Siebenmorgen, T. ; Cardoso Micu Menezes, F.M. ; Benassou, S.* ; Merdivan, E. ; Didi, K.* ; Mourao, A. ; Kitel, R.* ; Liò, P.* ; Kesselheim, S.* ; Piraud, M. ; Theis, F.J. ; Sattler, M. ; Popowicz, G.M.
MISATO: Machine learning dataset of protein-ligand complexes for structure-based drug discovery.
Nat. Comput. Sci. 4, 367–378 (2024)
Large language models have greatly enhanced our ability to understand biology and chemistry, yet robust methods for structure-based drug discovery, quantum chemistry and structural biology are still sparse. Precise biomolecule-ligand interaction datasets are urgently needed for large language models. To address this, we present MISATO, a dataset that combines quantum mechanical properties of small molecules and associated molecular dynamics simulations of ~20,000 experimental protein-ligand complexes with extensive validation of experimental data. Starting from the existing experimental structures, semi-empirical quantum mechanics was used to systematically refine these structures. A large collection of molecular dynamics traces of protein-ligand complexes in explicit water is included, accumulating over 170 μs. We give examples of machine learning (ML) baseline models proving an improvement of accuracy by employing our data. An easy entry point for ML experts is provided to enable the next generation of drug discovery artificial intelligence models.
Impact Factor
Scopus SNIP
Web of Science
Times Cited
Scopus
Cited By
Altmetric
Publikationstyp
Artikel: Journalartikel
Dokumenttyp
Wissenschaftlicher Artikel
Typ der Hochschulschrift
Herausgeber
Schlagwörter
Scoring Function; Force-field; Binding; Affinity; Efficient; Models; Parameterization; Generation; Prediction; Accuracy
Keywords plus
Sprache
englisch
Veröffentlichungsjahr
2024
Prepublished im Jahr
0
HGF-Berichtsjahr
2024
ISSN (print) / ISBN
2662-8457
e-ISSN
2662-8457
ISBN
Bandtitel
Konferenztitel
Konferzenzdatum
Konferenzort
Konferenzband
Quellenangaben
Band: 4,
Heft: ,
Seiten: 367–378
Artikelnummer: ,
Supplement: ,
Reihe
Verlag
Springer
Verlagsort
Campus, 4 Crinan St, London, N1 9xw, England
Tag d. mündl. Prüfung
0000-00-00
Betreuer
Gutachter
Prüfer
Topic
Hochschule
Hochschulort
Fakultät
Veröffentlichungsdatum
0000-00-00
Anmeldedatum
0000-00-00
Anmelder/Inhaber
weitere Inhaber
Anmeldeland
Priorität
Begutachtungsstatus
Peer reviewed
POF Topic(s)
30203 - Molecular Targets and Therapies
30205 - Bioengineering and Digital Health
Forschungsfeld(er)
Enabling and Novel Technologies
PSP-Element(e)
G-503000-001
G-530001-001
G-503800-001
Förderungen
Helmholtz Association's Initiative and Networking Fund on the HAICORE@FZJ partition
BMBF
Bundesministerium fr Bildung und Forschung (Federal Ministry of Education and Research)
Copyright
Erfassungsdatum
2024-07-26