PuSH - Publikationsserver des Helmholtz Zentrums München

Van Herck, J.* ; Gil, M.V.* ; Jablonka, K.M.* ; Abrudan, A.* ; Anker, A.S.* ; Asgari, M.* ; Blaiszik, B.* ; Buffo, A.* ; Choudhury, L.* ; Corminboeuf, C.* ; Daglar, H.* ; Elahi, A.M.* ; Foster, I.T.* ; García, S.A.* ; Garvin, M.* ; Godin, G.* ; Good, L.L.* ; Gu, J.* ; Xiao Hu, N.* ; Jin, X.* ; Junkers, T.* ; Keskin, S.* ; Knowles, T.P.J.* ; Laplaza, R.* ; Lessona, M.* ; Majumdar, S.K.* ; Mashhadimoslem, H.* ; McIntosh, R.D.* ; Moosavi, S.M.* ; Mouriño, B.* ; Nerli, F.* ; Pevida, C.* ; Poudineh, N.* ; Rajabi-Kochi, M.* ; Saar, K.L.* ; Hooriabad Saboor, F.* ; Sagharichiha, M.* ; Schmidt, K.J.* ; Shi, J.* ; Simone, E.* ; Svatunek, D.* ; Taddei, M.* ; Tetko, I.V. ; Tolnai, D.* ; Vahdatifar, S.* ; Whitmer, J.* ; Wieland, D.C.F.* ; Willumeit-Römer, R.* ; Züttel, A.* ; Smit, B.*

Assessment of fine-tuned large language models for real-world chemistry and material science applications.

Chem. Sci. 16, 670-684 (2025)
Verlagsversion DOI PMC
Free journal
Creative Commons Lizenzvertrag
Open Access Green möglich sobald Postprint bei der ZB eingereicht worden ist.
The current generation of large language models (LLMs) has limited chemical knowledge. Recently, it has been shown that these LLMs can learn and predict chemical properties through fine-tuning. Using natural language to train machine learning models opens doors to a wider chemical audience, as field-specific featurization techniques can be omitted. In this work, we explore the potential and limitations of this approach. We studied the performance of fine-tuning three open-source LLMs (GPT-J-6B, Llama-3.1-8B, and Mistral-7B) for a range of different chemical questions. We benchmark their performances against "traditional" machine learning models and find that, in most cases, the fine-tuning approach is superior for a simple classification problem. Depending on the size of the dataset and the type of questions, we also successfully address more sophisticated problems. The most important conclusions of this work are that, for all datasets considered, their conversion into an LLM fine-tuning training set is straightforward and that fine-tuning with even relatively small datasets leads to predictive models. These results suggest that the systematic use of LLMs to guide experiments and simulations will be a powerful technique in any research study, significantly reducing unnecessary experiments or computations.
Impact Factor
Scopus SNIP
Web of Science
Times Cited
Altmetric
7.400
0.000
1
Tags
Anmerkungen
Besondere Publikation
Auf Hompepage verbergern

Zusatzinfos bearbeiten
Eigene Tags bearbeiten
Privat
Eigene Anmerkung bearbeiten
Privat
Auf Publikationslisten für
Homepage nicht anzeigen
Als besondere Publikation
markieren
Publikationstyp Artikel: Journalartikel
Dokumenttyp Wissenschaftlicher Artikel
Sprache englisch
Veröffentlichungsjahr 2025
Prepublished im Jahr 2024
HGF-Berichtsjahr 2024
ISSN (print) / ISBN 2041-6520
e-ISSN 2041-6539
Zeitschrift Chemical Science
Quellenangaben Band: 16, Heft: 2, Seiten: 670-684 Artikelnummer: , Supplement: ,
Verlag Royal Society of Chemistry (RSC)
Verlagsort Thomas Graham House, Science Park, Milton Rd, Cambridge Cb4 0wf, Cambs, England
Begutachtungsstatus Peer reviewed
POF Topic(s) 30203 - Molecular Targets and Therapies
Forschungsfeld(er) Enabling and Novel Technologies
PSP-Element(e) G-503000-001
Förderungen UK Research and Innovation (UKRI) under the UK government's Horizon Europe
European Research Council (ERC)
Data Sciences Institute at the University of Toronto
Novo Nordisk Foundation
USorb-DAC Project through Grantham Foundation for the Protection of the Environment
Carl Zeiss Foundation
European Regional Development Fund (ERDF)
Galician Government
Spanish Ministry of Science and Innovation
Spanish National Research Council (CSIC)
European Union NextGenerationEU/PRTR
Spanish Agencia Estatal de Investigacion (AEI) - MICIU/AEI
European Research Council under the European Union's Horizon 2020 research and innovation program through the ERC grant DiProPhys
National Institutes of Health Oxford-Cambridge Scholars Program
Cambridge Trust's Cambridge International Scholarship

European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme
European Union's Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant
NCCR MARVEL, a National Centre of Competence in Research - Swiss National Science Foundation
Italian MUR
St. John's College Research Fellowship programme
Rhodes Trust
Schmidt Science Fellowship
Frances and Augustus Newman Foundation
European Research Council under the European Union's Seventh Framework Programme (FP7/2007-2013)
Intramural Research Program of the National Institute of Diabetes and Digestive and Kidney Diseases at the National Institutes of Health
Swiss Science Foundation
Scopus ID 85212108442
PubMed ID 39664810
Erfassungsdatum 2024-12-13