Eissa, T.* ; Huber, M.* ; Obermayer-Pietsch, B.* ; Linkohr, B. ; Peters, A. ; Fleischmann, F.* ; Zigman, M.*
CODI: Enhancing machine learning-based molecular profiling through contextual out-of-distribution integration.
PNAS Nexus 3:pgae449 (2024)
Molecular analytics increasingly utilize machine learning (ML) for predictive modeling based on data acquired through molecular profiling technologies. However, developing robust models that accurately capture physiological phenotypes is challenged by the dynamics inherent to biological systems, variability stemming from analytical procedures, and the resource-intensive nature of obtaining sufficiently representative datasets. Here, we propose and evaluate a new method: Contextual Out-of-Distribution Integration (CODI). Based on experimental observations, CODI generates synthetic data that integrate unrepresented sources of variation encountered in real-world applications into a given molecular fingerprint dataset. By augmenting a dataset with out-of-distribution variance, CODI enables an ML model to better generalize to samples beyond the seed training data, reducing the need for extensive experimental data collection. Using three independent longitudinal clinical studies and a case-control study, we demonstrate CODI's application to several classification tasks involving vibrational spectroscopy of human blood. We showcase our approach's ability to enable personalized fingerprinting for multiyear longitudinal molecular monitoring and enhance the robustness of trained ML models for improved disease detection. Our comparative analyses reveal that incorporating CODI into the classification workflow consistently leads to increased robustness against data variability and improved predictive accuracy.
Altmetric
Additional Metrics?
Publication type
Article: Journal article
Document type
Scientific Article
Thesis type
Editors
Corresponding Author
Keywords
Data Augmentation ; Machine Learning ; Molecular Analytics ; Out-of-distribution ; Variability Modeling; Metabolic Phenotypes; Spectroscopy; Hallmarks; Cancer
Keywords plus
ISSN (print) / ISBN
2752-6542
e-ISSN
2752-6542
ISBN
Book Volume Title
Conference Title
Conference Date
Conference Location
Proceedings Title
Quellenangaben
Volume: 3,
Issue: 10,
Pages: ,
Article Number: pgae449
Supplement: ,
Series
Publisher
Oxford University Press
Publishing Place
Great Clarendon St, Oxford Ox2 6dp, England
University
University place
Faculty
Publication date
0000-00-00
Application date
0000-00-00
Patent owner
Further owners
Application country
Patent priority
Reviewing status
Peer reviewed
Grants
Styrian Business Promotion Agency (SFG)
Austrian Federal Ministry of Economics and Labour/the Federal Ministry of Economy, Family and Youth (BMWA/BMWFJ)
Austrian Research Fund, as a COMET K-project - Austrian Federal Ministry of Transport, Innovation and Technology (BMVIT)
State of Bavaria
Helmholtz Zentrum Munchen-German Research Center for Environmental Health - German Federal Ministry of Education and Research (BMBF)
LMU Munich, Centre for Advanced Laser Applications (CALA)