Heumos, L. ; Ehmele, P. ; Treis, T. ; Upmeier Zu Belzen, J.* ; Roellin, E. ; May, L. ; Namsaraeva, A. ; Horlava, N. ; Shitov, V.A. ; Zhang, X. ; Zappia, L. ; Knöll, R.* ; Lang, N.J. ; Hetzel, L. ; Virshup, I. ; Sikkema, L. ; Curion, F. ; Eils, R.* ; Schiller, H. ; Hilgendorff, A. ; Theis, F.J.
An open-source framework for end-to-end analysis of electronic health record data.
Nat. Med., DOI: 10.1038/s41591-024-03214-0 (2024)
With progressive digitalization of healthcare systems worldwide, large-scale collection of electronic health records (EHRs) has become commonplace. However, an extensible framework for comprehensive exploratory analysis that accounts for data heterogeneity is missing. Here we introduce ehrapy, a modular open-source Python framework designed for exploratory analysis of heterogeneous epidemiology and EHR data. ehrapy incorporates a series of analytical steps, from data extraction and quality control to the generation of low-dimensional representations. Complemented by rich statistical modules, ehrapy facilitates associating patients with disease states, differential comparison between patient clusters, survival analysis, trajectory inference, causal inference and more. Leveraging ontologies, ehrapy further enables data sharing and training EHR deep learning models, paving the way for foundational models in biomedical research. We demonstrate ehrapy's features in six distinct examples. We applied ehrapy to stratify patients affected by unspecified pneumonia into finer-grained phenotypes. Furthermore, we reveal biomarkers for significant differences in survival among these groups. Additionally, we quantify medication-class effects of pneumonia medications on length of stay. We further leveraged ehrapy to analyze cardiovascular risks across different data modalities. We reconstructed disease state trajectories in patients with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) based on imaging data. Finally, we conducted a case study to demonstrate how ehrapy can detect and mitigate biases in EHR data. ehrapy, thus, provides a framework that we envision will standardize analysis pipelines on EHR data and serve as a cornerstone for the community.
Impact Factor
Scopus SNIP
Web of Science
Times Cited
Scopus
Cited By
Altmetric
Publikationstyp
Artikel: Journalartikel
Dokumenttyp
Wissenschaftlicher Artikel
Typ der Hochschulschrift
Herausgeber
Schlagwörter
Clinical-trials; Primary-care; Missing Data; Resource; Common; Associations; Discovery; Multiple; Profile; Models
Keywords plus
Sprache
englisch
Veröffentlichungsjahr
2024
Prepublished im Jahr
0
HGF-Berichtsjahr
2024
ISSN (print) / ISBN
1078-8956
e-ISSN
1546-170X
ISBN
Bandtitel
Konferenztitel
Konferzenzdatum
Konferenzort
Konferenzband
Quellenangaben
Band:
Heft:
Seiten:
Artikelnummer:
Supplement:
Reihe
Verlag
Nature Publishing Group
Verlagsort
New York, NY
Tag d. mündl. Prüfung
0000-00-00
Betreuer
Gutachter
Prüfer
Topic
Hochschule
Hochschulort
Fakultät
Veröffentlichungsdatum
0000-00-00
Anmeldedatum
0000-00-00
Anmelder/Inhaber
weitere Inhaber
Anmeldeland
Priorität
Begutachtungsstatus
Peer reviewed
POF Topic(s)
30205 - Bioengineering and Digital Health
80000 - German Center for Lung Research
30202 - Environmental Health
Forschungsfeld(er)
Enabling and Novel Technologies
Lung Research
PSP-Element(e)
G-503800-001
G-503800-004
G-503893-001
G-501800-810
G-501693-001
G-552100-001
Förderungen
Chan Zuckerberg Initiative
Federal Ministry of Education and Research
Konrad Zuse School of Excellence in Learning and Intelligent Systems (ELIZA) through the DAAD
European Union (ERC)
German Federal Ministry of Education and Research (BMBF)
Helmholtz Association
German Center for Lung Research (DZL)
Copyright
Erfassungsdatum
2024-10-23