TY - JOUR AB - Large language models based on the transformer deep learning architecture have revolutionized natural language processing. Motivated by the analogy between human language and the genome’s biological code, researchers have begun to develop genome language models (gLMs) based on transformers and related architectures. This Review explores the use of transformers and language models in genomics. We survey open questions in genomics amenable to the use of gLMs, and motivate the use of gLMs and the transformer architecture for these problems. We discuss the potential of gLMs for modelling the genome using unsupervised pretraining tasks, specifically focusing on the power of zero- and few-shot learning. We explore the strengths and limitations of the transformer architecture, as well as the strengths and limitations of current gLMs more broadly. Additionally, we contemplate the future of genomic modelling beyond the transformer architecture, based on current trends in research. This Review serves as a guide for computational biologists and computer scientists interested in transformers and language models for genomic data. AU - Consens, M.E.* AU - Dufault, C.* AU - Wainberg, M.* AU - Forster, D.* AU - Karimzadeh, M.* AU - Goodarzi, H.* AU - Theis, F.J. AU - Moses, A.* AU - Wang, B.* C1 - 73799 C2 - 57233 TI - Transformers and genome language models. JO - Nat. Mach. Intell. PY - 2025 SN - 2522-5839 ER - TY - JOUR AB - Self-supervised learning (SSL) has emerged as a powerful method for extracting meaningful representations from vast, unlabelled datasets, transforming computer vision and natural language processing. In single-cell genomics (SCG), representation learning offers insights into the complex biological data, especially with emerging foundation models. However, identifying scenarios in SCG where SSL outperforms traditional learning methods remains a nuanced challenge. Furthermore, selecting the most effective pretext tasks within the SSL framework for SCG is a critical yet unresolved question. Here we address this gap by adapting and benchmarking SSL methods in SCG, including masked autoencoders with multiple masking strategies and contrastive learning methods. Models trained on over 20 million cells were examined across multiple downstream tasks, including cell-type prediction, gene-expression reconstruction, cross-modality prediction and data integration. Our empirical analyses underscore the nuanced role of SSL, namely, in transfer learning scenarios leveraging auxiliary data or analysing unseen datasets. Masked autoencoders excel over contrastive methods in SCG, diverging from computer vision trends. Moreover, our findings reveal the notable capabilities of SSL in zero-shot settings and its potential in cross-modality prediction and data integration. In summary, we study SSL methods in SCG on fully connected networks and benchmark their utility across key representation learning scenarios. AU - Richter, T. AU - Bahrami, M. AU - Xia, Y.* AU - Fischer, D.S. AU - Theis, F.J. C1 - 72924 C2 - 56785 TI - Delineating the effective use of self-supervised learning in single-cell genomics. JO - Nat. Mach. Intell. PY - 2025 SN - 2522-5839 ER - TY - JOUR AB - A chief goal of artificial intelligence is to build machines that think like people. Yet it has been argued that deep neural network architectures fail to accomplish this. Researchers have asserted these models’ limitations in the domains of causal reasoning, intuitive physics and intuitive psychology. Yet recent advancements, namely the rise of large language models, particularly those designed for visual processing, have rekindled interest in the potential to emulate human-like cognitive abilities. This paper evaluates the current state of vision-based large language models in the domains of intuitive physics, causal reasoning and intuitive psychology. Through a series of controlled experiments, we investigate the extent to which these modern models grasp complex physical interactions, causal relationships and intuitive understanding of others’ preferences. Our findings reveal that, while some of these models demonstrate a notable proficiency in processing and interpreting visual data, they still fall short of human capabilities in these areas. Our results emphasize the need for integrating more robust mechanisms for understanding causality, physical dynamics and social cognition into modern-day, vision-based language models, and point out the importance of cognitively inspired benchmarks. AU - Schulze Buschoff, L.M. AU - Akata, E. AU - Bethge, M.* AU - Schulz, E. C1 - 73619 C2 - 57140 SP - 96-106 TI - Visual cognition in multimodal large language models. JO - Nat. Mach. Intell. VL - 7 IS - 1 PY - 2025 SN - 2522-5839 ER - TY - JOUR AB - Although federated learning is often seen as a promising solution to allow AI innovation while addressing privacy concerns, we argue that this technology does not fix all underlying data ethics concerns. Benefiting from federated learning in digital health requires acknowledgement of its limitations. AU - Bak, M.* AU - Madai, V.I.* AU - Celi, L.A.* AU - Kaissis, G. AU - Cornet, R.* AU - Maris, M.* AU - Rueckert, D.* AU - Buyx, A.* AU - McLennan, S.* C1 - 70318 C2 - 55511 SP - 370–372 TI - Federated learning is not a cure-all for data ethics. JO - Nat. Mach. Intell. VL - 6 PY - 2024 SN - 2522-5839 ER - TY - JOUR AB - Artificial intelligence (AI) models are vulnerable to information leakage of their training data, which can be highly sensitive, for example, in medical imaging. Privacy-enhancing technologies, such as differential privacy (DP), aim to circumvent these susceptibilities. DP is the strongest possible protection for training models while bounding the risks of inferring the inclusion of training samples or reconstructing the original data. DP achieves this by setting a quantifiable privacy budget. Although a lower budget decreases the risk of information leakage, it typically also reduces the performance of such models. This imposes a trade-off between robust performance and stringent privacy. Additionally, the interpretation of a privacy budget remains abstract and challenging to contextualize. Here we contrast the performance of artificial intelligence models at various privacy budgets against both theoretical risk bounds and empirical success of reconstruction attacks. We show that using very large privacy budgets can render reconstruction attacks impossible, while drops in performance are negligible. We thus conclude that not using DP at all is negligent when applying artificial intelligence models to sensitive data. We deem our results to lay a foundation for further debates on striking a balance between privacy risks and model performance. AU - Ziller, A.* AU - Mueller, T.T.* AU - Stieger, S. AU - Feiner, L.F.* AU - Brandt, J.* AU - Braren, R.* AU - Rueckert, D.* AU - Kaissis, G. C1 - 70929 C2 - 55818 CY - Heidelberger Platz 3, Berlin, 14197, Germany TI - Reconciling privacy and accuracy in AI for medical imaging. JO - Nat. Mach. Intell. PB - Nature Portfolio PY - 2024 SN - 2522-5839 ER - TY - JOUR AB - The rise of artificial intelligence (AI) has relied on an increasing demand for energy, which threatens to outweigh its promised positive effects. To steer AI onto a more sustainable path, quantifying and comparing its energy consumption is key. AU - Debus, C.* AU - Piraud, M. AU - Streit, A.* AU - Theis, F.J. AU - Götz, M.* C1 - 68797 C2 - 53723 CY - Heidelberger Platz 3, Berlin, 14197, Germany SP - 1176-1178 TI - Reporting electricity consumption is essential for sustainable AI. JO - Nat. Mach. Intell. VL - 5 IS - 11 PB - Nature Portfolio PY - 2023 SN - 2522-5839 ER - TY - JOUR AB - Multispectral optoacoustic tomography is a high-resolution functional imaging modality that can non-invasively access a broad range of pathophysiological phenomena. Real-time imaging would enable translation of multispectral optoacoustic tomography into clinical imaging, visualize dynamic pathophysiological changes associated with disease progression and enable in situ diagnoses. Model-based reconstruction affords state-of-the-art optoacoustic images but cannot be used for real-time imaging. On the other hand, deep learning enables fast reconstruction of optoacoustic images, but the lack of experimental ground-truth training data leads to reduced image quality for in vivo scans. In this work we achieve accurate optoacoustic image reconstruction in 31 ms per image for arbitrary (experimental) input data by expressing model-based reconstruction with a deep neural network. The proposed deep learning framework, DeepMB, generalizes to experimental test data through training on optoacoustic signals synthesized from real-world images and ground truth optoacoustic images generated by model-based reconstruction. Based on qualitative and quantitative evaluation on a diverse dataset of in vivo images, we show that DeepMB reconstructs images approximately 1,000-times faster than the iterative model-based reference method while affording near-identical image qualities. Accurate and real-time image reconstructions with DeepMB can enable full access to the high-resolution and multispectral contrast of handheld optoacoustic tomography, thus adoption into clinical routines. AU - Dehner, C. AU - Zahnd, G. AU - Ntziachristos, V. AU - Jüstel, D. C1 - 68195 C2 - 53616 CY - Heidelberger Platz 3, Berlin, 14197, Germany SP - 1130–1141 TI - A deep neural network for real-time optoacoustic image reconstruction with adjustable speed of sound. JO - Nat. Mach. Intell. VL - 5 PB - Nature Portfolio PY - 2023 SN - 2522-5839 ER - TY - JOUR AB - Biomedical image analysis algorithm validation depends on high-quality annotation of reference datasets, for which labelling instructions are key. Despite their importance, their optimization remains largely unexplored. Here we present a systematic study of labelling instructions and their impact on annotation quality in the field. Through comprehensive examination of professional practice and international competitions registered at the Medical Image Computing and Computer Assisted Intervention Society, the largest international society in the biomedical imaging field, we uncovered a discrepancy between annotators’ needs for labelling instructions and their current quality and availability. On the basis of an analysis of 14,040 images annotated by 156 annotators from four professional annotation companies and 708 Amazon Mechanical Turk crowdworkers using instructions with different information density levels, we further found that including exemplary images substantially boosts annotation performance compared with text-only descriptions, while solely extending text descriptions does not. Finally, professional annotators constantly outperform Amazon Mechanical Turk crowdworkers. Our study raises awareness for the need of quality standards in biomedical image analysis labelling instructions. AU - Rädsch, T.* AU - Reinke, A.* AU - Weru, V.* AU - Tizabi, M.D.* AU - Schreck, N.* AU - Kavur, A.E.* AU - Pekdemir, B. AU - Roß, T.* AU - Kopp-Schneider, A.* AU - Maier-Hein, L.* C1 - 67589 C2 - 53597 CY - Heidelberger Platz 3, Berlin, 14197, Germany SP - 273–283 TI - Labelling instructions matter in biomedical image analysis. JO - Nat. Mach. Intell. VL - 5 IS - 3 PB - Nature Portfolio PY - 2023 SN - 2522-5839 ER - TY - JOUR AB - With the advent of deep learning and increasing use of brain MRIs, a great amount of interest has arisen in automated anomaly segmentation to improve clinical workflows; however, it is time-consuming and expensive to curate medical imaging. Moreover, data are often scattered across many institutions, with privacy regulations hampering its use. Here we present FedDis to collaboratively train an unsupervised deep convolutional autoencoder on 1,532 healthy magnetic resonance scans from four different institutions, and evaluate its performance in identifying pathologies such as multiple sclerosis, vascular lesions, and low- and high-grade tumours/glioblastoma on a total of 538 volumes from six different institutions. To mitigate the statistical heterogeneity among different institutions, we disentangle the parameter space into global (shape) and local (appearance). Four institutes jointly train shape parameters to model healthy brain anatomical structures. Every institute trains appearance parameters locally to allow for client-specific personalization of the global domain-invariant features. We have shown that our collaborative approach, FedDis, improves anomaly segmentation results by 99.74% for multiple sclerosis, 83.33% for vascular lesions and 40.45% for tumours over locally trained models without the need for annotations or sharing of private local data. We found out that FedDis is especially beneficial for institutes that share both healthy and anomaly data, improving their local model performance by up to 227% for multiple sclerosis lesions and 77% for brain tumours. AU - Bercea, C.-I. AU - Wiestler, B.* AU - Rueckert, D.* AU - Albarqouni, S. C1 - 66076 C2 - 52816 SP - 685-695 TI - Federated disentangled representation learning for unsupervised brain anomaly detection. JO - Nat. Mach. Intell. VL - 4 IS - 8 PY - 2022 SN - 2522-5839 ER - TY - JOUR AB - The increase in available high-throughput molecular data creates computational challenges for the identification of cancer genes. Genetic as well as non-genetic causes contribute to tumorigenesis, and this necessitates the development of predictive models to effectively integrate different data modalities while being interpretable. We introduce EMOGI, an explainable machine learning method based on graph convolutional networks to predict cancer genes by combining multiomics pan-cancer data—such as mutations, copy number changes, DNA methylation and gene expression—together with protein–protein interaction (PPI) networks. EMOGI was on average more accurate than other methods across different PPI networks and datasets. We used layer-wise relevance propagation to stratify genes according to whether their classification was driven by the interactome or any of the omics levels, and to identify important modules in the PPI network. We propose 165 novel cancer genes that do not necessarily harbour recurrent alterations but interact with known cancer genes, and we show that they correspond to essential genes from loss-of-function screens. We believe that our method can open new avenues in precision oncology and be applied to predict biomarkers for other complex diseases. AU - Schulte-Sasse, R.* AU - Budach, S.* AU - Hnisz, D.* AU - Marsico, A. C1 - 61856 C2 - 50487 CY - Campus, 4 Crinan St, London, N1 9xw, England SP - 513–526 TI - Integration of multiomics data with graph convolutional networks to identify new cancer genes and their associated molecular mechanisms. JO - Nat. Mach. Intell. VL - 3 PB - Springernature PY - 2021 SN - 2522-5839 ER - TY - JOUR AB - Access to large, annotated samples represents a considerable challenge for training accurate deep-learning models in medical imaging. Although at present transfer learning from pre-trained models can help with cases lacking data, this limits design choices and generally results in the use of unnecessarily large models. Here we propose a self-supervised training scheme for obtaining high-quality, pre-trained networks from unlabelled, cross-modal medical imaging data, which will allow the creation of accurate and efficient models. We demonstrate the utility of the scheme by accurately predicting retinal thickness measurements based on optical coherence tomography from simple infrared fundus images. Subsequently, learned representations outperformed advanced classifiers on a separate diabetic retinopathy classification task in a scenario of scarce training data. Our cross-modal, three-stage scheme effectively replaced 26,343 diabetic retinopathy annotations with 1,009 semantic segmentations on optical coherence tomography and reached the same classification accuracy using only 25% of fundus images, without any drawbacks, since optical coherence tomography is not required for predictions. We expect this concept to apply to other multimodal clinical imaging, health records and genomics data, and to corresponding sample-starved learning problems. AU - Holmberg, O. AU - Köhler, N. AU - Martins, T.* AU - Siedlecki, J.* AU - Herold, T.* AU - Keidel, L.* AU - Asani, B.* AU - Schiefelbein, J.* AU - Priglinger, S.* AU - Kortuem, K.U.* AU - Theis, F.J. C1 - 60501 C2 - 49354 CY - Campus, 4 Crinan St, London, N1 9xw, England SP - 719-726 TI - Self-supervised retinal thickness prediction enables deep learning from unlabelled data to boost classification of diabetic retinopathy. JO - Nat. Mach. Intell. VL - 2 IS - 11 PB - Springernature PY - 2020 SN - 2522-5839 ER - TY - JOUR AB - Reliable recognition of malignant white blood cells is a key step in the diagnosis of haematologic malignancies such as acute myeloid leukaemia. Microscopic morphological examination of blood cells is usually performed by trained human examiners, making the process tedious, time-consuming and hard to standardize. Here, we compile an annotated image dataset of over 18,000 white blood cells, use it to train a convolutional neural network for leukocyte classification and evaluate the network’s performance by comparing to inter- and intra-expert variability. The network classifies the most important cell types with high accuracy. It also allows us to decide two clinically relevant questions with human-level performance: (1) if a given cell has blast character and (2) if it belongs to the cell types normally present in non-pathological blood smears. Our approach holds the potential to be used as a classification aid for examining much larger numbers of cells in a smear than can usually be done by a human expert. This will allow clinicians to recognize malignant cell populations with lower prevalence at an earlier stage of the disease. AU - Matek, C. AU - Schwarz, S.* AU - Spiekermann, K.* AU - Marr, C. C1 - 57413 C2 - 47752 SP - 538-544 TI - Human-level recognition of blast cells in acute myeloid leukemia with convolutional neural networks. JO - Nat. Mach. Intell. VL - 1 PY - 2019 SN - 2522-5839 ER -