TY - JOUR AB - Quantitative mass spectrometry has revolutionized proteomics by enabling simultaneous quantification of thousands of proteins. Pooling patient-derived data from multiple institutions enhances statistical power but raises serious privacy concerns. Here we introduce FedProt, the first privacy-preserving tool for collaborative differential protein abundance analysis of distributed data, which utilizes federated learning and additive secret sharing. In the absence of a multicenter patient-derived dataset for evaluation, we created two: one at five centers from E. coli experiments and one at three centers from human serum. Evaluations using these datasets confirm that FedProt achieves accuracy equivalent to the DEqMS method applied to pooled data, with completely negligible absolute differences no greater than 4 × 10-12. By contrast, -log10P computed by the most accurate meta-analysis methods diverged from the centralized analysis results by up to 25-26. AU - Burankova, Y.* AU - Abele, M.* AU - Bakhtiari, M.* AU - von Toerne, C. AU - Barth, T.K.* AU - Schweizer, L.* AU - Giesbertz, P.* AU - Schmidt, J.R.* AU - Kalkhof, S.* AU - Müller-Deile, J.* AU - van Veelen, P.A.* AU - Mohammed, Y.* AU - Hammer, E.* AU - Arend, L.* AU - Adamowicz, K.* AU - Laske, T.* AU - Hartebrodt, A.* AU - Frisch, T.* AU - Meng, C.* AU - Matschinske, J.* AU - Späth, J.* AU - Röttger, R.* AU - Schwämmle, V.* AU - Hauck, S.M. AU - Lichtenthaler, S.F.* AU - Imhof, A.* AU - Mann, M.* AU - Ludwig, C.* AU - Kuster, B.* AU - Baumbach, J.* AU - Zolotareva, O.* C1 - 75126 C2 - 57825 CY - Campus, 4 Crinan St, London, N1 9xw, England TI - Privacy-preserving multicenter differential protein abundance analysis with FedProt. JO - Nat. Comput. Sci. PB - Springernature PY - 2025 SN - 2662-8457 ER - TY - JOUR AB - Data generated in perturbation experiments link perturbations to the changes they elicit and therefore contain information relevant to numerous biological discovery tasks-from understanding the relationships between biological entities to developing therapeutics. However, these data encompass diverse perturbations and readouts, and the complex dependence of experimental outcomes on their biological context makes it challenging to integrate insights across experiments. Here we present the large perturbation model (LPM), a deep-learning model that integrates multiple, heterogeneous perturbation experiments by representing perturbation, readout and context as disentangled dimensions. LPM outperforms existing methods across multiple biological discovery tasks, including in predicting post-perturbation transcriptomes of unseen experiments, identifying shared molecular mechanisms of action between chemical and genetic perturbations, and facilitating the inference of gene-gene interaction networks. LPM learns meaningful joint representations of perturbations, readouts and contexts, enables the study of biological relationships in silico and could considerably accelerate the derivation of insights from pooled perturbation experiments. AU - Miladinovic, D.* AU - Höppe, T. AU - Chevalley, M.* AU - Georgiou, A.* AU - Stuart, L.* AU - Mehrjou, A.* AU - Bantscheff, M.* AU - Schölkopf, B.* AU - Schwab, P.* C1 - 75799 C2 - 58152 TI - In silico biological discovery with large perturbation models. JO - Nat. Comput. Sci. PY - 2025 SN - 2662-8457 ER - TY - JOUR AB - Large language models have greatly enhanced our ability to understand biology and chemistry, yet robust methods for structure-based drug discovery, quantum chemistry and structural biology are still sparse. Precise biomolecule-ligand interaction datasets are urgently needed for large language models. To address this, we present MISATO, a dataset that combines quantum mechanical properties of small molecules and associated molecular dynamics simulations of ~20,000 experimental protein-ligand complexes with extensive validation of experimental data. Starting from the existing experimental structures, semi-empirical quantum mechanics was used to systematically refine these structures. A large collection of molecular dynamics traces of protein-ligand complexes in explicit water is included, accumulating over 170 μs. We give examples of machine learning (ML) baseline models proving an improvement of accuracy by employing our data. An easy entry point for ML experts is provided to enable the next generation of drug discovery artificial intelligence models. AU - Siebenmorgen, T. AU - Cardoso Micu Menezes, F.M. AU - Benassou, S.* AU - Merdivan, E. AU - Didi, K.* AU - Mourao, A. AU - Kitel, R.* AU - Liò, P.* AU - Kesselheim, S.* AU - Piraud, M. AU - Theis, F.J. AU - Sattler, M. AU - Popowicz, G.M. C1 - 70649 C2 - 56015 CY - Campus, 4 Crinan St, London, N1 9xw, England SP - 367–378 TI - MISATO: Machine learning dataset of protein-ligand complexes for structure-based drug discovery. JO - Nat. Comput. Sci. VL - 4 PB - Springernature PY - 2024 SN - 2662-8457 ER -