TY - JOUR AB - Identifying spatial domains for spatial transcriptomics is crucial for achieving comprehensive insights into the pathogenesis of gene expression. Increasingly, computational methods based on graph neural networks are being developed for spatial transcriptomics. However, previous methods have solely focused on the Euclidean manifold. To effectively exploit and explore the informative and deeper topological structures of inherent manifolds, we presented a Multi-Manifolds fusing hyperbolic graph network, balanced by Pareto optimization, for identifying spatial domains in Spatial Transcriptomics (MManiST). First, we developed multi-manifolds encoders for distinct manifolds using the hyperbolic neural network. Features from different manifolds were then combined using an attention mechanism, with multiple reconstruction losses balanced by Pareto optimization. Extensive experiments on commonly used benchmark datasets show that our method consistently outperforms seven state-of-the-art methods. Additionally, we investigated the validity of each component and the impact of fusion methods in ablation experiments. AU - Li, Y.* AU - Hu, Q.* AU - Han, S.* AU - Wang-Sattler, R. AU - Du, W.* C1 - 74072 C2 - 57324 CY - Great Clarendon St, Oxford Ox2 6dp, England TI - Multi-manifolds fusing hyperbolic graph network balanced by pareto optimization for identifying spatial domains of spatial transcriptomics. JO - Brief. Bioinform. VL - 26 IS - 2 PB - Oxford Univ Press PY - 2025 SN - 1467-5463 ER - TY - JOUR AB - Numerous imaging techniques are available for observing and interrogating biological samples, and several of them can be used consecutively to enable correlative analysis of different image modalities with varying resolutions and the inclusion of structural or molecular information. Achieving accurate registration of multimodal images is essential for the correlative analysis process, but it remains a challenging computer vision task with no widely accepted solution. Moreover, supervised registration methods require annotated data produced by experts, which is limited. To address this challenge, we propose a general unsupervised pipeline for multimodal image registration using deep learning. We provide a comprehensive evaluation of the proposed pipeline versus the current state-of-the-art image registration and style transfer methods on four types of biological problems utilizing different microscopy modalities. We found that style transfer of modality domains paired with fully unsupervised training leads to comparable image registration accuracy to supervised methods and, most importantly, does not require human intervention. AU - Grexa, I.* AU - Iván, Z.Z.* AU - Migh, E.* AU - Kovács, F.* AU - Bolck, H.A.* AU - Zheng, X.* AU - Mund, A.* AU - Moshkov, N.* AU - Miczán, V.* AU - Koos, K.* AU - Horvath, P. C1 - 70251 C2 - 56029 CY - Great Clarendon St, Oxford Ox2 6dp, England TI - SuperCUT, an unsupervised multimodal image registration with deep learning for biomedical microscopy. JO - Brief. Bioinform. VL - 25 IS - 2 PB - Oxford Univ Press PY - 2024 SN - 1467-5463 ER - TY - JOUR AB - Proteomics stands as the crucial link between genomics and human diseases. Quantitative proteomics provides detailed insights into protein levels, enabling differentiation between distinct phenotypes. OLINK, a biotechnology company from Uppsala, Sweden, offers a targeted, affinity-based protein measurement method called Target 96, which has become prominent in the field of proteomics. The SCALLOP consortium, for instance, contains data from over 70.000 individuals across 45 independent cohort studies, all sampled by OLINK. However, when independent cohorts want to collaborate and quantitatively compare their target 96 protein values, it is currently advised to include 'identical biological bridging' samples in each sampling run to perform a reference sample normalization, correcting technical variations across measurements. Such a 'biological bridging sample' approach requires each of the involved cohorts to resend their biological bridging samples to OLINK to run them all together, which is logistically challenging, costly and time-consuming. Hence alternatives are searched and an evaluation of the current state of the art exposes the need for a more robust method that allows all OLINK Target 96 studies to compare proteomics data accurately and cost-efficiently. To meet these goals we developed the Synthetic Plasma Pool Cohort Correction, the 'SPOC correction' approach, based on the use of an OLINK-composed synthetic plasma sample. The method can easily be implemented in a federated data-sharing context which is illustrated on a sepsis use case. AU - Heylen, D.* AU - Pusparum, M.* AU - Kuliesius, J.* AU - Wilson, J.* AU - Park, Y.-C. AU - Jamiołkowski, J.* AU - D'Onofrio, V.* AU - Valkenborg, D.* AU - Aerts, J.* AU - Ertaylan, G.* AU - Hooyberghs, J.* C1 - 72859 C2 - 56752 CY - Great Clarendon St, Oxford Ox2 6dp, England TI - Synthetic plasma pool cohort correction for affinity-based proteomics datasets allows multiple study comparison. JO - Brief. Bioinform. VL - 26 IS - 1 PB - Oxford Univ Press PY - 2024 SN - 1467-5463 ER - TY - JOUR AB - Spatially resolved transcriptomics (SRT) is a pioneering method for simultaneously studying morphological contexts and gene expression at single-cell precision. Data emerging from SRT are multifaceted, presenting researchers with intricate gene expression matrices, precise spatial details and comprehensive histology visuals. Such rich and intricate datasets, unfortunately, render many conventional methods like traditional machine learning and statistical models ineffective. The unique challenges posed by the specialized nature of SRT data have led the scientific community to explore more sophisticated analytical avenues. Recent trends indicate an increasing reliance on deep learning algorithms, especially in areas such as spatial clustering, identification of spatially variable genes and data alignment tasks. In this manuscript, we provide a rigorous critique of these advanced deep learning methodologies, probing into their merits, limitations and avenues for further refinement. Our in-depth analysis underscores that while the recent innovations in deep learning tailored for SRT have been promising, there remains a substantial potential for enhancement. A crucial area that demands attention is the development of models that can incorporate intricate biological nuances, such as phylogeny-aware processing or in-depth analysis of minuscule histology image segments. Furthermore, addressing challenges like the elimination of batch effects, perfecting data normalization techniques and countering the overdispersion and zero inflation patterns seen in gene expression is pivotal. To support the broader scientific community in their SRT endeavors, we have meticulously assembled a comprehensive directory of readily accessible SRT databases, hoping to serve as a foundation for future research initiatives. AU - Zahedi, R.P.* AU - Ghamsari, R.* AU - Argha, A.* AU - Macphillamy, C.* AU - Beheshti, A.* AU - Alizadehsani, R.* AU - Lovell, N.H.* AU - Lotfollahi, M. AU - Alinejad-Rokny, H.* C1 - 70252 C2 - 55468 TI - Deep learning in spatially resolved transcriptfomics: A comprehensive technical view. JO - Brief. Bioinform. VL - 25 IS - 2 PY - 2024 SN - 1467-5463 ER - TY - JOUR AB - A key problem in systems biology is the discovery of regulatory mechanisms that drive phenotypic behaviour of complex biological systems in the form of multi-level networks. Modern multi-omics profiling techniques probe these fundamental regulatory networks but are often hampered by experimental restrictions leading to missing data or partially measured omics types for subsets of individuals due to cost restrictions. In such scenarios, in which missing data is present, classical computational approaches to infer regulatory networks are limited. In recent years, approaches have been proposed to infer sparse regression models in the presence of missing information. Nevertheless, these methods have not been adopted for regulatory network inference yet. In this study, we integrated regression-based methods that can handle missingness into KiMONo, a Knowledge guided Multi-Omics Network inference approach, and benchmarked their performance on commonly encountered missing data scenarios in single- and multi-omics studies. Overall, two-step approaches that explicitly handle missingness performed best for a wide range of random- and block-missingness scenarios on imbalanced omics-layers dimensions, while methods implicitly handling missingness performed best on balanced omics-layers dimensions. Our results show that robust multi-omics network inference in the presence of missing data with KiMONo is feasible and thus allows users to leverage available multi-omics data to its full extent. AU - Henao, J. AU - Lauber, M.* AU - Azevedo, M. AU - Grekova, A. AU - Theis, F.J. AU - List, M.* AU - Ogris, C. AU - Schubert, B. C1 - 68219 C2 - 54738 CY - Great Clarendon St, Oxford Ox2 6dp, England TI - Multi-omics regulatory network inference in the presence of missing data. JO - Brief. Bioinform. VL - 24 IS - 5 PB - Oxford Univ Press PY - 2023 SN - 1467-5463 ER - TY - JOUR AB - RNA-binding proteins (RBPs) are central actors of RNA post-transcriptional regulation. Experiments to profile-binding sites of RBPs in vivo are limited to transcripts expressed in the experimental cell type, creating the need for computational methods to infer missing binding information. While numerous machine-learning based methods have been developed for this task, their use of heterogeneous training and evaluation datasets across different sets of RBPs and CLIP-seq protocols makes a direct comparison of their performance difficult. Here, we compile a set of 37 machine learning (primarily deep learning) methods for in vivo RBP-RNA interaction prediction and systematically benchmark a subset of 11 representative methods across hundreds of CLIP-seq datasets and RBPs. Using homogenized sample pre-processing and two negative-class sample generation strategies, we evaluate methods in terms of predictive performance and assess the impact of neural network architectures and input modalities on model performance. We believe that this study will not only enable researchers to choose the optimal prediction method for their tasks at hand, but also aid method developers in developing novel, high-performing methods by introducing a standardized framework for their evaluation. AU - Horlacher, M. AU - Cantini, G. AU - Hesse, J. AU - Schinke, P. AU - Goedert, N. AU - Londhe, S. AU - Moyon, L. AU - Marsico, A. C1 - 67938 C2 - 54416 TI - A systematic benchmark of machine learning methods for protein-RNA interaction prediction. JO - Brief. Bioinform. VL - 24 IS - 5 PY - 2023 SN - 1467-5463 ER - TY - JOUR AB - RNA localization is essential for regulating spatial translation, where RNAs are trafficked to their target locations via various biological mechanisms. In this review, we discuss RNA localization in the context of molecular mechanisms, experimental techniques and machine learning-based prediction tools. Three main types of molecular mechanisms that control the localization of RNA to distinct cellular compartments are reviewed, including directed transport, protection from mRNA degradation, as well as diffusion and local entrapment. Advances in experimental methods, both image and sequence based, provide substantial data resources, which allow for the design of powerful machine learning models to predict RNA localizations. We review the publicly available predictive tools to serve as a guide for users and inspire developers to build more effective prediction models. Finally, we provide an overview of multimodal learning, which may provide a new avenue for the prediction of RNA localization. AU - Wang, J.* AU - Horlacher, M. AU - Cheng, L.* AU - Winther, O.* C1 - 68090 C2 - 54568 CY - Great Clarendon St, Oxford Ox2 6dp, England TI - RNA trafficking and subcellular localization-a review of mechanisms, experimental and predictive methodologies. JO - Brief. Bioinform. VL - 24 IS - 5 PB - Oxford Univ Press PY - 2023 SN - 1467-5463 ER - TY - JOUR AB - Large metabolomics datasets inevitably contain unwanted technical variations which can obscure meaningful biological signals and affect how this information is applied to personalized healthcare. Many methods have been developed to handle unwanted variations. However, the underlying assumptions of many existing methods only hold for a few specific scenarios. Some tools remove technical variations with models trained on quality control (QC) samples which may not generalize well on subject samples. Additionally, almost none of the existing methods supports datasets with multiple types of QC samples, which greatly limits their performance and flexibility. To address these issues, a non-parametric method TIGER (Technical variation elImination with ensemble learninG architEctuRe) is developed in this study and released as an R package (https://CRAN.R-project.org/package=TIGERr). TIGER integrates the random forest algorithm into an adaptable ensemble learning architecture. Evaluation results show that TIGER outperforms four popular methods with respect to robustness and reliability on three human cohort datasets constructed with targeted or untargeted metabolomics data. Additionally, a case study aiming to identify age-associated metabolites is performed to illustrate how TIGER can be used for cross-kit adjustment in a longitudinal analysis with experimental data of three time-points generated by different analytical kits. A dynamic website is developed to help evaluate the performance of TIGER and examine the patterns revealed in our longitudinal analysis (https://han-siyu.github.io/TIGER_web/). Overall, TIGER is expected to be a powerful tool for metabolomics data analysis. AU - Han, S. AU - Huang, J. AU - Foppiano, F. AU - Prehn, C. AU - Adamski, J.* AU - Suhre, K.* AU - Li, Y.* AU - Matullo, G.* AU - Schliess, F.* AU - Gieger, C. AU - Peters, A. AU - Wang-Sattler, R. C1 - 64017 C2 - 51818 TI - TIGER: Technical variation elimination for metabolomics data using ensemble learning architecture. JO - Brief. Bioinform. VL - 23 IS - 2 PY - 2022 SN - 1467-5463 ER - TY - JOUR AB - DNA methylation analysis by sequencing is becoming increasingly popular, yielding methylomes at single-base pair and single-molecule resolution. It has tremendous potential for cell-type heterogeneity analysis using intrinsic read-level information. Although diverse deconvolution methods were developed to infer cell-type composition based on bulk sequencing-based methylomes, systematic evaluation has not been performed yet. Here, we thoroughly benchmark six previously published methods: Bayesian epiallele detection, DXM, PRISM, csmFinder+coMethy, ClubCpG and MethylPurify, together with two array-based methods, MeDeCom and Houseman, as a comparison group. Sequencing-based deconvolution methods consist of two main steps, informative region selection and cell-type composition estimation, thus each was individually assessed. With this elaborate evaluation, we aimed to establish which method achieves the highest performance in different scenarios of synthetic bulk samples. We found that cell-type deconvolution performance is influenced by different factors depending on the number of cell types within the mixture. Finally, we propose a best-practice deconvolution strategy for sequencing data and point out limitations that need to be handled. Array-based methods-both reference-based and reference-free-generally outperformed sequencing-based methods, despite the absence of read-level information. This implies that the current sequencing-based methods still struggle with correctly identifying cell-type-specific signals and eliminating confounding methylation patterns, which needs to be handled in future studies. AU - Jeong, Y.* AU - Barros De Andrade E Sousa, L. AU - Thalmeier, D. AU - Toth, R.* AU - Ganslmeier, M.* AU - Breuer, K.* AU - Plass, C.* AU - Lutsik, P.* C1 - 65659 C2 - 52854 TI - Systematic evaluation of cell-type deconvolution pipelines for sequencing-based bulk DNA methylomes. JO - Brief. Bioinform. VL - 23 IS - 4 PY - 2022 SN - 1467-5463 ER - TY - JOUR AB - Ordinary differential equation models are nowadays widely used for the mechanistic description of biological processes and their temporal evolution. These models typically have many unknown and nonmeasurable parameters, which have to be determined by fitting the model to experimental data. In order to perform this task, known as parameter estimation or model calibration, the modeller faces challenges such as poor parameter identifiability, lack of sufficiently informative experimental data and the existence of local minima in the objective function landscape. These issues tend to worsen with larger model sizes, increasing the computational complexity and the number of unknown parameters. An incorrectly calibrated model is problematic because it may result in inaccurate predictions and misleading conclusions. For nonexpert users, there are a large number of potential pitfalls. Here, we provide a protocol that guides the user through all the steps involved in the calibration of dynamic models. We illustrate the methodology with two models and provide all the code required to reproduce the results and perform the same analysis on new models. Our protocol provides practitioners and researchers in biological modelling with a one-stop guide that is at the same time compact and sufficiently comprehensive to cover all aspects of the problem. AU - Villaverde, A.F.* AU - Pathirana, D.* AU - Fröhlich, F. AU - Hasenauer, J.* AU - Banga, J.R.* C1 - 63295 C2 - 51279 TI - A protocol for dynamic model calibration. JO - Brief. Bioinform. VL - 23 IS - 1 PY - 2022 SN - 1467-5463 ER - TY - JOUR AB - Least absolute shrinkage and selection operator (LASSO) regression is often applied to select the most promising set of single nucleotide polymorphisms (SNPs) associated with a molecular phenotype of interest. While the penalization parameter λ restricts the number of selected SNPs and the potential model overfitting, the least-squares loss function of standard LASSO regression translates into a strong dependence of statistical results on a small number of individuals with phenotypes or genotypes divergent from the majority of the study population-typically comprised of outliers and high-leverage observations. Robust methods have been developed to constrain the influence of divergent observations and generate statistical results that apply to the bulk of study data, but they have rarely been applied to genetic association studies. In this article, we review, for newcomers to the field of robust statistics, a novel version of standard LASSO that utilizes the Huber loss function. We conduct comprehensive simulations and analyze real protein, metabolite, mRNA expression and genotype data to compare the stability of penalization, the cross-iteration concordance of the model, the false-positive and true-positive rates and the prediction accuracy of standard and robust Huber-LASSO. Although the two methods showed controlled false-positive rates ≤2.1% and similar true-positive rates, robust Huber-LASSO outperformed standard LASSO in the accuracy of predicted protein, metabolite and gene expression levels using individual SNP data. The conducted simulations and real-data analyses show that robust Huber-LASSO represents a valuable alternative to standard LASSO in genetic studies of molecular phenotypes. AU - Deutelmoser, H.* AU - Scherer, D.* AU - Brenner, H.* AU - Waldenberger, M. AU - Suhre, K.* AU - Kastenmüller, G. AU - Lorenzo Bermejo, J.* C1 - 61234 C2 - 49765 CY - Great Clarendon St, Oxford Ox2 6dp, England TI - Robust Huber-LASSO for improved prediction of protein, metabolite and gene expression levels relying on individual genotype data. JO - Brief. Bioinform. VL - 22 IS - 4 PB - Oxford Univ Press PY - 2021 SN - 1467-5463 ER - TY - JOUR AB - MOTIVATION: Estimating microbial association networks from high-throughput sequencing data is a common exploratory data analysis approach aiming at understanding the complex interplay of microbial communities in their natural habitat. Statistical network estimation workflows comprise several analysis steps, including methods for zero handling, data normalization and computing microbial associations. Since microbial interactions are likely to change between conditions, e.g. between healthy individuals and patients, identifying network differences between groups is often an integral secondary analysis step. Thus far, however, no unifying computational tool is available that facilitates the whole analysis workflow of constructing, analysing and comparing microbial association networks from high-throughput sequencing data. RESULTS: Here, we introduce NetCoMi (Network Construction and comparison for Microbiome data), an R package that integrates existing methods for each analysis step in a single reproducible computational workflow. The package offers functionality for constructing and analysing single microbial association networks as well as quantifying network differences. This enables insights into whether single taxa, groups of taxa or the overall network structure change between groups. NetCoMi also contains functionality for constructing differential networks, thus allowing to assess whether single pairs of taxa are differentially associated between two groups. Furthermore, NetCoMi facilitates the construction and analysis of dissimilarity networks of microbiome samples, enabling a high-level graphical summary of the heterogeneity of an entire microbiome sample collection. We illustrate NetCoMi's wide applicability using data sets from the GABRIELA study to compare microbial associations in settled dust from children's rooms between samples from two study centers (Ulm and Munich). AVAILABILITY: R scripts used for producing the examples shown in this manuscript are provided as supplementary data. The NetCoMi package, together with a tutorial, is available at https://github.com/stefpeschel/NetCoMi. CONTACT: Tel:+49 89 3187 43258; stefanie.peschel@mail.de. SUPPLEMENTARY INFORMATION: Supplementary data are available at Briefings in Bioinformatics online. AU - Peschel, S. AU - Müller, C.L. AU - von Mutius, E. AU - Boulesteix, A.L.* AU - Depner, M. C1 - 60782 C2 - 49524 CY - Great Clarendon St, Oxford Ox2 6dp, England TI - NetCoMi: Network construction and comparison for microbiome data in R. JO - Brief. Bioinform. VL - 22 IS - 4 PB - Oxford Univ Press PY - 2021 SN - 1467-5463 ER - TY - JOUR AB - The vast amount of experimental data from recent advances in the field of high-throughput biology begs for integration into more complex data structures such as genome-wide functional association networks. Such networks have been used for elucidation of the interplay of intra-cellular molecules to make advances ranging from the basic science understanding of evolutionary processes to the more translational field of precision medicine. The allure of the field has resulted in rapid growth of the number of available network resources, each with unique attributes exploitable to answer different biological questions. Unfortunately, the high volume of network resources makes it impossible for the intended user to select an appropriate tool for their particular research question. The aim of this paper is to provide an overview of the underlying data and representative network resources as well as to mention methods of integration, allowing a customized approach to resource selection. Additionally, this report will provide a primer for researchers venturing into the field of network integration. AU - Guala, D.* AU - Ogris, C. AU - Müller, N.S. AU - Sonnhammer, E.L.L.* C1 - 56546 C2 - 47091 CY - Great Clarendon St, Oxford Ox2 6dp, England SP - 1224-1237 TI - Genome-wide functional association networks: Background, data & state-of-the-art resources. JO - Brief. Bioinform. VL - 21 IS - 4 PB - Oxford Univ Press PY - 2020 SN - 1467-5463 ER - TY - JOUR AB - The understanding of complex biological networks often relies on both a dedicated layout and a topology. Currently, there are three major competing layout-aware systems biology formats, but there are no software tools or software libraries supporting all of them. This complicates the management of molecular network layouts and hinders their reuse and extension. In this paper, we present a high-level overview of the layout formats in systems biology, focusing on their commonalities and differences, review their support in existing software tools, libraries and repositories and finally introduce a new conversion module within the MINERVA platform. The module is available via a REST API and offers, besides the ability to convert between layout-aware systems biology formats, the possibility to export layouts into several graphical formats. The module enables conversion of very large networks with thousands of elements, such as disease maps or metabolic reconstructions, rendering it widely applicable in systems biology. AU - Hoksza, D.* AU - Gawron, P.* AU - Ostaszewski, M.* AU - Hasenauer, J. AU - Schneider, R.* C1 - 56551 C2 - 47099 CY - Great Clarendon St, Oxford Ox2 6dp, England SP - 1249-1260 TI - Closing the gap between formats for storing layout information in systems biology. JO - Brief. Bioinform. VL - 21 IS - 4 PB - Oxford Univ Press PY - 2020 SN - 1467-5463 ER - TY - JOUR AB - The first version of this article listed one of its authors as Jan Hausenauer rather than Jan Hasenauer. This has now been corrected. The authors regret the error. AU - Hoksza, D.* AU - Gawron, P.* AU - Ostaszewski, M.* AU - Hasenauer, J. AU - Schneider, R.* C1 - 58356 C2 - 48079 SP - 608 TI - Closing the gap between formats for storing layout information in systems biology. JO - Brief. Bioinform. VL - 22 IS - 1 PY - 2020 SN - 1467-5463 ER - TY - JOUR AB - Genome-wide DNA methylation studies have quickly expanded due to advances in next-generation sequencing techniques along with a wealth of computational tools to analyze the data. Most of our knowledge about DNA methylation profiles, epigenetic heritability and the function of DNA methylation in plants derives from the model species Arabidopsis thaliana. There are increasingly many studies on DNA methylation in plants-uncovering methylation profiles and explaining variations in different plant tissues. Additionally, DNA methylation comparisons of different plant tissue types and dynamics during development processes are only slowly emerging but are crucial for understanding developmental and regulatory decisions. Translating this knowledge from plant model species to commercial crops could allow the establishment of new varieties with increased stress resilience and improved yield. In this review, we provide an overview of the most commonly applied bioinformatics tools for the analysis of DNA methylation data (particularly bisulfite sequencing data). The performances of a selection of the tools are analyzed for computational time and agreement in predicted methylated sites for A. thaliana, which has a smaller genome compared to the hexaploid bread wheat. The performance of the tools was benchmarked on five plant genomes. We give examples of applications of DNA methylation data analysis in crops (with a focus on cereals) and an outlook for future developments for DNA methylation status manipulations and data integration. AU - Omony, J. AU - Nussbaumer, T. AU - Gutzat, R.* C1 - 55895 C2 - 46618 CY - Great Clarendon St, Oxford Ox2 6dp, England SP - 906-918 TI - DNA methylation analysis in plants: review of computational tools and future perspectives. JO - Brief. Bioinform. VL - 21 IS - 3 PB - Oxford Univ Press PY - 2020 SN - 1467-5463 ER - TY - JOUR AB - Copy number aberrations (CNAs) are known to strongly affect oncogenes and tumour suppressor genes. Given the critical role CNAs play in cancer research, it is essential to accurately identify CNAs from tumour genomes. One particular challenge in finding CNAs is the effect of confounding variables. To address this issue, we assessed how commonly used CNA identification algorithms perform on SNP 6.0 genotyping data in the presence of confounding variables. We simulated realistic synthetic data with varying levels of three confounding variables-the tumour purity, the length of a copy number region and the CNA burden (the percentage of CNAs present in a profiled genome)-and evaluated the performance of OncoSNP, ASCAT, GenoCNA, GISTIC and CGHcall. Furthermore, we implemented and assessed CGHcall*, an adjusted version of CGHcall accounting for high CNA burden. Our analysis on synthetic data indicates that tumour purity and the CNA burden strongly influence the performance of all the algorithms. No algorithm can correctly find lost and gained genomic regions across all tumour purities. The length of CNA regions influenced the performance of ASCAT, CGHcall and GISTIC. OncoSNP, GenoCNA and CGHcall* showed little sensitivity. Overall, CGHcall* and OncoSNP showed reasonable performance, particularly in samples with high tumour purity. Our analysis on the HapMap data revealed a good overlap between CGHcall, CGHcall* and GenoCNA results and experimentally validated data. Our exploratory analysis on the TCGA HNSCC data revealed plausible results of CGHcall, CGHcall* and GISTIC in consensus HNSCC CNA regions. AU - Pitea, A. AU - Kondofersky, I.* AU - Sass, S. AU - Theis, F.J. AU - Müller, N.S. AU - Unger, K. C1 - 54574 C2 - 45678 CY - Great Clarendon St, Oxford Ox2 6dp, England SP - 272-281 TI - Copy number aberrations from Affymetrix SNP 6.0 genotyping data-how accurate are commonly used prediction approaches? JO - Brief. Bioinform. VL - 21 IS - 1 PB - Oxford Univ Press PY - 2020 SN - 1467-5463 ER - TY - JOUR AB - The Disease Maps Project builds on a network of scientific and clinical groups that exchange best practices, share information and develop systems biomedicine tools. The project aims for an integrated, highly curated and user-friendly platform for disease-related knowledge. The primary focus of disease maps is on interconnected signaling, metabolic and gene regulatory network pathways represented in standard formats. The involvement of domain experts ensures that the key disease hallmarks are covered and relevant, up-to-date knowledge is adequately represented. Expert-curated and computer readable, disease maps may serve as a compendium of knowledge, allow for data-supported hypothesis generation or serve as a scaffold for the generation of predictive mathematical models. This article summarizes the 2nd Disease Maps Community meeting, highlighting its important topics and outcomes. We outline milestones on the roadmap for the future development of disease maps, including creating and maintaining standardized disease maps; sharing parts of maps that encode common human disease mechanisms; providing technical solutions for complexity management of maps; and Web tools for in-depth exploration of such maps. A dedicated discussion was focused on mathematical modeling approaches, as one of the main goals of disease map development is the generation of mathematically interpretable representations to predict disease comorbidity or drug response and to suggest drug repositioning, altogether supporting clinical decisions. AU - Ostaszewski, M.* AU - Gebel, S.* AU - Kuperstein, I.* AU - Mazein, A.* AU - Zinovyev, A.* AU - Dogrusoz, U.* AU - Hasenauer, J. AU - Fleming, R.M.T.* AU - Le Novère, N.* AU - Gawron, P.* AU - Ligon, T.* AU - Niarakis, A.* AU - Nickerson, D.A.* AU - Weindl, D. AU - Balling, R.* AU - Barillot, E.* AU - Auffray, C.* AU - Schneider, R.* C1 - 53446 C2 - 44865 SP - 659-670 TI - Community-driven roadmap for integrated disease maps. JO - Brief. Bioinform. VL - 20 IS - 2 PY - 2019 SN - 1467-5463 ER - TY - JOUR AB - In the present contribution we propose two recently developed classification algorithms for the analysis of mass-spectrometric data-the supervised neural gas and the fuzzy-labeled self-organizing map. The algorithms are inherently regularizing, which is recommended, for these spectral data because of its high dimensionality and the sparseness for specific problems. The algorithms are both prototype-based such that the principle of characteristic representants is realized. This leads to an easy interpretation of the generated classifcation model. Further, the fuzzy-labeled self-organizing map is able to process uncertainty in data, and classification results can be obtained as fuzzy decisions. Moreover, this fuzzy classification together with the property of topographic mapping offers the possibility of class similarity detection, which can be used for class visualization. We demonstrate the power of both methods for two exemplary examples: the classification of bacteria (listeria types) and neoplastic and non-neoplastic cell populations in breast cancer tissue sections. AU - Villmann, T.* AU - Schleif, F.M.* AU - Kostrzewa, M.* AU - Walch, A.K. AU - Hammer, B.* C1 - 3102 C2 - 25201 SP - 129-143 TI - Classification of mass-spectrometric data in clinical proteomics using learning vector quantization methods. JO - Brief. Bioinform. VL - 9 IS - 2 PB - Oxford Univ. Press PY - 2008 SN - 1467-5463 ER - TY - JOUR AB - Biobanks are well-organized resources comprising biological samples and associated information that are accessible to scientific investigation. Across Europe, millions of samples with related data are held in different types of collections. While individual collections can be well organized and accessible, the resources are subject to fragmentation, insecurity of funding and incompleteness. To address these issues, a Biobanking and BioMolecular Resources Infrastructure (BBMRI) is to be developed across Europe, thereby implementing a European 'roadmap' for research infrastructures that was developed by a forum of EU member states and that has been received by the European Commission. In this review, we describe the work involved in preparing for the construction of BBMRI in a European and global context. AU - Yuille, M.* AU - van Ommen, G.J.* AU - Bréchot, C.* AU - Cambon-Thomsen, A.* AU - Dagher, G.* AU - Landegren, U.* AU - Litton, J.E.* AU - Pasterk, M.* AU - Peltonen, L.* AU - Taussig, M.* AU - Wichmann, H.-E. AU - Zatloukal, K.* C1 - 3309 C2 - 25279 SP - 14-24 TI - Biobanking for Europe. JO - Brief. Bioinform. VL - 9 IS - 1 PB - Oxford Univ. Press PY - 2008 SN - 1467-5463 ER - TY - JOUR AB - Assessing the patterns of linkage disequilibrium (LD) has become an important issue in both evolutionary biology and medical genetics since the rapid accumulation of densely spaced DNA sequence variation data in several organisms. LD deals with the correlation of genetic variation at two or more loci or sites in the genome within a given population. There are a variety of LD measures which range from traditional pairwise LD measures such as D′ or r2 to entropy-based multi-locus measures or haplotype-specific approaches. Understanding the evolutionary forces (in particular recombination) that generate the observed variation of LD patterns across genomic regions is addressed by model-based LD analysis. Marker type and its allelic composition also influence the observed LD pattern, microsatellites having a greater power to detect LD in population isolates than SNPs. This review aims to explain basic LD measures and their application properties. AU - Mueller, J.C. C1 - 2694 C2 - 22286 SP - 1-10 TI - Linkage disequilibrium for different scales and applications. JO - Brief. Bioinform. VL - 5 IS - 4 PY - 2004 SN - 1467-5463 ER - TY - JOUR AB - The draft sequences of whole genomes are being published at an ever-increasing pace, thus providing access to the human genomic sequence and, more recently, the mouse sequence. Genomes of the invertebrates are also becoming available. Now that the genomic DNA of mammalian species is available, an old problem can be tackled with renewed vigour: mammalian promoter prediction. Gene promoters have proved elusive for more than a decade, despite their pivotal role in gene regulation. Recently, however, several new developments have made it possible to make meaningful large-scale predictions. This paper reviews the methods used for the prediction of mammalian, mostly human, promoters. AU - Werner, T. C1 - 4624 C2 - 22438 SP - 22-30 TI - The state of the art of mammalian promoter recognition. JO - Brief. Bioinform. VL - 4 IS - 1 PY - 2003 SN - 1467-5463 ER - TY - JOUR AB - This paper is aimed principally at bioinformaticians and biologists as an introduction to recent advances in mouse mutagenesis, concentrating on genome-wide screens utilising the powerful mutagen N-ethyl-N-nitroso-urea (ENU). It contains a brief background to the underlying genetics as well as details of the practical aspects of organisation and data capture for such projects. AU - Hrabě de Angelis, M. AU - Strivens, M.* C1 - 22312 C2 - 21120 SP - 170-180 TI - Large-scale production of mouse phenotypes : the search for animal models for inherited diseases in humans. JO - Brief. Bioinform. VL - 2 IS - 2 PY - 2001 SN - 1467-5463 ER - TY - JOUR AB - Identification of transcriptional elements in large sequences is a very difficult task, as individual transcription elements (eg transcription factor binding sites,TF-sites) are not clearly correlated with regions exerting transcription control. However, elucidation of the molecular organisation of genomic regions responsible for the control of gene expression is an essential part of the efforts to annotate the genomic sequences, especially within the Human Genome Project. The task for bioinformatics in this context is twofold. The first step required is the approximate localisation of regulatory sequences in large anonymous DNA sequences. Once those regions are located, the second task is the identification of individual transcriptional control elements and correlation of a subset of such elements with transcriptional functions. Part of this second task can be achieved by constructing organisational models of regulatory regions like promoters which can reveal elements important for a gene class or the coexpression of a set of genes. Comparative genomics in non-coding regions (eg phylogenetic footprinting) is a very promising approach that allows identification of potential new regulatory elements which may be used in modelling approaches. AU - Werner, T. C1 - 2271 C2 - 22798 SP - 372-380 TI - Identification and functional modelling of DNA sequence elements of transcription. JO - Brief. Bioinform. VL - 1 IS - 4 PY - 2000 SN - 1467-5463 ER -