PuSH - Publikationsserver des Helmholtz Zentrums München

Beyond the 'best' match: Machine learning annotation of protein sequences by integration of different sources of information.

Bioinformatics 24, 621-628 (2008)
Verlagsversion Volltext DOI PMC
Free by publisher
Open Access Green möglich sobald Postprint bei der ZB eingereicht worden ist.
Accurate automatic assignment of protein functions remains a challenge for genome annotation. We have developed and compared the automatic annotation of four bacterial genomes employing a 5-fold cross-validation procedure and several machine learning methods. RESULTS: The analyzed genomes were manually annotated with FunCat categories in MIPS providing a gold standard. Features describing a pair of sequences rather than each sequence alone were used. The descriptors were derived from sequence alignment scores, InterPro domains, synteny information, sequence length and calculated protein properties. Following training we scored all pairs from the validation sets, selected a pair with the highest predicted score and annotated the target protein with functional categories of the prototype protein. The data integration using machine-learning methods provided significantly higher annotation accuracy compared to the use of individual descriptors alone. The neural network approach showed the best performance. The descriptors derived from the InterPro domains and sequence similarity provided the highest contribution to the method performance. The predicted annotation scores allow differentiation of reliable versus non-reliable annotations. The developed approach was applied to annotate the protein sequences from 180 complete bacterial genomes. AVAILABILITY: The FUNcat Annotation Tool (FUNAT) is available on-line as Web Services at http://mips.gsf.de/proj/funat.
Altmetric
Weitere Metriken?
Zusatzinfos bearbeiten [➜Einloggen]
Publikationstyp Artikel: Journalartikel
Dokumenttyp Wissenschaftlicher Artikel
Korrespondenzautor
Schlagwörter FUNCTION PREDICTION; NEURAL-NETWORK; AUTOMATIC ANNOTATION; VARIABLE SELECTION; SORTING SIGNALS; GENE ONTOLOGY; DATABASE; CLASSIFICATION; GENOMES; ALGORITHM
ISSN (print) / ISBN 1367-4803
e-ISSN 1367-4811
Zeitschrift Bioinformatics
Quellenangaben Band: 24, Heft: 5, Seiten: 621-628 Artikelnummer: , Supplement: ,
Verlag Oxford University Press
Verlagsort Oxford
Nichtpatentliteratur Publikationen
Begutachtungsstatus Peer reviewed