PuSH - Publikationsserver des Helmholtz Zentrums München

Friedel, C.C.* ; Jahn, K.H.* ; Sommer, S.* ; Rudd, S.* ; Mewes, H.-W. ; Tetko, I.V.

Support vector machines for separation of mixed plant-pathogen EST collections based on codon usage.

Bioinformatics 21, 1383-1388 (2005)
Verlagsversion Volltext DOI PMC
Open Access Gold
MOTIVATION: Discovery of host and pathogen genes expressed at the plant-pathogen interface often requires the construction of mixed libraries that contain sequences from both genomes. Sequence identification requires high-throughput and reliable classification of genome origin. When using single-pass cDNA sequences difficulties arise from the short sequence length, the lack of sufficient taxonomically relevant sequence data in public databases and ambiguous sequence homology between plant and pathogen genes. RESULTS: A novel method is described, which is independent of the availability of homologous genes and relies on subtle differences in codon usage between plant and fungal genes. We used support vector machines (SVMs) to identify the probable origin of sequences. SVMs were compared to several other machine learning techniques and to a probabilistic algorithm (PF-IND) for expressed sequence tag (EST) classification also based on codon bias differences. Our software (Eclat) has achieved a classification accuracy of 93.1% on a test set of 3217 EST sequences from Hordeum vulgare and Blumeria graminis, which is a significant improvement compared to PF-IND (prediction accuracy of 81.2% on the same test set). EST sequences with at least 50 nt of coding sequence can be classified using Eclat with high confidence. Eclat allows training of classifiers for any host-pathogen combination for which there are sufficient classified training sequences. AVAILABILITY: Eclat is freely available on the Internet (http://mips.gsf.de/proj/est) or on request as a standalone version.
Impact Factor
Scopus SNIP
Web of Science
Times Cited
Scopus
Cited By
Altmetric
5.742
0.000
19
22
Tags
Anmerkungen
Besondere Publikation
Auf Hompepage verbergern

Zusatzinfos bearbeiten
Eigene Tags bearbeiten
Privat
Eigene Anmerkung bearbeiten
Privat
Auf Publikationslisten für
Homepage nicht anzeigen
Als besondere Publikation
markieren
Publikationstyp Artikel: Journalartikel
Dokumenttyp Wissenschaftlicher Artikel
Schlagwörter FUNGAL SEQUENCES; DROSOPHILA; PATTERNS
Sprache englisch
Veröffentlichungsjahr 2005
HGF-Berichtsjahr 2005
e-ISSN 1367-4811
Zeitschrift Bioinformatics
Quellenangaben Band: 21, Heft: 8, Seiten: 1383-1388 Artikelnummer: , Supplement: ,
Verlag Oxford University Press
Verlagsort Oxford
Begutachtungsstatus Peer reviewed
POF Topic(s) 30505 - New Technologies for Biomedical Discoveries
Forschungsfeld(er) Enabling and Novel Technologies
PSP-Element(e) G-503700-001
PubMed ID 15585526
Scopus ID 17444396383
Erfassungsdatum 2005-12-01