PuSH - Publikationsserver des Helmholtz Zentrums München: Applying negative rule mining to improve genome annotation.

Navigation

Startseite

English

Recherche

Erweiterte Suche

Durchblättern nach ...

... Zeitschriften

... Publikationstypen

... Forschungsdaten

... Erscheinungsjahr

Publikationen im Überblick

Hilfe & Kontakt

Ansprechpartner

Hilfe

Datenschutz

Artamonova, I.I. ; Frishman, G. ; Frishman, D.

Applying negative rule mining to improve genome annotation.

BMC Bioinformatics 8:261 (2007)

Verlagsversion Volltext

DOI

PMC

	Open Access Gold

Abstract
Metriken
Zusatzinfos

Unsupervised annotation of proteins by software pipelines suffers from very high error rates. Spurious functional assignments are usually caused by unwarranted homology-based transfer of information from existing database entries to the new target sequences. We have previously demonstrated that data mining in large sequence annotation databanks can help identify annotation items that are strongly associated with each other, and that exceptions from strong positive association rules often point to potential annotation errors. Here we investigate the applicability of negative association rule mining to revealing erroneously assigned annotation items. RESULTS: Almost all exceptions from strong negative association rules are connected to at least one wrong attribute in the feature combination making up the rule. The fraction of annotation features flagged by this approach as suspicious is strongly enriched in errors and constitutes about 0.6% of the whole body of the similarity-transferred annotation in the PEDANT genome database. Positive rule mining does not identify two thirds of these errors. The approach based on exceptions from negative rules is much more specific than positive rule mining, but its coverage is significantly lower. CONCLUSION: Mining of both negative and positive association rules is a potent tool for finding significant trends in protein annotation and flagging doubtful features for further inspection.

Altmetric

Weitere Metriken?

[➜Einloggen]

Zusatzinfos bearbeiten [➜Einloggen]

Publikationstyp Artikel: Journalartikel

Dokumenttyp Wissenschaftlicher Artikel

Schlagwörter PROTEIN SEQUENCES; DATABASE; RESOURCE; PEDANT; YEAST; MIPS

ISSN (print) / ISBN 1471-2105

e-ISSN 1471-2105

Zeitschrift BMC Bioinformatics

Quellenangaben Band: 8, Artikelnummer: 261

Verlag Springer

Begutachtungsstatus Peer reviewed

Institut(e) Institute of Bioinformatics and Systems Biology (IBIS)