PuSH - Publikationsserver des Helmholtz Zentrums München

Smialowski, P. ; Doose, G.* ; Torkler, P.* ; Kaufmann, S.* ; Frishman, D.

PROSO II - a new method for protein solubility prediction.

FEBS J. 279, 2192-2200 (2012)
Verlagsversion Volltext DOI PMC
Closed
Open Access Green möglich sobald Postprint bei der ZB eingereicht worden ist.
Many fields of science and industry depend on efficient production of active protein using heterologous expression in Escherichia coli. The solubility of proteins upon expression is dependent on their amino acid sequence. Prediction of solubility from sequence is therefore highly valuable. We present a novel machine-learning-based model called PROSO II which makes use of new classification methods and growth in experimental data to improve coverage and accuracy of solubility predictions. The classification algorithm is organized as a two-layered structure in which the output of a primary Parzen window model for sequence similarity and a logistic regression classifier of amino acid k-mer composition serve as input for a second-level logistic regression classifier. Compared with previously published research our model is trained on five times more data than used by any other method before (82 000 proteins). When tested on a separate holdout set not used at any point of method development our server attained the best results in comparison with other currently available methods: accuracy 75.4%, Matthews correlation coefficient 0.39, sensitivity 0.731, specificity 0.759, gain (soluble) 2.263. In summary, due to utilization of cutting edge machine learning technologies combined with the largest currently available experimental data set the PROSO II server constitutes a substantial improvement in protein solubility predictions. PROSO II is available at .
Impact Factor
Scopus SNIP
Web of Science
Times Cited
Scopus
Cited By
Altmetric
3.790
0.890
107
126
Tags
Anmerkungen
Besondere Publikation
Auf Hompepage verbergern

Zusatzinfos bearbeiten
Eigene Tags bearbeiten
Privat
Eigene Anmerkung bearbeiten
Privat
Auf Publikationslisten für
Homepage nicht anzeigen
Als besondere Publikation
markieren
Publikationstyp Artikel: Journalartikel
Dokumenttyp Wissenschaftlicher Artikel
Schlagwörter Classification ; Feature Selection ; Machine Learning ; Prediction ; Protein Solubility; SEQUENCE-BASED PREDICTION; ESCHERICHIA-COLI; STRUCTURAL PROTEOMICS; STABILITY CHANGES; POINT MUTATIONS; INCLUSION-BODY; EXPRESSION; DATABASE; SERVER; TOOL
Sprache
Veröffentlichungsjahr 2012
HGF-Berichtsjahr 2012
ISSN (print) / ISBN 1742-464X
e-ISSN 1742-4658
Zeitschrift FEBS Journal, The
Quellenangaben Band: 279, Heft: 12, Seiten: 2192-2200 Artikelnummer: , Supplement: ,
Verlag Wiley
Begutachtungsstatus Peer reviewed
POF Topic(s) 30505 - New Technologies for Biomedical Discoveries
Forschungsfeld(er) Enabling and Novel Technologies
PSP-Element(e) G-503700-001
PubMed ID 22536855
Scopus ID 84861888345
Erfassungsdatum 2012-06-21