Literature DB >> 25350499

Improving protein fold recognition by random forest.

Taeho Jo, Jianlin Cheng.   

Abstract

BACKGROUND: Recognizing the correct structural fold among known template protein structures for a target protein (i.e. fold recognition) is essential for template-based protein structure modeling. Since the fold recognition problem can be defined as a binary classification problem of predicting whether or not the unknown fold of a target protein is similar to an already known template protein structure in a library, machine learning methods have been effectively applied to tackle this problem. In our work, we developed RF-Fold that uses random forest - one of the most powerful and scalable machine learning classification methods - to recognize protein folds.
RESULTS: RF-Fold consists of hundreds of decision trees that can be trained efficiently on very large datasets to make accurate predictions on a highly imbalanced dataset. We evaluated RF-Fold on the standard Lindahl's benchmark dataset comprised of 976 × 975 target-template protein pairs through cross-validation. Compared with 17 different fold recognition methods, the performance of RF-Fold is generally comparable to the best performance in fold recognition of different difficulty ranging from the easiest family level, the medium-hard superfamily level, and to the hardest fold level. Based on the top-one template protein ranked by RF-Fold, the correct recognition rate is 84.5%, 63.4%, and 40.8% at family, superfamily, and fold levels, respectively. Based on the top-five template protein folds ranked by RF-Fold, the correct recognition rate increases to 91.5%, 79.3% and 58.3% at family, superfamily, and fold levels.
CONCLUSIONS: The good performance achieved by the RF-Fold demonstrates the random forest's effectiveness for protein fold recognition.

Entities:  

Mesh:

Year:  2014        PMID: 25350499      PMCID: PMC4251042          DOI: 10.1186/1471-2105-15-S11-S14

Source DB:  PubMed          Journal:  BMC Bioinformatics        ISSN: 1471-2105            Impact factor:   3.169


  35 in total

1.  The Protein Data Bank.

Authors:  H M Berman; J Westbrook; Z Feng; G Gilliland; T N Bhat; H Weissig; I N Shindyalov; P E Bourne
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

2.  Hidden Markov models that use predicted secondary structures for fold recognition.

Authors:  J Hargbo; A Elofsson
Journal:  Proteins       Date:  1999-07-01

3.  RAPTOR: optimal protein threading by linear programming.

Authors:  Jinbo Xu; Ming Li; Dongsup Kim; Ying Xu
Journal:  J Bioinform Comput Biol       Date:  2003-04       Impact factor: 1.122

4.  Profile-profile methods provide improved fold-recognition: a study of different profile-profile alignment methods.

Authors:  Tomas Ohlson; Björn Wallner; Arne Elofsson
Journal:  Proteins       Date:  2004-10-01

5.  COACH: profile-profile alignment of protein families using hidden Markov models.

Authors:  Robert C Edgar; Kimmen Sjölander
Journal:  Bioinformatics       Date:  2004-02-12       Impact factor: 6.937

Review 6.  Machine learning methods for protein structure prediction.

Authors:  Jianlin Cheng; Allison N Tegge; Pierre Baldi
Journal:  IEEE Rev Biomed Eng       Date:  2008

7.  Three-stage prediction of protein beta-sheets by neural networks, alignments and graph algorithms.

Authors:  Jianlin Cheng; Pierre Baldi
Journal:  Bioinformatics       Date:  2005-06       Impact factor: 6.937

8.  PFRES: protein fold classification by using evolutionary information and predicted secondary structure.

Authors:  Ke Chen; Lukasz Kurgan
Journal:  Bioinformatics       Date:  2007-10-17       Impact factor: 6.937

9.  A machine learning information retrieval approach to protein fold recognition.

Authors:  Jianlin Cheng; Pierre Baldi
Journal:  Bioinformatics       Date:  2006-03-17       Impact factor: 6.937

10.  Critical assessment of methods of protein structure prediction-Round VII.

Authors:  John Moult; Krzysztof Fidelis; Andriy Kryshtafovych; Burkhard Rost; Tim Hubbard; Anna Tramontano
Journal:  Proteins       Date:  2007
View more
  12 in total

1.  Pseudomonas pseudoalcaligenes KF707 grown with biphenyl expresses a cytochrome caa3 oxidase that uses cytochrome c4 as electron donor.

Authors:  Federica Sandri; Francesco Musiani; Nur Selamoglu; Fevzi Daldal; Davide Zannoni
Journal:  FEBS Lett       Date:  2018-03-01       Impact factor: 4.124

2.  Deep learning-based identification of genetic variants: application to Alzheimer's disease classification.

Authors:  Taeho Jo; Kwangsik Nho; Paula Bice; Andrew J Saykin
Journal:  Brief Bioinform       Date:  2022-03-10       Impact factor: 11.622

3.  Why can deep convolutional neural networks improve protein fold recognition? A visual explanation by interpretation.

Authors:  Yan Liu; Yi-Heng Zhu; Xiaoning Song; Jiangning Song; Dong-Jun Yu
Journal:  Brief Bioinform       Date:  2021-09-02       Impact factor: 11.622

4.  Improving protein fold recognition using triplet network and ensemble deep learning.

Authors:  Yan Liu; Ke Han; Yi-Heng Zhu; Ying Zhang; Long-Chen Shen; Jiangning Song; Dong-Jun Yu
Journal:  Brief Bioinform       Date:  2021-11-05       Impact factor: 13.994

5.  Improving Protein Fold Recognition by Deep Learning Networks.

Authors:  Taeho Jo; Jie Hou; Jesse Eickholt; Jianlin Cheng
Journal:  Sci Rep       Date:  2015-12-04       Impact factor: 4.379

6.  Proceedings of the 2014 MidSouth Computational Biology and Bioinformatics Society (MCBIOS) Conference.

Authors:  Jonathan D Wren; Mikhail G Dozmorov; Dennis Burian; Andy Perkins; Chaoyang Zhang; Peter Hoyt; Rakesh Kaundal
Journal:  BMC Bioinformatics       Date:  2014-10-21       Impact factor: 3.169

7.  Adaptive local learning in sampling based motion planning for protein folding.

Authors:  Chinwe Ekenna; Shawna Thomas; Nancy M Amato
Journal:  BMC Syst Biol       Date:  2016-08-01

8.  DeepSF: deep convolutional neural network for mapping protein sequences to folds.

Authors:  Jie Hou; Badri Adhikari; Jianlin Cheng
Journal:  Bioinformatics       Date:  2018-04-15       Impact factor: 6.937

9.  Impact of structure space continuity on protein fold classification.

Authors:  Jinrui Xu; Jianzhi Zhang
Journal:  Sci Rep       Date:  2016-03-23       Impact factor: 4.379

10.  RF-Phos: A Novel General Phosphorylation Site Prediction Tool Based on Random Forest.

Authors:  Hamid D Ismail; Ahoi Jones; Jung H Kim; Robert H Newman; Dukka B Kc
Journal:  Biomed Res Int       Date:  2016-03-15       Impact factor: 3.411

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.