Literature DB >> 19473879

Supervised machine learning algorithms for protein structure classification.

Pooja Jain1, Jonathan M Garibaldi, Jonathan D Hirst.   

Abstract

We explore automation of protein structural classification using supervised machine learning methods on a set of 11,360 pairs of protein domains (up to 35% sequence identity) consisting of three secondary structure elements. Fifteen algorithms from five categories of supervised algorithms are evaluated for their ability to learn for a pair of protein domains, the deepest common structural level within the SCOP hierarchy, given a one-dimensional representation of the domain structures. This representation encapsulates evolutionary information in terms of sequence identity and structural information characterising the secondary structure elements and lengths of the respective domains. The evaluation is performed in two steps, first selecting the best performing base learners and subsequently evaluating boosted and bagged meta learners. The boosted random forest, a collection of decision trees, is found to be the most accurate, with a cross-validated accuracy of 97.0% and F-measures of 0.97, 0.85, 0.93 and 0.98 for classification of proteins to the Class, Fold, Super-Family and Family levels in the SCOP hierarchy. The meta learning regime, especially boosting, improved performance by more accurately classifying the instances from less populated classes.

Mesh:

Substances:

Year:  2009        PMID: 19473879     DOI: 10.1016/j.compbiolchem.2009.04.004

Source DB:  PubMed          Journal:  Comput Biol Chem        ISSN: 1476-9271            Impact factor:   2.877


  10 in total

1.  Improving protein fold recognition by random forest.

Authors:  Taeho Jo; Jianlin Cheng
Journal:  BMC Bioinformatics       Date:  2014-10-21       Impact factor: 3.169

2.  A Comparison of Deep Learning Techniques for Arterial Blood Pressure Prediction.

Authors:  Annunziata Paviglianiti; Vincenzo Randazzo; Stefano Villata; Giansalvo Cirrincione; Eros Pasero
Journal:  Cognit Comput       Date:  2021-08-27       Impact factor: 4.890

3.  A Hybrid Levenberg-Marquardt Algorithm on a Recursive Neural Network for Scoring Protein Models.

Authors:  Eshel Faraggi; Robert L Jernigan; Andrzej Kloczkowski
Journal:  Methods Mol Biol       Date:  2021

4.  A machine learning approach to the digitalization of bank customers: Evidence from random and causal forests.

Authors:  Santiago Carbo-Valverde; Pedro Cuadros-Solas; Francisco Rodríguez-Fernández
Journal:  PLoS One       Date:  2020-10-28       Impact factor: 3.240

5.  Automatic structure classification of small proteins using random forest.

Authors:  Pooja Jain; Jonathan D Hirst
Journal:  BMC Bioinformatics       Date:  2010-07-01       Impact factor: 3.169

6.  A novel scoring function for discriminating hyperthermophilic and mesophilic proteins with application to predicting relative thermostability of protein mutants.

Authors:  Yunqi Li; C Russell Middaugh; Jianwen Fang
Journal:  BMC Bioinformatics       Date:  2010-01-28       Impact factor: 3.169

7.  Gene expression prediction using low-rank matrix completion.

Authors:  Arnav Kapur; Kshitij Marwah; Gil Alterovitz
Journal:  BMC Bioinformatics       Date:  2016-06-17       Impact factor: 3.169

8.  Prediction of donor splice sites using random forest with a new sequence encoding approach.

Authors:  Prabina Kumar Meher; Tanmaya Kumar Sahu; Atmakuri Ramakrishna Rao
Journal:  BioData Min       Date:  2016-01-22       Impact factor: 2.522

9.  Network-based protein structural classification.

Authors:  Khalique Newaz; Mahboobeh Ghalehnovi; Arash Rahnama; Panos J Antsaklis; Tijana Milenković
Journal:  R Soc Open Sci       Date:  2020-06-03       Impact factor: 2.963

10.  Predicting ATP-Binding Cassette Transporters Using the Random Forest Method.

Authors:  Ruiyan Hou; Lida Wang; Yi-Jun Wu
Journal:  Front Genet       Date:  2020-03-25       Impact factor: 4.599

  10 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.