Literature DB >> 22376768

A top-down approach to classify enzyme functional classes and sub-classes using random forest.

Chetan Kumar1, Alok Choudhary.   

Abstract

Advancements in sequencing technologies have witnessed an exponential rise in the number of newly found enzymes. Enzymes are proteins that catalyze bio-chemical reactions and play an important role in metabolic pathways. Commonly, function of such enzymes is determined by experiments that can be time consuming and costly. Hence, a need for a computing method is felt that can distinguish protein enzyme sequences from those of non-enzymes and reliably predict the function of the former. To address this problem, approaches that cluster enzymes based on their sequence and structural similarity have been presented. But, these approaches are known to fail for proteins that perform the same function and are dissimilar in their sequence and structure. In this article, we present a supervised machine learning model to predict the function class and sub-class of enzymes based on a set of 73 sequence-derived features. The functional classes are as defined by International Union of Biochemistry and Molecular Biology. Using an efficient data mining algorithm called random forest, we construct a top-down three layer model where the top layer classifies a query protein sequence as an enzyme or non-enzyme, the second layer predicts the main function class and bottom layer further predicts the sub-function class. The model reported overall classification accuracy of 94.87% for the first level, 87.7% for the second, and 84.25% for the bottom level. Our results compare very well with existing methods, and in many cases report better performance. Using feature selection methods, we have shown the biological relevance of a few of the top rank attributes.

Entities:  

Year:  2012        PMID: 22376768      PMCID: PMC3351021          DOI: 10.1186/1687-4153-2012-1

Source DB:  PubMed          Journal:  EURASIP J Bioinform Syst Biol        ISSN: 1687-4145


  17 in total

1.  The ENZYME database in 2000.

Authors:  A Bairoch
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

2.  EMBOSS: the European Molecular Biology Open Software Suite.

Authors:  P Rice; I Longden; A Bleasby
Journal:  Trends Genet       Date:  2000-06       Impact factor: 11.639

3.  Prediction of enzyme family classes.

Authors:  Kuo-Chen Chou; David W Elrod
Journal:  J Proteome Res       Date:  2003 Mar-Apr       Impact factor: 4.466

4.  Prediction of novel archaeal enzymes from sequence-derived features.

Authors:  Lars Juhl Jensen; Marie Skovgaard; Søren Brunak
Journal:  Protein Sci       Date:  2002-12       Impact factor: 6.725

5.  EzyPred: a top-down approach for predicting enzyme functional classes and subclasses.

Authors:  Hong-Bin Shen; Kuo-Chen Chou
Journal:  Biochem Biophys Res Commun       Date:  2007-10-02       Impact factor: 3.575

6.  Protein function classification via support vector machine approach.

Authors:  C Z Cai; W L Wang; L Z Sun; Y Z Chen
Journal:  Math Biosci       Date:  2003-10       Impact factor: 2.144

Review 7.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.

Authors:  S F Altschul; T L Madden; A A Schäffer; J Zhang; Z Zhang; W Miller; D J Lipman
Journal:  Nucleic Acids Res       Date:  1997-09-01       Impact factor: 16.971

8.  Pectin degrading glycoside hydrolases of family 28: sequence-structural features, specificities and evolution.

Authors:  O Markovic; S Janecek
Journal:  Protein Eng       Date:  2001-09

9.  Gene selection and classification of microarray data using random forest.

Authors:  Ramón Díaz-Uriarte; Sara Alvarez de Andrés
Journal:  BMC Bioinformatics       Date:  2006-01-06       Impact factor: 3.169

10.  Predicting functional family of novel enzymes irrespective of sequence similarity: a statistical learning approach.

Authors:  L Y Han; C Z Cai; Z L Ji; Z W Cao; J Cui; Y Z Chen
Journal:  Nucleic Acids Res       Date:  2004-12-07       Impact factor: 16.971

View more
  8 in total

1.  Prediction of detailed enzyme functions and identification of specificity determining residues by random forests.

Authors:  Chioko Nagao; Nozomi Nagano; Kenji Mizuguchi
Journal:  PLoS One       Date:  2014-01-08       Impact factor: 3.240

Review 2.  A survey of computational intelligence techniques in protein function prediction.

Authors:  Arvind Kumar Tiwari; Rajeev Srivastava
Journal:  Int J Proteomics       Date:  2014-12-11

3.  Automatic single- and multi-label enzymatic function prediction by machine learning.

Authors:  Shervine Amidi; Afshine Amidi; Dimitrios Vlachakis; Nikos Paragios; Evangelia I Zacharaki
Journal:  PeerJ       Date:  2017-03-29       Impact factor: 2.984

4.  EnzyNet: enzyme classification using 3D convolutional neural networks on spatial representation.

Authors:  Afshine Amidi; Shervine Amidi; Dimitrios Vlachakis; Vasileios Megalooikonomou; Nikos Paragios; Evangelia I Zacharaki
Journal:  PeerJ       Date:  2018-05-04       Impact factor: 2.984

5.  DEEPre: sequence-based enzyme EC number prediction by deep learning.

Authors:  Yu Li; Sheng Wang; Ramzan Umarov; Bingqing Xie; Ming Fan; Lihua Li; Xin Gao
Journal:  Bioinformatics       Date:  2018-03-01       Impact factor: 6.937

6.  Isofunctional Protein Subfamily Detection Using Data Integration and Spectral Clustering.

Authors:  Elisa Boari de Lima; Wagner Meira; Raquel Cardoso de Melo-Minardi
Journal:  PLoS Comput Biol       Date:  2016-06-27       Impact factor: 4.475

7.  ECPred: a tool for the prediction of the enzymatic functions of protein sequences based on the EC nomenclature.

Authors:  Alperen Dalkiran; Ahmet Sureyya Rifaioglu; Maria Jesus Martin; Rengul Cetin-Atalay; Volkan Atalay; Tunca Doğan
Journal:  BMC Bioinformatics       Date:  2018-09-21       Impact factor: 3.169

8.  Alignment-Free Method to Predict Enzyme Classes and Subclasses.

Authors:  Riccardo Concu; M Natália D S Cordeiro
Journal:  Int J Mol Sci       Date:  2019-10-29       Impact factor: 5.923

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.