Literature DB >> 19274708

Multiple classifier integration for the prediction of protein structural classes.

Lei Chen1, Lin Lu, Kairui Feng, Wenjin Li, Jie Song, Lulu Zheng, Youlang Yuan, Zhenbin Zeng, Kaiyan Feng, Wencong Lu, Yudong Cai.   

Abstract

Supervised classifiers, such as artificial neural network, partition trees, and support vector machines, are often used for the prediction and analysis of biological data. However, choosing an appropriate classifier is not straightforward because each classifier has its own strengths and weaknesses, and each biological dataset has its own characteristics. By integrating many classifiers together, people can avoid the dilemma of choosing an individual classifier out of many to achieve an optimized classification results (Rahman et al., Multiple Classifier Combination for Character Recognition: Revisiting the Majority Voting System and Its Variation, Springer, Berlin, 2002, 167-178). The classification algorithms come from Weka (Witten and Frank, Data Mining: Practical Machine Learning Tools and Techniques, Morgan Kaufmann, San Francisco, 2005) (a collection of software tools for machine learning algorithms). By integrating many predictors (classifiers) together through simple voting, the correct prediction (classification) rates are 65.21% and 65.63% for a basic training dataset and an independent test set, respectively. These results are better than any single machine learning algorithm collected in Weka when exactly the same data are used. Furthermore, we introduce an integration strategy which takes care of both classifier weightings and classifier redundancy. A feature selection strategy, called minimum redundancy maximum relevance (mRMR), is transferred into algorithm selection to deal with classifier redundancy in this research, and the weightings are based on the performance of each classifier. The best classification results are obtained when 11 algorithms are selected by mRMR method, and integrated together through majority votes with weightings. As a result, the prediction correct rates are 68.56% and 69.29% for the basic training dataset and the independent test dataset, respectively. The web-server is available at http://chemdata.shu.edu.cn/protein_st/. 2009 Wiley Periodicals, Inc.

Entities:  

Mesh:

Substances:

Year:  2009        PMID: 19274708     DOI: 10.1002/jcc.21230

Source DB:  PubMed          Journal:  J Comput Chem        ISSN: 0192-8651            Impact factor:   3.376


  10 in total

1.  Identifying novel protein phenotype annotations by hybridizing protein-protein interactions and protein sequence similarities.

Authors:  Lei Chen; Yu-Hang Zhang; Tao Huang; Yu-Dong Cai
Journal:  Mol Genet Genomics       Date:  2016-01-04       Impact factor: 3.291

2.  Heterogeneous ensemble approach with discriminative features and modified-SMOTEbagging for pre-miRNA classification.

Authors:  Supatcha Lertampaiporn; Chinae Thammarongtham; Chakarida Nukoolkit; Boonserm Kaewkamnerdpong; Marasri Ruengjitchatchawalya
Journal:  Nucleic Acids Res       Date:  2012-09-24       Impact factor: 16.971

3.  Classification and analysis of regulatory pathways using graph property, biochemical and physicochemical property, and functional property.

Authors:  Tao Huang; Lei Chen; Yu-Dong Cai; Kuo-Chen Chou
Journal:  PLoS One       Date:  2011-09-28       Impact factor: 3.240

4.  Prediction of RNA-binding proteins by voting systems.

Authors:  C R Peng; L Liu; B Niu; Y L Lv; M J Li; Y L Yuan; Y B Zhu; W C Lu; Y D Cai
Journal:  J Biomed Biotechnol       Date:  2011-07-26

5.  Prediction and analysis of retinoblastoma related genes through gene ontology and KEGG.

Authors:  Zhen Li; Bi-Qing Li; Min Jiang; Lei Chen; Jian Zhang; Lin Liu; Tao Huang
Journal:  Biomed Res Int       Date:  2013-08-13       Impact factor: 3.411

6.  Prediction of Protein Structural Class Based on Gapped-Dipeptides and a Recursive Feature Selection Approach.

Authors:  Taigang Liu; Yufang Qin; Yongjie Wang; Chunhua Wang
Journal:  Int J Mol Sci       Date:  2015-12-24       Impact factor: 5.923

7.  Missing Value Estimation Methods Research for Arrhythmia Classification Using the Modified Kernel Difference-Weighted KNN Algorithms.

Authors:  Fei Yang; Jiazhi Du; Jiying Lang; Weigang Lu; Lei Liu; Changlong Jin; Qinma Kang
Journal:  Biomed Res Int       Date:  2020-06-21       Impact factor: 3.411

8.  KStable: A Computational Method for Predicting Protein Thermal Stability Changes by K-Star with Regular-mRMR Feature Selection.

Authors:  Chi-Wei Chen; Kai-Po Chang; Cheng-Wei Ho; Hsung-Pin Chang; Yen-Wei Chu
Journal:  Entropy (Basel)       Date:  2018-12-19       Impact factor: 2.524

9.  Predicting the DPP-IV inhibitory activity pIC₅₀ based on their physicochemical properties.

Authors:  Tianhong Gu; Xiaoyan Yang; Minjie Li; Milin Wu; Qiang Su; Wencong Lu; Yuhui Zhang
Journal:  Biomed Res Int       Date:  2013-06-20       Impact factor: 3.411

10.  Gene ontology and KEGG enrichment analyses of genes related to age-related macular degeneration.

Authors:  Jian Zhang; ZhiHao Xing; Mingming Ma; Ning Wang; Yu-Dong Cai; Lei Chen; Xun Xu
Journal:  Biomed Res Int       Date:  2014-08-06       Impact factor: 3.411

  10 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.