Literature DB >> 28292249

A Feature and Algorithm Selection Method for Improving the Prediction of Protein Structural Class.

Qianwu Ni1, Lei Chen1.   

Abstract

AIM AND
OBJECTIVE: Correct prediction of protein structural class is beneficial to investigation on protein functions, regulations and interactions. In recent years, several computational methods have been proposed in this regard. However, based on various features, it is still a great challenge to select proper classification algorithm and extract essential features to participate in classification.
MATERIAL AND METHODS: In this study, a feature and algorithm selection method was presented for improving the accuracy of protein structural class prediction. The amino acid compositions and physiochemical features were adopted to represent features and thirty-eight machine learning algorithms collected in Weka were employed. All features were first analyzed by a feature selection method, minimum redundancy maximum relevance (mRMR), producing a feature list. Then, several feature sets were constructed by adding features in the list one by one. For each feature set, thirtyeight algorithms were executed on a dataset, in which proteins were represented by features in the set. The predicted classes yielded by these algorithms and true class of each protein were collected to construct a dataset, which were analyzed by mRMR method, yielding an algorithm list. From the algorithm list, the algorithm was taken one by one to build an ensemble prediction model. Finally, we selected the ensemble prediction model with the best performance as the optimal ensemble prediction model.
RESULTS: Experimental results indicate that the constructed model is much superior to models using single algorithm and other models that only adopt feature selection procedure or algorithm selection procedure.
CONCLUSION: The feature selection procedure or algorithm selection procedure are really helpful for building an ensemble prediction model that can yield a better performance. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

Entities:  

Keywords:  Protein structural class prediction; algorithmzzm321990selection; ensemble classifier; feature selection; minimum redundancy maximum relevance; optimal ensemble prediction model

Mesh:

Substances:

Year:  2017        PMID: 28292249     DOI: 10.2174/1386207320666170314103147

Source DB:  PubMed          Journal:  Comb Chem High Throughput Screen        ISSN: 1386-2073            Impact factor:   1.339


  8 in total

1.  Discriminating cirRNAs from other lncRNAs using a hierarchical extreme learning machine (H-ELM) algorithm with feature selection.

Authors:  Lei Chen; Yu-Hang Zhang; Guohua Huang; Xiaoyong Pan; ShaoPeng Wang; Tao Huang; Yu-Dong Cai
Journal:  Mol Genet Genomics       Date:  2017-09-14       Impact factor: 3.291

2.  Prediction and analysis of essential genes using the enrichments of gene ontology and KEGG pathways.

Authors:  Lei Chen; Yu-Hang Zhang; ShaoPeng Wang; YunHua Zhang; Tao Huang; Yu-Dong Cai
Journal:  PLoS One       Date:  2017-09-05       Impact factor: 3.240

3.  Identification of Differentially Expressed Genes between Original Breast Cancer and Xenograft Using Machine Learning Algorithms.

Authors:  Deling Wang; Jia-Rui Li; Yu-Hang Zhang; Lei Chen; Tao Huang; Yu-Dong Cai
Journal:  Genes (Basel)       Date:  2018-03-12       Impact factor: 4.096

4.  Computational Approach to Investigating Key GO Terms and KEGG Pathways Associated with CNV.

Authors:  YuanYuan Luo; Yan Yan; Shiqi Zhang; Zhen Li
Journal:  Biomed Res Int       Date:  2018-04-11       Impact factor: 3.411

5.  Analysis of Protein-Protein Functional Associations by Using Gene Ontology and KEGG Pathway.

Authors:  Fei Yuan; Xiaoyong Pan; Lei Chen; Yu-Hang Zhang; Tao Huang; Yu-Dong Cai
Journal:  Biomed Res Int       Date:  2019-07-18       Impact factor: 3.411

6.  Roles of Physicochemical and Structural Properties of RNA-Binding Proteins in Predicting the Activities of Trans-Acting Splicing Factors with Machine Learning.

Authors:  Lin Zhu; Wenjin Li
Journal:  Int J Mol Sci       Date:  2022-04-17       Impact factor: 6.208

7.  Identifying and analyzing different cancer subtypes using RNA-seq data of blood platelets.

Authors:  Yu-Hang Zhang; Tao Huang; Lei Chen; YaoChen Xu; Yu Hu; Lan-Dian Hu; Yudong Cai; Xiangyin Kong
Journal:  Oncotarget       Date:  2017-09-15

8.  A Computational Method for Classifying Different Human Tissues with Quantitatively Tissue-Specific Expressed Genes.

Authors:  JiaRui Li; Lei Chen; Yu-Hang Zhang; XiangYin Kong; Tao Huang; Yu-Dong Cai
Journal:  Genes (Basel)       Date:  2018-09-07       Impact factor: 4.096

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.