Literature DB >> 34162327

Predicting antifreeze proteins with weighted generalized dipeptide composition and multi-regression feature selection ensemble.

Shunfang Wang1, Lin Deng2, Xinnan Xia3, Zicheng Cao4, Yu Fei5.   

Abstract

BACKGROUND: Antifreeze proteins (AFPs) are a group of proteins that inhibit body fluids from growing to ice crystals and thus improve biological antifreeze ability. It is vital to the survival of living organisms in extremely cold environments. However, little research is performed on sequences feature extraction and selection for antifreeze proteins classification in the structure and function prediction, which is of great significance.
RESULTS: In this paper, to predict the antifreeze proteins, a feature representation of weighted generalized dipeptide composition (W-GDipC) and an ensemble feature selection based on two-stage and multi-regression method (LRMR-Ri) are proposed. Specifically, four feature selection algorithms: Lasso regression, Ridge regression, Maximal information coefficient and Relief are used to select the feature sets, respectively, which is the first stage of LRMR-Ri method. If there exists a common feature subset among the above four sets, it is the optimal subset; otherwise we use Ridge regression to select the optimal subset from the public set pooled by the four sets, which is the second stage of LRMR-Ri. The LRMR-Ri method combined with W-GDipC was performed both on the antifreeze proteins dataset (binary classification), and on the membrane protein dataset (multiple classification). Experimental results show that this method has good performance in support vector machine (SVM), decision tree (DT) and stochastic gradient descent (SGD). The values of ACC, RE and MCC of LRMR-Ri and W-GDipC with antifreeze proteins dataset and SVM classifier have reached as high as 95.56%, 97.06% and 0.9105, respectively, much higher than those of each single method: Lasso, Ridge, Mic and Relief, nearly 13% higher than single Lasso for ACC.
CONCLUSION: The experimental results show that the proposed LRMR-Ri and W-GDipC method can significantly improve the accuracy of antifreeze proteins prediction compared with other similar single feature methods. In addition, our method has also achieved good results in the classification and prediction of membrane proteins, which verifies its widely reliability to a certain extent.

Entities:  

Keywords:  Antifreeze proteins prediction; Ensemble feature selection; Lasso regression; Ridge regression; Two-stage multiple regressions; Weighted general dipeptide composition

Year:  2021        PMID: 34162327     DOI: 10.1186/s12859-021-04251-z

Source DB:  PubMed          Journal:  BMC Bioinformatics        ISSN: 1471-2105            Impact factor:   3.169


  29 in total

1.  Using support vector machines to distinguish enzymes: approached by incorporating wavelet transform.

Authors:  Jian-Ding Qiu; San-Hua Luo; Jian-Hua Huang; Ru-Ping Liang
Journal:  J Theor Biol       Date:  2008-11-08       Impact factor: 2.691

2.  Prediction of oxidoreductase subfamily classes based on RFE-SND-CC-PSSM and machine learning methods.

Authors:  Fang Yuan; Gan Liu; Xiwen Yang; Shunfang Wang; Xueren Wang
Journal:  J Bioinform Comput Biol       Date:  2019-08       Impact factor: 1.122

3.  Physics and chemistry-driven artificial neural network for predicting bioactivity of peptides and proteins and their design.

Authors:  Ri-Bo Huang; Qi-Shi Du; Yu-Tuo Wei; Zong-Wen Pang; Hang Wei; Kuo-Chen Chou
Journal:  J Theor Biol       Date:  2008-09-17       Impact factor: 2.691

4.  Predicting protein structural class by SVM with class-wise optimized features and decision probabilities.

Authors:  Ashish Anand; Ganesan Pugalenthi; P N Suganthan
Journal:  J Theor Biol       Date:  2008-03-04       Impact factor: 2.691

5.  Origin of antifreeze protein genes: a cool tale in molecular evolution.

Authors:  J M Logsdon; W F Doolittle
Journal:  Proc Natl Acad Sci U S A       Date:  1997-04-15       Impact factor: 11.205

6.  Predicting human microRNA-disease associations based on support vector machine.

Authors:  Qinghua Jiang; Guohua Wang; Shuilin Jin; Yu Li; Yadong Wang
Journal:  Int J Data Min Bioinform       Date:  2013       Impact factor: 0.667

7.  Topology-independent and global protein structure alignment through an FFT-based algorithm.

Authors:  Zeyu Wen; Jiahua He; Sheng-You Huang
Journal:  Bioinformatics       Date:  2020-01-15       Impact factor: 6.937

Review 8.  Machine learning and its applications in plant molecular studies.

Authors:  Shanwen Sun; Chunyu Wang; Hui Ding; Quan Zou
Journal:  Brief Funct Genomics       Date:  2020-01-22       Impact factor: 4.241

9.  Efficient utilization on PSSM combining with recurrent neural network for membrane protein types prediction.

Authors:  Shunfang Wang; Mingyuan Li; Lei Guo; Zicheng Cao; Yu Fei
Journal:  Comput Biol Chem       Date:  2019-08-08       Impact factor: 2.877

10.  POSSUM: a bioinformatics toolkit for generating numerical sequence feature descriptors based on PSSM profiles.

Authors:  Jiawei Wang; Bingjiao Yang; Jerico Revote; André Leier; Tatiana T Marquez-Lago; Geoffrey Webb; Jiangning Song; Kuo-Chen Chou; Trevor Lithgow
Journal:  Bioinformatics       Date:  2017-09-01       Impact factor: 6.937

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.