Literature DB >> 31617464

Prediction of oxidoreductase subfamily classes based on RFE-SND-CC-PSSM and machine learning methods.

Fang Yuan1, Gan Liu2, Xiwen Yang2, Shunfang Wang2, Xueren Wang3.   

Abstract

Oxidoreductase is an enzyme that widely exists in organisms. It plays an important role in cellular energy metabolism and biotransformation processes. Oxidoreductases have many subclasses with different functions, creating an important classification task in bioinformatics. In this paper, a dataset of 2640 oxidoreductase sequences was used to perform an analysis and comparison. The idea of dipeptides was introduced to process the Position Specific Score Matrix (PSSM), since each dipeptide consists of two amino acids and each column of PSSM corresponds to the information of one amino acid. Two kinds of dipeptide scores were proposed, the Standardization Normal Distribution PSSM (SND-PSSM) and the Correlation Coefficient PSSM (CC-PSSM). Recursive Feature Elimination (RFE) is used to extract features from the SND-PSSM and CC-PSSM, and the two sets of extracted features are combined to form a new feature matrix, the RFE-SND-CC-PSSM. The results show that, with the proposed method and a kernel-based nonlinear SVM classifier, the accuracy can reach 95.56% by the Jackknife test. Our method greatly improves the accuracy of oxidoreductase subclass prediction. Using this method to predict the categories of the 6 major types of enzymes effectively improves its prediction accuracy to 94.54%, indicating that this method has general applicability to other protein problems. The results show that our method is effective and universally applicable, and might be complementary to the existing methods.

Entities:  

Keywords:  Oxidoreductase subfamily classes; RFE-SND-CC-PSSM; SVM; correlation coefficient PSSM; recursive feature elimination; standardization normal distribution PSSM

Year:  2019        PMID: 31617464     DOI: 10.1142/S021972001950029X

Source DB:  PubMed          Journal:  J Bioinform Comput Biol        ISSN: 0219-7200            Impact factor:   1.122


  1 in total

1.  Predicting antifreeze proteins with weighted generalized dipeptide composition and multi-regression feature selection ensemble.

Authors:  Shunfang Wang; Lin Deng; Xinnan Xia; Zicheng Cao; Yu Fei
Journal:  BMC Bioinformatics       Date:  2021-06-23       Impact factor: 3.169

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.