| Literature DB >> 31617464 |
Fang Yuan1, Gan Liu2, Xiwen Yang2, Shunfang Wang2, Xueren Wang3.
Abstract
Oxidoreductase is an enzyme that widely exists in organisms. It plays an important role in cellular energy metabolism and biotransformation processes. Oxidoreductases have many subclasses with different functions, creating an important classification task in bioinformatics. In this paper, a dataset of 2640 oxidoreductase sequences was used to perform an analysis and comparison. The idea of dipeptides was introduced to process the Position Specific Score Matrix (PSSM), since each dipeptide consists of two amino acids and each column of PSSM corresponds to the information of one amino acid. Two kinds of dipeptide scores were proposed, the Standardization Normal Distribution PSSM (SND-PSSM) and the Correlation Coefficient PSSM (CC-PSSM). Recursive Feature Elimination (RFE) is used to extract features from the SND-PSSM and CC-PSSM, and the two sets of extracted features are combined to form a new feature matrix, the RFE-SND-CC-PSSM. The results show that, with the proposed method and a kernel-based nonlinear SVM classifier, the accuracy can reach 95.56% by the Jackknife test. Our method greatly improves the accuracy of oxidoreductase subclass prediction. Using this method to predict the categories of the 6 major types of enzymes effectively improves its prediction accuracy to 94.54%, indicating that this method has general applicability to other protein problems. The results show that our method is effective and universally applicable, and might be complementary to the existing methods.Entities:
Keywords: Oxidoreductase subfamily classes; RFE-SND-CC-PSSM; SVM; correlation coefficient PSSM; recursive feature elimination; standardization normal distribution PSSM
Year: 2019 PMID: 31617464 DOI: 10.1142/S021972001950029X
Source DB: PubMed Journal: J Bioinform Comput Biol ISSN: 0219-7200 Impact factor: 1.122