| Literature DB >> 31620433 |
Chaolu Meng1,2, Shunshan Jin3, Lei Wang4, Fei Guo1, Quan Zou1,5,6.
Abstract
Antioxidant proteins play important roles in countering oxidative damage in organisms. Because it is time-consuming and has a high cost, the accurate identification of antioxidant proteins using biological experiments is a challenging task. For these reasons, we proposed a model using machine-learning algorithms that we named AOPs-SVM, which was developed based on sequence features and a support vector machine. Using a testing dataset, we conducted a jackknife cross-validation test with the proposed AOPs-SVM classifier and obtained 0.68 in sensitivity, 0.985 in specificity, 0.942 in average accuracy, 0.741 in MCC, and 0.832 in AUC. This outperformed existing classifiers. The experiment results demonstrate that the AOPs-SVM is an effective classifier and contributes to the research related to antioxidant proteins. A web server was built at http://server.malab.cn/AOPs-SVM/index.jsp to provide open access.Entities:
Keywords: antioxidant proteins; classifier; machine-learning; sequence features; support vector machine
Year: 2019 PMID: 31620433 PMCID: PMC6759716 DOI: 10.3389/fbioe.2019.00224
Source DB: PubMed Journal: Front Bioeng Biotechnol ISSN: 2296-4185
Figure 1AOPs-SVM flowchart. The original dataset (positive and negative dataset) is processed in three phases. (A) In the feature extraction phase, two types of profiles are constructed using the PSI-BLAST and PSI-PRED programs. Then, 473D discrete vectors are generated by combining evolutionary information and secondary processing feature information, including 20D PSSM features, 20D 1-g, 400D 2-g features, 6D secondary structure sequence features and 27D global and local structural features. (B) In feature selection phase, ranking the 473D features by MRMD score and selecting optimal feature set by Random Forest. (C) At last, in model generation phase the optimal feature set is fed into SVM to generate the AOPs-SVM model and optimize it via a grid search.
Figure 2Performance comparison of the AOPs-SVM and other classifiers. (A) Compares other SVM models generated on the original feature set (473D). SVM-473D and SVM-473D-weight are the classifiers that the SVM trained on the original feature set in straight and weighted manner (negative: positive = 1: 6). (B) Comparing with three other traditional classifiers on optimal feature set (176D). RandomForest-176D, BayesNet-176D, and AdaBoostM1-176D are RandomForest, BayesNet and AdoBoostM1 on optimal feature set, respectively. (C) Comparing with other SVM models based on optimal feature set generated by ANOVA and mRMR respectively. ANOVA, mRMR generated 302D and 180D optimal feature set, respectively. (D) Comparing with state-of-the art methods. “ < ” denotes that Sn and SP of SeqSVM are <0.65 and 0.935, respectively.
Figure 3The MRMD score and composition of the optimal feature set. The X-coordinate corresponds to 473 features; the Y-coordinate is the value of the MRMD score and participation rate. The orange vertical line represents the MRMD score of the 176 optimal feature set. Just as with the original feature set, the optimal feature set also consists of 6 feature types. The six horizontal lines represent the participation rates of each feature: 20D 1-gram feature (yellow); 400D 2-g feature (blue); 20D feature from PSSM (black); 6D secondary structure feature (red); and 27D global and local feature (green). We defined the participation rate as equal to the number of each feature type in the 176 feature set divided by the total number of corresponding features. For example, the 6D secondary structure feature (red) are all selected for inclusion in the optimal feature set, so the participation rate is 1.
MRMD score of F.
| MRMD score | 0.271886 | 0.443691 | 0.427483 | 0.486034 | 0.870973 | 0.355192 | 0.360546 | 0.628175 | 0.499345 | 0.500683 |
| MRMD score | 0.508348 | 0.355434 | 0.414661 | 0.649391 | 0.672217 | 0.286041 | 0.289247 | 1 | 0.560823 | 0.408259 |