| Literature DB >> 30687359 |
Kaiyang Qu1, Leyi Wei1, Jiantao Yu2, Chunyu Wang3,4.
Abstract
Motivation: Pentatricopeptide repeat (PPR) is a triangular pentapeptide repeat domain that plays a vital role in plant growth. In this study, we seek to identify PPR coding genes and proteins using a mixture of feature extraction methods. We use four single feature extraction methods focusing on the sequence, physical, and chemical properties as well as the amino acid composition, and mix the features. The Max-Relevant-Max-Distance (MRMD) technique is applied to reduce the feature dimension. Classification uses the random forest, J48, and naïve Bayes with 10-fold cross-validation.Entities:
Keywords: J48; maximum relevant maximum distance; mixed feature extraction methods; naïve bayes; pentatricopeptide repeat; random forest
Year: 2019 PMID: 30687359 PMCID: PMC6335366 DOI: 10.3389/fpls.2018.01961
Source DB: PubMed Journal: Front Plant Sci ISSN: 1664-462X Impact factor: 5.753
Figure 1Overall process of the method described in this paper.
Figure 2Three groups of amino acids divided according to properties.
PPR prediction results using a single feature extraction method.
| 188D | RF | ||
| J48 | 0.8684 | 0.8786 | |
| Naïve bayes | 0.907 | 0.8192 | |
| Kmer | RF | ||
| J48 | 0.8284 | 0.8312 | |
| Naïve bayes | 0.9162 | 0.8344 | |
| Acc | RF | ||
| J48 | 0.8456 | 0.8406 | |
| Naïve bayes | 0.9428 | 0.8594 | |
| PC-PseAAC | RF | ||
| J48 | 0.8710 | 0.8740 | |
| Naïve bayes | 0.9678 | 0.9076 |
To represent the experimental results more intuitively, they are displayed as a histogram in Figure .
Results from mixing the features.
| Kmer | RF | 0.9826 | 0.9492 |
| 188D + ACC | RF | ||
| J48 | 0.8868 | 0.8886 | |
| Naïve bayes | 0.9150 | 0.8294 | |
| 188D + kmer | RF | ||
| J48 | 0.8554 | 0.8608 | |
| Naïve bayes | 0.9088 | 0.8340 | |
| 188D-Pse-AAC | RF | ||
| J48 | 0.8806 | 0.8866 | |
| Naïve bayes | 0.9174 | 0.8368 | |
| ACC + kmer | RF | ||
| J48 | 0.8518 | 0.8538 | |
| Naïve bayes | 0.9252 | 0.8516 | |
| PseAAC + kmer | RF | ||
| J48 | 0.8386 | 0.8446 | |
| Naïve bayes | 0.9252 | 0.8532 | |
| ACC + Pse-AAC | RF | ||
| J48 | 0.8632 | 0.8748 | |
| Naïve bayes | 0.9736 | 0.9214 |
Bold values indicates Best result in that experiment results which is a combination of Method and Classifier.
Results from reduction the features.
| 188D + ACC | RF | ||
| J48 | 0.8840 | 0.8854 | |
| Naïve bayes | 0.9148 | 0.8240 | |
| 188D + kmer | RF | ||
| J48 | 0.8652 | 0.8662 | |
| Naïve bayes | 0.9174 | 0.8650 | |
| 188D-Pse-AAC | RF | ||
| J48 | 0.8748 | 0.8836 | |
| Naïve bayes | 0.9166 | 0.8318 | |
| ACC + kmer | RF | ||
| J48 | 0.8500 | 0.8572 | |
| Naïve bayes s | 0.9512 | 0.8808 | |
| PseAAC + kmer | RF | ||
| J48 | 0.8400 | 0.8400 | |
| Naïve bayes | 0.9412 | 0.8706 | |
| ACC + Pse-AAC | RF | ||
| J48 | 0.8682 | 0.8830 | |
| Naïve bayes | 0.9738 | 0.9210 |
Bold values indicates Best result in that experiment results which is a combination of Method and Classifier.
Figure 4AUC when using MRMD to reduce the dimension.
Figure 5F-Measure when using MRMD to reduce the dimension.