| Literature DB >> 29914091 |
Yanyuan Pan1, Hui Gao2, Hao Lin3, Zhen Liu4, Lixia Tang5, Songtao Li6.
Abstract
Bacteriophages, which are tremendously important to the ecology and evolution of bacteria, play a key role in the development of genetic engineering. Bacteriophage virion proteins are essential materials of the infectious viral particles and in charge of several of biological functions. The correct identification of bacteriophage virion proteins is of great importance for understanding both life at the molecular level and genetic evolution. However, few computational methods are available for identifying bacteriophage virion proteins. In this paper, we proposed a new method to predict bacteriophage virion proteins using a Multinomial Naïve Bayes classification model based on discrete feature generated from the g-gap feature tree. The accuracy of the proposed model reaches 98.37% with MCC of 96.27% in 10-fold cross-validation. This result suggests that the proposed method can be a useful approach in identifying bacteriophage virion proteins from sequence information. For the convenience of experimental scientists, a web server (PhagePred) that implements the proposed predictor is available, which can be freely accessed on the Internet.Entities:
Keywords: ANOVA; Multinomial Naïve Bayes; bacteriophage virion proteins; g-gap peptides
Mesh:
Substances:
Year: 2018 PMID: 29914091 PMCID: PMC6032154 DOI: 10.3390/ijms19061779
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Figure 1The computational framework of the PhagePred.
Figure 2The incremental feature selection curves for the values of accuracy against the discrete feature vector.
Figure 3The incremental feature selection curve for the values of accuracy against the combination subsets.
Comparison of PhagePred with other classifiers.
| Classifier | Sn (%) | Sp (%) | Acc (%) | MCC (%) |
|---|---|---|---|---|
| xgboost | 52.52 | 81.25 | 71.98 | 46.05 |
| Random Forest | 25.25 | 97.60 | 74.26 | 38.67 |
| Adaboost + CART | 52.53 | 88.94 | 77.20 | 41.03 |
| SVM | 73.74 | 90.87 | 85.34 | 65.92 |
|
|
|
|
|
|
Acc: accuracy, Sn: sensitivity, Sp: specificity, MCC: Matthew’s correlation coefficient, CART: Classification and Regression Trees.
Figure 4The receiver operating characteristic (ROC) curves calculated from the 10-fold cross-validation of the five different classifiers.
Comparison of state-of-the-art methods with PhagePred.
| Classifier | Sn (%) | Sp (%) | Acc (%) | MCC (%) |
|---|---|---|---|---|
| Naïve bayes | 75.76 | 80.77 | 79.15 | 54.59 |
| SVM | 75.76 | 89.42 | 85.02 | 65.53 |
| PVP-SVM | 73.73 | 93.27 | 86.97 | 69.50 |
|
|
|
|
|
|
Acc: accuracy, Sn: sensitivity, Sp: specificity, MCC: Matthew’s correlation coefficient.
Figure 5The g-gap feature tree.