| Literature DB >> 34925468 |
Xiaomei Gu1,2,3,4, Lina Guo5, Bo Liao1,3,4, Qinghua Jiang1,3,4.
Abstract
Phages have seriously affected the biochemical systems of the world, and not only are phages related to our health, but medical treatments for many cancers and skin infections are related to phages; therefore, this paper sought to identify phage proteins. In this paper, a Pseudo-188D model was established. The digital features of the phage were extracted by PseudoKNC, an appropriate vector was selected by the AdaBoost tool, and features were extracted by 188D. Then, the extracted digital features were combined together, and finally, the viral proteins of the phage were predicted by a stochastic gradient descent algorithm. Our model effect reached 93.4853%. To verify the stability of our model, we randomly selected 80% of the downloaded data to train the model and used the remaining 20% of the data to verify the robustness of our model.Entities:
Keywords: digital characteristics; dimensional disaster; model pseudo-188D; phage; stochastic gradient descent
Year: 2021 PMID: 34925468 PMCID: PMC8672092 DOI: 10.3389/fgene.2021.796327
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
FIGURE 1Establishing model Pseudo-188D process.
FIGURE 2Extraction process of vector features by PseudoKNC.
Performance comparison of different methods under 10-fold cross-validation.
| Methods | Cross validation | Classification method | Sn | Sp | ACC | MCC |
|---|---|---|---|---|---|---|
| monoTriKGap | 10-Cross validation | SGD | 0.79 | 0.96 | 0.93 | 0.85 |
| SC-PseAAC | 0.66 | 0.87 | 0.80 | 0.54 | ||
| 188D | 0.52 | 0.87 | 0.76 | 0.41 | ||
|
|
|
|
|
|
Performance comparison of the same method in different classifiers.
| Methods | Cross validation | Classification method | Sn | Sp | ACC | MCC |
|---|---|---|---|---|---|---|
| Pseudo-188D | 10- Cross validation | NaiveBayes | 0.59 | 0.88 | 0.79 | 0.49 |
| Logistic | 0.69 | 0.84 | 0.79 | 0.79 | ||
| Multi-layer perceptron | 0.88 | 0.94 | 0.92 | 0.83 | ||
|
|
|
|
|
|
Performance comparison of Pseudo-188D models under different cross-validations.
| Methods | Classification method | Cross validation | Sn | Sp | ACC | MCC |
|---|---|---|---|---|---|---|
| Pseudo-188D | SGD | 5 | 0.86 | 0.94 | 0.91 | 0.80 |
| 6 | 0.87 | 0.95 | 0.93 | 0.84 | ||
| 8 | 0.88 | 0.94 | 0.92 | 0.83 | ||
|
|
|
|
|
|
Performance comparison under different Ktuple (k).
| Ktuple (k) | Dimension | Sn | Sp | ACC | MCC |
|---|---|---|---|---|---|
| 1 | 208 | 0.54 | 0.83 | 0.74 | 0.379 |
| 2 | 335 | 0.65 | 0.85 | 0.78 | 0.503 |
|
|
|
|
|
|
|