| Literature DB >> 34934411 |
Jesús Herrera-Bravo1,2, Jorge G Farías3, Fernanda Parraguez Contreras3, Lisandra Herrera-Belén3, Juan-Alejandro Norambuena3,4, Jorge F Beltrán3.
Abstract
Viral antigens are key in the development of vaccines that prevent or eradicate infections caused by these pathogens. Bioinformatics tools are modern alternatives that facilitate the discovery of viral antigens, reducing the costs of experimental assays. We developed a bioinformatics tool called VirVACPRED, which is highly efficient in predicting viral antigens. In this study, we obtained a model based on the gradient boosting classifier, which showed high performance during the training, leave-one-out cross-validation (accuracy = 0.7402, sensitivity = 0.7319, precision = 0.7503, F1 = 0.7251, kappa = 0.4774, Matthews correlation coefficient = 0.4981) and testing (accuracy = 0.8889, sensitivity = 1.0, precision = 0.8276, F1 = 0.9057, kappa = 0.7734, Matthews correlation coefficient = 0.7941). VirVACPRED is a robust tool that can be of great help in the search and proposal of new viral antigens, which can be considered in the development of future vaccines against infections caused by viruses.Entities:
Keywords: Bioinformatics; Machine learning; Protective antigen; Server; Vaccine; Virus
Year: 2021 PMID: 34934411 PMCID: PMC8679566 DOI: 10.1007/s10989-021-10345-2
Source DB: PubMed Journal: Int J Pept Res Ther ISSN: 1573-3149 Impact factor: 1.931
Fig. 1The architecture used for the generation of the predictive models of protective viral antigens
Training performance measurements obtained during the LOOCV using 16 machine learning algorithms
| Algorithms | ACC | AUC | TPR | PVV | F1 | Kappa | MCC |
|---|---|---|---|---|---|---|---|
| RF | 0.7471 | 0.8365 | 0.7611 | 0.7504 | 0.7437 | 0.4938 | 0.5086 |
| ETC | 0.7467 | 0.8485 | 0.7486 | 0.7507 | 0.7393 | 0.4927 | 0.5041 |
| QDA | 0.7412 | 0.8110 | 0.8278 | 0.7048 | 0.7543 | 0.4824 | 0.5010 |
| LGBM | 0.7405 | 0.8265 | 0.7333 | 0.7575 | 0.7311 | 0.4792 | 0.4962 |
| GBC | 0.7402 | 0.8045 | 0.7319 | 0.7503 | 0.7251 | 0.4774 | 0.4981 |
| NBC | 0.7173 | 0.8265 | 0.8750 | 0.6715 | 0.7542 | 0.4386 | 0.4710 |
| LDA | 0.7075 | 0.7827 | 0.7264 | 0.7012 | 0.7057 | 0.4142 | 0.4246 |
| ABC | 0.7069 | 0.7665 | 0.7375 | 0.7023 | 0.7028 | 0.4142 | 0.4339 |
| KNN | 0.7052 | 0.7792 | 0.7778 | 0.6821 | 0.7194 | 0.4104 | 0.4294 |
| DTC | 0.6775 | 0.6750 | 0.6389 | 0.7004 | 0.6560 | 0.3506 | 0.3582 |
| SVM-LK | 0.5144 | 0.0000 | 0.1222 | 0.2000 | 0.1048 | 0.0111 | 0.0243 |
| LR | 0.5088 | 0.7415 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 |
| SVM-RK | 0.5088 | 0.2573 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 |
| GPC | 0.5088 | 0.7427 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 |
| MLPC | 0.5088 | 0.7591 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 |
| RC | 0.5033 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | -0.0111 | -0.0243 |
AUC area under the curve
Performance measurements obtained during the testing phase (independent dataset) with the 16 machine learning algorithms assessed
| Algorithms | ACC | AUC | TPR | PVV | F1 | Kappa | MCC |
|---|---|---|---|---|---|---|---|
| RF | 0.8667 | 0.9266 | 1.0 | 0.8000 | 0.8889 | 0.7273 | 0.7559 |
| ETC | 0.8444 | 0.9266 | 0.9583 | 0.7931 | 0.8679 | 0.6828 | 0.7010 |
| QDA | 0.7556 | 0.8492 | 0.8333 | 0.7407 | 0.7843 | 0.5045 | 0.5092 |
| LGBM | 0.8667 | 0.9315 | 0.9167 | 0.8462 | 0.8800 | 0.7305 | 0.7335 |
| GBC | 0.8889 | 0.9008 | 1.0 | 0.8276 | 0.9057 | 0.7734 | 0.7941 |
| NBC | 0.8000 | 0.8373 | 0.9583 | 0.7419 | 0.8364 | 0.5897 | 0.6222 |
| LDA | 0.8000 | 0.8611 | 0.8333 | 0.8000 | 0.8163 | 0.5970 | 0.5976 |
| ABC | 0.8667 | 0.8502 | 1.0 | 0.8000 | 0.8889 | 0.7273 | 0.7559 |
| KNN | 0.8222 | 0.8621 | 0.9167 | 0.7857 | 0.8462 | 0.6386 | 0.6492 |
| DTC | 0.8667 | 0.8661 | 0.8750 | 0.8750 | 0.8750 | 0.7321 | 0.7321 |
| SVM-LK | 0.5333 | 0.5000 | 1.0 | 0.5333 | 0.6957 | 0.0000 | 0.0000 |
| LR | 0.4667 | 0.7778 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 |
| SVM-RK | 0.4667 | 0.2222 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 |
| GPC | 0.4667 | 0.7778 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 |
| MLPC | 0.4667 | 0.7560 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 |
| RC | 0.4667 | 0.5000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 |
AUC area under the curve
Comparison of the Vaxijen v2.0 and VirVACPRED performance measures
| Tool | Phase | ACC | AUC | TPR | PVV | F1 | Kappa | MCC |
|---|---|---|---|---|---|---|---|---|
| VirVACPRED | LOOCV | 0.7402 | 0.8045 | 0.7319 | 0.7503 | 0.7251 | 0.4774 | 0.4981 |
| Vaxijen v2.0 Doytchinova and Flower ( | LOOCV | 0.73 | 0.810 | 0.74 | 0.71 | – | – | – |
| VirVACPRED | Testing | 0.8889 | 0.9008 | 1.0 | 0.8276 | 0.9057 | 0.7734 | 0.7941 |
| Vaxijen v2.0 Doytchinova and Flower ( | Testing | 0.70 | 0.743 | 0.84 | – | – | – | – |
Fig. 2Receiver operating characteristic curves of the gradient boosting classifier on the independent dataset. This classifier shows an AUC value of 0.90 on the unseen data (testing data), which is an indicative of a good model for prediction of the viral antigen and non-viral antigen classes represented by zero and one, respectively
Fig. 3User interface of the VirVACPRED tool for prediction of protective viral antigens. A Input and B result interfaces