| Literature DB >> 34093642 |
Rahila Sardar1,2, Arun Sharma1, Dinesh Gupta1.
Abstract
With the availability of COVID-19-related clinical data, healthcare researchers can now explore the potential of computational technologies such as artificial intelligence (AI) and machine learning (ML) to discover biomarkers for accurate detection, early diagnosis, and prognosis for the management of COVID-19. However, the identification of biomarkers associated with survival and deaths remains a major challenge for early prognosis. In the present study, we have evaluated and developed AI-based prediction algorithms for predicting a COVID-19 patient's survival or death based on a publicly available dataset consisting of clinical parameters and protein profile data of hospital-admitted COVID-19 patients. The best classification model based on clinical parameters achieved a maximum accuracy of 89.47% for predicting survival or death of COVID-19 patients, with a sensitivity and specificity of 85.71 and 92.45%, respectively. The classification model based on normalized protein expression values of 45 proteins achieved a maximum accuracy of 89.01% for predicting the survival or death, with a sensitivity and specificity of 92.68 and 86%, respectively. Interestingly, we identified 9 clinical and 45 protein-based putative biomarkers associated with the survival/death of COVID-19 patients. Based on our findings, few clinical features and proteins correlate significantly with the literature and reaffirm their role in the COVID-19 disease progression at the molecular level. The machine learning-based models developed in the present study have the potential to predict the survival chances of COVID-19 positive patients in the early stages of the disease or at the time of hospitalization. However, this has to be verified on a larger cohort of patients before it can be put to actual clinical practice. We have also developed a webserver CovidPrognosis, where clinical information can be uploaded to predict the survival chances of a COVID-19 patient. The webserver is available at http://14.139.62.220/covidprognosis/.Entities:
Keywords: COVID-19; biomarkers discovery; feature selection; machine learning; proteomics and bioinformatics
Year: 2021 PMID: 34093642 PMCID: PMC8175075 DOI: 10.3389/fgene.2021.636441
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
FIGURE 1ML-based pipeline to identify key features associated with survival based on clinical and proteomics data. (The figure images were generated using biorender.com).
Performance of best models based on whole clinical parameters.
| Dataset (no. of clinical parameters used) | Day(s) | Sensitivity (%) | Specificity (%) | Accuracy (%) | MCC | ROC | WEKA technique used |
| Whole dataset I (19) | 0 | 50 | 94.7 | 88.56 | 0.48 | 0.806 | AttributeSelectedClassifier |
| Average of P1–P5 splits (19) | 0 | 81.90 | 82.94 | 82.48 | 0.65 | 0.808 | IterativeClassifierOptimizer |
| Whole dataset I (33) | 0, 3, 7 | 47.62 | 96.21 | 89.54 | 0.51 | 0.739 | J48 |
| Average of P1–P5 splits (33) | 0, 3, 7 | 75.24 | 81.43 | 78.68 | 0.57 | 0.868 | RandomForest (with -K 4) |
FIGURE 2Selected features from clinical data to classify COVID-19 patients who survived vs. those who died.
Performance of best models based on selected clinical parameter values.
| Dataset (no. of clinical parameters used) | Day(s) | Sensitivity (%) | Specificity (%) | Accuracy (%) | MCC | ROC | WEKA technique used |
| Whole dataset I (3) | 0 | 50 | 94.7 | 88.56 | 0.48 | 0.806 | J48 |
| Average of P1–P5 splits (3) | 0 | 83.33 | 80.31 | 81.64 | 0.63 | 0.831 | RandomSubSpace |
| Whole dataset I (9) | 0 | 50 | 94.7 | 88.56 | 0.48 | 0.806 | AttributeSelectedClassifier |
| Average of P1–P5 splits (9) | 0, 3 | 81.43 | 78.02 | 79.54 | 0.59 | 0.823 | IterativeClassifierOptimizer |
Performance of best models based on all 1428 proteins NPX values.
| Dataset (no. of proteins used) | Day(s) | Sensitivity (%) | Specificity (%) | Accuracy (%) | MCC | ROC | WEKA technique used |
| Whole Dataset II (1428) | 0 | 39.02 | 95.18 | 87.24 | 0.4 | 0.791 | AdaBoostM1 |
| P4 (1428) | 0 | 82.93 | 84 | 83.52 | 0.67 | 0.868 | LogitBoost |
| Average of P1–P5 splits (1428) | 0 | 69.76 | 71.90 | 70.94 | 0.42 | 0.755 | LogitBoost |
Performance of best models based on selected 45 protein NPX values.
| Dataset (No. of proteins used) | Day(s) | Sensitivity (%) | Specificity (%) | Accuracy (%) | MCC | ROC | WEKA technique used |
| Whole dataset II (45) | 0 | 80.49 | 92.77 | 91.03 | 0.67 | 0.948 | BayesNet |
| Average of P1–P5 splits (45) | 0 | 82.44 | 82.72 | 82.59 | 0.65 | 0.902 | BayesNet |
| Average of P1–P5 splits (45) | 0 | 83.42 | 79.97 | 81.51 | 0.63 | 0.817 | SMO; NormalizedPolyKernel |
FIGURE 3Pathway analysis of the selected 45 proteins.
FIGURE 4A screenshot showing the functionality of the CovidPrognosis webserver with three clinical parameters for Day 0.