| Literature DB >> 33102686 |
Maryam AlJame1, Imtiaz Ahmad1, Ayyub Imtiaz2, Ameer Mohammed1.
Abstract
BACKGROUND AND OBJECTIVES: The pandemic of novel coronavirus disease 2019 (COVID-19) has severely impacted human society with a massive death toll worldwide. There is an urgent need for early and reliable screening of COVID-19 patients to provide better and timely patient care and to combat the spread of the disease. In this context, recent studies have reported some key advantages of using routine blood tests for initial screening of COVID-19 patients. In this article, first we present a review of the emerging techniques for COVID-19 diagnosis using routine laboratory and/or clinical data. Then, we propose ERLX which is an ensemble learning model for COVID-19 diagnosis from routine blood tests.Entities:
Keywords: COVID-19; Diagnostic model; Ensemble; Machine learning; Routine blood tests
Year: 2020 PMID: 33102686 PMCID: PMC7572278 DOI: 10.1016/j.imu.2020.100449
Source DB: PubMed Journal: Inform Med Unlocked ISSN: 2352-9148
Comparison of related techniques.
| Ref. | Dataset Source | Dataset Size (COVID-19) | Total Features (Selected) | Model Used | Accuracy | Sensitivity | Specificity |
|---|---|---|---|---|---|---|---|
| [ | Hospitals, Lanzhou, China | 253 (105) | 49 (11) | RF | 96.95% | 95.12% | 96.97% |
| [ | Tongji Hospital of Wuhan, China | 110 (−) | 47 (7) | LASSO-LR | – | 98% | 91% |
| [ | Tongji Hospital of Wuhan, China | 375 (201) | 300 (3) | XGBoost | – | 83% | – |
| [ | First Medical Center, Beijing, China | 132 (26) | 46 (18) | LASSO-LR, DT, Adaboost | – | 100% | 77.8% |
| [ | Albert Einstein Hospital, Brazil | 599 (81) | 108 (16) | Ensemble of 10 SVM models | – | 70.25% | 85.98% |
| [ | Albert Einstein Hospital, Brazil | 598 (81) | 108 (14) | RF, LR, GLMNET, ANN | 81%–87% | 43%–65% | 81%–91% |
| [ | San Raffaele Hospital, Milan, Italy | 279 (177) | - (15) | DT, ET, KNN, LR, NB, RF, SVM | 82%–86% | 92%–95% | – |
| [ | Albert Einstein Hospital, Brazil | 253 (102) | 108 (15) | NN, RF, GBT, LR, SVM | – | 68% | 85% |
| [ | Hospitals in Wuhan, China | 294 (208) | 15 (−) | RF, SVM | 84% | 88% | 80% |
| [ | University Medical Center, Ljubljana, Slovenia | 5333 (160) | 117 (35) | XGBoost, RF, DNN | – | 81.9% | 97.9% |
| [ | Albert Einstein Hospital, Brazil | 5644 (559) | 108 (24) | XMLP, SVM, RT, RF, BN, NB | 95.159% | 96.8% | 93.6% |
| [ | New York Presbyterian Hospital/WCM, LMH, USA | 3346 (1394) 1822 (549) | 685 (33) | RF, LR, DT, GBDT | – | 75.8% | 80.2% |
| [ | Hospitals in Zhejiang, China | 912 (361) | 31 (10) | LR, DT, RF, SVM. DNN | 91% | 87% | 95% |
| [ | Stanford Health Care, CA, USA | 390 (31) | - (4) | LR | – | 86–93% | 35–55% |
| [ | – | 398 (−) | 42 (19) | SOM, XGBoost | – | 92.5% | 97.9% |
| [ | Veterans Health Administration Sites, USA | 5002 (1079) | 68 (54) | RF | 83.3% | 83.4% | 89.8% |
| [ | Hospital in Milan, Italy | 199 (127) | 74 (42) | ANNs, LR, RF, DT | 91.4% | 94.1% | 88.7% |
| [ | Oxford University Hospitals, UK | 40732 (437) | 74 (−) | RF, LR, XGBoost | 92.3% | 77.4% | 95.7% |
| [ | Albert Einstein Hospital, Brazil | 5644 (279) | 106 (97) | LR, NN, RF, SVM, XGB | – | 80% | 98% |
Fig. 1The ERLX model.
Outliers removal and data balancing impacts on performance metric.
| Dataset | Accuracy | AUC | Sensitivity | Specificity |
|---|---|---|---|---|
| Imbal w/outliers | 99.24% [95%CI: 98.7–99.7] | 98.81% [95%CI: 97.1–100] | 93.66% [95%CI: 88.7–98.7] | 99.85% [95%CI: 99.5–100] |
| Bal w/outliers | 99.35% [95%CI: 98.7–99.8] | 97.83% [95%CI: 94.1–99.9] | 95.69% [95%CI: 90.2–100] | 99.75% [95%CI: 99.3–100] |
| Imbal w/o outliers | 99.85% [95%CI: 99.5–100] | 99.79% [95%CI: 98.8–100] | 98.43% [95%CI: 95.0–100] | 99.97% [95%CI: 99.8–100] |
| Bal w/o outliers | 99.88% [95%CI: 99.6–100] | 99.38% [95%CI: 97.5–100] | 98.72% [95%CI: 94.6–100] | 99.99% [95%CI: 99.99–100] |
Fig. 2The Receiver operating characteristic (ROC) curve for the test set.
Fig. 3Average confusion matrix obtained from 100 replications of ERLX.
95% C.I. model performance of ERLX vs ER-CoV.
| Model | AUC | Sensitivity | Specificity |
|---|---|---|---|
| ERLX with [ | 99.73% [95%CI: 98.6–100] | 99.47% [95%CI: 97.2–100] | 99.99% [95%CI: 99.9–100] |
| ER-CoV [ | 86.78% [95% CI: 85.65–87.90] | 70.25% [95% CI: 66.57–73.12] | 85.98% [95%CI:84.94–86.84] |
Fig. 4Average confusion matrix obtained from 100 replications of ERLX with [71] features.
Average performance metrics for each model.
| Model | Accuracy | AUC | Sensitivity | Specificity |
|---|---|---|---|---|
| ERLX with [ | 99.94% [95%CI: 99.8–100] | 99.7% [95%CI: 98.7–100] | 99.38% [95%CI: 97.3–100] | 99.99% [95%CI: 99.9–100] |
| ANN with SMOTE [ | 87% | 80% | 43% | 91% |
| ERLX with [ | 99.94% [95%CI: 99.6–100] | 99.69% [95%CI: 98.3–100] | 99.93% [95%CI: 99.6–100] | 99.96% [95%CI: 99.4–100] |
| Bayes Net [ | 95.159% | – | 96.8% | 93.6% |
| ERLX with [ | 99.94% [95%CI: 99.6–100] | 99.77% [95%CI: 98.3–100] | 99.55% [95%CI: 96.5–100] | 99.98% [95%CI: 99.9–100] |
| SVM [ | – | 85% | 68% | 85% |
Fig. 5Shap summary plot.