| Literature DB >> 33437173 |
Ibrahim Arpaci1, Shigao Huang2, Mostafa Al-Emran3, Mohammed N Al-Kabi4, Minfei Peng5.
Abstract
While the RT-PCR is the silver bullet test for confirming the COVID-19 infection, it is limited by the lack of reagents, time-consuming, and the need for specialized labs. As an alternative, most of the prior studies have focused on Chest CT images and Chest X-Ray images using deep learning algorithms. However, these two approaches cannot always be used for patients' screening due to the radiation doses, high costs, and the low number of available devices. Hence, there is a need for a less expensive and faster diagnostic model to identify the positive and negative cases of COVID-19. Therefore, this study develops six predictive models for COVID-19 diagnosis using six different classifiers (i.e., BayesNet, Logistic, IBk, CR, PART, and J48) based on 14 clinical features. This study retrospected 114 cases from the Taizhou hospital of Zhejiang Province in China. The results showed that the CR meta-classifier is the most accurate classifier for predicting the positive and negative COVID-19 cases with an accuracy of 84.21%. The results could help in the early diagnosis of COVID-19, specifically when the RT-PCR kits are not sufficient for testing the infection and assist countries, specifically the developing ones that suffer from the shortage of RT-PCR tests and specialized laboratories.Entities:
Keywords: COVID-19; Classification algorithms; Diagnosis; Machine learning; Novel coronavirus; Prediction
Year: 2021 PMID: 33437173 PMCID: PMC7790521 DOI: 10.1007/s11042-020-10340-7
Source DB: PubMed Journal: Multimed Tools Appl ISSN: 1380-7501 Impact factor: 2.757
Fig. 1Applications of machine learning algorithms on COVID-19
Examples of studies focusing on applying ML algorithms on patients’ clinical features of COVID-19
| Source | Objective | Dataset size | Features | Algorithms | Accuracy |
|---|---|---|---|---|---|
| [ | Proposing and validating a diagnostic model for COVID-19 based on clinical and radiological features | 136 (COVID-19 patients (N = 70) and non-COVID-19 pneumonia patients (N = 66)) | 67 features (41 images + 26 clinical) | C Model R Model CR Model | 95.2% 96.9% 98.6% |
| [ | Evaluating clinical and imaging features for measuring the need for intensive care unit (ICU) treatment | 65 | Clinical, laboratory, and imaging features | Multivariate random forest modeling | 80% |
| [ | Identifying the positive COVID-19 cases based on blood tests analysis | 279 | Patient’s age, gender, blood tests, and RT-PCR tests for COVID-19 | Decision Tree Three-Way Random Forest (TWRF) classifier | 82% − 86% |
| [ | Identifying the positive COVID-19 cases based on blood tests analysis | 786 | 81 COVID-19 (+), 517 COVID-19 (-), and 188 Pathogens (non COVID-19) | ANN classifier | 90% |
| [ | Chest CT image-based-diagnose of COVID-19 | 275 | 88 COVID-19 (+) Chest CT images, 101 Bacterial Pneumonia (+) Chest CT images, and 86 Chest CT images of healthy people | DeepPneumonia | 99% |
| [ | Chest CT image-based-diagnose of COVID-19 | 1020 | CT images (50% of COVID-19 patients) | 10 Convolutional Neural Networks: AlexNet, VGG-16, VGG-19, SqueezeNet, GoogleNet, MobileNet-V2, ResNet-18, ResNet-50, ResNet-101, and Xception | 99.51% |
| [ | Chest X-Ray image-based-diagnose of COVID-19 | 1157 | 157 Pneumonia (+) Chest X-Ray images, 500 Pneumonia (+) Chest X-Ray images, and 500 Chest X-Ray images of healthy people | CoroNet | 90.21% |
| [ | Chest CT image-based-diagnose of COVID-19 | 460 | 230 CT images from 79 COVID-19 patients, 100 CT images from 100 common pneumonia patients, and 130 CT images from 130 healthy people | AD3D-MIL | 97.9% |
| [ | Chest X-Ray image-based-diagnose of COVID-19 | 3150 | 1050 COVID-19 (+) Chest X-Ray images, 1050 no-findings Chest X-Ray images, and 1050 pneumonia Chest X-Ray images | Capsule networks | 84.22% (multi-class) 97.24% (binary-class) |
| [ | Chest X-Ray image-based-diagnose of COVID-19 | 381 | 127 COVID-19 (+) Chest X-Ray images and 127 Pneumonia (+) Chest X-Ray images | ResNet50 plus SVM | 95.33% |
| [ | Chest X-Ray image-based-diagnose of COVID-19 | 16,700 | 313 COVID-19 (+) Chest X-Ray images, 2780 Bacterial Pneumonia (+) Chest X-Ray images, 6012 unknown Pneumonia Chest X-Ray images, and 7595 Chest X-Ray images of healthy people | Weighted averaging (iteratively pruned) | 99.01% |
Fig. 2Data flow diagram
Confusion matrix
| Actual Class | |||
|---|---|---|---|
| Has COVID-19 | Doesn’t Have COVID-19 | ||
| Predicted Class | Has COVID-19 Positive | TP | FP |
Doesn’t Have COVID-19 Negative | FN | TN | |
Descriptive statistics of the 14 attributes (features)
| Attribute | Min | Max | Mean | S.D. | Infection | N | Mean | S.D. |
|---|---|---|---|---|---|---|---|---|
| WBC (109/L) | 1.9 | 22.5 | 7.08 | 3.70 | Negative | 82 | 7.78 | 3.98 |
| Positive | 32 | 5.28 | 1.94 | |||||
| N (%) | 18.7 | 94.9 | 68.20 | 14.00 | Negative | 82 | 69.15 | 14.76 |
| Positive | 32 | 65.73 | 11.67 | |||||
| L (%) | 1.9 | 70.7 | 22.48 | 12.08 | Negative | 82 | 21.81 | 13.05 |
| Positive | 32 | 24.19 | 9.12 | |||||
| M (%) | 1.2 | 20.2 | 8.18 | 3.47 | Negative | 82 | 7.65 | 3.34 |
| Positive | 32 | 9.52 | 3.47 | |||||
| E (%) | 0 | 7.6 | .87 | 1.35 | Negative | 82 | 1.09 | 1.50 |
| Positive | 32 | .31 | .56 | |||||
| B (%) | 0 | .80 | .26 | .17 | Negative | 82 | .28 | .18 |
| Positive | 32 | .22 | .13 | |||||
| N/L | .26 | 54.5 | 5.73 | 8.22 | Negative | 82 | 6.51 | 9.35 |
| Positive | 32 | 3.73 | 3.44 | |||||
| L/M | .18 | 13.33 | 3.17 | 2.25 | Negative | 82 | 3.28 | 2.49 |
| Positive | 32 | 2.87 | 1.43 | |||||
| Hb (g/L) | 74 | 168 | 135.94 | 17.60 | Negative | 82 | 134.04 | 18.24 |
| Positive | 32 | 140.78 | 14.98 | |||||
| Hct (g/L) | .22 | .49 | .40 | .05 | Negative | 82 | .39 | .050 |
| Positive | 32 | .41 | .044 | |||||
| MCV (fl) | 76.7 | 111.4 | 89.84 | 5.61 | Negative | 82 | 89.88 | 6.00 |
| Positive | 32 | 89.71 | 4.50 | |||||
| PLT (109/L) | 30 | 462 | 215.90 | 73.76 | Negative | 82 | 225.45 | 76.12 |
| Positive | 32 | 191.43 | 61.88 | |||||
| PCT (%) | .03 | .47 | .22 | .07 | Negative | 82 | .23 | .07 |
| Positive | 32 | .19 | .06 | |||||
| Pro (ng/ml) | .02 | 82.45 | 1.10 | 8.27 | Negative | 67 | 1.62 | 10.08 |
| Positive | 27 | .06 | .071 |
Classifiers performance using 10-fold cross-validation method
| CCI (%) | TP Rate | FP Rate | Precision | Recall | F-Measure | ROC Area | |
|---|---|---|---|---|---|---|---|
| BayesNet | 71.93 | 0.719 | 0.643 | 0.670 | 0.719 | 0.653 | 0.675 |
| Logistic | 80.70 | 0.807 | 0.304 | 0.804 | 0.807 | 0.805 | 0.782 |
| IBk | 72.81 | 0.728 | 0.392 | 0.731 | 0.728 | 0.729 | 0.649 |
| CR | . | . | . | . | . | . | |
| PART | 76.32 | 0.763 | 0.397 | 0.753 | 0.763 | 0.757 | 0.719 |
| J48 | 73.68 | 0.737 | 0.369 | 0.742 | 0.737 | 0.739 | 0.722 |
CCI: Correctly classified instances, TP: True positive, FP: False positive, and ROC: Receiver operating characteristic
Fig. 3J48 Decision Tree
Performance comparison of the classifiers
| Classifier | Kappa statistic | MAE | RMSE | MCC | ROC Area | PRC Area |
|---|---|---|---|---|---|---|
| BayesNet | 0.0988 | 0.3456 | 0.4312 | 0.134 | 0.675 | 0.705 |
| Logistic | 0.5128 | 0.2756 | 0.3975 | 0.513 | 0.782 | 0.821 |
| IBk | 0.3330 | 0.2763 | 0.5165 | 0.333 | 0.649 | 0.681 |
| CR | 0.5853 | 0.2577 | 0.3488 | 0.591 | 0.873 | 0.895 |
| PART | 0.3842 | 0.2782 | 0.4549 | 0.387 | 0.719 | 0.737 |
| J48 | 0.3605 | 0.2713 | 0.4831 | 0.361 | 0.722 | 0.733 |
Comparison of the proposed algorithm with prior studies
| Study | Technique | # of features | Validation method | Accuracy (%) |
|---|---|---|---|---|
| [ | Logistic regression | 13 | 10-fold cross-validation | 85 |
| [ | Logistic regression | 13 | 10-fold cross-validation | 89 |
| [ | Logistic regression-LASSO | 6 | 10-fold cross-validation | 89 |
| [ | Voting-naive Bayes logistic regression | 9 | 10-fold cross-validation | 87.41 |
| [ | Bagging-DT | 20 | 10-fold cross-validation | 61.46–79.54 |
| This study | CR | 14 | 10-fold cross-validation | 84.21 |
Fig. 4ROC curves for the CR
Fig. 5Cost/benefit analysis curves