| Literature DB >> 35673548 |
Abstract
Objectives: When the prognosis of COVID-19 disease can be detected early, the intense-pressure and loss of workforce in health-services can be partially reduced. The primary-purpose of this article is to determine the feature-dataset consisting of the routine-blood-values (RBV) and demographic-data that affect the prognosis of COVID-19. Second, by applying the feature-dataset to the supervised machine-learning (ML) models, it is to identify severely and mildly infected COVID-19 patients at the time of admission. Material and methods: The sample of this study consists of severely (n = 192) and mildly (n = 4010) infected-patients hospitalized with the diagnosis of COVID-19 between March-September, 2021. The RBV-data measured at the time of admission and age-gender characteristics of these patients were analyzed retrospectively. For the selection of the features, the minimum-redundancy-maximum-relevance (MRMR) method, principal-components-analysis and forward-multiple-logistics-regression analyzes were used. The features set were statistically compared between mild and severe infected-patients. Then, the performances of various supervised-ML-models were compared in identifying severely and mildly infected-patients using the feature set.Entities:
Keywords: Biochemical and hematological biomarkers; COVID-19; Classification; Feature selection methods; Routine blood values; Supervised machine learning models
Year: 2022 PMID: 35673548 PMCID: PMC9158375 DOI: 10.1016/j.irbm.2022.05.006
Source DB: PubMed Journal: Ing Rech Biomed ISSN: 1876-0988
Descriptive statistics of the feature dataset and comparison of these data between severely and mildly infected COVID-19 patients.
| Mildly infected patients | Severely infected patients | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Sex | |||||||||
| Male n (%) | 2015 (50.2) | 105 (54.7) | .45 | ||||||
| Female n (%) | 1995 (49.8) | 87 (45.3) | |||||||
| Mean | Median | IQR | Mean | Median | IQR | ||||
| Age | 55 | 57 | 40-70 | 69 | 71 | 64-80 | |||
| Category | Parameters | Unit of measure | |||||||
| ALT | U/L | 34.34 | 23.00 | 16.0-39.0 | 58.68 | 32.00 | 19.00-61.00 | ||
| AST | U/L | 33.50 | 26.00 | 20.0-37.0 | 93.86 | 44.00 | 26.00-70.00 | ||
| Albumin | g/L | 38.02 | 37.90 | 34.3-42.0 | 29.27 | 29.25 | 25.40-32.60 | ||
| Alkaline phosphatase | U/L | 153.88 | 75.00 | 56.0-100.0 | 171.59 | 87.00 | 66.00-128.00 | ||
| Iron (Fe) | μmol/L | 48.18 | 46.00 | 24.0-62.0 | 18.50 | 11.50 | 5.00-39.00 | ||
| Glucose | mmol/L | 133.28 | 107.00 | 93.0-142.0 | 191.59 | 163.00 | 118.00-240.00 | ||
| Creatine kinase | U/L | 106.19 | 70.00 | 45.0-109.0 | 210.06 | 98.00 | 52.50-183.00 | ||
| LDH | U/L | 262.94 | 237.00 | 195.00-297.00 | 452.22 | 351.00 | 232.00-601.00 | ||
| Total bilirubin | mg/dL | .55 | .46 | .34-.63 | .81 | .62 | .46-.91 | ||
| Total protein | g/L | 68.19 | 68.26 | 63.70-72.80 | 59.12 | 58.50 | 54.40-64.80 | ||
| Eosinophils count | 109/L | .11 | .07 | .03-.15 | .09 | .04 | .02-.12 | ||
| Hematocrit | % | 39.41 | 39.40 | 36.00-43.0 | 36.82 | 36.50 | 32.50-40.5 | ||
| Hemoglobin | g/L | 13.25 | 13.30 | 12.00-14.5 | 12.05 | 11.90 | 10.70-13.6 | ||
| Lymphocytes count | 109/L | 1.61 | 1.47 | 1.01-2.01 | 1.91 | .73 | .48-1.22 | ||
| Monocytes count | 109/L | .53 | .50 | .37-.65 | .52 | .44 | .30-.66 | ||
| Neutrophils count | 109/L | 4.44 | 3.76 | 2.65-5.51 | 9.99 | 8.58 | 5.95-13.04 | ||
| Red blood cells | 1012/L | 4.70 | 4.70 | 4.32-5.1 | 4.31 | 4.30 | 3.79-4.82 | ||
| RDW | % | 13.51 | 13.10 | 12.50-14.0 | 15.10 | 14.40 | 13.40-16.1 | ||
| White blood cells | 109/L | 6.70 | 6.10 | 4.80-7.9 | 12.52 | 10.20 | 7.50-15.6 | ||
| D-dimer | μg/mL | 1120.04 | 541.83 | 320.00-954.0 | 3659.22 | 1365.0 | 989.10-3982.0 | ||
| C-reactive protein | mg/L | 28.08 | 10.20 | 3.14-34.0 | 88.92 | 79.0 | 30.05-138.0 | ||
| Ferritin | mg/L | 242.59 | 142.45 | 57.5-313.6 | 549.74 | 433.4 | 171.0-851.5 | ||
| Fibrinogen | mg/L | 336.63 | 327.79 | 293.7-370.4 | 357.98 | 359.70 | 321.1-392.5 | ||
| INR | 1.31 | 1.10 | 1.04-1.17 | 2.21 | 1.20 | 1.12-1.38 | |||
| Prothrombin time | Sec | 13.98 | 13.10 | 12.50-13.9 | 16.46 | 14.3 | 13.20-16.0 | ||
| Procalcitonin | mg/L | 1.76 | .12 | .12-0.12 | 8.17 | .39 | .13-6.5 | ||
| ESR | mm/h | 27.99 | 20.00 | 10.0-41.0 | 47.38 | 45.50 | 18.0-68.0 | ||
| Troponin | ng/L | 20.87 | 10.00 | 10.0-10.0 | 153.5 | 10.0 | 10.0-68.0 | ||
ALT: alanin aminotransferaz; AST: aspartat aminotransferaz; LDH: lactate dehydrogenase; MCVC: mean corpuscolar hemoglobin concentration; RDW: erythrocyte distribution width; INR: international normalized ratio; ESR: erythrocyte sedimentation rate; IQR: inter quartile range; P values indicated the comparison of severe and mild patients groups and they are bold when P < 0.05.
Fig. 1Work flow diagram of this manuscript.
Fig. 2A confusion matrix.
Performance results of supervised ML models in detecting severely and mildly infected patients.
| Classifier model | PPV | NPV | Sensitivity | Specificity | F1 skore | ACC | AUC (%) |
|---|---|---|---|---|---|---|---|
| (95% CI) | (95% CI) | (95% CI) | (95% CI) | (95% CI) | (95% CI) | ||
| LR | 95.18 | 72.60 | 97.33 | 58.89 | 96.24 | 93.21 | 0.75 |
| (95.91-98.36) | (62.46-80.85) | (95.91-98.36) | (48.02-69.16) | (91.30-94.82) | (0.62-0.78) | ||
| SVM | 93.44 | 46.43 | 92.00 | 57.78 | 92.71 | 88.33 | 0.64 |
| (99.44-95.86) | (39.10-53.92) | (89.82-93.84) | (46.91-68.12) | (85.97-90.43) | (0.52-0.71) | ||
| VP | 94.86 | 44.17 | 91.07 | 58.89 | 92.92 | 87.62 | 0.61 |
| (93.51-95.95) | (37.27-51.30) | (88.79-93.01) | (48.02-69.16) | (85.20-89.77) | (0.51-0.68) | ||
| KNN | 96.92 | 71.28 | 96.40 | 74.44 | 96.66 | 94.05 | 0.80 |
| (95.67-97.81) | (62.70-78.56) | (94.81-97.61) | (64.16-83.06) | (92.23-95.55) | (0.77-0.88) | ||
| K* | 97.62 | 84.71 | 98.27 | 80.00 | 97.94 | 96.31 | 0.91 |
| (96.44-98.41) | (76.19-90.55) | (97.05-99.07) | (70.25-87.69) | (94.80-97.48) | (0.75-0.93) | ||
| LWL | 98.41 | 92.86 | 99.20 | 86.67 | 98.80 | 97.86 | 0.95 |
| (97.34-99.06) | (85.37-96.66) | (98.27-99.71) | (77.87-92.92) | (96.63-98.73) | (0.85-0.96) | ||
| NB | 97.34 | 78.65 | 97.47 | 77.78 | 97.40 | 95.36 | 0.85 |
| (96.13-98.18) | (69.99-85.34) | (96.07-98.47) | (67.79-85.87) | (93.71-96.68) | (0.80-0.88) | ||
| SGD | 95.01 | 45.76 | 91.47 | 60.00 | 93.21 | 88.10 | 0.63 |
| (93.66-96.09) | (38.73-52.97) | (89.23-93.37) | (49.13-70.19) | (85.71-90.21) | (80.51-0.67) | ||
| DT | 94.77 | 45.61 | 91.73 | 57.78 | 93.23 | 88.10 | 0.62 |
| (93.42-95.85) | (38.40-53.02) | (89.53-93.60) | (46.91-68.12) | (85.71-90.21) | 0.54-0.66 | ||
| HDT | 95.28 | 70.13 | 96.93 | 60.00 | 96.10 | 92.98 | 0.74 |
| (94.00-96.30) | (60.28-78.41) | (95.43-98.05) | (49.13-70.19) | (91.03-94.61) | 0.71-0.82 | ||
| RF | 95.40 | 55.45 | 94.00 | 62.22 | 94.70 | 90.60 | 0.67 |
| (94.08-96.43) | (47.32-63.29) | (92.05-95.59) | (51.38-72.23) | (88.42-92.48) | 0.58-0.71 | ||
CI: confidence ınterval; LR: Logistic Regression; SVM: Support Vektör Machine; VP: Voted Perceptron; KNN: K Nearest Neighbor; K*: K Star; LWL: Locally Weighted Learning; NB: Naive Bayes; SGD: Stochastic Gradient Descent; DT: Decision Tables; HDT: Hoeffding Decision Trees; RF: Random Forest.
The supervised ML methods utilized in this manuscript.
| Categories | Method | Abbreviation |
|---|---|---|
| Functions | Logistic Regression | LR |
| Support Vector Machines | SVM | |
| Voted Perceptron | VP | |
| Lazy-learning algorithms | K Nearest Neighbor | KNN |
| K star | K* | |
| Locally Weighted Learning | LWL | |
| Naive Bayes | NB | |
| Stochastic Gradient Descent | SGD | |
| Tree-based learning algorithms | Decision Trees | DT |
| Hoeffding Decision Trees | HDT | |
| Random Forest | RF | |
Fig. 3Comparison of the distributions of gender and age of mild and severe COVID-19 infected patients.
Fig. 5Receiver operating characteristic curves (ROC) and AUC results of ML models for detecting severely and mildly COVID-19 infected patients.
Fig. 4Confusion matrix results of the ML models used in this study on the test dataset. LR: Logistic Regression; SVM: Support Vektör Machine; VP: Voted Perceptron; KNN: K Nearest Neighbor; K*: K Star; LWL: Locally Weighted Learning; NB: Naive Bayes; SGD: Stochastic Gradient Descent; DT: Decision Tables; HDT: Hoeffding Decision Trees; RF: Random Forest.
Error metric results of supervised ML models in detecting mild and severe COVID-19 patients.
| Classifier model | MAE (%) | RMSE (%) |
|---|---|---|
| LR | 6.05 | 24.02 |
| SVM | 2.96 | 17.22 |
| VP | 3.71 | 19.25 |
| KNN | 5.31 | 20.29 |
| K* | 5.5 | 19.53 |
| LWL | 6.58 | 19.88 |
| NB | 7.6 | 25.30 |
| SGD | 3.65 | 19.11 |
| DT | 8.23 | 20.00 |
| HDT | 11.14 | 23.82 |
| RF | 6.21 | 18.35 |
CI: Confidence Interval; LR: Logistic Regression; NB: Naive Bayes; SGD: Stochastic Gradient Descent; SVM: Support Vector Machine; VP: Voted Perceptron; KNN: K Nearest Neighbor; LWL: Locally Weighted Learning; MC: Multi Classifier; DT: Decision Tables; HDT: Hoeffding Decision Trees; RF: Random Forest; MAE: Mean Absolute Error; RMSE: Root Mean Squared Error.