| Literature DB >> 33127965 |
Chansik An1, Hyunsun Lim1, Dong-Wook Kim2, Jung Hyun Chang1,3, Yoon Jung Choi4,5, Seong Woo Kim6.
Abstract
The rapid spread of COVID-19 has resulted in the shortage of medical resources, which necessitates accurate prognosis prediction to triage patients effectively. This study used the nationwide cohort of South Korea to develop a machine learning model to predict prognosis based on sociodemographic and medical information. Of 10,237 COVID-19 patients, 228 (2.2%) died, 7772 (75.9%) recovered, and 2237 (21.9%) were still in isolation or being treated at the last follow-up (April 16, 2020). The Cox proportional hazards regression analysis revealed that age > 70, male sex, moderate or severe disability, the presence of symptoms, nursing home residence, and comorbidities of diabetes mellitus (DM), chronic lung disease, or asthma were significantly associated with increased risk of mortality (p ≤ 0.047). For machine learning, the least absolute shrinkage and selection operator (LASSO), linear support vector machine (SVM), SVM with radial basis function kernel, random forest (RF), and k-nearest neighbors were tested. In prediction of mortality, LASSO and linear SVM demonstrated high sensitivities (90.7% [95% confidence interval: 83.3, 97.3] and 92.0% [85.9, 98.1], respectively) and specificities (91.4% [90.3, 92.5] and 91.8%, [90.7, 92.9], respectively) while maintaining high specificities > 90%, as well as high area under the receiver operating characteristics curves (0.963 [0.946, 0.979] and 0.962 [0.945, 0.979], respectively). The most significant predictors for LASSO included old age and preexisting DM or cancer; for RF they were old age, infection route (cluster infection or infection from personal contact), and underlying hypertension. The proposed prediction model may be helpful for the quick triage of patients without having to wait for the results of additional tests such as laboratory or radiologic studies, during a pandemic when limited medical resources must be wisely allocated without hesitation.Entities:
Mesh:
Year: 2020 PMID: 33127965 PMCID: PMC7599238 DOI: 10.1038/s41598-020-75767-2
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Baseline characteristics.
| Mortality (N = 228) | Survived | Total (N = 10,237) | |||
|---|---|---|---|---|---|
| Recovered (N = 7772) | Undetermined* (N = 2237) | ||||
| Age (years) | < 40 | 1 (0.4%) | 3520 (45.3%) | 907 (40.5%) | 4428 (43.3%) |
| 40–50 | 3 (1.3%) | 1093 (14.1%) | 231 (10.3%) | 1327 (13.0%) | |
| 50–60 | 12 (5.3%) | 1529 (19.7%) | 338 (15.1%) | 1879 (18.4%) | |
| 60–70 | 30 (13.2%) | 1023 (13.2%) | 332 (14.8%) | 1385 (13.5%) | |
| 70–80 | 65 (28.5%) | 429 (5.5%) | 204 (9.1%) | 698 (6.8%) | |
| > 80 | 117 (51.3%) | 178(2.3%) | 225(10.1%) | 520 (5.1%) | |
| Sex | Female | 121 (53.1%) | 4790 (61.6%) | 1252 (56.0%) | 6149 (60.1%) |
| Male | 107 (46.9%) | 2982 (38.4%) | 985 (44.0%) | 4088 (39.9%) | |
| Income level† | Medicaid | 42 (18.4%) | 569 (7.3%) | 260 (11.6%) | 871 (8.5%) |
| < 25% | 39 (17.1%) | 2000 (25.7%) | 453 (20.3%) | 2492 (24.3%) | |
| 25–50% | 29 (12.7%) | 1437 (18.5%) | 334 (14.9%) | 1800 (17.6%) | |
| 50–75% | 35 (15.4%) | 1675 (21.6%) | 436 (19.5%) | 2146 (21.0%) | |
| > 75% | 82 (36.0%) | 2032 (26.1%) | 747 (33.4%) | 2861 (27.9%) | |
| Residence | Suburban/rural | 2 (0.9%) | 240 (3.1%) | 419 (18.7%) | 661 (6.5%) |
| Urban | 149 (65.4%) | 5657 (72.8%) | 1018 (45.5%) | 6824 (66.7%) | |
| Metropolitan | 77 (33.8%) | 1875 (24.1%) | 800 (35.8%) | 2752 (26.9%) | |
| Household type | Others | 161 (70.6%) | 7503 (96.5%) | 2044 (91.4%) | 9708 (94.8%) |
| Seniors (> 65 y) living alone | 67 (29.4%) | 269 (3.5%) | 193 (8.6%) | 529 (5.2%) | |
| Disability | None | 166 (72.8%) | 7352 (94.6%) | 1959 (87.6%) | 9477 (92.6%) |
| Mild | 40 (17.5%) | 301 (3.9%) | 175 (7.8%) | 516 (5.0%) | |
| Moderate or severe | 22 (9.6%) | 119 (1.5%) | 103 (4.6%) | 244 (2.4%) | |
| Symptom | Absent | 133 (58.3%) | 4370 (56.2%) | 1791 (80.1%) | 6294 (61.5%) |
| Present | 95 (41.7%) | 3402 (43.8%) | 446 (19.9%) | 3943 (38.5%) | |
| Infection route | Personal contact | 16 (7.0%) | 1043 (13.4%) | 250 (11.2%) | 1309 (12.8%) |
| Cluster infection | 26 (11.4%) | 5379 (69.2%) | 542 (24.2%) | 5947 (58.1%) | |
| Nursing home | 127 (55.7%) | 443 (5.7%) | 584 (26.1%) | 1154 (11.3%) | |
| From abroad | 0 (0.0%) | 188 (2.4%) | 710 (31.7%) | 898 (8.8%) | |
| Unclassified | 59 (25.9%) | 719 (9.3%) | 151 (6.8%) | 929 (9.1%) | |
| Underlying medical condition‡ | None | 26 (11.4%) | 5733 (73.8%) | 1331 (59.5%) | 7090 (69.3%) |
| Hypertension | 165 (72.4%) | 1154 (14.8%) | 545 (24.4%) | 1864 (18.2%) | |
| Diabetes mellitus | 107 (46.9%) | 580 (7.5%) | 334 (14.9%) | 1021 (10.0%) | |
| Hyperlipidemia | 112 (49.1%) | 1252 (16.1%) | 479 (21.4%) | 1843 (18.0%) | |
| Cardiovascular disease | 70 (30.7%) | 280 (3.6%) | 161 (7.2%) | 511 (5.0%) | |
| Cerebrovascular disease | 4 (1.8%) | 5 (0.1%) | 18 (0.8%) | 27 (0.3%) | |
| Cancer | 9 (3.9%) | 40 (0.5%) | 27 (1.2%) | 76 (0.7%) | |
| Chronic lung disease or Asthma | 93 (40.8%) | 730 (9.4%) | 257 (11.5%) | 1080 (10.5%) | |
| Chronic renal disease | 13 (5.7%) | 42 (0.5%) | 23 (1.0%) | 78 (0.8%) | |
| Mental illness | 58 (25.4%) | 126 (1.6%) | 313 (14.0%) | 497 (4.9%) | |
| Chronic liver disease | 10 (4.4%) | 157 (2.0%) | 64 (2.9%) | 231 (2.3%) | |
| Medication‡ | ACE inhibitor | 5 (2.2%) | 30 (0.4%) | 13 (0.6%) | 48 (0.5%) |
| AR blocker | 62 (27.2%) | 636 (8.2%) | 224(10.0%) | 922 (9.0%) | |
| Beta blocker | 29 (12.7%) | 189 (2.4%) | 89 (4.0%) | 307 (3.0%) | |
| Calcium channel blocker | 59 (25.9%) | 529 (6.8%) | 209 (9.3%) | 797 (7.8%) | |
| Loop diuretics | 14 (6.1%) | 21 (0.3%) | 21 (0.9%) | 56 (0.5%) | |
| Acarbose | 2 (0.9%) | 3 (0.0%) | 2 (0.1%) | 7 (0.1%) | |
| Sulfonylurea | 22 (9.6%) | 125 (1.6%) | 65 (2.9%) | 212 (2.1%) | |
| Metformin | 45 (19.7%) | 261 (3.4%) | 117 (5.2%) | 423 (4.1%) | |
| DDP-4 | 26 (11.4%) | 141 (1.8%) | 62 (2.8%) | 229 (2.2%) | |
| Fenofibrate | 4 (1.8%) | 44 (0.6%) | 15 (0.7%) | 63 (0.6%) | |
| Statin | 69 (30.3%) | 742 (9.5%) | 263 (11.8%) | 1074 (10.5%) | |
| NSAID | 12 (5.3%) | 64 (0.8%) | 29 (1.3%) | 105 (1.0%) | |
| Aspirin | 57 (25.0%) | 305 (3.9%) | 136 (6.1%) | 498 (4.9%) | |
Recovered or undetermined cases were censored at the date of recovery or the date of last follow-up (April 16, 2020), respectively.
ACE angiotensin-converting enzyme, AR angiotensin receptor, NSAID non- steroidal anti-inflammatory drug.
*In isolation or under treatment.
†67 patients with missing values were excluded.
‡Some patients had more than one medical condition or medication.
Figure 1Histogram illustrating the distribution of the time interval between diagnosis and recovery (A) or mortality (B).
Figure 2Box plot illustrating the time interval between diagnosis and recovery or mortality according to the age group.
Results of Cox proportional hazards regression without medication.
| Univariable | Multivariable | ||||
|---|---|---|---|---|---|
| HR (95% CI) | HR (95% CI) | ||||
| Age (years) | < 40 | 0.04 (0.01, 0.28) | 0.001 | 0.06 (0.01, 0.46) | 0.007 |
| 40–50 | 0.35 (0.10, 1.25) | 0.107 | 0.47 (0.13, 1.68) | 0.246 | |
| 50–60 | Reference | ||||
| 60–70 | 3.12 (1.58, 6.17) | 0.001 | 1.97 (0.99, 3.93) | 0.054 | |
| 70–80 | 13.49 (7.23, 25.17) | < .0001 | 7.31 (3.77, 14.16) | < .0001 | |
| > 80 | 40.49 (22.30, 73.50) | < .0001 | 17.46 (9.01, 33.85) | < .0001 | |
| Sex | Female | Reference | |||
| Male | 1.86 (1.42, 2.44) | < .0001 | 2.37 (1.78, 3.15) | < .0001 | |
| Income level | Medicaid | 3.35 (2.12, 5.29) | < .0001 | 1.34 (0.82, 2.19) | 0.250 |
| < 25% | Reference | ||||
| 25–50% | 1.07 (0.65, 1.77) | 0.781 | 1.35 (0.81, 2.26) | 0.247 | |
| 50–75% | 1.18 (0.74, 1.88) | 0.492 | 1.15 (0.71, 1.85) | 0.566 | |
| > 75% | 1.89 (1.26, 2.83) | 0.002 | 1.05 (0.69, 1.60) | 0.814 | |
| Residence | Suburban/rural | Reference | |||
| Urban | 0.72 (0.54, 0.96) | 0.023 | 1.29 (0.94, 1.77) | 0.119 | |
| Metropolitan | 0.13 (0.03, 0.53) | 0.004 | 0.82 (0.20, 3.40) | 0.784 | |
| Household type | Others | reference | |||
| Seniors (> 65 y) living alone | 8.33 (6.21, 11.2) | < .0001 | 1.06 (0.76, 1.48) | 0.717 | |
| Disability | None | Reference | |||
| Mild | 4.76 (3.32, 6.82) | < .0001 | 0.98 (0.67, 1.42) | 0.911 | |
| Moderate or severe | 6.19 (3.96, 9.68) | < .0001 | 1.63 (1.01, 2.63) | 0.047 | |
| Symptom | Absent | reference | |||
| Present | 1.08 (0.82, 1.42) | 0.591 | 2.29 (1.70, 3.09) | < .0001 | |
| Infection route | Unclassified | Reference | |||
| Large clusters | 0.08 (0.05, 0.12) | < .0001 | 0.31 (0.19, 0.52) | < .0001 | |
| Nursing home | 2.42 (1.73, 3.38) | < .0001 | 1.68 (1.10, 2.56) | 0.017 | |
| Personal contact | 0.22 (0.12, 0.39) | < .0001 | 0.24 (0.13, 0.43) | < .0001 | |
| Underlying medical condition | None | Reference | |||
| Hypertension | 12.18 (9.02, 16.46) | < .0001 | 1.22 (0.87, 1.73) | 0.254 | |
| Diabetes mellitus | 8.30 (6.33, 10.89) | < .0001 | 1.75 (1.29, 2.36) | 0.001 | |
| Hyperlipidemia | 4.27 (3.26, 5.60 ) | < .0001 | 0.89 (0.66, 1.20) | 0.446 | |
| Cardiovascular disease | 8.48 (6.29, 11.42) | < .0001 | 1.23 (0.89, 1.70) | 0.220 | |
| Cerebrovascular disease | 8.53 (3.17, 22.96) | < .0001 | 0.88 (0.32, 2.44) | 0.801 | |
| Cancer | 5.06 (2.38, 10.75) | < .0001 | 1.64 (0.75, 3.60) | 0.216 | |
| Chronic lung disease or Asthma | 5.54 (4.20, 7.31) | < .0001 | 1.83 (1.37, 2.46) | < .0001 | |
| Chronic renal disease | 9.37 (5.35, 16.43) | < .0001 | 1.47 (0.80, 2.69) | 0.215 | |
| Mental illness | 8.49 (6.22, 11.57) | < .0001 | 1.01 (0.69, 1.49) | 0.948 | |
| Chronic liver disease | 1.74 (0.86, 3.52) | 0.126 | 0.76 (0.37, 1.57) | 0.462 | |
HR hazard ratio, CI confidence interval; 0651.
Results of Cox proportional hazards regression for medication.
| Univariable | Multivariable* | |||
|---|---|---|---|---|
| HR (95% CI) | HR (95% CI) | |||
| ACE inhibitor | 4.30 (1.60, 11.56) | 0.004 | 0.58 (0.20, 1.68) | 0.314 |
| AR blocker | 3.58 (2.63, 4.87) | < .0001 | 0.93 (0.65, 1.32) | 0.668 |
| Beta blocker | 4.77 (3.16, 7.19) | < .0001 | 1.18 (0.73, 1.88) | 0.502 |
| Calcium channel blocker | 3.81 (2.77, 5.24) | < .0001 | 1.03 (0.72, 1.48) | 0.875 |
| Loop diuretics | 14.64 (8.35, 25.67) | < .0001 | 2.17 (1.14, 4.11) | 0.018 |
| Acarbose | 15.45 (3.84, 62.20) | 0.001 | 8.36 (1.89, 36.93) | 0.005 |
| Sulfonylurea | 5.43 (3.46, 8.53) | < .0001 | 1.12 (0.66, 1.93) | 0.671 |
| Metformin | 5.82 (4.13, 8.18) | < .0001 | 1.41 (0.86, 2.33) | 0.179 |
| DDP-4 inhibitor | 5.84 (3.82, 8.93) | < .0001 | 1.29 (0.75, 2.21) | 0.358 |
| Fenofibrate | 3.47 (2.57, 4.68) | < .0001 | 0.87 (0.59, 1.28) | 0.470 |
| Statin | 3.23 (1.20, 8.68) | 0.020 | 1.23 (0.44, 3.44) | 0.688 |
| NSAID | 5.11 (2.71, 9.65) | < .0001 | 1.31 (0.68, 2.53) | 0.424 |
| Aspirin | 6.58 (4.80, 9.02) | < .0001 | 1.19 (0.79, 1.79) | 0.397 |
*Adjusted for age, sex, income level, residence, household type, disability, symptom, and infection route.
HR hazard ratio, CI confidence interval, ACE angiotensin-converting enzyme, AR angiotensin receptor, DDP-4 dipeptidyl peptidase-4, NSAID non- steroidal anti-inflammatory drug.
Figure 3Variable importance in prediction of mortality from COVID-19 by LASSO (A) and Random Forest (B).
Final performance of machine learning models in prediction of mortality from COVID-19 in the test set.
| Classifier | AUC | TP/FP/FN/TN | Sensitivity | Specificity | PPV | NPV | Balanced accuracy |
|---|---|---|---|---|---|---|---|
| LASSO | 0.963 (0.946, 0.979) | 68/208/7/2217 | 90.7% (83.3, 97.3) | 91.4% (90.3, 92.5) | 24.6% (19.7, 30.2) | 99.7% (99.4, 99.9) | 91.1% (86.0, 94.3) |
| Linear SVM | 0.962 (0.945, 0.979) | 69/199/6/2226 | 92.0% (85.9, 98.1) | 91.8% (90.7, 92.9) | 25.7% (20.6, 31.4) | 99.7% (99.4, 99.9) | 91.9% (87.0, 95.0) |
| RBF-SVM | 0.958 (0.945, 0.971) | 32/53/43/2372 | 42.7% (31.5, 53.9) | 97.8% (91.9, 1) | 37.6% (27.4, 48.8) | 98.2% (97.6, 98.7) | 70.2% (64.2, 76.5) |
| RF | 0.958 (0.936, 0.981) | 24/30/51/2395 | 32.0% (21.6, 42.8) | 98.8% (98.4, 99.3) | 44.4% (30.9, 58.6) | 97.9% (97.3, 98.4) | 65.4% (60.0, 71.5) |
| KNN | 0.897 (0.856, 0.937) | 61/255/14/2170 | 81.3% (73.1, 90.6) | 89.5% (88.3, 90.7) | 19.3% (15.1, 24.1) | 99.4% (98.9, 99.6) | 85.4% (79.5, 90.1) |
| LASSO | 0.944 (0.921, 0.967) | 44/293/9/2871 | 83.0% (72.9, 91.5) | 90.7% (89.7, 91.9) | 13.1% (9.6, 17.1) | 99.7% (99.4, 99.9) | 86.8% (80.0, 91.8) |
| Linear SVM | 0.941 (0.914, 0.967) | 45/303/8/2861 | 84.9% (75.3, 93.0) | 90.4% (89.4, 91.6) | 12.9% (9.6, 16.9) | 99.7% (99.5, 99.9) | 87.7% (80.8, 92.3) |
| RBF-SVM | 0.919 (0.883, 0.955) | 6/18/47/3146 | 11.3% (0.3, 18.0) | 99.4% (99.1, 99.7) | 25% (9.8, 46.7) | 98.5% (98.0, 98.9) | 55.4% (51.7, 61.4) |
| RF | 0.925 (0.893, 0.958) | 12/41/41/3123 | 22.6% (11.3, 32.1) | 98.7% (98.3, 99.2) | 22.6% (12.3, 36.2) | 98.7% (98.2, 99.1) | 60.7% (55.2, 67.7) |
| KNN | 0.772 (0.705, 0.839) | 32/205/21/2959 | 60.4% (47.2, 71.4) | 93.5% (92.6, 04.4) | 13.5% (9.4, 18.5) | 99.3% (98.9, 99.6) | 77.0% (69.3, 84.0) |
| LASSO | 0.953 (0.937, 0.969) | 57/309/7/2791 | 89.1% (81.4, 96.2) | 90.0% (88.9, 91.2) | 15.6% (12.0, 19.7) | 99.7% (99.5, 99.9) | 89.5% (83.8, 93.3) |
| Linear SVM | 0.948 (0.928, 0.968) | 55/324/9/2776 | 85.9% (77.3, 93.8) | 89.5% (88.4, 90.7) | 14.5% (11.1, 18.5) | 99.7% (99.4, 99.9) | 87.7% (81.7, 92.0) |
| RBF-SVM | 0.915 (0.885, 0.944) | 14/34/50/3066 | 21.9% (11.7, 31.3) | 98.9% (98.5, 99.3) | 29.2% (17.0, 44.1) | 98.4% (97.9, 98.8) | 60.4% (55.5, 66.6) |
| RF | 0.946 (0.930, 0.963) | 9/21/55/3079 | 14.1% (6.2, 21.9) | 99.3% (98.9, 99.6) | 30.0% (14.7, 49.4) | 98.2% (97.7, 98.7) | 56.7% (52.8, 62.3) |
| KNN | 0.750 (0.687, 0.813) | 37/247/27/2853 | 57.8% (44.9, 68.3) | 92.0% (91.0, 93.0) | 13.0% (9.3, 17.5) | 99.1% (98.6, 99.4) | 74.9% (67.9, 81.5) |
Values in parentheses are 95% confidence intervals.
AUC area under the receiver operating characteristic curve, TP true positive, FP false positive, FN false negative, TN true negative, PPV positive predictive value, NPV negative predictive value, LASSO least absolute shrinkage and selection operator, SVM support vector machine, RBF radial basis function kernel, RF random forest, KNN k-nearest neighbors.
Figure 4Flow diagram for study participants.