| Literature DB >> 33062451 |
Ashis Kumar Das1, Shiba Mishra2, Saji Saraswathy Gopalan1.
Abstract
BACKGROUND: The recent pandemic of CoVID-19 has emerged as a threat to global health security. There are very few prognostic models on CoVID-19 using machine learning.Entities:
Keywords: CoVID-19; Decision support; Machine learning; Modelling; Mortality risk prediction
Year: 2020 PMID: 33062451 PMCID: PMC7528809 DOI: 10.7717/peerj.10083
Source DB: PubMed Journal: PeerJ ISSN: 2167-8359 Impact factor: 2.984
Sample characteristics.
| Variable | Number | Proportion (%) |
|---|---|---|
| Sex | ||
| Female | 1,940 | 55.1 |
| Male | 1,584 | 45.0 |
| Age group (years) | ||
| Below 10 | 60 | 1.7 |
| 10–19 | 160 | 4.5 |
| 20–29 | 859 | 24.4 |
| 30–39 | 494 | 14.0 |
| 40–49 | 483 | 13.7 |
| 50–59 | 625 | 17.7 |
| 60–69 | 423 | 12.0 |
| 70–79 | 210 | 6.0 |
| 80–89 | 162 | 4.6 |
| 90 and above | 48 | 1.4 |
| Province | ||
| Busan | 144 | 4.1 |
| Chungcheongbuk-do | 52 | 1.5 |
| Chungcheongnam-do | 146 | 4.1 |
| Daegu | 63 | 1.8 |
| Daejeon | 46 | 1.3 |
| Gangwon-do | 52 | 1.5 |
| Gwangju | 30 | 0.9 |
| Gyeonggi-do | 829 | 23.5 |
| Gyeongsangbuk-do | 1,236 | 35.1 |
| Gyeongsangnam-do | 119 | 3.4 |
| Incheon | 92 | 2.6 |
| Jeju-do | 14 | 0.4 |
| Jeollabuk-do | 20 | 0.6 |
| Jeollanam-do | 19 | 0.5 |
| Sejong | 47 | 1.3 |
| Seoul | 563 | 16.0 |
| Ulsan | 52 | 1.5 |
| Exposure | ||
| Nursing home | 46 | 1.3 |
| Hospital | 37 | 1.1 |
| Religious gathering | 160 | 4.5 |
| Call center | 135 | 3.8 |
| Community center, shelter and apartment | 68 | 1.9 |
| Gym facility | 34 | 1.0 |
| Overseas inflow | 612 | 17.4 |
| Contact with patients | 1,049 | 29.8 |
| Others | 1,383 | 39.3 |
| Outcome | ||
| Survived | 3,450 | 97.9 |
| Died | 74 | 2.1 |
| Total | 3,524 | 100 |
Figure 1Relative importance of predictors.
(A) Random forest, (B) Logistic regression.
Performance of the machine learning algorithms.
| Algorithm | Oversampling method | Area under | Matthews correlation coefficient | Brier score | Sensitivity | Specificity | Accuracy |
|---|---|---|---|---|---|---|---|
| Logistic regression | SMOTE | 0.830 | 0.433 | 0.036 | 0.692 | 0.968 | 0.965 |
| ADASYN | 0.823 | 0.376 | 0.049 | 0.692 | 0.955 | 0.968 | |
| Support vector machine | SMOTE | 0.825 | 0.393 | 0.045 | 0.692 | 0.959 | 0.970 |
| ADASYN | 0.786 | 0.345 | 0.048 | 0.615 | 0.958 | 0.971 | |
| K nearest neighbor | SMOTE | 0.644 | 0.253 | 0.031 | 0.307 | 0.981 | 0.942 |
| ADASYN | 0.759 | 0.410 | 0.028 | 0.538 | 0.979 | 0.924 | |
| Random forest | SMOTE | 0.787 | 0.351 | 0.046 | 0.615 | 0.959 | 0.972 |
| ADASYN | 0.787 | 0.351 | 0.046 | 0.615 | 0.959 | 0.971 | |
| Gradient boosting | SMOTE | 0.787 | 0.351 | 0.046 | 0.615 | 0.959 | 0.971 |
| ADASYN | 0.787 | 0.351 | 0.046 | 0.615 | 0.959 | 0.971 |
Notes:
SMOTE, Synthetic minor oversampling technique.
ADASYN, Adaptive synthetic sampling.
Figure 2CoCoMORP online CoVID-19 community mortality risk prediction tool.