| Literature DB >> 34044839 |
Koffka Khan1, Emilie Ramsahai2.
Abstract
BACKGROUND: An ongoing outbreak of a novel coronavirus (2019-nCoV) pneumonia continues to affect the whole world including major countries such as China, USA, Italy, France and the United Kingdom. We present outcome ('recovered', 'isolated' or 'death') risk estimates of 2019-nCoV over 'early' datasets. A major consideration is the likelihood of death for patients with 2019-nCoV.Entities:
Keywords: 2019-nCoV; AdaBoost; Bagging; Classifiers; Death; Disease; Machine learning; Pneumonia; Prediction
Year: 2021 PMID: 34044839 PMCID: PMC8159067 DOI: 10.1186/s12911-021-01537-3
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Recovered, confirmed cases and deaths by 2019-nCoV virus over time [11, 12]
| Date | Deaths | Confirmed | Recovered | |||
|---|---|---|---|---|---|---|
| China | Korea | China | Korea | China | Korea | |
| 1/22/2020 | 17 | 0 | 548 | 1 | 28 | 0 |
| 1/25/2020 | 42 | 0 | 1406 | 2 | 39 | 0 |
| 1/28/2020 | 131 | 0 | 5509 | 4 | 101 | 0 |
| 1/31/2020 | 213 | 0 | 9802 | 11 | 214 | 0 |
| 2/1/2020 | 259 | 0 | 11,891 | 12 | 275 | 0 |
| 2/5/2020 | 563 | 0 | 27,440 | 18 | 1115 | 1 |
| 2/9/2020 | 905 | 0 | 39,829 | 27 | 3219 | 3 |
| 2/13/2020 | 1369 | 0 | 59,895 | 28 | 6217 | 7 |
| 2/17/2020 | 1864 | 0 | 72,434 | 30 | 12,462 | 10 |
| 2/21/2020 | 2238 | 2 | 75,550 | 204 | 18,704 | 17 |
| 2/25/2020 | 2665 | 12 | 77,754 | 977 | 27,676 | 24 |
| 2/29/2020 | 2837 | 17 | 79,356 | 3150 | 39,320 | 28 |
| 3/1/2020 | 2872 | 18 | 79,932 | 3736 | 42,162 | 30 |
| 3/5/2020 | 3015 | 35 | 80,537 | 5766 | 52,292 | 88 |
| 3/9/2020 | 3123 | 51 | 80,860 | 7382 | 58,804 | 166 |
| 3/13/2020 | 3180 | 67 | 80,945 | 7979 | 64,196 | 510 |
| 3/17/2020 | 3230 | 81 | 81,058 | 8320 | 68,798 | 1401 |
| 3/21/2020 | 3259 | 102 | 81,305 | 8799 | 71,857 | 2612 |
| 3/23/2020 | 3274 | 111 | 81,439 | 8961 | 72,814 | 3166 |
Fig. 2Distribution of patient age for dataset2. Age frequency histogram plot for dataset2
Optimum hyper-parameter settings for experiments
| Setting | AdaBoost | Bagging | Extra-Trees | Decision Tree | k-NN |
|---|---|---|---|---|---|
| Base Estimator | None | None | NA | NA | NA |
| # Estimators | 100 | 10 | 100 | NA | NA |
| Learning rate | 2 | NA | NA | NA | NA |
| Algorithm | SAMME.R | Bagging | Gini | Gini | KDTree |
| Metric | Mean label accuracy | Mean label accuracy | Gini Impurity | Gini Impurity | Euclidean distance |
| Random state | None | Random generation | None | Random generation | NA |
| Max. samples to train needed to train base estimator | NA | 1 | NA | NA | NA |
| Out-of-bag samples to estimate generalization error | NA | None | None | NA | NA |
| Use whole ensemble to fit | NA | Yes | Yes | NA | NA |
| # Jobs to run in parallel | NA | 1 | 1 | NA | 1 |
| Random resampling | NA | 3141 | 12 | NA | NA |
| Min. sample to be a leaf | NA | NA | 2 | 2 | NA |
| Sample weighting | NA | NA | All equal, weight of 1 | All equal, weight of 1 | NA |
| # of features for best split | NA | NA | Square root of the # of | Max. features = # of features | NA |
| Min. number of leaf nodes | NA | NA | Unlimited | NA | NA |
| Split criteria | NA | NA | Impurity level > 0 | NA | NA |
| Reuse previous call to fit and add more estimators to ensemble | NA | No | Yes | NA | NA |
| Number of neighbours | NA | NA | NA | NA | 1 |
Fig. 1Distribution of patient age for dataset1. Age frequency histogram plot for dataset1
Metrics of machine learning models for two most common outcomes on dataset1
| Outcome | Metric | AdaBoost | Bagging | Extra-Trees | Decision Tree | k-NN |
|---|---|---|---|---|---|---|
| Alive | Precision | 0.95 | 0.95 | 0.94 | 0.95 | 0.95 |
| Recall | 0.96 | 0.97 | 0.97 | 0.97 | 0.95 | |
| F1-Score | 0.95 | 0.96 | 0.95 | 0.96 | 0.95 | |
| Death | Precision | 0.45 | 0.50 | 0.44 | 0.50 | 0.42 |
| Recall | 0.38 | 0.38 | 0.31 | 0.38 | 0.38 | |
| F1-Score | 0.42 | 0.43 | 0.36 | 0.43 | 0.40 | |
| Accuracy | 0.60 | 0.92 | 0.91 | 0.91 | 0.91 |
Fig. 3Major steps outlined in our method
Metrics of machine learning models for three most common outcomes on dataset1
| Outcome | Metric | AdaBoost | Bagging | Extra-trees | Decision tree | k-NN |
|---|---|---|---|---|---|---|
| Recovered | Precision | 0.29 | 0.44 | 0.47 | 0.38 | 0.34 |
| Recall | 0.81 | 0.59 | 0.56 | 0.56 | 0.41 | |
| F1-Score | 0.23 | 0.51 | 0.51 | 0.45 | 0.37 | |
| Isolated | Precision | 0.82 | 0.85 | 0.84 | 0.84 | 0.81 |
| Recall | 0.30 | 0.81 | 0.83 | 0.78 | 0.78 | |
| F1-Score | 0.44 | 0.83 | 0.83 | 0.81 | 0.80 | |
| Death | Precision | 0.09 | 0.50 | 0.44 | 0.50 | 0.42 |
| Recall | 0.31 | 0.38 | 0.31 | 0.38 | 0.38 | |
| F1-Score | 0.14 | 0.43 | 0.36 | 0.43 | 0.40 | |
| Accuracy | 0.38 | 0.74 | 0.74 | 0.71 | 0.69 |
Metrics of machine learning models for three most common outcomes on dataset2
| Outcome | Metric | AdaBoost | Bagging | Extra-trees | Decision tree | k-NN |
|---|---|---|---|---|---|---|
| Recovered | Precision | 0.34 | 0.40 | 0.39 | 0.39 | 0.29 |
| Recall | 0.12 | 0.18 | 0.12 | 0.12 | 0.31 | |
| F1-Score | 0.18 | 0.25 | 0.19 | 0.19 | 0.30 | |
| Isolated | Precision | 0.62 | 0.69 | 0.69 | 0.69 | 0.66 |
| Recall | 0.50 | 0.88 | 0.91 | 0.91 | 0.64 | |
| F1-Score | 0.55 | 0.77 | 0.78 | 0.78 | 0.65 | |
| Death | Precision | 0.02 | 0.33 | 0.33 | 0.33 | 0.11 |
| Recall | 0.40 | 0.10 | 0.20 | 0.20 | 0.10 | |
| F1-Score | 0.04 | 0.15 | 0.25 | 0.25 | 0.11 | |
| Accuracy | 0.38 | 0.65 | 0.65 | 0.65 | 0.53 |
Metrics of machine learning models for two most common and ‘disease’ outcomes on dataset2
| Outcome | Metric | AdaBoost | Bagging | Extra-trees | Decision tree | k-NN |
|---|---|---|---|---|---|---|
| Recovered | Precision | 0.22 | 0.36 | 0.30 | 0.30 | 0.31 |
| Recall | 0.20 | 0.22 | 0.11 | 0.11 | 0.39 | |
| F1-Score | 0.21 | 0.27 | 0.16 | 0.16 | 0.34 | |
| Isolated | Precision | 0.66 | 0.72 | 0.71 | 0.71 | 0.71 |
| Recall | 0.57 | 0.84 | 0.88 | 0.88 | 0.62 | |
| F1-Score | 0.61 | 0.77 | 0.78 | 0.78 | 0.66 | |
| Death | Precision | 0.08 | 0.71 | 0.56 | 0.56 | 0.30 |
| Recall | 0.86 | 0.71 | 0.71 | 0.71 | 0.43 | |
| F1-Score | 0.15 | 0.71 | 0.63 | 0.63 | 0.35 | |
| Accuracy | 0.47 | 0.66 | 0.66 | 0.66 | 0.55 |
Fig. 4Binary classification models: ROC curves