| Literature DB >> 35871639 |
Samad Moslehi1, Niloofar Rabiei1, Ali Reza Soltanian2, Mojgan Mamani3.
Abstract
BACKGROUND: Due to the high mortality of COVID-19 patients, the use of a high-precision classification model of patient's mortality that is also interpretable, could help reduce mortality and take appropriate action urgently. In this study, the random forest method was used to select the effective features in COVID-19 mortality and the classification was performed using logistic model tree (LMT), classification and regression tree (CART), C4.5, and C5.0 tree based on important features.Entities:
Keywords: C4.5; C5.0; CART; COVID-19; Classification; Logistics model tree; Machine learning
Mesh:
Year: 2022 PMID: 35871639 PMCID: PMC9308952 DOI: 10.1186/s12911-022-01939-x
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 3.298
Fig. 1The process of COVID-19 classification mortality
Fig. 2A typical decision tree diagram
Fig. 3Relative importance features based on RF
Descriptive statistic and importance of each feature in COVID-19 patients
| Characteristic | Feature name | Category | Treatment frequency (%) | Relative importance (%) | ||
|---|---|---|---|---|---|---|
| Discharge | Dead | |||||
| Demographic and clinical | Age | ≤ 60 | 957(51.6) | 111(18) | < 0.001 | 6.99 |
| > 60 | 896(48.4) | 506(82) | ||||
| Gender | Male | 990(53.4) | 347(56.2) | 0.226 | 6.05 | |
| Female | 863(46.6) | 270(43.8) | ||||
| Hypertension | Yes | 623(33.6) | 283(45.9) | | 5.26 | |
| No | 1230(66.4) | 334(54.1) | ||||
| Marital status | Married | 1552(83.8) | 509(82.5) | | 5.07 | |
| Single | 109(5.9) | 17(2.8) | ||||
| Divorce | 15(0.8) | 0(0) | ||||
| Other | 177(9.6) | 91(14.7) | ||||
| Cardiovascular disease | Yes | 288(15.5) | 171(27.7) | | 4.66 | |
| No | 1565(84.5) | 446(72.3) | ||||
| Diabetes | Yes | 320(17.3) | 153(24.8) | | 4.66 | |
| No | 1533(82.7) | 464(75.2) | ||||
| Location | Urban | 1541(83.2) | 516(83.6) | 0.804 | 4.34 | |
| Rural | 312(16.8) | 101(16.4) | ||||
| Lung diseases | Yes | 193(10.4) | 91(14.7) | 3.65 | ||
| No | 1660(89.6) | 526(85.3) | ||||
| Smoker | Yes | 140(7.6) | 50(8.1) | 0.663 | 2.36 | |
| No | 1713(92.4) | 567(91.9) | ||||
| Renal insufficiency | Yes | 78(4.2) | 32(5.2) | 0.311 | 1.77 | |
| No | 1775(95.8) | 585(94.8) | ||||
| Cancer | Yes | 29(1.6) | 26(4.2) | | 1.5 | |
| No | 1824(98.4) | 591(95.8) | ||||
| Neurological diseases | Yes | 3(0.2) | 11(1.8) | | 0.67 | |
| No | 1850(99.8) | 606(98.2) | ||||
| Hepatic failure | Yes | 16(0.9) | 5(0.8) | 1 | 0.57 | |
| No | 1837(99.1) | 612(99.2) | ||||
| Hematologic disorders | Yes | 10(0.5) | 6(1) | 0.252 | 0.29 | |
| No | 1843(99.5) | 611(99) | ||||
| C.I.S | Yes | 5(0.3) | 1(0.2) | 1 | 0.11 | |
| No | 1848(99.7) | 616(99.8) | ||||
| Total | – | – | – | – | 47.94 | |
| Laboratory examination | BUN | ≤ 20 | 1372(74) | 202(32.7) | | 11.93 |
| > 20 | 481(26) | 415(67.3) | ||||
| PTT | 30–40 | 991(53.5) | 300(48.6) | 6.58 | ||
| Other | 862(46.5) | 317(51.4) | ||||
| SGOT | ≤ 45 | 1378(74.4) | 302(48.9) | | 6.25 | |
| > 45 | 475(25.6) | 315(51.1) | ||||
| ESR | ≤ 30 | 698(37.7) | 219(35.5) | 0.337 | 6.11 | |
| > 30 | 1155(62.3) | 398(64.5) | ||||
| Na | 135–145 | 1497(80.8) | 409(66.3) | | 5.11 | |
| Other | 356(19.2) | 208(33.7) | ||||
| CPK | 25–310 | 1514(81.7) | 413(66.9) | | 4.88 | |
| Other | 339(18.3) | 204(33.1) | ||||
| Plat | 130–400 | 1578(85.2) | 440(71.3) | | 4.59 | |
| Other | 275(14.8) | 177(28.7) | ||||
| BS | ≤ 100 | 356(19.2) | 85(13.8) | 3.79 | ||
| > 100 | 1497(80.8) | 532(86.2) | ||||
| PMN | < 40 | 5(0.3) | 8(1.3) | 0.005 | 2.36 | |
| 40–60 | 60(3.2) | 26(4.2) | ||||
| > 60 | 1788(96.5) | 583(94.5) | ||||
| LDH | 100–250 | 24(1.3) | 3(0.5) | 0.117 | 0.46 | |
| Other | 1829(98.7) | 614(99.5) | ||||
| Total | – | – | – | – | 52.06 | |
Significant values are given in bold
*Chi-square test, C.I.S compromised immune system, ESR erythrocyte sedimentation rate, BUN blood urea nitrogen, BS blood sugare, SGOT serum glutamic-oxaloacetic transaminase, PTT partial thromboplastin time, Plat platelets, PMN polymorphonuclear, CPK creatine phosphokinase, Na sodium, LDH lactate dehydrogenase
Assessing the accuracy of decision tree models in classifying the COVID-19 death
| Model | Subset | Confusion matrix | Evaluation metric | |||||
|---|---|---|---|---|---|---|---|---|
| TP | FP | FN | TN | Recall | F1-score | Accuracy | ||
| LMT | Train | 1225 | 299 | 73 | 133 | 0.9437 | 0.8681 | 0.7850 |
| Test | 514 | 130 | 41 | 55 | 0.9261 | 0.8573 | 0.7689 | |
| Total | 1739 | 429 | 113 | 189 | 0.9383 | 0.8650 | 0.7804 | |
| C4.5 | Train | 1217 | 284 | 81 | 148 | 0.9376 | 0.8696 | 0.7891 |
| Test | 510 | 120 | 45 | 65 | 0.9189 | 0.8608 | 0.7770 | |
| Total | 1728 | 412 | 125 | 205 | 0.9325 | 0.8655 | 0.7826 | |
| C5.0 | Train | 1192 | 276 | 106 | 156 | 0.9183 | 0.8619 | 0.7792 |
| Test | 514 | 130 | 41 | 55 | 0.9261 | 0.8574 | 0.7689 | |
| Total | 1739 | 429 | 114 | 188 | 0.9384 | 0.8649 | 0.7802 | |
| CART | Train | 1217 | 284 | 81 | 148 | 0.9376 | 0.8696 | 0.7891 |
| Test | 530 | 136 | 25 | 49 | ||||
| Total | 1739 | 421 | 114 | 196 | ||||
Significant values are given in bold
Fig. 4CART flowchart in classifying the COVID-19 death
Fig. 5ROC curve for the final CART model