| Literature DB >> 34281922 |
Marcel J H Aries1, Martijn Beudel2, Maarten C Ottenhoff3, Lucas A Ramos4, Wouter Potters2, Marcus L F Janssen5, Deborah Hubers1, Shi Hu6, Egill A Fridgeirsson7, Dan Piña-Fuentes2, Rajat Thomas7, Iwan C C van der Horst1, Christian Herff8, Pieter Kubben9, Paul W G Elbers10, Henk A Marquering4, Max Welling6, Suat Simsek11,12, Martijn D de Kruif13, Tom Dormans14, Lucas M Fleuren15, Michiel Schinkel16, Peter G Noordzij17, Joop P van den Bergh18, Caroline E Wyers18, David T B Buis19, W Joost Wiersinga20,21, Ella H C van den Hout11, Auke C Reidinga22, Daisy Rusch23, Kim C E Sigaloff20, Renee A Douma24, Lianne de Haan24, Niels C Gritters van den Oever25, Roger J M W Rennenberg26, Guido A van Wingen27.
Abstract
OBJECTIVE: Develop and validate models that predict mortality of patients diagnosed with COVID-19 admitted to the hospital.Entities:
Keywords: COVID-19; public health; risk management
Mesh:
Year: 2021 PMID: 34281922 PMCID: PMC8290951 DOI: 10.1136/bmjopen-2020-047347
Source DB: PubMed Journal: BMJ Open ISSN: 2044-6055 Impact factor: 2.692
Figure 1A schematic overview of all steps involved data acquisition to model evaluation. The dotted line depicts the step only used during feature selection of the 10 best features.
Figure 2Flow diagram of patients excluded for further analysis.
Patients characteristics per outcome group and a selection of features. P values were calculated using a t-test and corrected for multiple comparisons by Bonferroni correction
| Variables | Missing | Overall | Favourable outcome | Unfavourable outcome | Adjusted p value |
| Total patients | 2273 | 1758 | 516 | ||
| Age, median (Q1, Q3) | 19 | 69.0 (58.0,78.0) | 65.0 (55.0,75.1) | 77.1 (71.0,83.1) | p<0.001*** |
| Gender, n (%) | 0 | ||||
| Female | 858 (37.7) | 690 (39.3) | 168 (32.6) | ||
| Male | 1415 (62.3) | 1067 (60.7) | 348 (67.4) | ||
| History of hypertension, n (%) | 30 | p<0.001*** | |||
| No | 1207 (53.8) | 998 (57.7) | 209 (40.8) | ||
| Yes | 1036 (46.2) | 733 (42.3) | 303 (59.2) | ||
| History of diabetes with complications, n (%) | 64 | p<0.001*** | |||
| No | 2044 (92.5) | 1608 (94.4) | 436 (86.3) | ||
| Yes | 165 (7.5) | 96 (5.6) | 69 (13.7) | ||
| History of diabetes without complications, n (%) | 69 | p<0.001*** | |||
| No | 1789 (81.2) | 1412 (83.0) | 377 (75.1) | ||
| Yes | 415 (18.8) | 290 (17.0) | 125 (24.9) | ||
| History of asthma, n (%) | 55 | p>0.05 | |||
| No | 1988 (89.6) | 1524 (89.0) | 464 (91.7) | ||
| Yes | 230 (10.4) | 188 (11.0) | 42 (8.3) | ||
| History of liver disease, n (%) | 57 | p>0.05 | |||
| No | 2194 (99.0) | 1693 (99.0) | 501 (99.0) | ||
| Yes | 22 (1.0) | 17 (1.0) | 5 (1.0) | ||
| History of rheumatological disorder, n (%) | 43 | p<0.05* | |||
| No | 1981 (88.8) | 1549 (89.9) | 432 (85.2) | ||
| Yes | 249 (11.2) | 174 (10.1) | 75 (14.8) | ||
| History of autoimmune and/or inflammatory diseases, n (%) | 62 | p<0.05 | |||
| No | 2027 (91.7) | 1559 (91.5) | 468 (92.3) | ||
| Yes | 184 (8.3) | 145 (8.5) | 39 (7.7) | ||
| History of chronic cardiac disease, n (%) | 36 | p<0.001*** | |||
| No | 1539 (68.8) | 1271 (73.6) | 268 (52.4) | ||
| Yes | 698 (31.2) | 455 (26.4) | 243 (47.6) | ||
| History of chronic haematological disease, n (%) | 50 | p<0.05 | |||
| No | 2133 (96.0) | 1648 (96.0) | 485 (95.7) | ||
| Yes | 90 (4.0) | 68 (4.0) | 22 (4.3) | ||
| History of chronic kidney disease, n (%) | 45 | p<0.001*** | |||
| No | 1987 (89.2) | 1566 (91.3) | 421 (82.2) | ||
| Yes | 241 (10.8) | 150 (8.7) | 91 (17.8) | ||
| History of chronic neurological disorder, n (%) | 45 | p<0.001*** | |||
| No | 1921 (86.2) | 1519 (88.4) | 402 (79.0) | ||
| Yes | 307 (13.8) | 200 (11.6) | 107 (21.0) | ||
| History of chronic pulmonary disease (not asthma), n (%) | 47 | p<0.001*** | |||
| No | 1790 (80.4) | 1419 (82.5) | 371 (73.2) | ||
| Yes | 436 (19.6) | 300 (17.5) | 136 (26.8) |
***p<0.001, **p<0.01, *p<0.05.
Figure 3(A) Overall performance of both models per feature set. All models perform well above chance level. XGB generally performs better than LR, except on the premorbid feature set, where both models performed equally. The highest performance was achieved by XGB on both all features and the 10 selected features. (B) The confusion matrix of the best performing models, XGB trained on the 10 selected features. The prediction threshold was tuned to the shortest distance to the upper left corner of the AUC plot to create the ‘optimal’ binary prediction. AUC, area under the curve; LR, logistic regression; ROC, receiver operating curve; XGB, extreme gradient boosting.
Evaluation metrics for both classifiers for each feature set
| Classifiers | Feature set | AUC | Sensitivity | Specificity | PPV | NPV |
| LR | Premorbid | 0.77 (0.72 to 0.81) | 0.73 (0.61 to 0.84) | 0.71 (0.64 to 0.78) | 0.39 (0.35 to 0.44) | 0.91 (0.88 to 0.95) |
| Clinical presentation | 0.67 (0.62 to 0.71) | 0.60 (0.51 to 0.68) | 0.63 (0.57 to 0.69) | 0.30 (0.22 to 0.38) | 0.86 (0.83 to 0.90) | |
| Laboratory and radiology | 0.66 (0.59 to 0.73) | 0.65 (0.47 to 0.83) | 0.54 (0.34 to 0.73) | 0.25 (0.16 to 0.34) | 0.83 (0.74 to 0.91) | |
| Premorbid+clinical presentation | 0.79 (0.75 to 0.83) | 0.71 (0.62 to 0.80) |
| 0.38 (0.32 to 0.43) | 0.91 (0.89 to 0.93) | |
| All | 0.71 (0.67 to 0.76) | 0.62 (0.52 to 0.73) | 0.70 (0.62 to 0.78) | 0.36 (0.28 to 0.44) | 0.88 (0.85 to 0.92) | |
| Ten best |
|
| 0.71 (0.65 to 0.77) |
|
| |
| XGB | Premorbid | 0.77 (0.73 to 0.81) | 0.68 (0.54 to 0.81) | 0.60 (0.39 to 0.82) | 0.36 (0.29 to 0.43) | 0.68 (0.44 to 0.92) |
| Clinical presentation | 0.73 (0.71 to 0.74) | 0.69 (0.61 to 0.77) | 0.64 (0.59 to 0.69) | 0.33 (0.26 to 0.40) | 0.89 (0.87 to 0.92) | |
| Laboratory and radiology | 0.72 (0.66 to 0.77) | 0.68 (0.60 to 0.75) | 0.63 (0.57 to 0.68) | 0.31 (0.27 to 0.35) | 0.88 (0.84 to 0.92) | |
| Premorbid+clinical presentation | 0.81 (0.78 to 0.83) |
| 0.62 (0.47 to 0.78) | 0.36 (0.29 to 0.44) | 0.81 (0.62 to 1.00) | |
| All |
| 0.66 (0.54 to 0.78) | 0.77 (0.65 to 0.89) |
| 0.91 (0.88 to 0.95) | |
| Ten best |
| 0.67 (0.57 to 0.77) |
| 0.44 (0.40 to 0.48) |
|
The average and 95% CIs over all leave-onehospital-out cross-validation iterations are presented. Values in bold represent the best performance for each metric per classifier. The premorbid feature set includes age, gender, occupation and medical history.
AUC, area under the curve; LR, logistic regression; NPV, negative predictive value; PPV, positive predictive value; XGB, extreme gradient boosting.
Figure 4Confusion matrix per centre as predicted by extreme gradient boosting trained on the 10 selected features. The prediction threshold is optimised by the shortest distance to the upper-left corner in the receiver operating curve plot of the complete dataset. All matrices show comparable distributions, though centre 4 shows relatively many false positives.
Figure 5Performance per day for the extreme gradient boosting (XGB) trained on the 10 selected features. The left y-axis shows the absolute number of correct predictions and the right y-axis the relative number of correct predictions. Relative performance was calculated by correct/(correct+incorrect) and was well above chance level (0.5) for all days. The results indicate robust performance as the relative performance showed no decrease over time while varying between 0.6 and 0.9. The absolute performance shows that most patients have an outcome (both favourable and unfavourable within 1 week after admission. A high number of patients is seen on day 21, which is caused by the aggregation of all patients that are in the hospital 21 days or longer. Logistic regression on the 10 best features shows similar performance (figure not shown).
Figure 6SHAP values of XGB trained on all features. To prevent readability issues, only the top 20 features are shown and the SHAP value range is set from −1.5 to 1.5, visually cutting of a few outliers. The colour of each data points depicts the height of the value, where red corresponds to high values and blue to low values. SHAP values above 0 suggest a positive association with the outcome. Given the outcome is defined as mortality within 21 days, the positive SHAP values translate to association with higher mortality. AST SGOT, aspartate aminotransferase / serum glutamic-oxaloacetic transaminase; LDH, lactate dehydrogenase; SHAP, SHapley Additive exPlanations; XGB, extreme gradient boosting.
Model performance on (non-)ICU subgroup
| Classifiers | Feature set | AUC—ICU patients | AUC—non-ICU patients |
| LR | Premorbid |
| 0.81 (0.77 to 0.84) |
| Clinical presentation | 0.51 (0.37 to 0.66) | 0.68 (0.64 to 0.72) | |
| Laboratory and radiology | 0.54 (0.45 to 0.63) | 0.69 (0.61 to 0.76) | |
| Premorbid+clinical presentation | 0.60 (0.42 to 0.78) | 0.83 (0.80 to 0.86) | |
| All | 0.63 (0.50 to 0.76) | 0.75 (0.72 to 0.79) | |
| 10 best | 0.62 (0.44 to 0.80) |
| |
| XGB | Premorbid |
| 0.80 (0.76 to 0.83) |
| Clinical presentation | 0.57 (0.41 to 0.72) | 0.75 (0.72 to 0.77) | |
| Laboratory and radiology | 0.59 (0.52 to 0.66) | 0.76 (0.69 to 0.83) | |
| Premorbid+clinical presentation |
| 0.84 (0.81 to 0.87) | |
| All | 0.68 (0.58 to 0.78) |
| |
| 10 best | 0.68 (0.57 to 0.79) | 0.85 (0.82 to 0.88) |
Values in bold represent the best performance per classifier per subgroup. The premorbid feature set includes age, gender, occupation and medical history.
AUC, area under the curve; ICU, intensive care unit; LR, logistic regression; XGB, extreme gradient boosting.
Figure 7LR and XGB trained on the 10 selected features compared with two age-based decision rules. Both LR and XGB showed a higher AUC than both age-based rules. Nineteen patients did not have a value for age and were excluded for this analysis. AUC, area under the curve; LR, logistic regression; XGB, extreme gradient boosting.