| Literature DB >> 34021235 |
Ashish Verma1, Ankit B Patel1, Sonu Subudhi2, C Corey Hardin3, Melin J Khandekar4, Hang Lee5, Dustin McEvoy6, Triantafyllos Stylianopoulos7, Lance L Munn8, Sayon Dutta9,10, Rakesh K Jain11.
Abstract
As predicting the trajectory of COVID-19 is challenging, machine learning models could assist physicians in identifying high-risk individuals. This study compares the performance of 18 machine learning algorithms for predicting ICU admission and mortality among COVID-19 patients. Using COVID-19 patient data from the Mass General Brigham (MGB) Healthcare database, we developed and internally validated models using patients presenting to the Emergency Department (ED) between March-April 2020 (n = 3597) and further validated them using temporally distinct individuals who presented to the ED between May-August 2020 (n = 1711). We show that ensemble-based models perform better than other model types at predicting both 5-day ICU admission and 28-day mortality from COVID-19. CRP, LDH, and O2 saturation were important for ICU admission models whereas eGFR <60 ml/min/1.73 m2, and neutrophil and lymphocyte percentages were the most important variables for predicting mortality. Implementing such models could help in clinical decision-making for future infectious disease outbreaks including COVID-19.Entities:
Year: 2021 PMID: 34021235 PMCID: PMC8140139 DOI: 10.1038/s41746-021-00456-x
Source DB: PubMed Journal: NPJ Digit Med ISSN: 2398-6352
Fig. 1Schematic diagram representing the process of machine learning model development.
a Flow diagram depicting steps in obtaining the training and temporal validation datasets (with patient numbers in each step). b The process of patient selection, dataset balancing, hyperparameter tuning, cross-validation and temporal validation are shown.
Fig. 2F1 score comparison and variables of importance for ICU admission and mortality prediction models.
a, b Bar plots representing the F1 scores of ICU admission and mortality prediction models. Error bars indicate standard deviation from mean. Statistical analysis was performed using Two-stage step-up method of Benjamini, Krieger and Yekutieli test which controls for False discovery rate (FDR) during multiple comparison. p-value style is geometric progression - <0.03 (*), <0.002 (**), <0.0002 (***), <0.0001 (****). c SHAP value summary dot plot and d variable importance of RandomForest algorithm-based ICU admission model. e SHAP value summary dot plot and f variable importance of RandomForest algorithm-based mortality model. The calculation of SHAP values is done by comparing the prediction of the model with and without the feature in every possible way of adding the feature to the model. The bar plot depicts the mean SHAP values whereas the summary dot plot shows the impact on the model. The color of the dot represents the value of the feature and the X-axis depicts the direction and magnitude of the impact. Red colored dots represent high value of the feature and the blue represents lower value. A positive SHAP value means the feature value increases likelihood of ICU admission/mortality. For features with positive SHAP value for red dots, suggests directly proportional variable to outcome of interest and those with positive SHAP value for blue dots, suggest inverse correlation.