Literature DB >> 33042536

Mortality prediction model for the triage of COVID-19, pneumonia, and mechanically ventilated ICU patients: A retrospective study.

Logan Ryan1, Carson Lam1, Samson Mataraso1, Angier Allen1, Abigail Green-Saxena1, Emily Pellegrini1, Jana Hoffman1, Christopher Barton1, Andrea McCoy2, Ritankar Das1.   

Abstract

RATIONALE: Prediction of patients at risk for mortality can help triage patients and assist in resource allocation.
OBJECTIVES: Develop and evaluate a machine learning-based algorithm which accurately predicts mortality in COVID-19, pneumonia, and mechanically ventilated patients.
METHODS: Retrospective study of 53,001 total ICU patients, including 9166 patients with pneumonia and 25,895 mechanically ventilated patients, performed on the MIMIC dataset. An additional retrospective analysis was performed on a community hospital dataset containing 114 patients positive for SARS-COV-2 by PCR test. The outcome of interest was in-hospital patient mortality.
RESULTS: When trained and tested on the MIMIC dataset, the XGBoost predictor obtained area under the receiver operating characteristic (AUROC) values of 0.82, 0.81, 0.77, and 0.75 for mortality prediction on mechanically ventilated patients at 12-, 24-, 48-, and 72- hour windows, respectively, and AUROCs of 0.87, 0.78, 0.77, and 0.734 for mortality prediction on pneumonia patients at 12-, 24-, 48-, and 72- hour windows, respectively. The predictor outperformed the qSOFA, MEWS and CURB-65 risk scores at all prediction windows. When tested on the community hospital dataset, the predictor obtained AUROCs of 0.91, 0.90, 0.86, and 0.87 for mortality prediction on COVID-19 patients at 12-, 24-, 48-, and 72- hour windows, respectively, outperforming the qSOFA, MEWS and CURB-65 risk scores at all prediction windows.
CONCLUSIONS: This machine learning-based algorithm is a useful predictive tool for anticipating patient mortality at clinically useful timepoints, and is capable of accurate mortality prediction for mechanically ventilated patients as well as those diagnosed with pneumonia and COVID-19.
© 2020 IJS Publishing Group Ltd. Published by Elsevier Ltd.

Entities:  

Keywords:  Artificial intelligence; COVID-19; Machine learning; Mortality prediction; SARS-CoV-2

Year:  2020        PMID: 33042536      PMCID: PMC7532803          DOI: 10.1016/j.amsu.2020.09.044

Source DB:  PubMed          Journal:  Ann Med Surg (Lond)        ISSN: 2049-0801


Introduction

Infection prevention and control recommendations from the World Health Organization (WHO) stress that early detection, effective triage, and isolation of potentially infectious patients are essential to prevent unnecessary exposures to COVID-19 [1]. However, the rapid spread of COVID-19 has outpaced US healthcare facilities’ ability to administer diagnostic tests to guide the quarantine and triage COVID-19 patients [[2], [3], [4], [5]]. The outbreak significantly affects the availability of necessary hospital resources (i.e. respirators [6] and mechanical ventilators [[7], [8], [9], [10], [11], [12]]). COVID-19 can be lethal, with a variable case fatality rate considered to be between that of severe acute respiratory syndrome (SARS; 9.5% [13]) and influenza (0.1%) [[14], [15], [16]] and the potential to develop into severe respiratory diseases [[17], [18], [19]]. During this period of unprecedented health crisis, clinicians must prioritize care for at-risk individuals to maximize limited resources. Mortality prediction tools aid in triage and resource allocation by providing advance warning of patient deterioration. Our prior work has validated machine-learning (ML) algorithms for their ability to predict mortality and patient stability in a variety of settings and on diverse patient populations [[20], [21], [22], [23], [24]].

Theory

Of particular interest during the COVID-19 pandemic is mortality prediction of COVID-19 patients, as well as those who have developed respiratory complications such as pneumonia and conditions requiring mechanical ventilation. Some prior studies predicting mortality in the mechanically vented subpopulation have used a logistic regression model. When applied on day 21 [25] or 14 [26,27] of mechanical ventilation, this provides a probability of 1-year mortality. These studies were designed to determine the long-term prognosis of patients receiving prolonged mechanical ventilation. Here we present a mortality prediction tool applied to intensive care unit (ICU) patients requiring mechanical ventilation as well as those diagnosed with pneumonia, with mortality prediction windows of 12, 24, 48 and 72 h prior to death. We apply this algorithm for the same mortality prediction windows in COVID-19 patients.

Materials and methods

Data sources

Patient records were collected from the Medical Information Mart for Intensive Care (MIMIC) dataset, an openly available dataset developed by the MIT Lab for Computational Physiology, comprising de-identified health data associated with ~60,000 intensive care unit admissions [28]. It includes demographics, vital signs, laboratory tests, medications, and more. Data collection was passive with no impact on patient safety. MIMIC data has been de-identified in compliance with the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule. Patient records of COVID-19 polymerase chain reaction (PCR) positive patients were collected from a community hospital and formatted in the same manner as the MIMIC dataset. A total of 114 patient encounters were collected between 12 March and 12 April 2020. Data collection was passive with no impact on patient safety. Dascena establishes de-identification by removing all protected health information (PHI) identifiers and by jittering all timestamps (including date of birth (DOB)) randomly either forwards or backwards in time. Studies performed on de-identified patient data constitute non-human subjects research, and thus this study has been determined by the Pearl Institutional Review Board to be Exempt according to FDA 21 CFR 56.104 and 45CFR46.104(b) (4): (4) Secondary Research Uses of Data or Specimens under study number 20-DASC-119.

Data processing

For the MIMIC and community hospital datasets, we included only records for patients aged 18 years or older. We excluded patient records for which there were no raw data or no discharge or death dates. We then filtered for length of stay (LOS) for the different look aheads of 12, 24, 48, and 72 h. Table 1 lists the number of patients for each inclusion criterion from the MIMIC dataset. Inclusion criteria for the community hospital dataset are listed in Table 2. We minimally processed raw electronic health record (EHR) data to generate features. Following imputation of missing values, we averaged one value for each measurement each hour for up to 3 h preceding prediction time. We also calculated differences between the current hour and the prior hour and between the prior hour and the hour before that. We concatenated these values from each measurement into a feature vector. For the MIMIC dataset, pneumonia patients were identified by International Classification of Diseases (ICD) codes, while those requiring mechanical ventilation and their corresponding start times were determined by chart measurements indicative of a mechanical ventilation setting. In the community hospital dataset, COVID-19 patients were identified with positive SARS-Cov2 PCR tests.
Table 1

Inclusion criteria for patients in the MIMIC dataset. *Required measurements include Age, Heart Rate, Respiratory Rate, Peripheral Oxygen Saturation (SpO2), Temperature, Systolic Blood Pressure, Diastolic Blood Pressure, White Blood Cell Counts, Platelets, Lactate, Creatinine, and Bilirubin.

CriterionEncounters
ICU stays in MIMIC61,532
ICU stays with patients aged ≥ 18 years, any measurements present*53,001
Length of stay filtering for all patients 12 h50,695
Length of stay filtering for all patients 24 h40,959
Length of stay filtering for all patients 48 h26,576
Length of stay filtering for all patients 72 h18,275
Length of stay filtering for mechanically ventilated patients 12 h24,934
Length of stay filtering for mechanically ventilated patients 24 h21,414
Length of stay filtering for mechanically ventilated patients 48 h16,085
Length of stay filtering for mechanically ventilated patients 72 h12,368
Length of stay filtering for pneumonia patients 12 h8879
Length of stay filtering for pneumonia patients 24 h7678
Length of stay filtering for pneumonia patients 48 h5600
Length of stay filtering for pneumonia patients 72 h4169
Table 2

Inclusion criteria for patients in the community hospital dataset. *Required measurements include Age, Heart Rate, Respiratory Rate, Peripheral Oxygen Saturation (SpO2), Temperature, Systolic Blood Pressure, Diastolic Blood Pressure, White Blood Cell Counts, Platelets, Lactate, Creatinine, and Bilirubin.

CriterionEncounters
COVID positive stays in community hospital114
COVID positive stays with patients aged ≥ 18 years, any measurements present*114
Length of stay filtering for all COVID positive patients 12 h114
Length of stay filtering for all COVID positive patients 24 h112
Length of stay filtering for all COVID positive patients 48 h110
Length of stay filtering for all COVID positive patients 72 h103
Inclusion criteria for patients in the MIMIC dataset. *Required measurements include Age, Heart Rate, Respiratory Rate, Peripheral Oxygen Saturation (SpO2), Temperature, Systolic Blood Pressure, Diastolic Blood Pressure, White Blood Cell Counts, Platelets, Lactate, Creatinine, and Bilirubin. Inclusion criteria for patients in the community hospital dataset. *Required measurements include Age, Heart Rate, Respiratory Rate, Peripheral Oxygen Saturation (SpO2), Temperature, Systolic Blood Pressure, Diastolic Blood Pressure, White Blood Cell Counts, Platelets, Lactate, Creatinine, and Bilirubin. Data were discretized into 1 h intervals, beginning at the time of the first recorded patient measurement and hourly measurements were required for each input variable. Measurements were averaged to produce a single value in cases when multiple observations of the same patient measurement were taken within a given hour. This ensures that the measurement rate was the same across patients and across time. Missing values were imputed by carrying forward the most recent past measurement in cases where no measurement of a clinical variable was available for a given hour. For some patients with infrequent measurements of one or more vital signs, this simple imputation resulted in many consecutive hours with identical values. Our publication on the use of gradient boosted trees for sepsis detection and prediction describes the data processing in detail [29]. Predictions were generated for all experiments using the following variables: Age, Heart Rate, Respiratory Rate, Peripheral Oxygen Saturation (SpO2), Temperature, Systolic Blood Pressure, Diastolic Blood Pressure, White Blood Cell Counts, Platelets, Lactate, Creatinine, and Bilirubin, over an interval of 3 h and their corresponding differentials in that interval.

Gold standard

The outcome of interest was in-hospital patient mortality, determined retrospectively for each patient. In the MIMIC dataset, we used the expire_flag field to identify the last stays of those patients. Similarly, the community hospital dataset contains a deceased flag that is either true or false to determine mortality.

The machine learning algorithm

The classifier was created using the XGBoost method for fitting “boosted” decision trees. We applied the XGBoost package for Python32 to the patient age and vital sign measurements and their temporal changes, where temporal changes included hourly differences between each measurement beginning 3 h before prediction time. Gradient boosting, which XGBoost implements, is an ensemble learning technique that combines results from multiple decision trees to create prediction scores. Each tree splits the patient population into smaller and smaller groups, successively. Each branch splits the patients who enter it into two groups, based on whether their value of some covariate is above or below some threshold—for instance, a branch might divide patients according to whether their temperature is above or below 100 °F. After some number of branches, the tree ends in a set of “leaves.” Each patient is in exactly one leaf, according to the values of his or her measurements. Each “leaf” of the tree is predicted to have the same risk of mortality. The covariate involved in each split and the threshold value are selected by an algorithm designed to trade off fit to the training data and accuracy on out-of-sample data by using cross-validation to avoid “over-fitting.” We restricted tree depth to a maximum of six branching levels, set the learning rate parameter of XGBoost to 0.1, and restricted the tree ensembles to 1000 trees to limit the computational burden. Hyperparameter optimization was performed using cross-validated grid search. We included a hyperparameter for the early stopping of the iterative tree-addition procedure to prevent overfit of the model on the training data and optimized across this hyperparameter using fivefold cross-validation. Due to computational and time constraints, hyperparameter optimization was performed across a sparse parameter grid, where the candidate hyperparameter values were chosen to span large ranges of viable parameter space. Cross-validated grid search was conducted to determine the optimal combination of candidate hyperparameters. While XGBoost has a large number of trainable parameters, computational and time constraints limited the set of parameters to be tuned to just those parameters with the largest impact on performance on the training data and most relevant to the prediction task. To validate the boosted tree predictor when training and testing was performed on data from the same institution, we used fivefold cross-validation. For each model, four-fifths of the patients were randomly selected to train the model and the remaining one-fifth were used as a hold-out set to test the predictions. To account for the random selection of the training set, reported performance metrics are the average performance of the five separately trained models arising from fivefold cross-validation, each of which was trained on four-fifths of the data and tested on the remaining fifth. For AUROC, we also reported the standard deviation of the five AUROC values obtained from cross-validation. For patients who died, we modeled mortality 12, 24, 48, and 72 h before death to evaluate the performance with a variety of lead times. For mechanically ventilated encounters, the time point was the start of ventilation for positive and negative class. Predictors were trained independently for each distinct lookahead time. In 12, 24, 48 and 72 h long lookahead predictions following a 3-h window of measurements, patients must have data for, respectively, 15, 27, 51 or 75 respective hours preceding the time of in-hospital mortality or the time of discharge. Accordingly, we selected patients with the appropriate stays for the training and testing of each lookahead.

Comparison to rule-based methods

To calculate the AUROC for rule-based predictors of mortality, we calculated quick Sepsis Related Organ Failure Assessment (qSOFA), Modified Early Warning Score (MEWS) and CURB-65 scores for patients in the MIMIC database. qSOFA has also been used to predict poor outcomes in pneumonia patients, including the need for mechanical ventilation, and has been shown to either match or outperform other outcome predictors such as SOFA, CRB, CRB-65 and the pneumonia severity index (PSI) [30,31]. Among more generally used mortality prediction scores, qSOFA has been shown to have similar predictive performance to that of Acute Physiology, Age, Chronic Health Evaluation (APACHE) II or SOFA, as evidenced by a lack of statistical difference between AUROC [32]. The MEWS and CURB-65 scores have also been validated for mortality prediction in general patient populations [33,34] and those with community-acquired pneumonia [35] or COVID-19 [36], respectively. Scores were calculated using the entire dataset. We calculated the qSOFA score using systolic blood pressure, respiratory rate, and Glasgow Coma Scale (GCS) from EHR data. MEWS was calculated using systolic blood pressure, heart rate, respiratory rate, and temperature. GCS was used as a proxy for evaluating AVPU. CURB-65 scores were computed using age, BUN, respiratory rate, as well as systolic and diastolic blood pressure. A GCS of less than or equal to 14 was used as a proxy for confusion. Comparator score calculations for patients in the community hospital dataset were modified based on available data.

Results

XGBoost model training and testing was performed on the MIMIC dataset. Patient demographic information for all ICU encounters as well as each subpopulation are presented in Table 3, Table 4, Table 5. Patient demographic information for all encounters from the community hospital data set are listed in Table 6.
Table 3

Patient demographic information for MIMIC dataset for all encounters (53,001).

CharacteristicMIMIC (%)
Age18–294.70
30–395.25
40–4910.65
50–5917.52
60–6920.99
>7040.90
GenderMale43.68
Female56.32
In-hospital DeathYes9.59
No90.41
Table 4

Patient demographic information for MIMIC dataset for all pneumonia encounters (9,166).

CharacteristicMIMIC (%)
Age18–292.97
30–394.34
40–499.33
50–5917.17
60–6921.01
>7045.18
GenderMale53.99
Female46.01
In-hospital DeathYes11.41
No88.59
Table 5

Patient demographic information for MIMIC dataset for all mechanically ventilated encounters (25,895).

CharacteristicMIMIC (%)
Age18–294.50
30–394.38
40–4910.16
50–5917.91
60–6922.39
>7040.66
GenderMale59.1
Female40.9
In-hospital DeathYes15.73
No84.27
Table 6

Patient demographic information for community hospital dataset encounters.

CharacteristicCommunity Hospital (%)
Age18–297.00
30–3910.5
40–498.77
50–5915.79
60–6923.68
>7034.21
GenderMale58.77
Female41.33
In-hospital DeathYes21.0
No79.0
Patient demographic information for MIMIC dataset for all encounters (53,001). Patient demographic information for MIMIC dataset for all pneumonia encounters (9,166). Patient demographic information for MIMIC dataset for all mechanically ventilated encounters (25,895). Patient demographic information for community hospital dataset encounters. The XGBoost ML algorithm predicted mortality in all ICU patients as well as mechanically ventilated and pneumonia patients more accurately than qSOFA, MEWS and CURB-65 at all prediction windows ( Table 7, Table 8 and Supplementary Table S5). When trained and tested on the MIMIC dataset, the XGBoost predictor obtained AUROCs of 0.82, 0.81, 0.77, and 0.75 for mortality prediction on mechanically ventilated patients at 12-, 24-, 48-, and 72- hour windows, respectively, and AUROCs of 0.87, 0.78, 0.77, and 0.73 for mortality prediction on pneumonia patients at 12-, 24-, 48-, and 72- hour windows, respectively (Fig. 1). Feature importance statistics are listed in Table S1, Table S2, Table S3, Table S4.
Table 7

Comparison of AUROC, average precision (APR), sensitivity, specificity, F1, diagnostic odds ratio (DOR), positive and negative likelihood ratios (LR+ and LR‒), accuracy and recall obtained by the machine learning algorithm (MLA) and the qSOFA score for mortality prediction at 12-, 24-, 48-, and 72- hour windows on pneumonia patients using the MIMIC dataset. Standard deviations are listed in parenthesis. For AUROC and APR the operating point was set near a sensitivity of 0.800.

MLA: PneumoniaqSOFA: PneumoniaMEWS: PneumoniaCURB-65: Pneumonia
12 hAUROC0.865 (0.0027)0.7190.7920.595
APR0.594 (0.0075)0.2420.4050.154
Sensitivity0.800 (0.0000)0.9330.8840.973
Specificity0.761 (0.0101)0.3040.4720.169
F10.467 (0.0094)0.2800.3230.255
DOR12.74 (0.715)6.1266.8317.397
LR+3.35 (0.143)1.3421.6741.171
LR-3.35 (0.143)1.3421.6741.171
Accuracy0.766 (0.0088)0.3850.5240.272
Recall0.804 (0.0000)0.9330.8840.973
24 hAUROC0.783 (0.0017)0.7210.7790.612
APR0.442 (0.0070)0.2670.3460.159
Sensitivity0.802 (0.0000)0.9320.9060.974
Specificity0.594 (0.0324)0.2850.4390.142
F10.349 (0.0160)0.2710.3130.247
DOR5.99 (0.727)5.4847.5526.211
LR+1.99 (0.144)1.3041.6141.136
LR-1.99 (0.144)1.3041.6141.136
Accuracy0.621 (0.0283)0.3670.4980.248
Recall0.802 (0.0000)0.9320.9060.974
48 hAUROC0.769 (0.0074)0.6810.7470.606
APR0.407 (0.0099)0.2640.3340.178
Sensitivity0.803 (0.0000)0.9170.8660.975
Specificity0.580 (0.0308)0.2380.3940.122
F10.374 (0.0158)0.2840.3170.271
DOR5.67 (0.701)3.4544.2115.318
LR+1.92 (0.138)1.2031.4291.110
LR-1.92 (0.138)1.2031.4291.110
Accuracy0.612 (0.0264)0.3350.4620.245
Recall0.803 (0.0000)0.9170.8660.975
72 hAUROC0.726 (0.0047)0.6450.6680.592
APR0.333 (0.0168)0.2270.2750.185
Sensitivity0.801 (0.0030)0.9330.8670.970
Specificity0.507 (0.0137)0.2020.3330.098
F10.357 (0.0070)0.2960.3150.281
DOR4.16 (0.307)3.5423.2503.541
LR+1.63 (0.052)1.1691.3001.075
LR-1.63 (0.052)1.1691.3001.075
Accuracy0.553 (0.0120)0.3150.4160.233
Recall0.807 (0.0000)0.9330.8670.970
Table 8

Comparison of AUROC, average precision (APR), sensitivity, specificity, F1, diagnostic odds ratio (DOR), positive and negative likelihood ratios (LR+ and LR‒), accuracy and recall obtained by the machine learning algorithm (MLA) and the qSOFA score for mortality prediction at 12-, 24-, 48-, and 72- hour windows on mechanically ventilated patients using the MIMIC dataset. Standard deviations are listed in parenthesis. For AUROC and APR the operating point was set near a sensitivity of 0.800.

MLA: Mechanically VentilatedqSOFA: Mechanically VentilatedMEWS: Mechanically VentilatedCURB-65: Mechanically Ventilated
12 hAUROC0.815 (0.0030)0.7310.8080.620
APR0.598 (0.0055)0.2760.4170.173
Sensitivity0.803 (0.0016)0.9690.8450.988
Specificity0.647 (0.0241)0.2320.6300.098
F10.394 (0.0159)0.2800.4000.253
DOR7.54 (0.902)9.4149.2879.237
LR+2.29 (0.167)1.2612.2851.096
LR-2.29 (0.167)1.2612.2851.096
Accuracy0.668 (0.0211)0.3310.6590.218
Recall0.802 (0.0000)0.9690.8450.988
24 hAUROC0.806 (0.0030)0.7290.7890.626
APR0.506 (0.0098)0.2740.3570.179
Sensitivity0.803 (0.0017)0.9700.8100.987
Specificity0.634 (0.0101)0.2440.6110.109
F10.392 (0.0060)0.2890.3810.260
DOR7.06 (0.299)10.3846.7149.343
LR+2.20 (0.060)1.2832.0841.108
LR-2.20 (0.060)1.2832.0841.108
Accuracy0.658 (0.0087)0.3440.6380.230
Recall0.802 (0.0000)0.9700.8100.987
48 hAUROC0.768 (0.0034)0.7150.7530.611
APR0.488 (0.0048)0.3120.3570.209
Sensitivity0.804 (0.0019)0.9770.8260.977
Specificity0.553 (0.0091)0.1820.5460.085
F10.398 (0.0046)0.3220.4030.298
DOR5.08 (0.198)9.2665.7103.840
LR+1.80 (0.038)1.1941.8181.067
LR-1.80 (0.038)1.1941.8181.067
Accuracy0.595 (0.0076)0.3150.5920.233
Recall0.803 (0.0000)0.9770.8260.977
72 hAUROC0.749 (0.0053)0.6570.6650.601
APR0.406 (0.0115)0.2610.2780.213
Sensitivity0.805 (0.0000)0.9770.9430.983
Specificity0.558 (0.0135)0.1530.2610.070
F10.413 (0.0068)0.3270.3470.308
DOR5.22 (0.288)7.6835.7994.322
LR+1.82 (0.056)1.1541.2761.057
LR-1.82 (0.056)1.1541.2761.057
Accuracy0.601 (0.0111)0.2970.3800.230
Recall0.805 (0.0000)0.9770.9430.983
Table S5

Comparison of AUROC, average precision (APR), sensitivity, specificity, F1, diagnostic odds ratio (DOR), positive and negative likelihood ratios (LR+ and LR‒), accuracy and recall obtained by the machine learning algorithm (MLA) and the qSOFA score for mortality prediction at 12-, 24-, 48-, and 72- hour windows on all ICU patients using the MIMIC dataset. Standard deviations are listed in parenthesis. For AUROC and APR the operating point was set near a sensitivity of 0.800.

MLA:All ICUqSOFA:All ICUMEWS:All ICUCURB-65:All ICU
12 hAUROC0.862 (0.0012)0.7600.8330.652
APR0.553 (0.0018)0.2250.3920.131
Sensitivity0.801 (0.0000)0.9490.8970.984
Specificity0.750 (0.0046)0.3730.5590.185
F10.378 (0.0040)0.2360.2900.198
DOR12.09 (0.297)11.08511.05414.088
LR+3.21 (0.059)1.5132.0331.208
LR-3.21 (0.059)1.5132.0331.208
Accuracy0.755 (0.0041)0.4260.5900.260
Recall0.801 (0.0000)0.9490.8970.984
24 hAUROC0.819 (0.0018)0.7420.8040.636
APR0.432 (0.0052)0.2230.3390.136
Sensitivity0.800 (0.0000)0.9390.8960.978
Specificity0.671 (0.0036)0.3570.5240.178
F10.338 (0.0023)0.2450.2920.210
DOR8.18 (0.131)8.6299.4739.692
LR+2.43 (0.026)1.4621.8821.189
LR-2.43 (0.026)1.4621.8821.189
Accuracy0.684 (0.0032)0.4160.5620.258
Recall0.800 (0.0000)0.9390.8960.978
48 hAUROC0.789 (0.0016)0.7060.7600.616
APR0.408 (0.0024)0.2320.2970.158
Sensitivity0.801 (0.0000)0.9450.8820.977
Specificity0.619 (0.0110)0.2820.4370.137
F10.356 (0.0063)0.2690.3010.242
DOR6.55 (0.315)6.7815.7716.791
LR+2.10 (0.063)1.3161.5651.132
LR-2.10 (0.063)1.3161.5651.132
Accuracy0.641 (0.0097)0.3640.4920.241
Recall0.801 (0.0000)0.9450.8820.977
72 hAUROC0.746 (0.0026)0.6550.6850.603
APR0.356 (0.0017)0.2180.2560.182
Sensitivity0.802 (0.0015)0.9290.8720.963
Specificity0.546 (0.0049)0.2470.3880.111
F10.361 (0.0025)0.2950.3210.270
DOR4.86 (0.114)4.2994.2963.216
LR+1.77 (0.020)1.2331.4231.083
LR-1.77 (0.020)1.2331.4231.083
Accuracy0.583 (0.0043)0.3470.4590.236
Recall0.801 (0.0000)0.9290.8720.963
Fig. 1

Comparison of area under the receiver operating characteristic (AUROC) curves for XGBoost models. AUROCs for the boosted tree predictor are presented for 12-, 24-, 48-, and 72-h mortality prediction with training and testing performed on MIMIC data from (A) all ICU patients as well as subpopulations of (B) mechanically ventilated (vented) ICU patients and (C) pneumonia ICU patients.

Table S1

Feature Importance 12 h outlook using MIMIC dataset

RankFeatureImportance (f_score)
1SpO2_-281
2HR77
3SysABP_diff_-276
4DiasABP_-068
5SysABP_-068
6WBC_-166
7HR_-264
8Platelets_-164
9RespRate_diff_-163
10SpO2_diff_-162
Table S2

Feature Importance 24 h outlook using MIMIC dataset

RankFeatureImportance (f_score)
1SpO2_diff_-192
2DiasABP_-169
3Platelets_-168
4DiasABP_diff_-164
5Temp_-161
6RespRate_-260
7DiasABP_diff_-260
8WBC_-160
9HR_diff_-157
10RespRate_diff_-156
Table S3

Feature Importance 48 h outlook using MIMIC dataset

RankFeatureImportance (f_score)
1WBC_-248
2DiasABP_diff_-244
3HR_-243
4DiasABP_-141
5RespRate_-141
6Temp_diff_-238
7Temp_diff_-136
8Temp_-035
9SpO2_diff_-235
10WBC_-134
Table S4

Feature Importance 72 h outlook using MIMIC dataset

RankFeatureImportance (f_score)
1SysABP_diff_-161
2Platelets_-046
3SpO2_-046
4DiasABP_-245
5Lactate_-045
6RespRate_-045
7SysABP_-144
8Creatinine_-142
9Temp_-242
10RespRate_diff_-137
Comparison of AUROC, average precision (APR), sensitivity, specificity, F1, diagnostic odds ratio (DOR), positive and negative likelihood ratios (LR+ and LR‒), accuracy and recall obtained by the machine learning algorithm (MLA) and the qSOFA score for mortality prediction at 12-, 24-, 48-, and 72- hour windows on pneumonia patients using the MIMIC dataset. Standard deviations are listed in parenthesis. For AUROC and APR the operating point was set near a sensitivity of 0.800. Comparison of AUROC, average precision (APR), sensitivity, specificity, F1, diagnostic odds ratio (DOR), positive and negative likelihood ratios (LR+ and LR‒), accuracy and recall obtained by the machine learning algorithm (MLA) and the qSOFA score for mortality prediction at 12-, 24-, 48-, and 72- hour windows on mechanically ventilated patients using the MIMIC dataset. Standard deviations are listed in parenthesis. For AUROC and APR the operating point was set near a sensitivity of 0.800. Comparison of area under the receiver operating characteristic (AUROC) curves for XGBoost models. AUROCs for the boosted tree predictor are presented for 12-, 24-, 48-, and 72-h mortality prediction with training and testing performed on MIMIC data from (A) all ICU patients as well as subpopulations of (B) mechanically ventilated (vented) ICU patients and (C) pneumonia ICU patients. Detailed performance metrics for the XGBoost predictor on pneumonia and mechanically ventilated patients are presented in Table 7, Table 8 and on COVID-19 patients in Table 9. All predictor training and testing was performed on the MIMIC data set. The diagnostic odds ratio (DOR) is a measure for comparing diagnostic accuracy between tools and is calculated as (True Positive/False Negative)/(False Positive/True Negative). DOR represents the ratio of the odds of a true positive prediction of mortality in patients who died within a certain prediction window to the odds of a false positive prediction of mortality in patients who did not die within a certain prediction window. For all prediction windows, the XGBoost predictor had a higher DOR than qSOFA.
Table 9

Comparison of AUROC, average precision (APR), sensitivity, specificity, F1, diagnostic odds ratio (DOR), positive and negative likelihood ratios (LR+ and LR‒), accuracy and recall obtained by the ML algorithm (MLA) and the qSOFA, MEWS and CURB-65 scores for mortality prediction at 12-, 24-, 48-, and 72- hour windows on 114 COVID-19 PCR Positive Patients from the community hospital data set. Standard deviations are listed in parenthesis. For AUROC and APR the operating point was set near a sensitivity of 0.800. n/a (not applicable).

MLAqSOFAMEWSCURB-65
12 hAUROC0.910 (0.0024)0.7910.7690.780
APR0.795 (0.0054)0.5100.5140.369
Sensitivity0.826 (0.0000)1.0001.0001.000
Specificity0.804 (0.0239)0.0000.0220.500
F10.638 (0.0228)0.3380.3430.505
DOR19.89 (2.912)n/an/an/a
LR+4.29 (0.506)1.0001.0232.000
LR-4.29 (0.506)1.0001.0232.000
Accuracy0.809 (0.0191)0.2040.2210.602
Recall0.826 (0.0000)1.0001.0001.000
24 hAUROC0.903 (0.0059)0.8400.7800.764
APR0.754 (0.0127)0.5630.5150.354
Sensitivity0.826 (0.0000)0.8261.0001.000
Specificity0.816 (0.0054)0.8220.0330.444
F10.649 (0.0054)0.6550.3460.479
DOR21.03 (0.770)21.969n/an/a
LR+4.48 (0.134)4.6471.0341.800
LR-4.48 (0.134)4.6471.0341.800
Accuracy0.818 (0.0043)0.8230.2300.558
Recall0.826 (0.0000)0.8261.0001.000
48 hAUROC0.862 (0.0088)0.7920.7240.802
APR0.684 (0.0156)0.4780.4440.384
Sensitivity0.818 (0.0000)1.0000.9551.000
Specificity0.773 (0.0334)0.0000.0220.522
F10.598 (0.0297)0.3280.3210.506
DOR15.80 (3.051)n/a0.477n/a
LR+3.69 (0.555)1.0000.9762.093
LR-3.69 (0.555)1.0000.9762.093
Accuracy0.782 (0.0268)0.1960.2050.616
Recall0.818 (0.0000)1.0000.9551.000
72 hAUROC0.873 (0.0034)0.7220.7970.751
APR0.649 (0.0209)0.3640.4520.320
Sensitivity0.819 (0.0190)1.0000.8571.000
Specificity0.760 (0.0181)0.0000.6110.467
F10.576 (0.0171)0.3180.4860.467
DOR14.64 (2.376)n/a9.429n/a
LR+3.43 (0.261)1.0002.2041.875
LR-3.43 (0.261)1.0002.2041.875
Accuracy0.771 (0.0146)0.1890.6580.568
Recall0.810 (0.0000)1.0000.8571.000
Comparison of AUROC, average precision (APR), sensitivity, specificity, F1, diagnostic odds ratio (DOR), positive and negative likelihood ratios (LR+ and LR‒), accuracy and recall obtained by the ML algorithm (MLA) and the qSOFA, MEWS and CURB-65 scores for mortality prediction at 12-, 24-, 48-, and 72- hour windows on 114 COVID-19 PCR Positive Patients from the community hospital data set. Standard deviations are listed in parenthesis. For AUROC and APR the operating point was set near a sensitivity of 0.800. n/a (not applicable). These results suggest that the XGBoost predictor is capable of predicting mortality in pneumonia, mechanically ventilated, and COVID-19 patients and outperforms the qSOFA, MEWS and CURB-65 mortality risk scores.

Discussion

Accurate mortality prediction can assist with the allocation of limited hospital resources and optimize patient management. Additionally, advanced mortality prediction can facilitate decision making with family and caregivers. The commonly used MEWS [37], the APACHE [38], Simplified Acute Physiology Score (SAPS II) [39], Sepsis-Related Organ Failure Assessment (SOFA) [40], and the quick SOFA (qSOFA) score [41] provide a rough estimate of mortality prediction, however the specificity and sensitivity of these tools are limited for COVID and mechanically ventilated populations [42]. Machine learning (ML) has been previously broadly applied to predictive tasks within the biosciences [[43], [44], [45], [46]]. ML-based tools for mortality prediction have been applied to sepsis [47,48] cardiac arrest [49], coronary artery disease [50], and extubation [51] patient populations, and have been implemented in a broad range of clinical settings, including the emergency department (ED) [48] and the intensive care unit (ICU) [[52], [53], [54], [55]]. Studies of mortality prediction on pneumonia and mechanically ventilated patients are particularly relevant for COVID-19 related lung complications. We have demonstrated that machine learning algorithms are useful predictive tools for anticipating patient mortality at clinically useful windows of 12, 24, 48, and 72 h in advance and have validated mortality prediction accuracy for COVID-19, pneumonia, mechanically ventilated, and all ICU patients (Fig. 1), demonstrating that for all prediction types and windows, our ML algorithm outperforms the qSOFA, MEWS and CURB-65 severity scores (Table 7, Table 8, Table 9). A meta-analysis of studies focusing on predicting mortality in pneumonia patients showed that of the three commonly used prognostic scores which predicted mortality, the Pneumonia Severity Index (PSI) had the highest AUROC of 0.81. However, this index was used for predicting 30-day mortality specifically among patients with community acquired pneumonia [56]. When trained and tested on the MIMIC dataset, the XGBoost predictor obtained AUROCs of 0.87, 0.78, 0.77, and 0.73 for mortality prediction on pneumonia patients at 12-, 24-, 48-, and 72- hour windows, respectively (Fig. 1, Table 7). When trained and tested on the community hospital dataset, the XGBoost predictor obtained AUROCs of 0.91, 0.90, 0.86, and 0.87 for mortality prediction on COVID-19 PCR positive patients at 12-, 24-, 48-, and 72- hour windows, respectively (Table 9). The algorithm outperformed the qSOFA, MEWS and CURB-65 risk scores at all prediction windows (Table 9). This ML algorithm can be used to automatically monitor patient populations without incurring additional data entry or impeding clinical workflow, and patient alerts can be set to desired thresholds for sensitivity and specificity of alerting as needed in different care settings. As a clinical decision support tool, the machine learning algorithm presented in this study may assist clinicians in navigating the complexities surrounding COVID-19 related resource allocation. During a pandemic, accurate triage of patients is essential for improving patient outcomes, effectively utilizing clinical care teams, and efficiently allocating resources. The benefit of our approach is that when our machine learning algorithm is implemented in clinical ICU settings, healthcare providers can potentially identify patients at risk of significant COVID-19 related decompensation before they deteriorate, thus facilitating effective resource allocation and identifying those patients most likely to benefit from increased care. There are several limitations to our study. The ML algorithm developed on the MIMIC dataset used only data from the ICU. Therefore, further research is required to evaluate performance of the algorithm in other patient care settings. Further, because the algorithm only utilized laboratory data and vital signs as inputs, it did not account for actions undertaken by the care team. These actions could signify aggressive treatment or withdrawal of treatment and could cause changes to algorithm inputs, potentially leading to variations in the algorithm's prediction score. On one hand, incorporating care team actions into algorithm inputs could be useful feedback to the care team in the sense that it may aid them in determining whether a given intervention was harmful or beneficial. On the other hand, accounting for actions undertaken by the care team may complicate the interpretation of what it means to “anticipate” mortality, given that the current state of knowledge of the care team is unknown. Finally, because this is a retrospective study, we cannot determine the performance of the mortality prediction algorithm in a prospective clinical setting. Prospective validation is required to determine how clinicians may respond to risk predictions as well as whether predictions can affect patient outcomes or resource allocation.

Conclusion

The ML algorithm presented in this study is a useful predictive tool for anticipating patient mortality at clinically useful windows up to 72 h in advance, and capable of accurate mortality prediction for COVID-19, pneumonia, and mechanically ventilated patients.

Patient and public involvement statement

Patients and the public were not involved in the design and conduct of the study, choice of outcome measures, or recruitment to the study due to the nature of data collection.

Dissemination declaration

Transparency declaration: RD affirms that the manuscript is an honest, accurate, and transparent account of the study being reported; that no important aspects of the study have been omitted; and that any discrepancies from the study as originally planned (and, if relevant, registered) have been explained.

Funding sources

N/A.

Ethical approval

Data has been deidentified and, as such, does not constitute human subjects research.

Conflicts of interest

All authors who have affiliations listed with Dascena (San Francisco, California, USA) are employees or contractors of Dascena.

Trial registration

This study has been registered on ClinicalTrials.gov under study number NCT04358510.

Provenance and peer review

Not commissioned, externally peer reviewed.

Guarantor

The Guarantor is the one or more people who accept full responsibility for the work and/or the conduct of the study, had access to the data, and controlled the decision to publish

CRediT authorship contribution statement

Logan Ryan: Methodology, Investigation. Carson Lam: Conceptualization, Methodology, Formal analysis, Writing - original draft, Supervision. Samson Mataraso: Conceptualization, Methodology, Formal analysis, Writing - original draft, Supervision. Angier Allen: Investigation, Writing - original draft. Abigail Green-Saxena: Writing - original draft. Emily Pellegrini: Writing - original draft. Jana Hoffman: Conceptualization, Methodology, Formal analysis, Writing--Original Draft, Supervision. Christopher Barton: Supervision. Ritankar Das: had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis, Conceptualization, Methodology, Formal analysis, Writing - original draft, Supervision.
  46 in total

1.  Strategic national stockpile: overview and ventilator assets.

Authors:  Eileen M Malatino
Journal:  Respir Care       Date:  2008-01       Impact factor: 2.258

2.  Experimental studies on performance of ventilators stored in the Strategic National Stockpile.

Authors:  Ali Mehrabi; Patricia Dillon; Kyle Kelly; Kristina Hitchins; Eileen Malatino; Susan Gorman; Madhusoodana Nambiar; Hilda Scharen
Journal:  J Emerg Manag       Date:  2018 Sep/Oct

3.  Predictive performance of quick Sepsis-related Organ Failure Assessment for mortality and ICU admission in patients with infection at the ED.

Authors:  Jun-Yu Wang; Yun-Xia Chen; Shu-Bin Guo; Xue Mei; Peng Yang
Journal:  Am J Emerg Med       Date:  2016-06-07       Impact factor: 2.469

4.  Serial evaluation of the SOFA score to predict outcome in critically ill patients.

Authors:  F L Ferreira; D P Bota; A Bross; C Mélot; J L Vincent
Journal:  JAMA       Date:  2001-10-10       Impact factor: 56.272

5.  Development and Validation of a Mortality Prediction Model for Patients Receiving 14 Days of Mechanical Ventilation.

Authors:  Catherine L Hough; Ellen S Caldwell; Christopher E Cox; Ivor S Douglas; Jeremy M Kahn; Douglas B White; Eric J Seeley; Shrikant I Bangdiwala; Gordon D Rubenfeld; Derek C Angus; Shannon S Carson
Journal:  Crit Care Med       Date:  2015-11       Impact factor: 7.598

6.  Prediction of In-hospital Mortality in Emergency Department Patients With Sepsis: A Local Big Data-Driven, Machine Learning Approach.

Authors:  R Andrew Taylor; Joseph R Pare; Arjun K Venkatesh; Hani Mowafi; Edward R Melnick; William Fleischman; M Kennedy Hall
Journal:  Acad Emerg Med       Date:  2016-02-13       Impact factor: 3.451

7.  Machine learning landscapes and predictions for patient outcomes.

Authors:  Ritankar Das; David J Wales
Journal:  R Soc Open Sci       Date:  2017-07-26       Impact factor: 2.963

8.  Prognostic Prediction Value of qSOFA, SOFA, and Admission Lactate in Septic Patients with Community-Acquired Pneumonia in Emergency Department.

Authors:  Haijiang Zhou; Tianfei Lan; Shubin Guo
Journal:  Emerg Med Int       Date:  2020-04-06       Impact factor: 1.112

9.  Assessing the Capacity of the US Health Care System to Use Additional Mechanical Ventilators During a Large-Scale Public Health Emergency.

Authors:  Adebola Ajao; Scott V Nystrom; Lisa M Koonin; Anita Patel; David R Howell; Prasith Baccam; Tim Lant; Eileen Malatino; Margaret Chamberlin; Martin I Meltzer
Journal:  Disaster Med Public Health Prep       Date:  2015-10-09       Impact factor: 1.385

10.  Validation of Prediction Models for Critical Care Outcomes Using Natural Language Processing of Electronic Health Record Data.

Authors:  Ben J Marafino; Miran Park; Jason M Davies; Robert Thombley; Harold S Luft; David C Sing; Dhruv S Kazi; Colette DeJong; W John Boscardin; Mitzi L Dean; R Adams Dudley
Journal:  JAMA Netw Open       Date:  2018-12-07
View more
  17 in total

1.  [Long short-term memory and Logistic regression for mortality risk prediction of intensive care unit patients with stroke].

Authors:  Y H Deng; Y Jiang; Z Y Wang; S Liu; Y X Wang; B H Liu
Journal:  Beijing Da Xue Xue Bao Yi Xue Ban       Date:  2022-06-18

2.  Machine Learning Models to Predict In-Hospital Mortality among Inpatients with COVID-19: Underestimation and Overestimation Bias Analysis in Subgroup Populations.

Authors:  Javad Zarei; Amir Jamshidnezhad; Maryam Haddadzadeh Shoushtari; Ali Mohammad Hadianfard; Maria Cheraghi; Abbas Sheikhtaheri
Journal:  J Healthc Eng       Date:  2022-06-23       Impact factor: 3.822

3.  Design of an artificial neural network to predict mortality among COVID-19 patients.

Authors:  Mostafa Shanbehzadeh; Raoof Nopour; Hadi Kazemi-Arpanahi
Journal:  Inform Med Unlocked       Date:  2022-05-29

4.  Using artificial intelligence technology to fight COVID-19: a review.

Authors:  Yong Peng; Enbin Liu; Shanbi Peng; Qikun Chen; Dangjian Li; Dianpeng Lian
Journal:  Artif Intell Rev       Date:  2022-01-03       Impact factor: 9.588

5.  Comparing machine learning algorithms for predicting COVID-19 mortality.

Authors:  Khadijeh Moulaei; Mostafa Shanbehzadeh; Zahra Mohammadi-Taghiabad; Hadi Kazemi-Arpanahi
Journal:  BMC Med Inform Decis Mak       Date:  2022-01-04       Impact factor: 2.796

6.  Machine Learning Based Prediction of COVID-19 Mortality Suggests Repositioning of Anticancer Drug for Treating Severe Cases.

Authors:  Thomas Linden; Frank Hanses; Daniel Domingo-Fernández; Lauren Nicole DeLong; Alpha Tom Kodamullil; Jochen Schneider; Maria J G T Vehreschild; Julia Lanznaster; Maria Madeleine Ruethrich; Stefan Borgmann; Martin Hower; Kai Wille; Torsten Feldt; Siegbert Rieg; Bernd Hertenstein; Christoph Wyen; Christoph Roemmele; Jörg Janne Vehreschild; Carolin E M Jakob; Melanie Stecher; Maria Kuzikov; Andrea Zaliani; Holger Fröhlich
Journal:  Artif Intell Life Sci       Date:  2021-12-17

7.  Early Prediction of COVID-19 Ventilation Requirement and Mortality from Routinely Collected Baseline Chest Radiographs, Laboratory, and Clinical Data with Machine Learning.

Authors:  Abdulrhman Fahad Aljouie; Ahmed Almazroa; Yahya Bokhari; Mohammed Alawad; Ebrahim Mahmoud; Eman Alawad; Ali Alsehawi; Mamoon Rashid; Lamya Alomair; Shahad Almozaai; Bedoor Albesher; Hassan Alomaish; Rayyan Daghistani; Naif Khalaf Alharbi; Manal Alaamery; Mohammad Bosaeed; Hesham Alshaalan
Journal:  J Multidiscip Healthc       Date:  2021-07-30

Review 8.  A State-of-the-Art Survey on Artificial Intelligence to Fight COVID-19.

Authors:  Md Mohaimenul Islam; Tahmina Nasrin Poly; Belal Alsinglawi; Ming Chin Lin; Min-Huei Hsu; Yu-Chuan Jack Li
Journal:  J Clin Med       Date:  2021-05-02       Impact factor: 4.241

9.  Computational Intelligence-Based Model for Mortality Rate Prediction in COVID-19 Patients.

Authors:  Irfan Ullah Khan; Nida Aslam; Malak Aljabri; Sumayh S Aljameel; Mariam Moataz Aly Kamaleldin; Fatima M Alshamrani; Sara Mhd Bachar Chrouf
Journal:  Int J Environ Res Public Health       Date:  2021-06-14       Impact factor: 3.390

10.  A Real-Time Artificial Intelligence-Assisted System to Predict Weaning from Ventilator Immediately after Lung Resection Surgery.

Authors:  Ying-Jen Chang; Kuo-Chuan Hung; Li-Kai Wang; Chia-Hung Yu; Chao-Kun Chen; Hung-Tze Tay; Jhi-Joung Wang; Chung-Feng Liu
Journal:  Int J Environ Res Public Health       Date:  2021-03-08       Impact factor: 3.390

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.