Literature DB >> 33345948

Development and Prospective Validation of a Deep Learning Algorithm for Predicting Need for Mechanical Ventilation.

Supreeth P Shashikumar¹, Gabriel Wardi², Paulina Paul¹, Morgan Carlile³, Laura N Brenner⁴, Kathryn A Hibbert⁴, Crystal M North⁴, Shibani S Mukerji⁵, Gregory K Robbins⁶, Yu-Ping Shao⁵, M Brandon Westover⁵, Shamim Nemati¹, Atul Malhotra⁷.

Abstract

BACKGROUND: Objective and early identification of hospitalized patients, and particularly those with novel coronavirus disease 2019 (COVID-19), who may require mechanical ventilation (MV) may aid in delivering timely treatment. RESEARCH QUESTION: Can a transparent deep learning (DL) model predict the need for MV in hospitalized patients and those with COVID-19 up to 24 h in advance? STUDY DESIGN AND METHODS: We trained and externally validated a transparent DL algorithm to predict the future need for MV in hospitalized patients, including those with COVID-19, using commonly available data in electronic health records. Additionally, commonly used clinical criteria (heart rate, oxygen saturation, respiratory rate, Fio2, and pH) were used to assess future need for MV. Performance of the algorithm was evaluated using the area under receiver operating characteristic curve (AUC), sensitivity, specificity, and positive predictive value.
RESULTS: We obtained data from more than 30,000 ICU patients (including more than 700 patients with COVID-19) from two academic medical centers. The performance of the model with a 24-h prediction horizon at the development and validation sites was comparable (AUC, 0.895 vs 0.882, respectively), providing significant improvement over traditional clinical criteria (P < .001). Prospective validation of the algorithm among patients with COVID-19 yielded AUCs in the range of 0.918 to 0.943.
INTERPRETATION: A transparent DL algorithm improves on traditional clinical criteria to predict the need for MV in hospitalized patients, including in those with COVID-19. Such an algorithm may help clinicians to optimize timing of tracheal intubation, to allocate resources and staff better, and to improve patient care.

Entities: CellLine Chemical Disease Gene Species

Keywords: artificial intelligence; artificial respiration; coronavirus; deep learning; lung

Year: 2020 PMID： 33345948 PMCID： PMC8027289 DOI： 10.1016/j.chest.2020.12.009

Source DB: PubMed Journal: Chest ISSN： 0012-3692 Impact factor: 9.410

The novel coronavirus pandemic, caused by severe acute respiratory syndrome coronavirus 2, has strained global health care systems and the supply of mechanical ventilators, because approximately 3% to 79% of hospitalized patients require invasive mechanical ventilation (MV).3, 4, 5, 6, 7 Major concern exists regarding whether the supply of mechanical ventilators is insufficient for certain regions., Appropriate triage and identification of patients at high risk for respiratory failure may help hospital systems to guide resource allocation better and to triage patients into treatment cohorts., Additionally, identification of patients who may need intubation allows health care providers to prepare for endotracheal intubation (eg, by moving the patient to a negative pressure room), thereby preventing an emergent procedure that is inherently high risk and aerosol generating.11, 12, 13, 14 Related to fears of contamination, many providers decided to intubate early on the assumption that patients eventually will need MV so as to avoid crash intubation. Others have called for more judicious use of MV and to avoid high positive end-expiratory pressure (PEEP) in poorly recruitable lungs, which tends to result in severe hemodynamic impairment and fluid retention. Both patient self-inflicted lung injury and ventilator-associated lung injury could exacerbate lung inflammation and biotrauma. As such, objective and consistent methods to determine who and when to intubate, how to optimize treatment parameters, and when to extubate patients safely are needed to lower the long-term complications and mortality rate in this very sick patient population. The field of machine learning (ML) refers to a subset of artificial intelligence that automates analytical model building to identify patterns in data to predict outcomes. In particular, ML algorithms are powerful tools for the detection of complicated and nonlinear outcomes when traditional statistical methods (eg, linear regression or decision trees) are overrun by a large number of variables. Deep learning (DL) models, a branch of ML, use multiple layers of processing (known as artificial neural networks), which can capture nonlinearity and complex interactions among clinical variables. Prior studies using DL-based algorithms have been shown to improve diagnostic accuracy and to predict outcomes across a variety of clinical scenarios.19, 20, 21, 22, 23, 24, 25, 26 Such algorithms can interpret and make useful predictions from large and dynamic data available in the electronic health record (EHR). Recently, we have shown ML algorithms to be superior to traditional metrics in the prediction of sepsis. Current scoring systems that predict respiratory failure and need for MV are limited by small sample size and have low predictive power. Frontline providers have called for urgent development of new warning systems for patients in whom conservative management is likely to fail and who will require MV. No reliable models exist to predict the need for MV in patients with COVID-19; therefore, we sought to use dynamic EHR data at hourly resolution to determine if such an approach would provide value over traditional methods such as the ratio of pulse oximetry/Fio2 to respiratory rate (ROX) ROX index or simple regression-based risk scores. In this study, we trained and prospectively validated a DL algorithm that predicts the need for invasive MV in hospitalized patients and those with known or suspected coronavirus disease 2019 (COVID-19) up to 24 h in advance of tracheal intubation.

Methods

Development and reporting of the prediction model presented in this study was in accordance with the checklist provided by the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis Consortium.

Patient Population and Outcomes

An observational, multicenter cohort consisting of all adult patients (≥ 18 years of age) admitted to the ICU between January 1, 2016, and January 15, 2020, at two large urban academic health centers, University of California San Diego Health (UCSD) and Massachusetts General Hospital (MGH), was considered in this study. Throughout the article, we refer to the respective hospital systems as the development and the validation sites. Additionally, both datasets included prospectively collected temporal validation cohorts, involving known or suspected patients with COVID-19 between February 1 and May 4, 2020 (because of expansion of ICU care to nontraditional floors, the MGH cohort included all hospitalized patients with COVID-19 independent of explicit indication of ICU level of care). Patients were excluded if (1) their length of stay was less than 4 h or more than 20 days, or (2) the start of invasive MV occurred before hour 4 of ICU admission (or hospitalization for the MGH COVID-19 cohort), or (3) if they received noninvasive MV. Institutional review board approval of the study was obtained at both sites with a waiver of informed consent (UCSD Identifier: 191098; MGH Identifier: 2013P001024). Data from both sites were abstracted into a clinical data repository (Epic Clarity; Epic Systems) and included vital signs, laboratory values, PEEP, Sequential Organ Failure Assessment scores, Charlson comorbidity index scores, demographics, and length of stay. Data were available to the treating clinician at the time of entry into the electronic health record electronic health record (EHR) and input into VentNet. Specific inputs to the model were prespecified and included 40 clinical variables (34 dynamic and six demographic variables) that were selected based on their availability in EHRs across the two hospitals considered in our study, similar to our previous work in the PhysioNet 2019 Challenge. These included vital signs measurements (heart rate, pulse oximetry, temperature, systolic BP, mean arterial pressure, diastolic BP, respiration rate, and end-tidal CO2), laboratory measurements (bicarbonate, measure of excess bicarbonate, Fio2, pH, Pco2, Po2, aspartate transaminase, BUN, alkaline phosphatase, calcium, chloride, creatinine, bilirubin direct, serum glucose, lactic acid, magnesium, phosphate, potassium, total bilirubin, troponin, hematocrit, hemoglobin, partial thromboplastin time, leukocyte count, fibrinogen, and platelets), and demographic variables (e-Table 1). Additionally, for every vital sign and laboratory variable, the slope of change since its last measurement was included as an additional variable. All variables were organized into 1-h nonoverlapping time bins to accommodate different sampling frequencies of available data. All the variables with sampling frequencies higher than once per hour were resampled uniformly into 1-h time bins by taking the median values if multiple measurements were available. Variables were updated hourly when new data became available; otherwise, the old values were kept (sample-and-hold interpolation). Mean imputation was used to replace all remaining missing values (mainly at the start of each record). To assist in model training, features in the development site training set first underwent normality transformations and then were standardized by subtracting the mean and dividing by the SD. All other datasets were normalized using the mean and SD computed from the development site training set. Use of MV was defined as the first occurrence of simultaneous recording of Fio2 and PEEP. For prediction purposes, we defined our outcome of interest as continuous MV for at least 24 h or MV followed by death. Patients who were placed on a mechanical ventilator within 3 h of admission were excluded because our model makes its first prediction at hour 4 of ICU admission (or hospitalization in the case of the MGH COVID-19 cohort); this allowed for the collection and processing of laboratory samples required by the algorithm to make accurate predictions.

Model Development and Statistical Analyses

VentNet (a two-layer feedforward neural network of size 40 and 25) was trained to predict the onset of MV 24 h in advance, starting from hour 4 into admission up to the time of MV or end of hospitalization. Additionally, the predictions from VentNet were calibrated using isotonic regression. VentNet was implemented in TensorFlow version 1.12.0 (Google Brain) and machine learning frameworks for Python version 2.7 (Python Software Foundation). The parameters of VentNet were initialized randomly and optimized on the training data from the development site using the gradient descent algorithm with L1-L2 regularization to avoid overfitting. Model interpretability was achieved by calculating the relevance score of each input variable for every predicted risk score (e-Appendix 1). The output of VentNet was a probability score between 0 and 1. The decision threshold was chosen corresponding to an 80% sensitivity level. Any score beyond this threshold (0.03) implied that in the given prediction window, the algorithm predicted that the patient would undergo tracheal intubation within the prespecified period. A score of less than the decision threshold meant that VentNet did not predict tracheal intubation within the prediction window. Within the development cohort, 10-fold cross-validation (with an 80%-20% split within each fold) was used for training and testing purposes. We report median and interquartile values of the area under the receiver operating characteristic curve (AUC; and specificity at 80% sensitivity) for the held-out testing sets within the development cohort (details on precision-recall curves are presented in e-Appendix 1). AUCs are reported under an end-user clinical response policy in which the model is silenced for 6 h after an alarm is fired, and correct alarms that are fired up to 72 h before onset of MV are not penalized. The best performing model at the development site then was fixed and used for evaluation on the validation cohort and the prospectively collected cohort of COVID-19 patients. Comparison of receiver operating characteristic curves was performed using DeLong’s method. All continuous variables are reported as medians with 25% and 75% interquartile ranges (IQRs). Binary variables are reported as percentages.

Results

Patient Characteristics

After applying the exclusion criteria, a total of 18,528 and 3,888 ICU patients were included in the development and validation cohorts, respectively. Patient characteristics including the percentage of ventilated patients before and after application of exclusion criteria are presented in Table 1 and e-Table 2. Additionally, data from 26 COVID-19 patients from the development site (UCSD) and 402 patients from the validation site (MGH) were used for prospective validation (Table 2, e-Table 3).

Table 1

Demographic Comparisons of the UCSD and MGH General ICU Cohorts

Demographics	UCSD (Development Site)		MGH (Validation Site)
Demographics	Nonventilated	Ventilated	Nonventilated	Ventilated
Patients	17,723 (95.6)	805 (4.4)	3,602 (92.6)	286 (7.4)
Age, y	61.3 (48.3-72.6)	61.2 (48.6-71.2)	62 (51-72)	64 (53-74)
Male sex	10,421	521	1,948	173
Race
White	9,659	440	2,925	229
Black	1,330	60	191	19
Asian	1,081	43	119	8
ICU LOS, h	48.3 (26.7-95.9)	221.5 (113.8-386.9)	50.9 (27.2-98.0)	183.7 (92.2-309.9)
CCI	3 (2-7)	3 (1-6)	4 (2-6)	4 (2-6)
SOFA score	0.6 (0-1.8)	3.3 (1.9-5.1)	0.9 (0.3-2.1)	4.1 (2.5-6.3)
Inpatient mortality	869	329	223	109
Time from ICU admission to start of ventilation, h	N/A	20 (7.8-45)	N/A	13 (6-33)

Data are presented as No. (%), No., or median (interquartile range), unless otherwise indicated. CCI = Charlson comorbidity index; LOS = length of stay; MGH = Massachusetts General Hospital; N/A = not applicable; SOFA = Sequential Organ Failure Assessment; UCSD = University of California San Diego Health. Patients were excluded if (1) their LOS was less than 4 h or more than 20 d or (2) the start of mechanical ventilation was before hour 4 of ICU admission.

Table 2

Demographic Comparisons of the Prospective Validation Cohorts Consisting of COVID-19 Patients at UCSD and MGH

Demographics	UCSD COVID-19		MGH COVID-19
Demographics	Nonventilated	Ventilated	Nonventilated	Ventilated
Patients	16 (61.5)	10 (38.5)	343 (85.3)	59 (14.7)
Age, y	57.6 (45.2-81.6)	52.8 (42.3-65.9)	65 (47-78)	61.5 (50-73)
Male sex	9	7	176	40
Race
White	7	< 5	207	30
Black	< 5	< 5	46	10
Asian	< 5	< 5	13	< 5
ICU LOS, h	51.4 (37.7-128.4)	368.7 (247.0-430.0)	131 (87.5-230)	258.5 (141-396)
CCI	4 (2.8-5.3)	2 (1-4.3)	3 (1-6)	3 (1-5)
SOFA	1.3 (0-2.1)	2.5 (0-5.4)	0.1 (0-0.7)	3.0 (1.6-4.7)
Inpatient mortality	< 5	< 5	24	14
Time from ICU admission to start of ventilation, h	N/A	23 (10-63)	N/A	49.5 (20.6-143)

Data are presented as No. (%), No., or median (interquartile range), unless otherwise indicated. Patients were excluded if (1) their LOS was less than 4 h or more than 20 d or (2) the start of mechanical ventilation was before hour 4 of ICU admission. CCI = Charlson comorbidity index; COVID-19 = coronavirus disease 2019; LOS = length of stay; MGH = Massachusetts General Hospital; N/A = not applicable; SOFA = Sequential Organ Failure Assessment; UCSD = University of California San Diego Health.

Demographic Comparisons of the UCSD and MGH General ICU Cohorts Data are presented as No. (%), No., or median (interquartile range), unless otherwise indicated. CCI = Charlson comorbidity index; LOS = length of stay; MGH = Massachusetts General Hospital; N/A = not applicable; SOFA = Sequential Organ Failure Assessment; UCSD = University of California San Diego Health. Patients were excluded if (1) their LOS was less than 4 h or more than 20 d or (2) the start of mechanical ventilation was before hour 4 of ICU admission. Demographic Comparisons of the Prospective Validation Cohorts Consisting of COVID-19 Patients at UCSD and MGH Data are presented as No. (%), No., or median (interquartile range), unless otherwise indicated. Patients were excluded if (1) their LOS was less than 4 h or more than 20 d or (2) the start of mechanical ventilation was before hour 4 of ICU admission. CCI = Charlson comorbidity index; COVID-19 = coronavirus disease 2019; LOS = length of stay; MGH = Massachusetts General Hospital; N/A = not applicable; SOFA = Sequential Organ Failure Assessment; UCSD = University of California San Diego Health.

Model Performance on General ICU Populations

The median 10-fold cross-validated AUC on the held-out development site testing set for prediction horizon of 24 h was 0.886 (IQR, 0.878-0.892), and the specificity when measured at the 80% sensitivity level was 0.824 (IQR, 0.818-0.838). We observed a drop in AUC when the prediction horizon increased from 6 h to 48 h (from 0.950 [IQR, 0.948-0.952] to 0.845 [IQR, 0.838-0.869], respectively) (e-Fig 1). Comparisons of the VentNet algorithm against the ROX index and a logistic regression model (baseline model 1) based on commonly used clinical variables (namely, heart rate, oxygen saturation, respiratory rate, and pH) are shown in Figure 1. VentNet significantly outperformed the baseline models (P < .001) on the development site testing set (AUC, 0.895 vs 0.738 and 0.769, respectively) (Fig 1A). Performance of the VentNet on the external validation cohort (Fig 1B) was comparable (AUC, 0.882 vs 0.782 and 0.773, respectively). See e-Figure 1A, 1B, and e-Figure 2A, 2B, for additional information, including precision-recall curves. Additionally, the calibration plots of VentNet on the development site testing set and the external validation cohort are shown in e-Figures 3, 4, 5A, 5B.

Figure 1

A-D, Line graphs showing the performance of the proposed and baseline models on the development and validation ICU cohorts and the two COVID-19 prospective validation cohorts. For a prediction horizon of 24 h, comparison of the proposed model vs two baseline models is shown on the development and validation ICU cohorts (A, B; P < .001) and prospective validation cohorts of patients with COVID-19 (C, D; P < .001). The baseline model 1 was a logistic regression model based on commonly used clinical variables (namely, heart rate, oxygen saturation, respiratory rate, and pH). AUC = area under the receiver operating characteristic curve; COVID-19 = coronavirus disease 2019; MGH = Massachusetts General Hospital; ROX = ratio of pulse oximetry/Fio2 to respiratory rate; UCSD = University of California San Diego Health. Figure 2A, 2B, shows heatmaps of the top 15 variables contributing to the increase in risk score up to 12 h before intubation for the development and the validation cohorts, respectively. Some of the most predictive features included respiratory rate, heart rate, temperature, chloride, oxygen saturation, platelet count, pH, and Fio2, among others. e-Figure 3 includes an illustrative example of clinical trajectory of a patient in the ICU, as well as the respective model predictions and the top contributing factors. Note that as shown in e-Figure 4, a given risk factor can contribute to an increase in risk score by taking values either above or below the clinical reference range.

Figure 2

A-C, Heatmaps showing the population-level plot of top contributing factors to the increase in model risk score. The x-axis represents hours before onset time of mechanical ventilation. The y-axis represents the top factors (sorted by the magnitude of relevance score) across the patient populations at the development site (A), external validation site (B), and prospective COVID-19 cohort (C). Only dynamically changing variables are shown. Among the static factors, duration of time in hospital (to the current time) and sex (male) consistently were among the top factors. The heatmap shows the percentage of ventilated patients for whom a given variable was an important contributor to the risk score up to 12 h before intubation. See e-Appendix 1 and e-Figure 4 for more details. AST = aspartate transaminase; Δ = slope of change since last measurement; HR = heart rate; O2Sat = oxygen saturation; Resp = respiratory; SaO2 = saturation of arterial oxygen; Temp = temperature.

Model Performance on COVID-19 Populations

VentNet achieved superior performance when applied prospectively to the UCSD and MGH cohorts of patients with COVID-19 (AUC, 0.943 and 0.919, respectively). The corresponding specificities measured at 80% sensitivity level were 88.8% and 84.5%, respectively. See Figure 1C, 1D, and e-Figure 2C, 2D, for more information. Across both cohorts, performance of the VentNet was significantly better than the ROX score and the baseline model 1 (P < .001) (Fig 1, e-Fig 2). Additionally, the calibration plots of VentNet on both the UCSD and MGH COVID-19 cohorts are shown in e-Figure 5C, 5D. Figure 2C shows a heatmap of the top 15 variables contributing to the increase in risk score up to 12 h before intubation for the COVID-19 cohort at the validation site. In addition to features listed above, other factors frequently contributing to the risk score in the COVID-19 population included total bilirubin, aspartate aminotransferase, fibrinogen, and phosphate, among others. Figure 3 includes an illustrative example of the clinical trajectory of a COVID-19 patient, as well as the respective model predictions and the top contributing factors.

Figure 3

Illustrative example of a patient’s trajectory over a 67-h window preceding intubation. The proposed algorithm crossed the prediction threshold at around hour 45 (highlighted by the red arrow), roughly 24 h before the onset time of mechanical ventilation. This 54-year-old woman with a history of hypothyroidism demonstrated fevers, chills, muscle aches, fever, sore throat, cough, and anosmia. She was admitted to the hospital for hypoxemia and a chest radiograph showing basilar patchy opacities present in the ED. She later showed positive results for COVID-19. Her oxygen requirements and work of breathing increased with a marked drop in oxygen saturation around hour 50. On the afternoon of the third day (hour 65) of hospitalization, she demonstrated rapidly progressive respiratory failure, was intubated, and was diagnosed with ARDS. For clarity, the top relevant features are shown every 5 h under the estimated risk scores. AST = aspartate transaminase; HR = heart rate; MAP = mean arterial pressure; O2Sat = oxygen saturation; Resp = respiratory; Temp = temperature.

Discussion

We demonstrated that a high-performing DL model (AUC > 0.88) can predict future need for MV 24 h in advance using commonly accessible EHR data. We externally validated all findings in patients from a separate academic center, as well as in two prospective cohorts of patients with COVID-19 (Fig 1). Because the proposed model can inform health care providers of the most relevant features contributing to the need for MV (Figs 2, 3), it provides an interpretable algorithm to aid clinicians with optimizing timing of tracheal intubation, better allocation of resources, and improving patient care. Importantly, the goal of algorithms such as this is not to replace clinical judgement, but rather to complement bedside care by providing predictions that can augment decision-making. The COVID-19 pandemic has placed important strains on the health care system as the surge and long tail of critically ill patients continues to impact resource availability. Despite having the highest number of ventilators and critical care beds per capita among developed countries, MV in the United States is still a finite resource., Frontline providers in the pandemic noted that traditional risk stratification tools such as Modified Early Warning Score and Quick Sequential Organ Failure Assessment score are inadequate to predict respiratory failure accurately in patients with COVID-19. Recent data have shown that the ROX index has moderate usefulness for predicting tracheal intubation in patients with COVID-19. However, VentNet showed a significantly higher AUC at all prediction windows compared with the ROX index. To our knowledge, this is the first study to demonstrate robust performance of a DL algorithm for early prediction of the need for MV in patients hospitalized with COVID-19. We designed VentNet to be implemented in real time to augment clinician decision-making. All data input into VentNet were available to clinicians at the time of entry into the EHR. Such an algorithm can be implemented into the EHR, and we are actively pursuing this approach at our institutions. Previously, ML algorithms have been implemented into clinical workflow with improved clinical, statistical, or economic usefulness. Additionally, we have included varying prediction windows to illustrate how VentNet performs at various time frames to illustrate potential uses (e-Fig 1). A shorter prediction horizon (eg, 6 h) may provide more clinically actionable information, whereas a longer prediction horizon (eg, 24-72 h) may inform population-level resource allocation. As anticipated, we observed a progressive drop in AUC when the prediction horizon increased from 6 to 48 h (from 0.950 to 0.845, respectively). Our findings are important for a number of reasons. First, we developed and externally validated an interpretable DL algorithm that predicts the need for MV using commonly accessible clinical variables. Such findings could be used to facilitate optimal triage, more timely management, and resource use. Second, we showed with high predictive value the ability of our algorithm to function in different geographic settings in the United States and in varying cohorts. Third, our model used a sequential predictive approach such that ongoing clinical status was assessed to make important clinical predictions (see Fig 3 and e-Fig 3 for illustrative examples). This strategy has advantages over a baseline assessment (eg, the Modified Early Warning Score and Quick Sequential Organ Failure Assessment) given the dynamic nature of critically ill patients. This approach paves the way for future implementation in real time at the point of care. Fourth, as shown in e-Tables 4 and 5 VentNet’s predictions do not rely heavily on a single or a handful of clinical variables, and as such are more robust to data missingness. Thus, our model has both generalizability and portability and may have an impact not only on the current COVID-19 pandemic, but also on in the expected second wave and beyond. For a 24-h ahead prediction horizon, specificity of the model (on the MGH COVID-19 cohort) at 50% sensitivity was 96.5% (with a positive predictive value of 35.3%) vs 98.9% (with a positive predictive value of 39.2%) for 6 h. In terms of model optimization, one could argue the value in maximizing sensitivity, specificity, or both. In particular, during the COVID-19 pandemic, it has been argued that the avoidance of emergent procedures is a priority, because clearly a risk of viral transmission to providers exists and delays in intubation increase the risk of cardiovascular collapse., Thus, a highly sensitive model may help to minimize the chance of a crash intubation, which leads to poor clinical outcomes and may put providers at risk of unnecessary viral exposure. However, a highly specific model may be used to avoid unnecessary intubation, and the associated risks of ventilator-induced lung injury, ventilator-associated pneumonia, and sedation and associated delirium., Additionally, a shorter prediction horizon (eg, 6 h) may provide more clinically actionable information, whereas a longer prediction horizon (eg, 24-72 h) may inform population-level resource allocation. Despite its many strengths, this study includes a number of limitations. First, we defined the need for invasive MV in the EHR database based on the presence of PEEP and Fio2 measurements. We believe that this definition is robust based on considerable experience, but acknowledge that some mislabeling (eg, noninvasive MV) could occur in any EHR-based criteria. Similarly, the delivery of noninvasive oxygen gives variable oxygen to the patient depending on inspiratory flow demand and breathing pattern; thus, our model likely could improve with more specificity from the EHR. Nonetheless, we view such misclassification as random and do not expect that any potential misclassifications would improve our model’s performance artificially. Second, more generally, the proposed algorithm makes use of EHR data that was not designed originally for the analysis performed in our study. However, the superior performance of our algorithm, even in the presence of missing data, confirms its usefulness in a real-world clinical setting. Third, the COVID-19 pandemic has led to many changes in usual care, including potentially earlier intubation and avoidance of noninvasive ventilation, among others. Thus, one could argue that the need for intubation of these patients may be driven by factors unique to this epidemic. However, our model was trained and validated with historical data from major academic centers before the COVID-19 pandemic. Thus, the high observed AUCs speak to the robustness of the model, even in the face of rapid changes in practice patterns. Fourth, one could argue that the outcome of intubation and need for MV is somewhat subjective and could be a function of local practices or intrinsic bias inherent in such decisions. However, our ability to predict this clinically important outcome (need for MV) 6 to 24 h in advance suggests the value of this model. Moreover, traditional clinical parameters (heart rate, respiratory rate, pH, oxygen saturation) used to make intubation decisions performed relatively poorly compared with our DL algorithm (AUC, 0.769 vs 0.895 on the development site testing cohort). Despite these limitations, we view our new findings as robust and likely to lead to important advances in the care of COVID-19 patients. Furthermore, our approach may extend beyond the COVID-19 pandemic to guide optimal clinical care using advanced analytics as applied to the general ICU population, for example, to determine timing and selecting of appropriate pharmacologic therapies.

Interpretation

In this two-center observational study, we demonstrated that high-performance models can be constructed to predict the future need for MV in hospitalized patients, including those with COVID-19. By using open-source software, our validated algorithm is readily available for prospective studies aimed at determining the clinical usefulness of the proposed risk model for optimizing timing of tracheal intubation, better allocation of MV resources and staff, and improving patient care.

38 in total

1. Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs.

Authors: Varun Gulshan; Lily Peng; Marc Coram; Martin C Stumpe; Derek Wu; Arunachalam Narayanaswamy; Subhashini Venugopalan; Kasumi Widner; Tom Madams; Jorge Cuadros; Ramasamy Kim; Rajiv Raman; Philip C Nelson; Jessica L Mega; Dale R Webster
Journal: JAMA Date: 2016-12-13 Impact factor: 56.272

2. Staff safety during emergency airway management for COVID-19 in Hong Kong.

Authors: Jonathan Chun-Hei Cheung; Lap Tin Ho; Justin Vincent Cheng; Esther Yin Kwan Cham; Koon Ngai Lam
Journal: Lancet Respir Med Date: 2020-02-24 Impact factor: 30.700

Review 3. A guide to deep learning in healthcare.

Authors: Andre Esteva; Alexandre Robicquet; Bharath Ramsundar; Volodymyr Kuleshov; Mark DePristo; Katherine Chou; Claire Cui; Greg Corrado; Sebastian Thrun; Jeff Dean
Journal: Nat Med Date: 2019-01-07 Impact factor: 53.440

4. Scalable and accurate deep learning with electronic health records.

Authors: Alvin Rajkomar; Eyal Oren; Kai Chen; Andrew M Dai; Nissan Hajaj; Michaela Hardt; Peter J Liu; Xiaobing Liu; Jake Marcus; Mimi Sun; Patrik Sundberg; Hector Yee; Kun Zhang; Yi Zhang; Gerardo Flores; Gavin E Duggan; Jamie Irvine; Quoc Le; Kurt Litsch; Alexander Mossin; Justin Tansuwan; James Wexler; Jimbo Wilson; Dana Ludwig; Samuel L Volchenboum; Katherine Chou; Michael Pearson; Srinivasan Madabushi; Nigam H Shah; Atul J Butte; Michael D Howell; Claire Cui; Greg S Corrado; Jeffrey Dean
Journal: NPJ Digit Med Date: 2018-05-08

5. A clinically applicable approach to continuous prediction of future acute kidney injury.

Authors: Trevor Back; Christopher Nielson; Joseph R Ledsam; Shakir Mohamed; Nenad Tomašev; Xavier Glorot; Jack W Rae; Michal Zielinski; Harry Askham; Andre Saraiva; Anne Mottram; Clemens Meyer; Suman Ravuri; Ivan Protsyuk; Alistair Connell; Cían O Hughes; Alan Karthikesalingam; Julien Cornebise; Hugh Montgomery; Geraint Rees; Chris Laing; Clifton R Baker; Kelly Peterson; Ruth Reeves; Demis Hassabis; Dominic King; Mustafa Suleyman
Journal: Nature Date: 2019-07-31 Impact factor: 49.962

6. Beware of the second wave of COVID-19.

Authors: Shunqing Xu; Yuanyuan Li
Journal: Lancet Date: 2020-04-08 Impact factor: 79.321

7. Prediction of outcome of nasal high flow use during COVID-19-related acute hypoxemic respiratory failure.

Authors: Noémie Zucman; Jimmy Mullaert; Damien Roux; Oriol Roca; Jean-Damien Ricard
Journal: Intensive Care Med Date: 2020-07-15 Impact factor: 17.440

8. COVID-19 Does Not Lead to a "Typical" Acute Respiratory Distress Syndrome.

Authors: Luciano Gattinoni; Silvia Coppola; Massimo Cressoni; Mattia Busana; Sandra Rossi; Davide Chiumello
Journal: Am J Respir Crit Care Med Date: 2020-05-15 Impact factor: 21.405

9. Multicenter derivation and validation of an early warning score for acute respiratory failure or death in the hospital.

Authors: Mikhail A Dziadzko; Paul J Novotny; Jeff Sloan; Ognjen Gajic; Vitaly Herasevich; Parsa Mirhaji; Yiyuan Wu; Michelle Ng Gong
Journal: Crit Care Date: 2018-10-30 Impact factor: 9.097

10. COVID-19 pneumonia: different respiratory treatments for different phenotypes?

Authors: Luciano Gattinoni; Davide Chiumello; Pietro Caironi; Mattia Busana; Federica Romitti; Luca Brazzi; Luigi Camporota
Journal: Intensive Care Med Date: 2020-04-14 Impact factor: 17.440

4 in total

1. Patient-specific COVID-19 resource utilization prediction using fusion AI model.

Authors: Amara Tariq; Leo Anthony Celi; Janice M Newsome; Saptarshi Purkayastha; Neal Kumar Bhatia; Hari Trivedi; Judy Wawira Gichoya; Imon Banerjee
Journal: NPJ Digit Med Date: 2021-06-03

2. Wearable sensor derived decompensation index for continuous remote monitoring of COVID-19 diagnosed patients.

Authors: Dylan M Richards; MacKenzie J Tweardy; Steven R Steinhubl; David W Chestek; Terry L Vanden Hoek; Karen A Larimer; Stephan W Wegerich
Journal: NPJ Digit Med Date: 2021-11-08

3. Disease-Course Adapting Machine Learning Prognostication Models in Elderly Patients Critically Ill With COVID-19: Multicenter Cohort Study With External Validation.

Authors: Christian Jung; Behrooz Mamandipoor; Jesper Fjølner; Raphael Romano Bruno; Bernhard Wernly; Antonio Artigas; Bernardo Bollen Pinto; Joerg C Schefold; Georg Wolff; Malte Kelm; Michael Beil; Sigal Sviri; Peter V van Heerden; Wojciech Szczeklik; Miroslaw Czuczwar; Muhammed Elhadi; Michael Joannidis; Sandra Oeyen; Tilemachos Zafeiridis; Brian Marsh; Finn H Andersen; Rui Moreno; Maurizio Cecconi; Susannah Leaver; Dylan W De Lange; Bertrand Guidet; Hans Flaatten; Venet Osmani
Journal: JMIR Med Inform Date: 2022-03-31

4. Machine Learning Models to Predict 30-Day Mortality in Mechanically Ventilated Patients.

Authors: Jong Ho Kim; Young Suk Kwon; Moon Seong Baek
Journal: J Clin Med Date: 2021-05-18 Impact factor: 4.241

4 in total