Literature DB >> 33458681

Use of Machine Learning to Screen for Acute Respiratory Distress Syndrome Using Raw Ventilator Waveform Data.

Gregory B Rehm¹, Irene Cortés-Puch², Brooks T Kuhn², Jimmy Nguyen³, Sarina A Fazio², Michael A Johnson⁴, Nicholas R Anderson⁵, Chen-Nee Chuah⁶, Jason Y Adams².

Abstract

To develop and characterize a machine learning algorithm to discriminate acute respiratory distress syndrome from other causes of respiratory failure using only ventilator waveform data.
DESIGN: Retrospective, observational cohort study.
SETTING: Academic medical center ICU. PATIENTS: Adults admitted to the ICU requiring invasive mechanical ventilation, including 50 patients with acute respiratory distress syndrome and 50 patients with primary indications for mechanical ventilation other than hypoxemic respiratory failure.
INTERVENTIONS: None.
MEASUREMENTS AND MAIN RESULTS: Pressure and flow time series data from mechanical ventilation during the first 24-hours after meeting acute respiratory distress syndrome criteria (or first 24-hr of mechanical ventilation for non-acute respiratory distress syndrome patients) were processed to extract nine physiologic features. A random forest machine learning algorithm was trained to discriminate between the patients with and without acute respiratory distress syndrome. Model performance was assessed using the area under the receiver operating characteristic curve, sensitivity, specificity, positive predictive value, and negative predictive value. Analyses examined performance when the model was trained using data from the first 24 hours and tested using withheld data from either the first 24 hours (24/24 model) or 6 hours (24/6 model). Area under the receiver operating characteristic curve, sensitivity, specificity, positive predictive value, and negative predictive value were 0.88, 0.90, 0.71, 0.77, and 0.90 (24/24); and 0.89, 0.90, 0.75, 0.83, and 0.83 (24/6).
CONCLUSIONS: Use of machine learning and physiologic information derived from raw ventilator waveform data may enable acute respiratory distress syndrome screening at early time points after intubation. This approach, combined with traditional diagnostic criteria, could improve timely acute respiratory distress syndrome recognition and enable automated clinical decision support, especially in settings with limited availability of conventional diagnostic tests and electronic health records.

Entities: Chemical

Keywords: acute respiratory distress syndrome; classification; critical care; mechanical ventilation; population surveillance; respiratory failure

Year: 2021 PMID： 33458681 PMCID： PMC7803688 DOI： 10.1097/CCE.0000000000000313

Source DB: PubMed Journal: Crit Care Explor ISSN： 2639-8028

Acute respiratory distress syndrome (ARDS) is a severe form of hypoxemic respiratory failure present in up to 10% of ICU admissions and 25% of patients receiving mechanical ventilation (MV) (1). Patients with ARDS experience substantial morbidity and mortality, prolonged MV, high hospital-associated costs, and long-term physical and psychologic dysfunctions (1, 2). Poor outcomes in ARDS are associated with delayed and missed diagnosis, and suboptimal use of evidence-based therapies even by subspecialty-trained clinicians (1–4). Diagnostic criteria require arterial blood gas (ABG) measurement of the ratio of Pao2 to Fio2, referred to as “P/F,” and the presence of bilateral opacities on chest imaging, which may contribute to delayed and/or missed diagnosis given decreasing use of ABGs in critical care and poor interrater agreement in chest x-ray (CXR) interpretation (5–9). Even when the aforementioned clinical variables are available and ARDS criteria are met, clinicians identified only 65.3% of moderate and 78.5% of severe ARDS cases at any point during ICU admission, and only 34% of cases on the first day (1). Challenges associated with early identification of ARDS have led to the development of automated ARDS screening systems. Early examples of these so-called ARDS “sniffer” systems used rule-based processing of keywords extracted from CXR reports and screening of ABG data for qualifying P/F values (10, 11), whereas more recent studies have used natural language processing and machine learning (ML) algorithms (12). Although these studies demonstrated the potential of automated systems to improve the accuracy and timeliness of ARDS diagnosis, each approach was dependent on the availability of laboratory and radiographic data, local radiologist practices, and information technology that may not be present in all healthcare settings. When the generalizability of ARDS sniffers was tested in new patient populations, algorithm specificity declined substantially (13, 14). To address the limitations of data timeliness and availability, and the challenges of extracting information from imaging reports, we hypothesized that raw ventilator waveform data (VWD) could be used to screen newly intubated patients early in the course of moderate-severe ARDS. VWD is particularly appealing for syndrome surveillance as the data contain quantitative physiologic information and are continuously available from the start of MV. Previous studies using both rule-based and ML-based methods have shown that VWD can be used to automate the assessment of patient-ventilator asynchrony and exposure to excessive tidal volumes (TVs), and that VWD may be useful to monitor progression of ARDS (15–23). However, VWD has not yet been studied in the context of ARDS screening. We thus specifically hypothesized that an ML model, using only physiologic information extracted from raw VWD, would be able to discriminate between patients with and without ARDS at early time points in MV without the need for CXR, ABG, or other electronic health record (EHR)-derived data.

MATERIALS AND METHODS

Cohort Selection

All patient data were obtained as part of a prospective, Institutional Review Board approved study collecting raw VWD from mechanically ventilated adults admitted to the Medical ICU at the UC Davis Medical Center. Three clinicians (I.C.P., B.T.K., and J.Y.A.) performed retrospective chart review to identify the cause of respiratory failure in subjects from the VWD study cohort enrolled between 2015 and 2019. Subjects were split into two cohorts: 1) patients with confirmed moderate or severe ARDS diagnosed using Berlin consensus criteria within 7 days of intubation (5) and 2) patients with no suspicion of ARDS during their MV course, to avoid phenotypic ambiguity. Patients with chronic obstructive pulmonary disease and/or asthma were excluded from the ARDS patient cohort to minimize the risk of misclassifying ARDS as a result of concurrent non-ARDS acute or nonacute chronic lung disease-associated hypoxemia. Causes of ARDS and indications for MV in the non-ARDS cohort are shown in Table , and additional clinical information such as primary ventilator mode, depth of sedation, use of neuromuscular blockade, and rates of two common asynchronies are shown in Supplemental Digital Content Table 2 (http://links.lww.com/CCX/A480) and Supplemental Digital Content Figure 2 (http://links.lww.com/CCX/A479). Both cohorts required at least 1 hour of VWD collected in the first 24 hours after ARDS criteria were first met or after the start of MV (Supplemental Digital Content Table 1, http://links.lww.com/CCX/A480, and Supplemental Digital Content Fig. 1, http://links.lww.com/CCX/A479). All cases meeting study inclusion criteria were reviewed by two clinicians and only cases that were unambiguously considered to have ARDS or to not have ARDS were included in the study cohort. No sample size calculation was conducted for this study; however, sample size was guided by the range of cohort sizes in previous studies of VWD analysis (15–19) and to achieve a balanced dataset for ML model development as standard ML algorithms are biased toward the majority class, resulting in a higher misclassification rate for the minority class (24). Clinical Characteristics of Study Subjects ARDS = acute respiratory distress syndrome, COPD = chronic obstructive pulmonary disease, IQR = interquartile range.

VWD Acquisition and Featurization

We used our ventMAP software platform (18) to extract nine physiologic features from raw VWD, representing pressure and flow, sampled at 50 Hz, obtained from Puritan-Bennett model 840 ventilators (25). Features were extracted and aimed to capture relevant respiratory pathophysiology, while avoiding features that might strongly correlate with ARDS management such as TV or positive end-expiratory pressure (Supplemental Digital Content Table 3, http://links.lww.com/CCX/A480). Observations for each subject were derived by taking the median value of each feature across windows of 100 consecutive VWD breaths (approximately 5 min), with the 100-breath window size based on empirical sensitivity analysis. We processed all available VWD in the 24 hours after Berlin criteria were first met and 24 hours after the start of MV for patients with and without ARDS, respectively (Fig. ). However, not all patients had 24 hours of data based on variable start times of data acquisition. Each observation feature vector was tagged with a subject identifier and clinical class label of ARDS versus non-ARDS. Observations were excluded if a feature in the observation window met any of following: 1) not a number or an infinite value, 2) more than 50% of expected breaths in a window were missing, or 3) window start time was prior to charted MV start and end times in the EHR. Visual overview of data processing and classifier model development. A, Ventilator waveform data from each subject were divided into consecutive 100-breath observation windows. Physiologic features were calculated for each breath in a window, and median values were used to represent the entire window. Each window was labeled as acute respiratory distress syndrome (ARDS) or non-ARDS and tagged with a subject identifier. B, Feature vectors for each labeled window were fed to a supervised machine learning algorithm for training and evaluation. C, Classified windows were aggregated at the patient level to allow threshold-based, patient-level predictions to be made based on the percentage of ARDS and non-ARDS windows within any given time period (e.g., 24 hr).

Machine Learning Model Development

We evaluated seven algorithms using the Python scikit-learn software library (Supplemental Digital Content Table 4, http://links.lww.com/CCX/A480) (26). Despite comparable performance across algorithms, we chose the random forest (RF) algorithm for further model development and testing based on its resistance to overfitting and tolerance to outliers (27). Given our small sample size, we evaluated model performance using k-fold cross validation (k = 5), a 70/30 holdout split, and bootstrapping. For the five-fold cross validation, 80 subjects were used for training in each of the five k-folds, and 20 were used for validation, with no overlapping data between the training and validation sets in each fold (Supplemental Digital Content Fig. 3, http://links.lww.com/CCX/A479). For the 70/30 holdout split, we randomly selected 70 subjects for model training, and withheld 30 for final model validation. For bootstrapping, 100 bootstrapping runs were performed. In each run, 80 patients were randomly selected with replacement for training and the remaining patients were used for validation, with performance averaged over 100 bootstraps. We performed feature selection for each model using the chi-square and Gini selection methods. Model hyperparameters were selected using a Python grid search (Supplemental Digital Content Table 5, http://links.lww.com/CCX/A480) (26). Because feature importance was comparable across both methods, we used sequential feature selection to maximize the area under the receiver operating curve (AUC) with chi-square for all models (Supplemental Digital Content Tables 6–8, http://links.lww.com/CCX/A480 and Supplemental Digital Content Fig. 4, http://links.lww.com/CCX/A479). ARDS-screening ML models were developed using a two-step process. First, we trained the model to classify all individual 100-breath windows from the training set as either ARDS or non-ARDS (Fig. ). We then determined patient-level model performance by attributing all breath window predictions from the validation sets to each subject and assigning the subject class as ARDS or non-ARDS using a specific threshold for the percentage of individual windows classified as ARDS in any given time bin (Fig. ). We examined ML model performance to screen for ARDS using either 24 or 6 hours of VWD. We trained the 24/24 model using the first 24 hours of available VWD in the training set and then validated using the first 24 hours of available VWD from the validation set (24/24 model; n = 100). Our second model, the 24/6 model, was trained using the first 24 hours of available data but was validated using data available in the first 6 hours (24/6 model; n = 70). In both the 24/24 and 24/6 models, all available VWD were used within the specified time frames after Berlin criteria were first met or after the start of MV for ARDS and non-ARDS subjects, respectively. Model performance was assessed using AUC, sensitivity, specificity, positive predictive value, and negative predictive value. Performance was compared using a simple majority voting threshold (e.g., more than 50% of 100-breath windows were classified as ARDS in a given time period) and across a range of voting threshold deciles between 0% and 100%.

RESULTS

A total of 100 adult mechanically ventilated patients were included in the study, including 50 with ARDS and 50 without evidence of ARDS during the course of MV. Table 1 provides demographic, clinical, and physiologic characteristics of subjects. We analyzed a median of 21.2 hours of VWD per subject from ARDS patients and 13.3 hours of VWD from non-ARDS patients, representing 19,777 100-breath window observations. The dataset contained a total of 2,020,556 breaths, with 1,331,285 breaths from patients with ARDS and 689,271 from patients without ARDS.

TABLE 1.

Clinical Characteristics of Study Subjects

	ARDS (n = 50)	Non-ARDS (n = 50)
Age (median [IQR])	57 (38–65)	58 (49–67)
Female (n [%])	13 (26)	23 (46)
Body mass index (median [IQR])	26.4 (22.3–33.8)	25.9 (22.1–28.7)
Obstructive lung disease (n [%])
COPD	0 (0)	12 (24)
Asthma	0 (0)	5 (10)
Reason for ICU admission (n [%])
Acute hypoxemic respiratory failure	24 (48)	—
COPD/asthma exacerbation	—	17 (34)
Sepsis	11 (22)	—
Metabolic encephalopathy/drug overdose	2 (4)	15 (30)
Airway edema/anaphylaxis	—	5 (10)
Stroke	—	4 (8)
Cardiac arrest	9 (18)	3 (6)
Heart failure	—	2 (4)
Upper gastrointestinal bleeding	—	2 (4)
Trauma/surgery	3 (6)	2 (4)
Pancreatitis	1 (2)	—
Sequential Organ Failure Assessment score (median [IQR])	13 (10–16)	7.5 (5–10)
Days from intubation to Berlin criteria (median [IQR])	0.1 (0.0–0.2)	—
Median Pao₂/Fio₂ first 24 hr (median [IQR])	176 (134–210)	318 (267–423)
Worst Pao₂/Fio₂ 24 hr (median [IQR])	108 (66–137)	278 (147–385)
ARDS insult type (n [%])
Pneumonia	18 (36)	—
Aspiration	14 (28)	—
Nonpulmonary sepsis	10 (20)	—
Trauma	2 (4)	—
Diffuse alveolar hemorrhage	2 (4)	—
Pancreatitis	1 (2)	—
Other	3 (6)	—
Hospital length of stay (median [IQR])	13.3 (6.6–25.4)	7.0 (4.2–13.4)
Hospital mortality (n [%])	24 (48)	10 (20)
Ventilator-free days in 28 d (median [IQR])	6.6 (0–23.0)	25.3 (10.6–26.9)

ARDS = acute respiratory distress syndrome, COPD = chronic obstructive pulmonary disease, IQR = interquartile range.

Performance of our primary ML model discriminating between ARDS and non-ARDS cases using the first 24 hours of VWD (24/24 model) is shown in Figure and Table . For our main analyses, we used a simple majority voting scheme to determine patient-level predictions. Thus, if 51% or more observations from a patient were classified as ARDS, the patient was classified as ARDS. Using this voting threshold, the 24/24 VWD model was able to discriminate between the ARDS and non-ARDS subjects with a mean AUC across all five k-folds of 0.88 (95% CI, 0.816–0.944). Discriminative performance was similar in the 70/30 holdout and bootstrapping experiments (Supplemental Digital Content Tables 9 and 10, http://links.lww.com/CCX/A480). Figure and Table show how model sensitivity and specificity varied by changing the threshold used to classify ARDS across the range of prediction thresholds from 0% to 100% prediction votes, and at specific threshold deciles from 10% to 100%, respectively. Model Performance Statistics for Both Train 24-/Test 24-hr (24/24) and Train 24-/Test 6-hr (24/6) Models Mean (with 95% CIs) performance across all five k-folds is shown for both models, and results of individual k-folds are displayed for the 24/24 model to illustrate the spectrum of performance variability. Note that only 70 subjects had ventilator waveform data available in the first 6 hr, resulting in a smaller sample size for the test cohort in the 24/6 model (see Supplemental Digital Content Table 8, http://links.lww.com/CCX/A480, for individual k-fold results of the 24/6 model). Performance Characteristics of the Train 24/Test 24-hr (24/24) Model for Detection of Acute Respiratory Distress Syndrome Across Deciles of Voting Thresholds, Illustrating the Tunable Nature of Our Two-Step Acute Respiratory Distress Syndrome Classification Methodology Performance characteristics of the train 24-/test 24-hr model (24/24). A, Receiver operating characteristic (ROC) curves for individual k-folds in the 24/24 five-fold cross validation model. Mean area under the ROC (area under the receiver operating characteristic curve [AUC]) across all k-folds is shown in blue (95% CI displayed in figure legend). B, Sensitivity and specificity of acute respiratory distress syndrome (ARDS) detection change as the voting threshold required to classify ARDS in the first 24 hr increases. Our second ML model explored the ability of an ML algorithm trained on the first 24 hours of VWD to differentiate between ARDS and non-ARDS in a validation set using only the first 6 hours of VWD after meeting Berlin criteria or starting MV for the ARDS and the non-ARDS cohorts, respectively (24/6 model). Discriminative performance in this 24/6 model was comparable with the 24/24 model with AUCs of 0.89 (95% CI, 0.817–0.963) and 0.88 (95% CI, 0.816–0.944), respectively, using five-fold cross validation.

DISCUSSION

We developed an automated ARDS screening algorithm that can detect potential cases of moderate-severe ARDS early in the course of MV without need for CXR, ABG, or other EHR-derived data. Using ML techniques and physiologic features derived from raw VWD, our model demonstrated robust discriminative performance for detecting ARDS in the first 24 hours after meeting Berlin criteria that were reproducible across a variety of experimental conditions. We further showed that our ARDS detection model could identify potential ARDS cases as early as 6 hours after Berlin criteria were first documented and that our model architecture enabled adjustment of model performance according to desired levels of sensitivity and specificity. Despite intensive research into the etiology, diagnosis, and treatment of ARDS, multiple studies have shown that bedside providers continue to underrecognize the syndrome. In the largest multinational prospective cohort study to date, recognition of ARDS occurred in only 34% of patients on the first day when Berlin diagnostic criteria were present, and ever in only 60% of patients. Even when providers were prompted with the question “Did the patient have ARDS at any stage of their ICU stay?,” 34.7% of patients with moderate and 21.5% with severe ARDS were never recognized at any time while in intensive care (1). Reasons for delayed or failed diagnosis remain incompletely understood, but underrecognition is not likely the result of subtle clinical findings, since 88% of patients already met Berlin criteria on day 1 of hypoxemic respiratory failure in LUNG SAFE and in 76% of patients at the time of intubation in the LOTUS-FRUIT study (1, 28). Underdiagnosis has also been associated with suboptimal care delivery. In this regard, studies have demonstrated repeatedly that clinicians, operating unassisted by decision support, consistently fail to apply evidence-based therapies (1, 4, 28–30), whereas at least one study has shown that clinical decision support driven by automated ARDS screening can decrease the delivery of potentially injurious MV (31). Collectively, these studies demonstrate a clear need for improved ARDS screening strategies. Our results expand on previous research of automated ARDS screening “sniffer” systems. The original ARDS systems, developed in parallel at two U.S. institutions, used rule-based algorithms combining keyword searching of CXR reports and processing of ABG data to screen for ARDS and alert clinicians in near real-time (10, 11). These initial studies reported excellent diagnostic performance; however, specificity decreased substantially when they were externally validated at a different institution (13), underscoring the challenges of generalizing algorithms that depend on local practice and documentation patterns. Since the initial ARDS sniffers were developed, at least six additional ARDS detection tools have been described and validated in single institutions, all using EHR-based data and/or imaging (12). Most of these second-generation sniffers have used ML approaches in an attempt to address the challenges of rule-based algorithms. Four have used ML techniques based on natural language processing and text mining of CXR reports (14, 32, 33) or image processing and feature extraction from CXR images (34). Two studies did not incorporate radiographic data and were based only on clinical data extracted from the medical and surgical history, charted vital signs, laboratory results, ventilatory settings, and medication use (35, 36). Most of these recent ARDS sniffer tools reported moderate to excellent diagnostic performance locally; however, none have been validated externally. Potential barriers to widespread usability of existing ARDS detection systems include dependence on local practice patterns of EHR adoption, documentation and ordering, and the requirement that clinicians document accurately and order tests in a timely manner. To address these limitations, our methods differed from previous research in several notable ways. Our use of ML with VWD-derived features may overcome some limitations of previous feature extraction methods by capturing the physiologic signatures present in waveforms instead of relying on EHR or imaging data alone. Because VWD are generated from the start of MV, our methods enable continuous patient monitoring and may allow for more timely identification of potential ARDS cases, independent of ordering or documentation, which was suggested by our finding that ARDS could be detected as early as 6 hours after Berlin criteria were first met. As access to physiologic waveform data becomes more common, our exclusive use of VWD may also extend automated ARDS screening to resource-constrained care environments such as community and rural hospitals lacking well-developed EHRs, and in developing nations, battlefields and disaster relief zones where tests required to fulfill Berlin criteria may be in short supply or unavailable (37). Use of ventilator waveform analysis may thus improve both the timeliness and the ability to apply automated ARDS screening in diverse settings, which are particularly important since delayed or missed diagnosis is thought to be a major contributor to suboptimal implementation of evidence-based therapies for ARDS (1, 3, 4, 30). In addition to extending previous research into developing automated ARDS screening systems, our work further demonstrates the potential value of ML in the analysis of large volumes of untapped streaming physiologic waveform data generated from patient-monitoring devices in the ICU. The use of patient-derived physiologic data has gained increasing attention in recent years as the availability of both high-volume, high-sampling rate data types, and advanced computing power has become more commonplace. In this regard, automated processing of VWD has been demonstrated by multiple investigators in the study of patient-ventilator asynchrony (16–18) and several groups have shown the potential to computationally extract physiologic features from VWD including data derived from animal models of ARDS pertaining to airway resistance and respiratory system compliance (21–23). Sottile et al (19) and Rehm et al (20) have further investigated the ability to use ML to detect common types of patient-ventilator asynchrony without the need to explicitly code rule-based, expert systems, illustrating the ability of ML algorithms to learn relevant knowledge from the physiologic information embedded in raw VWD. Our work also fits into a broader context of recent research using ML and sensor-derived physiologic data to develop so-called digital biomarkers to screen for and monitor diseases, improve disease phenotyping, and predict clinical trajectories (38). Recent studies in critical care have demonstrated the potential of digital biomarker signatures including the use of convolutional deep neural networks to process electrocardiogram waveforms to screen for hyperkalemia (39) and detect arrhythmias (40), and the use of continuous electroencephalography waveforms and deep learning to predict neurologic outcome after cardiac arrest (41). Within this framework, our results suggest the potential of ML and VWD to generate digital biomarker signatures of ARDS, either alone or in combination with conventional biomarkers (42), to aid clinicians in early detection, monitoring ARDS progression, and prognostication of patient outcomes. Our study has a number of limitations that should be addressed in future studies. First, our study was limited to a single academic medical center, which could affect model generalizability despite our exclusive use of quantitative physiologic data (13). Second, our limited sample size may have resulted in model overfitting. We attempted to address this issue by using the RF algorithm, which may be inherently more resistant to overfitting (27), and several different frameworks for model validation. Although most prior studies using VWD have been similarly limited in size (16–19), research on larger cohorts will be necessary to understand the full limitations of waveform-based ARDS screening. Similarly, our subject selection was intentionally biased to ensure phenotypic separation between ARDS and non-ARDS subjects to test the hypothesis that VWD and ML could be used to discriminate between clear phenotypes. Our cohort was comprised mostly of moderate-severe persistent ARDS that was present on intubation, and it is unclear how our model would perform in late-onset ARDS, mild ARDS patients, rapid resolvers (43), those with an uncertain diagnosis when Berlin criteria are first met, or in those with preexisting chronic lung disease (6, 36, 44). Third, we focused on clinician-driven feature extraction and one ML algorithm. It is possible that the use of other input features, including nonwaveform EHR-derived features or algorithms capable of end-to-end model development and automated featurization such as deep learning (45) may have improved performance. Fourth, although VWD are ubiquitous at the bedside, widespread access to these data for research purposes remains a challenge at present. Finally, studies aimed at developing ARDS classifiers, including ours, are limited by the inherent imprecision of the Berlin criteria (6, 46). Recognizing this fundamental limitation, our study focused on developing a tunable screening algorithm rather than one aimed at diagnosis. Development of ARDS classifiers that generalize well and are trusted by clinicians will require additional study with larger, more heterogeneous populations, may require improved methods of class assignment, such as advanced imaging and physiologic or digital biomarkers (29, 47, 48), and will ultimately require external validation followed by thoughtful integration into decision support workflows to realize intended patient benefits.

CONCLUSIONS

We report the performance of an automated, ML-based ARDS screening algorithm that can detect ARDS with strong discrimination performance within the first 24 hours after Berlin criteria are first met, without the need for CXR, ABG, or other EHR-derived data. Our focus on feature extraction exclusively from VWD suggests that this approach may enable ARDS screening very early after intubation and nearly continuously, which may result in decreased time to recognition, improved generalization to other centers, and may enable screening in resource-constrained settings where ABG, radiographic testing, and critical care expertise may be unavailable or scarce. Although our results represent a first proof of concept that digital biomarkers derived from physiologic monitoring data can be used for ARDS detection, additional research is needed to determine how broadly such methods can be applied, how best to incorporate them into traditional Berlin criteria-based diagnostic work flows, and how they might compliment EHR and biochemical approaches to clinical phenotyping and prognosis.

TABLE 2.

Model Performance Statistics for Both Train 24-/Test 24-hr (24/24) and Train 24-/Test 6-hr (24/6) Models

Model	Train/Test Split (n)	k-Fold Number	Sensitivity	Specificity	Positive Predictive Value	Negative Predictive Value	Area Under the Curve
Train 24/Test 24	80/20	1	1.0	0.73	0.79	1.0	0.98
—	—	2	1.0	0.79	0.83	1.0	0.92
—	—	3	0.70	0.70	0.69	0.70	0.78
—	—	4	0.80	0.91	0.90	0.82	0.94
—	—	5	1.0	0.44	0.64	1.0	0.79
—	Not applicable	Mean of five k-folds	0.90 ± 0.059	0.71 ± 0.089	0.77 ± 0.082	0.90 ± 0.059	0.88 ± 0.064
Train 24/Test 6	80/14	Mean of five k-folds	0.90 ± 0.07	0.75 ± 0.101	0.83 ± 0.088	0.83 ± 0.088	0.89 ± 0.073

Mean (with 95% CIs) performance across all five k-folds is shown for both models, and results of individual k-folds are displayed for the 24/24 model to illustrate the spectrum of performance variability. Note that only 70 subjects had ventilator waveform data available in the first 6 hr, resulting in a smaller sample size for the test cohort in the 24/6 model (see Supplemental Digital Content Table 8, http://links.lww.com/CCX/A480, for individual k-fold results of the 24/6 model).

TABLE 3.

Performance Characteristics of the Train 24/Test 24-hr (24/24) Model for Detection of Acute Respiratory Distress Syndrome Across Deciles of Voting Thresholds, Illustrating the Tunable Nature of Our Two-Step Acute Respiratory Distress Syndrome Classification Methodology

% Acute Respiratory Distress Syndrome Votes in First 24 hr	Sensitivity	Specificity	Positive Predictive Value	Negative Predictive Value
10	0.99 ± 0.02	0.4 ± 0.096	0.63 ± 0.095	0.99 ± 0.02
20	0.97 ± 0.033	0.51 ± 0.098	0.68 ± 0.091	0.96 ± 0.038
30	0.96 ± 0.038	0.58 ± 0.097	0.71 ± 0.089	0.96 ± 0.038
40	0.92 ± 0.053	0.63 ± 0.095	0.74 ± 0.086	0.92 ± 0.053
50	0.9 ± 0.059	0.71 ± 0.089	0.77 ± 0.082	0.91 ± 0.056
60	0.87 ± 0.066	0.77 ± 0.082	0.81 ± 0.077	0.87 ± 0.066
70	0.81 ± 0.077	081 ± 0.077	0.83 ± 0.074	0.83 ± 0.074
80	0.75 ± 0.085	0.85 ± 0.07	0.85 ± 0.07	0.79 ± 0.08
90	0.68 ± 0.091	0.87 ± 0.066	0.86 ± 0.068	0.74 ± 0.086
100	0.57 ± 0.097	0.91 ± 0.056	0.86 ± 0.068	0.69 ± 0.091

44 in total

1. Subphenotypes in acute respiratory distress syndrome: latent class analysis of data from two randomised controlled trials.

Authors: Carolyn S Calfee; Kevin Delucchi; Polly E Parsons; B Taylor Thompson; Lorraine B Ware; Michael A Matthay
Journal: Lancet Respir Med Date: 2014-05-19 Impact factor: 30.700

2. Asynchronies during mechanical ventilation are associated with mortality.

Authors: Lluís Blanch; Ana Villagra; Bernat Sales; Jaume Montanya; Umberto Lucangelo; Manel Luján; Oscar García-Esquirol; Encarna Chacón; Anna Estruga; Joan C Oliva; Alberto Hernández-Abadia; Guillermo M Albaiceta; Enrique Fernández-Mondejar; Rafael Fernández; Josefina Lopez-Aguilar; Jesús Villar; Gastón Murias; Robert M Kacmarek
Journal: Intensive Care Med Date: 2015-02-19 Impact factor: 17.440

3. An Educational Intervention Optimizes the Use of Arterial Blood Gas Determinations Across ICUs From Different Specialties: A Quality-Improvement Study.

Authors: Carlos D Martínez-Balzano; Paulo Oliveira; Michelle O'Rourke; Luanne Hills; Andrés F Sosa
Journal: Chest Date: 2016-11-03 Impact factor: 9.410

4. Distinct molecular phenotypes of direct vs indirect ARDS in single-center and multicenter studies.

Authors: Carolyn S Calfee; David R Janz; Gordon R Bernard; Addison K May; Kirsten N Kangelaris; Michael A Matthay; Lorraine B Ware
Journal: Chest Date: 2015-06 Impact factor: 9.410

5. Automated Classification of Radiology Reports for Acute Lung Injury: Comparison of Keyword and Machine Learning Based Natural Language Processing Approaches.

Authors: Imre Solti; Colin R Cooke; Fei Xia; Mark M Wurfel
Journal: Proceedings (IEEE Int Conf Bioinformatics Biomed) Date: 2009-11

6. Limiting ventilator-induced lung injury through individual electronic medical record surveillance.

Authors: Vitaly Herasevich; Mykola Tsapenko; Marija Kojicic; Adil Ahmed; Rachul Kashyap; Chakradhar Venkata; Khurram Shahjehan; Sweta J Thakur; Brian W Pickering; Jiajie Zhang; Rolf D Hubmayr; Ognjen Gajic
Journal: Crit Care Med Date: 2011-01 Impact factor: 7.598

7. Validation study of an automated electronic acute lung injury screening tool.

Authors: Helen C Azzam; Satjeet S Khalsa; Richard Urbani; Chirag V Shah; Jason D Christie; Paul N Lanken; Barry D Fuchs
Journal: J Am Med Inform Assoc Date: 2009-04-23 Impact factor: 4.497

8. The Association Between Ventilator Dyssynchrony, Delivered Tidal Volume, and Sedation Using a Novel Automated Ventilator Dyssynchrony Detection Algorithm.

Authors: Peter D Sottile; David Albers; Carrie Higgins; Jeffery Mckeehan; Marc M Moss
Journal: Crit Care Med Date: 2018-02 Impact factor: 7.598

9. Acute respiratory distress syndrome: the Berlin Definition.

Authors: V Marco Ranieri; Gordon D Rubenfeld; B Taylor Thompson; Niall D Ferguson; Ellen Caldwell; Eddy Fan; Luigi Camporota; Arthur S Slutsky
Journal: JAMA Date: 2012-06-20 Impact factor: 56.272

Review 10. Acute respiratory distress syndrome.

Authors: Michael A Matthay; Rachel L Zemans; Guy A Zimmerman; Yaseen M Arabi; Jeremy R Beitler; Alain Mercat; Margaret Herridge; Adrienne G Randolph; Carolyn S Calfee
Journal: Nat Rev Dis Primers Date: 2019-03-14 Impact factor: 52.329