Literature DB >> 35302957

Preintubation Sequential Organ Failure Assessment Score for Predicting COVID-19 Mortality: External Validation Using Electronic Health Record From 86 U.S. Healthcare Systems to Appraise Current Ventilator Triage Algorithms.

Michael B Keller^1,2, Jing Wang³, Martha Nason⁴, Sarah Warner¹, Dean Follmann⁴, Sameer S Kadri¹.

Abstract

OBJECTIVES: Prior research has hypothesized the Sequential Organ Failure Assessment (SOFA) score to be a poor predictor of mortality in mechanically ventilated patients with COVID-19. Yet, several U.S. states have proposed SOFA-based algorithms for ventilator triage during crisis standards of care. Using a large cohort of mechanically ventilated patients with COVID-19, we externally validated the predictive capacity of the preintubation SOFA score for mortality prediction with and without other commonly used algorithm elements.
DESIGN: Multicenter, retrospective cohort study using electronic health record data.
SETTING: Eighty-six U.S. health systems. PATIENTS: Patients with COVID-19 hospitalized between January 1, 2020, and February 14, 2021, and subsequently initiated on mechanical ventilation.
INTERVENTIONS: None.
MEASUREMENTS AND MAIN RESULTS: Among 15,122 mechanically ventilated patients with COVID-19, SOFA score alone demonstrated poor discriminant accuracy for inhospital mortality in mechanically ventilated patients using the validation cohort (area under the receiver operating characteristic curve [AUC], 0.66; 95% CI, 0.65-0.67). Discriminant accuracy was even poorer using SOFA score categories (AUC, 0.54; 95% CI, 0.54-0.55). Age alone demonstrated greater discriminant accuracy for inhospital mortality than SOFA score (AUC, 0.71; 95% CI, 0.69-0.72). Discriminant accuracy for mortality improved upon addition of age to the continuous SOFA score (AUC, 0.74; 95% CI, 0.73-0.76) and categorized SOFA score (AUC, 0.72; 95% CI, 0.71-0.73) models, respectively. The addition of comorbidities did not substantially increase model discrimination. Of 36 U.S. states with crisis standards of care guidelines containing ventilator triage algorithms, 31 (86%) feature the SOFA score. Of these, 25 (81%) rely heavily on the SOFA score (12 exclusively propose SOFA; 13 place highest weight on SOFA or propose SOFA with one other variable).
CONCLUSIONS: In a U.S. cohort of over 15,000 ventilated patients with COVID-19, the SOFA score displayed poor predictive accuracy for short-term mortality. Our findings warrant reappraisal of the SOFA score's implementation and weightage in existing ventilator triage pathways in current U.S. crisis standards of care guidelines.

Entities: Chemical

Mesh：

Year: 2022 PMID： 35302957 PMCID： PMC9196924 DOI： 10.1097/CCM.0000000000005534

Source DB: PubMed Journal: Crit Care Med ISSN： 0090-3493 Impact factor: 9.296

The COVID-19 pandemic has caused surges in hospital caseloads worldwide, placing strain on affected healthcare systems (1). Patient caseloads have often exceeded a hospital’s capacity to provide standard-of-care, necessitating contingency standards, and in extreme situations, crisis standards of care (CSC). The latter may result in scenarios, whereby parsimonious allocation of life-saving resources becomes pivotal (2). Methods to adequately predict and maximize survival are paramount to inform CSC triage guidelines. Several guidelines have been developed to guide resource allocation under such circumstances (3–8). In addition to elements intended to predict survival, these guidelines include components intended to predict survival and also identify those at risk for high resource consumption such as from poor functional outcomes or prolonged mechanical ventilation. However, there is considerable variation in elements included in ventilator triage algorithms across state CSC guidelines as well as the quality of evidence underpinning their inclusion, raising ethical concerns around the adequacy of ventilator allocation offered by current algorithms (9). Many mechanical ventilator triage protocols in the United States include the Sequential Organ Failure Assessment (SOFA) score to predict short-term survival (9–11), including two U.S. states that recently declared CSC due to COVID-19 surges (12, 13). However, the degree to which ventilator triage decisions would hinge on the score has received less attention. Two prior studies in cohorts of ICU patients with sepsis have reported an area under the receiver operating characteristic curve (AUC) of 0.74 and 0.75 of the SOFA score for predicting survival (14, 15). The SOFA score assigns equal weightage to its six organ system components; however, respiratory failure tends to be the predominant organ failure among acutely ill patients with COVID-19, and these patients display less variability in SOFA score than those with conditions such as bacterial sepsis (16). Hence, despite its inclusion in several triage protocols nationwide, it is unclear whether the SOFA score adequately predicts mortality in mechanically ventilated patients with COVID-19. A recent hypothesis-generating study suggests that the discriminant accuracy of the SOFA score for predicting inhospital mortality in mechanically ventilated COVID-19 patients is poor (16). However, the study was relatively small (675 ventilated patients), was regional, and did not assess for model calibration or the predictive capacity of combining SOFA with other relevant predictors featured in existing triage protocols. As suggested by a recent expert consensus panel, there is need for additional, larger studies to validate the predictive accuracy of existing algorithms and formulate better prediction tools (17). Hence, in this study, we: 1) examine implementation and weightage of the SOFA score in State CSC ventilator triage algorithms nationally and 2) leverage a large electronic health record (EHR) database of U.S. hospitals to externally validate the hypothesis that preintubation SOFA score is a poor predictor for inhospital mortality in COVID-19 patients requiring mechanical ventilation.

MATERIALS AND METHODS

Study Design and Data Source

We performed a multicenter, retrospective cohort study using the Cerner COVID-19 Deidentified Data cohort. This repository contains EHR data from 86 U.S. healthcare systems that share data with Cerner (Kansas City, MO) and includes billing records, medication orders, laboratory results, vitals, and other physiologic variables (18). Data were accessed and analyzed on Cerner HealthIntent (Cerner), a cloud-based management platform following a data use agreement (no. 1-70WNSGX) between the National Institutes of Health (NIH) and Cerner. Data refreshes were provided quarterly, enabling incorporation of new cases. Downstream curation of study-specific variables and algorithms was performed by NIH-contracted informaticists under the guidance of study investigators (M.K., S.S.K.) and study design feedback offered by all investigators. Given the deidentified nature of the data, the study was deemed exempt from ethics board review based on the policy of the NIH Office of Human Subjects Research Protections.

Study Population

Patients greater than or equal to 18 years old with COVID-19 admitted as inpatients between January 1, 2020, and February 14, 2021, who underwent mechanical ventilation were included. For each patient, one admission was randomly selected for inclusion in the analysis. Patients admitted with COVID-19 were identified by an International Classification of Diseases, 10th Edition (ICD-10) diagnosis code for COVID-19 (U07.1), a positive polymerase chain reaction (PCR) test for severe acute respiratory syndrome coronavirus 2 (SARS-CoV2), or positive serology for COVID-19 antibodies. The ICD-10 diagnosis code for COVID-19 captures patients positive for SARS-CoV2 on PCR with a sensitivity of 98%, specificity of 99%, and a positive predictive value of 92% (19). Encounters prior to March 2020 were identified using a legacy coding strategy that leverages coding for generic coronaviruses (B97.29) (19). Patients who received invasive mechanical ventilation were identified by ICD-10 invasive mechanical ventilation procedure codes and Logical Observation Identifiers Names and Codes. Patients on mechanical ventilation within 24 hours of admission and those with a designation of do-not-resuscitate (DNR) status present at admission, respectively, were excluded.

Study Variables

The primary outcome was inhospital mortality, defined as death during hospitalization or discharge to hospice. The highest SOFA score was calculated within 24 hours prior to initiation of mechanical ventilation, signifying a time point at which ventilator triage is likely to occur based on current CSC protocols (3, 5, 7). Cerner HealtheIntent contains all components necessary to compute the SOFA score except urine output and vasopressor dose. Therefore, as previously described, we used creatinine levels to assign points for renal dysfunction and the number of vasopressors to assign points for cardiovascular dysfunction (20). If values for Po2 in arterial blood to fractional concentration of oxygen ratio (Pao2:Fio2) were missing, we used the saturation of blood oxygen to Fio2 ratio (Sao2:Fio2) to assign points for respiratory dysfunction (21). Daily SOFA score was computed using the worst scoring criteria for each component on each day. If no values were available on a day, we used the closest value within 5 days looking backward (20). If there was no value within the prior 5 days, we assigned the SOFA score component as 0 (missing-as-normal), as previously described (15, 22, 23). For Glasgow Coma Scale (GCS) score, the lowest value was taken for a given day. We then carried that value forward until a new value was present on another day (20). If the first GCS score occurred several days into hospitalization, the GCS score was assigned as 0 each day leading up to that day. Further details regarding calculation of daily SOFA scores are in eMethods (Online Supplement, http://links.lww.com/CCM/H88). We evaluated SOFA score as both a continuous (count) variable ranging from 0 to 24 and a categorized variable, based on strata (<6, 6–8, 9–11, >11) commonly implemented in existing CSC guidelines (3, 5, 7, 12, 13). Select patient-level covariates were identified, including age, sex, obesity, diabetes, and hypertension, based on prior data linking these covariates with poor outcomes (24–29). Patient-level comorbidities were identified using respective ICD-10 codes. Aggregate comorbidity burden was assessed using the Elixhauser comorbidity index (30, 31).

Evaluation of U.S. State-Adopted CSC Guidelines

We next performed a cross-sectional analysis of state-adopted CSC guidelines to examine the prevalence of SOFA score utilization and degree of representation in current CSC models. One study investigator (M.K.) performed a search on three separate dates between October first, 2021, and October 14, 2021, for state-adopted CDC guidelines, providing guidance on triage of mechanical ventilation or scarce resources as previously described (32). State-adopted CSC guidelines were identified as those written by or in coordination with the state’s department of public health. CSC guidelines that were revoked or not written in coordination with the state’s department of health (33, 34) were excluded. Guidelines that directly mention COVID-19 or were written after March 1, 2020, were deemed “COVID-19 specific” (further details of the search methods are outlined in the eMethods, Online Supplement, http://links.lww.com/CCM/H88). We categorized each CSC guideline’s level of reliance on the SOFA score as follows: 1) No reliance 2) Low reliance—SOFA score is mentioned but not directly involved in triage of mechanical ventilation 3) Heavy reliance—SOFA score used alone or as a major component (SOFA indicated as holding the greatest weight or used with one other variable) in assigning patients to priority tiers for receipt of mechanical ventilation We calculated the prevalence of each category of SOFA score reliance for CSC state-adopted protocols in the United States.

Statistical Analysis

We calculated descriptive statistics of patient characteristics overall or by patient groups. All characteristics were reported at admission to hospital except for preintubation SOFA score. To assess the difference between patient groups, Mann-Whitney nonparametric tests were used for continuous variables, and chi-square or Fisher exact tests were used for categorical variables. To investigate preintubation SOFA score and/or age as predictors, inhospital mortality, with or without adjusting for other covariates, logistic regression, and conditional classification trees with Bonferroni adjustments (35) were fit using the derivation set, which is two thirds of the entire dataset. The other third of the data were saved to validate selected models. Derivation/validation cohort splitting was stratified by hospital. To evaluate discriminant accuracy, we computed the AUC with 95% CIs and performed Delong test to compare the AUCs. We considered an AUC below 0.7 to be poor accuracy, AUC 0.7–0.8 moderate, 0.8–0.9 good, and greater than 0.9 excellent (36). We also generated calibration belts, followed by conducting Hosmer-Lemeshow test to assess calibration (37–39). We conducted sensitivity analyses: 1) excluding patients with chronic kidney disease (CKD) and end-stage renal disease (ESRD) (as the renal score component of SOFA used creatinine rather than urine output, thus potentially effecting model performance), 2) excluding patients with missing SOFA score values after our substitution method (imputed as normal—0), 3) excluding patients who had an ICD-10 Major Operating Room procedure code on the same day as intubation (to account for potential preoperative rather than critical illness-related intubation), and 4) using tree-based instead of logistic regression models. Among survivors, logistic regression was performed to investigate preintubation SOFA score as a predictor of discharge to long-term acute care (LTAC) (secondary outcome) facilities, as these patients may represent a population at risk for high resource utilization. Analyses were conducted using R 4.0.2 (R Foundation for Statistical Computing, Vienna, Austria). A p value of less than 0.05 was considered statistically significant. A link to statistical code is included in the Online Supplement (page 8, http://links.lww.com/CCM/H88).

RESULTS

Between January 1, 2020, and February 14, 2021, 101,985 patients (109,285 inpatient encounters) with COVID-19 were admitted to 86 U.S. healthcare systems. Of those encounters without an ICD diagnosis code or positive PCR, only 930 (0.85%) were selected based on positive serology. Of the 101,985 patients, 24,908 patients (24%) were mechanically ventilated, but 9,043 patients were mechanically ventilated within 24 hours of admission and 743 patients were DNR at admission and excluded, leaving 15,122 patients in the final analysis divided into a derivation (n = 10,085) and a validation (n = 5,037) cohort (Fig. 1). Of 15,122 ventilated patients, 7,568 (50.0%) died or were discharged to hospice; among 7,554 survivors, 501 (6.6%) were discharged to LTAC. The mean preintubation SOFA score was 2.77 (sd, 1.91). Respiratory SOFA subscore had a mean of 1.39 compared with 0.53 for hepatic, 0.29 for cardiovascular, 0.24 for neurologic, 0.23 for coagulation, and 0.09 for renal. A density plot illustrating SOFA scores for re-admissions is presented in Supplementary Figure 1 (http://links.lww.com/CCM/H88). A total of 7,568 patients (50%) died or were discharged to hospice. Patients who died or were discharged to hospice tended to be older and male, and display higher SOFA and Elixhauser scores, respectively (Table 1).

Figure 1.

Study flowchart depicting the exclusion of patients with do-not-resuscitate (DNR)/do-not-intubate (DNI) status and mechanical ventilation on within 24 hr of admission; 15,122 patients were included in the final analysis.

TABLE 1.

Patient Characteristics

Characteristics	Overall (n = 15,122)	Alive (n = 7,554)	Dead (n = 7,568)	P
Age, mean (sd)	64.47 (15.28)	58.89 (15.74)	70.05 (12.54)	< 0.001
< 65 (%)	6,914 (46)	4,599 (61)	2,315 (31)	< 0.001
≥ 65 (%)	8,208 (54)	2,955 (39)	5,253 (69)
Diabetes mellitus (%)	3,292 (22)	1,564 (21)	1,728 (23)	0.002
Hypertension (%)	3,813 (25)	1,854 (25)	1,959 (26)	0.06
Obesity (%)	1,455 (9.6)	908 (12)	547 (7.2)	< 0.001
Sex, female %	10,010 (40)	5,129 (41)	4,881 (39)	< 0.001
Preintubation SOFA score, mean (sd)	2.77 (1.91)	2.26 (1.63)	3.29 (2.03)	< 0.001
SOFA subscores, mean (sd)
Respiratory	1.39 (0.82)	1.27 (0.86)	1.51 (0.76)	< 0.001
Coagulation	0.23 (0.56)	0.17 (0.47)	0.29 (0.64)	< 0.001
Hepatic	0.53 (1.01)	0.36 (0.86)	0.70 (1.11)	< 0.001
Cardiovascular	0.29 (0.57)	0.26 (0.55)	0.32 (0.60)	< 0.001
Neurologic	0.24 (0.61)	0.14 (0.46)	0.34 (0.72)	< 0.001
Renal	0.09 (0.39)	0.06 (0.30)	0.12 (0.46)	< 0.001
Elixhauser score mean (sd)	1.39 (1.65)	1.30 (1.60)	1.48 (1.70)	0.001

SOFA = Sequential Organ Failure Assessment.

Patient Characteristics SOFA = Sequential Organ Failure Assessment. Study flowchart depicting the exclusion of patients with do-not-resuscitate (DNR)/do-not-intubate (DNI) status and mechanical ventilation on within 24 hr of admission; 15,122 patients were included in the final analysis.

Discrimination of Mortality Risk by SOFA Score

Using logistic regression models, the SOFA score demonstrated poor discriminant accuracy for mortality in mechanically ventilated patients (AUC, 0.66; 95% CI, 0.64–0.67). Discriminant accuracy was even poorer using categorized SOFA scores (AUC, 0.54; 95% CI, 0.54–0.55) (Table 2) and SOFA as a dichotomous variable with score greater than 11 (AUC, 0.50; 95% CI, 0.50–0.50). Discrimination of respiratory SOFA subscore alone for mortality was also poor (AUC, 0.67; 95% CI, 0.65–0.68) as were each of the other SOFA sub scores (Supplementary Index Table 1, http://links.lww.com/CCM/H88). In addition, the discriminant accuracy for SOFA score at admission (AUC, 0.61; 95% CI, 0.60–0.62) was poor as was change in SOFA score from admission to intubation (AUC, 0.54; 95% CI, 0.52–0.55; Supplementary Index Table 1, http://links.lww.com/CCM/H88). Results were similar across the derivation and validation cohorts.

TABLE 2.

Area Under the Receiver Operating Characteristic Curve for Prediction Models on Both Derivation and Validation Cohorts

Model	Variable	AUC (95% CI), Derivation Cohort (n = 10,085)	AUC (95% CI), Validation Cohort (n = 5,037)	Logistic Regression, OR (95% CI)[a]
SOFA[b]	SOFA	0.66 (0.65–0.67)	0.66 (0.64–0.67)	1.39 (1.35–1.42)
SOFA categories[c]	≥ 6 and < 9	0.55 (0.54–0.55)	0.54 (0.54–0.55)	3.42 (2.89–4.06)
	≥ 9 and < 12			4.73 (3–7.82)
	≥ 12			5.89 (1.96–25.32)
Age	Age	0.71 (0.7–0.72)	0.71 (0.69–0.72)	1.06 (1.06–1.06)
Age + SOFA categories	Age	0.73 (0.72–0.74)	0.72 (0.71–0.73)	1.06 (1.05–1.06)
	≥ 6 and < 9			3.18 (2.67–3.82)
	≥ 9 and < 12			4.92 (3.05–8.3)
	≥ 12			6.66 (2.13–29.31)
Age + SOFA	Age	0.75 (0.74–0.76)	0.74 (0.73–0.76)	1.06 (1.05–1.06)
Age + SOFA	SOFA	0.75 (0.74–0.76)	0.74 (0.73–0.76)	1.33 (1.3–1.36)
SOFA + age + covariates[d]	SOFA	0.75 (0.74–0.76)	0.74 (0.73–0.76)	1.33 (1.29–1.36)
	Age			1.06 (1.05–1.06)
	Gender (male vs female)			1.15 (1.05–1.25)
	Obesity			0.92 (0.79–1.08)
	Diabetes			1.18 (1.05–1.32)
	Hypertension			0.84 (0.75–0.93)
SOFA + age + Elixhauser score	Age	0.75 (0.74–0.76)	0.74 (0.73–0.76)	1.06 (1.05–1.06)
	Elixhauser score			1 (0.97–1.03)
	SOFA			1.33 (1.3–1.36)
SOFA + Elixhauser score	SOFA	0.66 (0.65–0.67)	0.66 (0.65–0.68)	1.38 (1.35–1.42)
SOFA + Elixhauser score	Elixhauser score	0.66 (0.65–0.67)	0.66 (0.65–0.68)	1.04 (1.01–1.07)
Categories SOFA + Elixhauser score	≥ 6 and < 9	0.57 (0.56–0.58)	0.56 (0.55–0.58)	3.33 (2.82–3.96)
	≥ 9 and < 12			4.59 (2.91–7.59)
	≥ 12			5.75 (1.91–24.76)
	Elixhauser score			1.06 (1.03–1.09)

AUC = area under the receiver operating curve, OR = odds ratio, SOFA = Sequential Organ Failure Assessment.

Outcome variable has two levels: deceased and discharged, where discharged is served as the reference level.

SOFA and all components are the scores recorded within the 24 hr prior to the start of ventilation.

Variable SOFA category is created based on SOFA variable with the cutoffs of: < 6, ≥ 6 and < 9, ≥ 9 and < 12, and ≥ 12. In the logistic regression model, SOFA < 6 is served as the reference group.

Covariates include age, gender, obesity, hypertension, and diabetes.

Area Under the Receiver Operating Characteristic Curve for Prediction Models on Both Derivation and Validation Cohorts AUC = area under the receiver operating curve, OR = odds ratio, SOFA = Sequential Organ Failure Assessment. Outcome variable has two levels: deceased and discharged, where discharged is served as the reference level. SOFA and all components are the scores recorded within the 24 hr prior to the start of ventilation. Variable SOFA category is created based on SOFA variable with the cutoffs of: < 6, ≥ 6 and < 9, ≥ 9 and < 12, and ≥ 12. In the logistic regression model, SOFA < 6 is served as the reference group. Covariates include age, gender, obesity, hypertension, and diabetes.

Discrimination by Age and Comorbidities, Alone and in Combination With SOFA Score

Age alone demonstrated better discriminant accuracy for mortality than SOFA score (AUC, 0.71; 95% CI, 0.69–0.72). Discriminant accuracy for mortality improved upon addition of age to the continuous SOFA score (AUC, 0.74; 95% CI, 0.73–0.76) and categorized SOFA score (AUC, 0.72; 95% CI, 0.71–0.73) models, respectively. The addition of other covariates (gender, obesity, diabetes, hypertension, or Elixhauser score) did not meaningfully improve discrimination beyond that offered by SOFA + age. Models without age, that is, utilizing SOFA and comorbidities (SOFA + Elixhauser score), had poor discriminant accuracy for both continuous SOFA (AUC, 0.66; 95% CI, 0.65–0.68) and categorized SOFA scores (AUC, 0.56; 95% CI, 0.55–0.58) (Table 2).

Sensitivity Analysis

Sensitivity analysis separately excluding patients with CKD and ESRD, patients with missing SOFA score values after our substitution method (imputed as normal—0), and those who had procedure codes for major operating room procedure and intubation on the same day all yielded results consistent with our primary analysis. Using tree-based models as a sensitivity analysis generated similar results, with slightly poorer discriminant accuracy in comparison with logistic regression models (Supplementary Index Tables 2–5, http://links.lww.com/CCM/H88).

Secondary Outcome

Using logistic regression in the validation cohort, preintubation SOFA displayed an AUC of 0.53 (0.49–0.57; odds ratio, 0.99 [0.93–1.04]) for predicting disposition to LTAC.

Model Calibration

The calibration belt for continuous SOFA score (Fig. 2A) demonstrated significant miscalibration, overestimating mortality risk for patients with observed mortality of 81–95%. The SOFA + age model was well calibrated (Fig. 2B). All other models also showed good calibration (Supplementary Fig. 2, http://links.lww.com/CCM/H88).

Figure 2.

Calibration belts for mortality prediction scores. A, Continuous Sequential Organ Failure Assessment (SOFA) score. B, SOFA score + age. The range of values for which the predicted mortality overestimates mortality (the observed mortality values are significantly under the bisector) or underestimates mortality (observed mortality lies above the bisector) based on the shaded 95% CI is reported at the bottom of each graph.

Nationwide Distribution of SOFA Score Implementation and Weightage in State CSC Ventilation Triage Algorithms

Our search revealed 36 states with state-adopted CSC guidelines in place (Figure 1). Twenty-six of these guidelines are COVID-19-specific and 10 are adopted from prior influenza pandemics. The SOFA score features in 31/36 (86%) of these CSC guideline’s triage protocols. Of these 31 guidelines that feature SOFA score, 25/31 (81%) are heavily reliant on SOFA, with all utilizing the categorized SOFA score (12 of the protocols propose categorized SOFA as the only variable for ventilator triage). Six CSC guidelines place low reliance on SOFA score, excluding it as a main focus of their triage algorithm in place of greater emphasis on clinical judgment (Supplementary Index Table 6, http://links.lww.com/CCM/H88). Heat map illustrating the availability of crisis standards of care (CSC) protocols in the United States by state and degree of reliance on Sequential Organ Failure Assessment (SOFA) score to guide scarce resource allocation. States with COVID-specific guidelines are underlined.

DISCUSSION

External validation of prediction models is a vital step that ensures models are reproducible, generalizable, and reliable for application in real-world decision-making. Using a cohort of over 15,000 mechanically ventilated COVID-19 patients at 86 U.S. healthcare systems, our study externally validates findings from a prior hypothesis-generating study of 675 ventilated COVID-19 patients and offers further confirmation that preintubation SOFA score is a poor predictor of inhospital mortality in COVID-19 patients. Despite a preponderance of respiratory failure in COVID-19, our study also found that the discriminant accuracy for inhospital mortality remains poor when considering respiratory subscore alone. The combination of SOFA score and age provided moderate mortality prediction; however, the addition to select common comorbidities or aggregate comorbidity burden did not meaningfully improve the predictive accuracy. Poor prediction of mortality was consistently observed across various parameterizations of the SOFA score (including as a continuous variable, categorical variable, component score, score change, and across different time periods) and in multiple sensitivity analysis. Even among ventilated patients with COVID-19 who survived hospitalization, SOFA score was a poor predictor of requiring LTAC. Furthermore, we demonstrated that 81% of current state CSC triage algorithms using SOFA score propose heavily reliance on the categorized SOFA score to assign patients into priority tiers for receipt of mechanical ventilation despite its poor predictive accuracy (AUC, 0.54). Our findings build upon observations of previous studies by utilizing expansive, well harmonized EHR data from a large cohort of U.S. hospitals and suggest that significant reliance on the SOFA score in ventilator triage protocols warrants reappraisal (16). The SOFA score, as acknowledged by its developers, was not intended to predict outcomes but to provide a quantitative description of the degree of organ dysfunction in critically ill patients (40, 41). As the degree of organ dysfunction is associated with mortality, several studies have validated its predictive accuracy for mortality in critically ill patients (14, 15, 42, 43). However, these populations may not be generalizable to patients with COVID-19, which primarily presents with respiratory failure and less variation in SOFA scores (16). In addition, many of these studies used distinct SOFA criteria often not represented in triage protocols, such as a change in SOFA score, that were assessed at snapshots in time (not necessarily at the point of triage). A recent study evaluating the predictive accuracy for inhospital mortality of the SOFA score in patients with sepsis and acute respiratory failure prior to the COVID-19 pandemic also revealed poor discriminant accuracy (44). Concerningly, this study also found that using the SOFA score for prognostic evaluation may lead to racial disparities in resource allocation by overestimating mortality for Black patients, potentially diverting scarce resources from this population without warrant. We recognize the complexity that underlies decisions around continued use of current CSC models. We also recognize that these decisions are often not solely predicated on mortality but take into consideration other outcomes including poor long-term functional status and risk of high resource utilization. Nonetheless, given the limitations and uncertainty surrounding the adequacy of the SOFA score for ventilator triage decisions, more robust and reliable strategies for prognostication are needed. Although there has yet to be significant ventilator triage performed in United States at this point in time, even in overwhelmed hospital systems, with the recent surge in the Omicron Variant, hospitalizations are currently the highest they have been during the pandemic, and ICUs are nearing capacity in several U.S. states (45). As such, we must be adequately prepared for the possibility of needing ventilator triage in the future, especially if we are met with a variant with high transmissibility and a propensity for both evading immunity and causing severe disease. Twelve U.S. States rely exclusively on categorical SOFA score for ventilator triage. Our findings raise questions on the appropriateness of ventilator triage decisions that rely solely on the SOFA score to gauge short-term survivability. Our results indicate that the combination of SOFA and age provide better prediction for mortality than SOFA alone. However, the use of age in CSC triage algorithms as a primary or even as a tie-breaker element has been controversial; on the one hand, its inclusion may bias against the elderly (46), and on the other hand, may offer everyone an equal opportunity to achieve a normal lifespan (47). Comprehensive evaluation of the magnitude of implicit triage decisions prevailing during a surge and their impact on the observed outcome in the elderly will further inform this decision. Some CSC guidelines do acknowledge the limitations of the SOFA score and place less emphasis on its role in the allocation of scarce resources (4, 48). Future studies should aim to develop more accurate and pragmatic triage protocols that incorporate novel predictors of mortality in patients with COVID-19. These protocols should also aim to balance maximizing survival with the equitable distribution of resources across society to mitigate disparities in health outcomes (49, 50). In addition, it will be important to continue to elicit informed public opinion and engage in surveys and focus groups to ensure that triage decisions remain patient-centered. This study has several strengths. We have externally validated the model used in a smaller, retrospective cohort study. The use of derivation and validation cohorts allowed for proper internal validation of our models. Model performance was further evaluated by assessing model calibration. The results of this study were robust to sensitivity analyses utilizing tree-based models, different parameterizations of SOFA score, and excluding patients in whom chronically elevated creatinine might bias model performance. These facets are congruent with recent best practice statements concerning the development and reporting of predictive models (51). Although prior studies have demonstrated the poor predictive accuracy of SOFA for mortality, our study is directly applicable to a time period and population more relevant to triage scenarios. Because ventilator triage was not sizably conducted in the United States during the time of our study, our findings are unlikely to be influenced by actual triage-based allocation of mechanical ventilation. There are limitations to this study. Our cohort might not be nationally representative, limiting the generalizability of our findings. As this is a retrospective study evaluating EHR data, the exact time when mechanical ventilation was initiated on a given day cannot be certain. The accuracy and intensity of preexisting comorbid conditions may have been limited using diagnosis codes to identify them. Patient outcomes beyond hospital discharge were not assessed. About 40% of patients had a component of the SOFA score imputed as missing-as-normal. However, a sensitivity analysis excluding these patients remained consistent with our primary analysis, and a recent study demonstrating this technique provides similar results to other imputation techniques (52). More complex models with additional variables (including inflammatory biomarkers) may have provided better predictive capacity; however, our objective was to examine the adequacy of select elements commonly found in existing CSC triage algorithms.

CONCLUSIONS

In conclusion, among hospitalized patients with COVID-19, the SOFA score within 24 hours prior to intubation shows inadequate discriminant accuracy for inhospital mortality. Caution should be taken in implementing the SOFA score in mechanical ventilator triage protocols for COVID-19, especially as the solitary or heavily weighted determinant seen in CSC guidelines currently endorsed by many U.S. states. More research is required to develop practical, accurate, and patient-centered scoring systems for inclusion in mechanical ventilator triage protocols for COVID-19 patients.

ACKNOWLEDGMENTS

We thank Mariam Noorulhuda, Christine Grady, and David Wendler of the National Institutes of Health (NIH) Bioethics Consultation Service for their thoughtful insight and guidance with the development of this article. This work used the computational resources of the NIH High Performance Computing Biowulf cluster (http://hpc.nih.gov).

37 in total

1. Prognostic Accuracy of the SOFA Score, SIRS Criteria, and qSOFA Score for In-Hospital Mortality Among Adults With Suspected Infection Admitted to the Intensive Care Unit.

Authors: Eamon P Raith; Andrew A Udy; Michael Bailey; Steven McGloughlin; Christopher MacIsaac; Rinaldo Bellomo; David V Pilcher
Journal: JAMA Date: 2017-01-17 Impact factor: 56.272

2. US State Government Crisis Standards of Care Guidelines: Implications for Patients With Cancer.

Authors: Andrew Hantel; Jonathan M Marron; Michael Casey; Sharyn Kurtz; Emily Magnavita; Gregory A Abel
Journal: JAMA Oncol Date: 2021-02-01 Impact factor: 31.777

3. Comparison of the SpO2/FIO2 ratio and the PaO2/FIO2 ratio in patients with acute lung injury or ARDS.

Authors: Todd W Rice; Arthur P Wheeler; Gordon R Bernard; Douglas L Hayden; David A Schoenfeld; Lorraine B Ware
Journal: Chest Date: 2007-06-15 Impact factor: 9.410

4. Development and Validation of a Clinical Risk Score to Predict the Occurrence of Critical Illness in Hospitalized Patients With COVID-19.

Authors: Wenhua Liang; Hengrui Liang; Limin Ou; Binfeng Chen; Ailan Chen; Caichen Li; Yimin Li; Weijie Guan; Ling Sang; Jiatao Lu; Yuanda Xu; Guoqiang Chen; Haiyan Guo; Jun Guo; Zisheng Chen; Yi Zhao; Shiyue Li; Nuofu Zhang; Nanshan Zhong; Jianxing He
Journal: JAMA Intern Med Date: 2020-08-01 Impact factor: 21.873

5. A consequentialist argument for considering age in triage decisions during the coronavirus pandemic.

Authors: Matthew C Altman
Journal: Bioethics Date: 2021-03-08 Impact factor: 1.898

6. Describing organ dysfunction in the intensive care unit: a cohort study of 20,000 patients.

Authors: Andrea Soo; Danny J Zuege; Gordon H Fick; Daniel J Niven; Luc R Berthiaume; Henry T Stelfox; Christopher J Doig
Journal: Crit Care Date: 2019-05-23 Impact factor: 9.097

7. Predicting COVID-19 mortality with electronic medical records.

Authors: Hossein Estiri; Zachary H Strasser; Jeffy G Klann; Pourandokht Naseri; Kavishwar B Wagholikar; Shawn N Murphy
Journal: NPJ Digit Med Date: 2021-02-04

8. Prognostic model to identify and quantify risk factors for mortality among hospitalised patients with COVID-19 in the USA.

Authors: Devin Incerti; Shemra Rizzo; Xiao Li; Lisa Lindsay; Vincent Yau; Dan Keebler; Jenny Chia; Larry Tsai
Journal: BMJ Open Date: 2021-04-07 Impact factor: 2.692

9. Association Between Caseload Surge and COVID-19 Survival in 558 U.S. Hospitals, March to August 2020.

Authors: Sameer S Kadri; Junfeng Sun; Alexander Lawandi; Jeffrey R Strich; Lindsay M Busch; Michael Keller; Ahmed Babiker; Christina Yek; Seidu Malik; Janell Krack; John P Dekker; Alicen B Spaulding; Emily Ricotta; John H Powers Iii; Chanu Rhee; Michael Klompas; Janhavi Athale; Tegan K Boehmer; Adi V Gundlapalli; William Bentley; S Deblina Datta; Robert L Danner; Cumhur Y Demirkale; Sarah Warner
Journal: Ann Intern Med Date: 2021-07-06 Impact factor: 25.391

10. Development and Reporting of Prediction Models: Guidance for Authors From Editors of Respiratory, Sleep, and Critical Care Journals.

Authors: Daniel E Leisman; Michael O Harhay; David J Lederer; Michael Abramson; Alex A Adjei; Jan Bakker; Zuhair K Ballas; Esther Barreiro; Scott C Bell; Rinaldo Bellomo; Jonathan A Bernstein; Richard D Branson; Vito Brusasco; James D Chalmers; Sudhansu Chokroverty; Giuseppe Citerio; Nancy A Collop; Colin R Cooke; James D Crapo; Gavin Donaldson; Dominic A Fitzgerald; Emma Grainger; Lauren Hale; Felix J Herth; Patrick M Kochanek; Guy Marks; J Randall Moorman; David E Ost; Michael Schatz; Aziz Sheikh; Alan R Smyth; Iain Stewart; Paul W Stewart; Erik R Swenson; Ronald Szymusiak; Jean-Louis Teboul; Jean-Louis Vincent; Jadwiga A Wedzicha; David M Maslove
Journal: Crit Care Med Date: 2020-05 Impact factor: 7.598

2 in total

1. Predictive Algorithms for a Crisis.

Authors: Claudia L Sotillo; Idalid Franco; Alexander F Arriaga
Journal: Crit Care Med Date: 2022-06-13 Impact factor: 9.296

2. Development and Internal Validation of a New Prognostic Model Powered to Predict 28-Day All-Cause Mortality in ICU COVID-19 Patients-The COVID-SOFA Score.

Authors: Emanuel Moisa; Dan Corneci; Mihai Ionut Negutu; Cristina Raluca Filimon; Andreea Serbu; Mihai Popescu; Silvius Negoita; Ioana Marina Grintescu
Journal: J Clin Med Date: 2022-07-18 Impact factor: 4.964

2 in total