Literature DB >> 33713156

Between-trial heterogeneity in ARDS research.

J Juschten^1,2,3, P R Tuinman^4,5, T Guo^4,5,6, N P Juffermans^7,8, M J Schultz^9,10,11, S A Loer¹², A R J Girbes^4,5, H J de Grooth¹².

Abstract

PURPOSE: Most randomized controlled trials (RCTs) in patients with acute respiratory distress syndrome (ARDS) revealed indeterminate or conflicting study results. We aimed to systematically evaluate between-trial heterogeneity in reporting standards and trial outcome.
METHODS: A systematic review of RCTs published between 2000 and 2019 was performed including adult ARDS patients receiving lung-protective ventilation. A random-effects meta-regression model was applied to quantify heterogeneity (non-random variability) and to evaluate trial and patient characteristics as sources of heterogeneity.
RESULTS: In total, 67 RCTs were included. The 28-day control-group mortality rate ranged from 10 to 67% with large non-random heterogeneity (I2 = 88%, p < 0.0001). Reported baseline patient characteristics explained some of the outcome heterogeneity, but only six trials (9%) reported all four independently predictive variables (mean age, mean lung injury score, mean plateau pressure and mean arterial pH). The 28-day control group mortality adjusted for patient characteristics (i.e. the residual heterogeneity) ranged from 18 to 45%. Trials with significant benefit in the primary outcome reported a higher control group mortality than trials with an indeterminate outcome or harm (mean 28-day control group mortality: 44% vs. 28%; p = 0.001).
CONCLUSION: Among ARDS RCTs in the lung-protective ventilation era, there was large variability in the description of baseline characteristics and significant unexplainable heterogeneity in 28-day control group mortality. These findings signify problems with the generalizability of ARDS research and underline the urgent need for standardized reporting of trial and baseline characteristics.

Entities: Chemical Disease Gene Species

Keywords: ARDS; Critical Care Research; Heterogeneity

Mesh：

Year: 2021 PMID： 33713156 PMCID： PMC7955690 DOI： 10.1007/s00134-021-06370-w

Source DB: PubMed Journal: Intensive Care Med ISSN： 0342-4642 Impact factor: 17.440

Introduction

The Acute Respiratory Distress Syndrome (ARDS) is a clinically and biologically heterogeneous syndrome that contributes significantly to morbidity and mortality in critically ill patients [1-3]. ARDS has long been a focal point of critical care research, but hundreds of randomized controlled trials (RCTs) have led to merely two guideline recommendations supported by high-level evidence: low tidal volume ventilation and prone positioning in patients with severe ARDS [4, 5]. The paucity of high-level evidence is due to indeterminate and conflicting trial results. Many RCTs in the ARDS population report an indeterminate outcome—detecting neither significant benefit nor harm of investigated therapeutic strategies [6]. Several other large RCTs demonstrated contradictory results, with seemingly beneficial therapies being found ineffective in subsequent trials [7, 8]. It has become clear that treatment effects of interventions in ARDS are highly dependent on the details of the intervention, as small variations of the same treatment have led to disparate results. For example, different definitions of ‘low’ and ‘high’ tidal volumes [9-13] or differences in neuromuscular blockade and sedation [7, 8, 14] have led to different trial outcomes. On the other hand, it is much less clear how differences between methodological trial characteristics and patient characteristics affect study outcomes. Between-trial heterogeneity refers to the non-random variation in treatment effect of an intervention due to methodological or clinical differences between patient populations. Unmeasured or unexplainable heterogeneity—both among patients in a single trial and between trial populations—may adversely affect the validity and generalizability of study results (see: ‘Panel: A practical example of the problem with unexplainable between-trial heterogeneity’) [15, 16]. In this study, we set out to quantify the consistency of reporting baseline characteristics and to measure between-trial heterogeneity in 28-day control group mortality among all ARDS RCTs in the lung-protective ventilation era. Our aim was to determine to which extent between-trial differences in control group mortality could be explained by differences in trial and patient characteristics. We hypothesized that between-trial heterogeneity would be large and trial populations often poorly characterized, leading to a discrepancy between inclusion criteria and patient characteristics on the one hand, and control group outcomes on the other hand.

Panel: A practical example of the problem with unexplainable between-trial heterogeneity

We note two high-profile trials published in the same journal issue [17, 18]. Both trials investigated high-frequency oscillatory ventilation in the same target population of moderate to severe ARDS patients, but they reported a different effect on mortality. Judging by the control group patient characteristics, there were clinically meaningful differences between the trial populations: there was a 32% relative difference in baseline Acute Physiology and Chronic Health Evaluation (APACHE II) scores (22 vs. 29 points) and there was a 41% relative difference in control group mortality (41% vs 29% at 30 days). But, paradoxically, the trial with the lowest baseline mean APACHE II score had the highest control group mortality. This makes the interpretation of the conflicting trial results exceedingly difficult. One trial demonstrated significant harm from the intervention while the other trial found no effect. Was the difference in treatment effect due to subtle unreported differences in the intervention, due to unreported differences in the patient populations, or due to differences in the standard of care? Were the patients more severely ill at baseline in the trial with the highest APACHE II score or in the trial with the highest control group mortality rate? It is clear that unexplainable outcome heterogeneity reduces the generalizability of intervention effects to the global ARDS population [19].

Methods

This systematic review follows the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA). The study protocol and statistical analysis plan were registered online at the International Prospective Register of Systematic Reviews (PROSPERO, registration number: CRD42020161809).

Systematic search

A comprehensive search was conducted in MEDLINE, Embase and Scopus for randomized clinical trials including adult ARDS patients published from January 1st, 2000 until January 31st, 2020. Eligible studies included a) adult ARDS patients diagnosed according to the AECC guidelines from 1994 [20] or the Berlin definition from 2012 [21], subjected to b) invasive lung-protective mechanical ventilation according to the ARDSnet protocol [11], or reporting a tidal volume of ≤ 8 ml/kg. Included studies were c) randomized clinical trials reporting on d) 28-day, hospital, intensive care unit (ICU) or 60-day mortality. There were no restrictions with regards to the intervention or phase of the study. More details about the review process are provided in the supplementary appendix, Sect. 1.1 and 1.2.

Outcome measures

For each study, we recorded trial characteristics, intervention, inclusion and exclusion criteria, mean patient baseline characteristics and mortality outcomes. Primary outcome was the between-trial heterogeneity based on the 28-day control group mortality rate (I2). The 28-day control group mortality rate reflects the baseline risk of death of a patient population of an individual trial. Secondary outcomes included associations between 28-day control group mortality and characteristics of trial design and outcome, inclusion- and exclusion criteria, as well as baseline characteristics.

Estimation of 28-day control group and intervention group mortality

All analyses investigating heterogeneity were conducted using the 28-day control group mortality rate. For trials reporting solely on the hospital, ICU or 60-day mortality, 28-day control group mortality was estimated with linear regression using data from trials reporting on both, 28-day mortality and any of the other mortality outcomes [22]. 28-day intervention group mortality was estimated in the same manner for analyses investigating differences between control and intervention group mortality. A sensitivity analysis was conducted, using only the trials reporting 28-day control group mortality.

Estimation and quantification of between-trial heterogeneity

The 28-day control group mortality rates across studies were analyzed using a random-effects meta-regression model with the log odds of mortality as the dependent variable. Each individual trial was weighted by the inverse of the sampling variance of the mortality rates. A maximum likelihood estimator was applied to estimate the mean mortality (random-effects pooled estimate), the between-study standard deviation due to heterogeneity (τ), and heterogeneity (the percentage of variation in control group mortality due to heterogeneity rather than chance, I). To make heterogeneity interpretable in a clinically meaningful manner, we calculated the 95% prediction interval. The prediction interval represents the distribution of estimated underlying mortality after correction for random chance and predictive covariates [for further details see method section reference [22]]. This model and its corresponding outcomes were used to present the distribution of 28-day control group mortality between all trials, and to investigate differences between individual trial characteristics.

Associations between patient characteristics and mortality rates

The associations between 28-day control group mortality and reported patient characteristics were estimated by adding each individual covariate separately to the random-effect model as moderators in univariate analysis. The goodness-of-fit of the log-linear, quadratic and power models were compared, and the model with the lowest Akaike information criterion (AIC) was selected [23]. For each model, the regression coefficient (b) and unadjusted R2 were reported. R2 represents the proportion of between-trial heterogeneity in 28-day control group mortality explained by the individual baseline characteristic—for the n trials reporting the covariate.

Prediction of control group mortality based on significant patient characteristics

To predict between-trial differences in mortality based on patient characteristics, a comprehensive multivariate logistic regression model was constructed. Missing observations were imputed using multiple imputation generating 20 datasets with predictive mean matching. For a detailed description of the process, we refer to the supplementary appendix, Sect. 2.5. Significant baseline characteristics reported in at least 25% of all trials with a univariate regression R2 ≥ 0.10 were eligible for the model. The threshold R2 of 0.10 was a compromise between the number of variables and the limited number of observations, as described before [22]. A stepwise backward selection procedure was applied removing regressors if p ≥ 0.05 for the final model. To facilitate comparisons between the individual covariates the standardized regression coefficient (β) and the standardized standard error (SSE) were reported in the supplementary appendix, Sect. 2.5.

Control group mortality differences between trials demonstrating benefit vs. no benefit

A trial demonstrating significant benefit was defined as a reported p value of < 0.05 for the primary endpoint (as defined by authors) in favor for the intervention group. Comparisons between trials demonstrating significant benefit and trials with an indeterminate outcome or harm were performed using the Mann–Whitney U test. Linear mixed-effects and regression models were applied to estimate the probability of a significant trial outcome based on the observed control group mortality and intervention group mortality, respectively.

Statistical analyses

A p-value of < 0.05 was considered statistically significant. Statistical analyses were performed with R Studio interface (Version 1.1.447. R core team. R: A Language and Environment for Statistical Computing. 2013. http://www.r–project.org/) using the packages ‘tidyverse, ‘dplyr’, ‘metafor’, ‘mice’, ‘Hmisc’, ‘wCorr’, ‘data.table’, ‘MASS’ and ‘ggplot2’.

Results

The literature search yielded 3479 results. A total of 67 RCTs met all inclusion and exclusion criteria and were included in the analyses (eFigure 1) [7, 8, 17, 18, 24–86]. Table 1 provides an overview of the included trial characteristics.

Table 1

Characteristics of included randomized clinical trials

	Number (%) or median (IQR)
Number of included trials	67
Control group sample size	55 (26 – 169)
Multicenter trials	41 (61)
Trial site*:
Europe	30 (45)
North America	23 (34)
Middle- and South America	2 (3)
Asia	22 (33)
Australia / New Zealand	6 (9)
Language:
English	63 (94)
Chinese	4 (6)
Intervention:
Ventilation strategy	23 (34)
Drug	21 (31)
Prone positioning	4 (9)
Neuromuscular blockage	5 (7)
Nutrition	5 (7)
ECMO	1 (1)
Others	8 (11)
Explicitly stated primary endpoint:	59 (88)
Mortality	23 (34)
Ventilator-free days at day 28	10 (15)
PaO₂/FiO₂ ratio	9 (13)
Others	18 (26)
Power analysis performed	50 (75)
Early termination of trial	19 (28)
Trials demonstrating significant benefit**	11 (16)
Trials with an indeterminate outcome***	40 (54)
Trials demonstrating significant harm****	4 (6)
Jadad score	2 (1–4)
Composites of the Jadad score:
Randomization	67 (100)
Randomization adequate	60 (93)
Blinding	28 (42)
Blinding adequate	23 (34)
Description of withdrawals and lost to follow-up	36 (54)

Data are presented as numbers and percentages (%) or median and interquartile range (IQR) according to the type of variable. Legend: *: more than one option possible. **: p < 0.05 for the primary outcome (as defined and reported by authors) for the intervention group. ***: p > 0.05 for the primary outcome (as defined and reported by authors). ****: p < 0.05 for the primary outcome (as defined and reported by authors) for the control group. ECMO extracorporeal membrane oxygenation

Characteristics of included randomized clinical trials Data are presented as numbers and percentages (%) or median and interquartile range (IQR) according to the type of variable. Legend: *: more than one option possible. **: p < 0.05 for the primary outcome (as defined and reported by authors) for the intervention group. ***: p > 0.05 for the primary outcome (as defined and reported by authors). ****: p < 0.05 for the primary outcome (as defined and reported by authors) for the control group. ECMO extracorporeal membrane oxygenation The 28-day control group mortality was reported in 45 trials. For trials reporting another mortality timeframe, 28-day mortality could be reliably estimated (adjusted R2 ≥ 0.98 for all estimation models). Linear equations and corresponding regression plots are shown in eTable 1 and eFigure 2. For the sensitivity analysis for trials reporting solely on 28-day mortality we refer to the supplementary appendix, Sect. 2.3. The 28-day control group mortality ranged from 9.7 to 66.7% with a weighted mean mortality rate of 30.9%. Between-trial heterogeneity was large with 87.5% of the differences between control group mortality rates not explained by chance (I2 = 87.5%, τ = 0.509, p < 0.0001). The 95% prediction interval (the estimated range of mortality rates, corrected for small trials and random error) was 14 to 55%. The mean mortality and the magnitude of heterogeneity were similar for all subgroups of trial characteristics (Fig. 1). The exclusion criteria were too manifold for valid analyses. However, the number of reported exclusion criteria was associated with a lower 28-day control group mortality (p = 0.04, eFigure 3).

Fig. 1

28-day control group mortality for individual trial characteristics. The diamond represents the mean mortality rate (peak) with the corresponding 95% confidence interval (length of diamond). The black line denotes the 95% prediction interval, which is the estimated between-trial variability in mortality rates after adjusting for random chance and sample size, i.e. the between-trial heterogeneity. I2 represents the proportion of between-trial variability that cannot be explained by chance. Table 2 presents the most-reported baseline characteristics and their associations with 28-day control group mortality. For the goodness-of-fit statistics of the individual associations, we refer to eTable 2 in the supplementary appendix. Figure 2 provides an overview of between-trial differences in control group mortality and patient characteristics. We observed an association between 28-day control group mortality and the following patient characteristics in univariate analyses: mean age; mean body mass index; mean APACHE II score; the proportion of patients treated with vasopressors; the proportion of patients presenting with shock at baseline; mean lung injury score; mean oxygenation index; mean plateau pressure; mean PaO2/FiO2 ratio and mean arterial pH. Individual regression plots are shown in eFigure 3.

Table 2

Univariate associations between 28-day control group mortality and commonly reported mean baseline patient characteristics

Covariate	Number of trials reporting on variable (%)	Mean (SD)	Regression coefficient	R²	p-value
Publication (year)	67 (100)	2012 (4.6)	− 0.01	0.00	0.429
Age (years)	65 (97)	54.9 (4.3)	0.00	0.16	0.011*
Male gender (%)	60 (90)	59.7 (7.5)	0.76	0.04	0.320
BMI	13 (19)	28.5 (2.3)	− 0.00	0.51	0.004**
Illness severity scores:
APACHE II	37 (55)	22.6 (3.6)	0.05		0.036*
APACHE III	10 (15)	95 (8.9)	0.01		0.398
APACHE IV	1 (2)	–	–	0.13	–
SAPS II	16 (24)	48.5 (3.3)	0.01	0.14	0.894
SAPS III	1 (2)	–	–	0.00	–
SOFA	24 (36)	9.5 (1.2)	0.07	0.08	0.351
Use of vasopressors (%)	11 (16)	51.1 (17)	0.02	0.058	0.003**
Shock at baseline (%)	4 (6)	56.0 (9.8)	0.00	0.74	0.036*
Risk factors for ARDS (%):
Pneumonia	44 (66)	51.6 (11.7)	0.01	0.00	0.990
Aspiration	34 (51)	13.1 (5.5)	-0.74	0.00	0.634
Sepsis	31 (46)	31.4 (16.7)	0.13	0.00	0.827
Trauma	24 (37)	6.9 (7.9)	-0.41	0.00	0.746
Transfusion	15 (22)	3.4 (2.9)	6.02	0.08	0.154
Pancreatitis	11 (16)	5.3 (3.2)	0.68	0.00	0.680
Pulmonary severity scores:
Lung injury score (LIS)	22 (33)	2.7 (0.3)	1.23	0.31	0.023*
Oxygenation index	14 (21)	13.8 (3.2)	0.00	0.35	0.044*
Mechanical ventilation:
Tidal volume (ml/kg)	50 (75)	7.1 (0.9)	− 0.11	0.05	0.250
Plateau pressure (cmH2O)	45 (67)	25.8 (2.4)	0.09	0.27	0.001**
Minute ventilation (L/min)	21 (31)	10.5 (1.1)	− 0.11	0.10	0.355
Driving pressure (cmH₂O)	9 (13)	13.6 (1.9)	0.08	0.16	0.240
PEEP (cm H₂O)	51 (76)	10.8 (1.8)	0.04	0.04	0.364
FiO₂ (%)	21 (31)	0.72 (0.1)	1.95	0.15	0.109
Respiratory rate (/min)	27 (40)	24.6 (2.6)	0.05	0.11	0.120
PaO₂/FiO₂ ratio	60 (90)	134.2 (29.7)	0.00	0.21	0.011*
Arterial pH	26 (39)	7.33 (0.04)	− 5.67	0.50	0.003**
PaCO₂ (mmHg)	26 (39)	44.9 (4.3)	− 0.00	0.08	0.097

Associations were estimated using a weighted random-effects model with mortality on the log-odds scale. Some baseline characteristics were reported by a minority of trials, which resulted in low power to detect significant associations. R2 can be interpreted as the proportion of heterogeneity that is explained by the population characteristic for the n trials reporting that characteristic

APACHE Acute Physiology and Chronic Health Evaluation, BMI body mass index, FiO2 fraction of inspired oxygen, PaCO2 partial pressure of carbon dioxide in arterial blood, PaO2 partial pressure of oxygen in arterial blood, SAPS Simplified Acute Physiology Score, SD standard deviation, SOFA Sequential Organ Failure Assessment Score, PEEP positive end-expiratory pressure

Fig. 2

Heatmap of control group outcomes and baseline characteristics. On the y-axis, all included trials are ordered from highest to lowest (estimated) 28-day mortality rate. The color of a tile represents whether, for a specific trial, a reported variable was lowest (blue) or highest (red) among all trials that reported the variable. A white tile represents a variable not reported by a specific trial. The X-axis depicts the most reported baseline characteristics. Some show a concordant pattern (e.g. age) with 28-day mortality while others do not (e.g. SAPS II score, SOFA score). Most importantly, the distribution of white tiles demonstrates the large variability in the reporting of baseline characteristics.

Univariate associations between 28-day control group mortality and commonly reported mean baseline patient characteristics Associations were estimated using a weighted random-effects model with mortality on the log-odds scale. Some baseline characteristics were reported by a minority of trials, which resulted in low power to detect significant associations. R2 can be interpreted as the proportion of heterogeneity that is explained by the population characteristic for the n trials reporting that characteristic APACHE Acute Physiology and Chronic Health Evaluation, BMI body mass index, FiO2 fraction of inspired oxygen, PaCO2 partial pressure of carbon dioxide in arterial blood, PaO2 partial pressure of oxygen in arterial blood, SAPS Simplified Acute Physiology Score, SD standard deviation, SOFA Sequential Organ Failure Assessment Score, PEEP positive end-expiratory pressure Heatmap of control group outcomes and baseline characteristics. On the y-axis, all included trials are ordered from highest to lowest (estimated) 28-day mortality rate. The color of a tile represents whether, for a specific trial, a reported variable was lowest (blue) or highest (red) among all trials that reported the variable. A white tile represents a variable not reported by a specific trial. The X-axis depicts the most reported baseline characteristics. Some show a concordant pattern (e.g. age) with 28-day mortality while others do not (e.g. SAPS II score, SOFA score). Most importantly, the distribution of white tiles demonstrates the large variability in the reporting of baseline characteristics. A detailed description of variable selection and construction of the multivariate logistic regression model is available in the supplementary appendix, Sect. 2.5. Significant variables for the final logistic regression model were: mean age (p < 0.0001), mean LIS (p = 0.0099), mean plateau pressure (p = 0.0078) and mean arterial pH (p = 0.0119). The residual 95% prediction interval adjusted for the significant predictors was 18 to 45%. Six trials reported all four variables in the final model. As shown in Fig. 3a, trials demonstrating significant benefit reported a higher 28-day control group mortality compared to trials with an indeterminate outcome or harm (mean 28-day control group mortality rate: 0.275 vs. 0.439; p = 0.001). Figure 3b demonstrates that trials with higher control group mortality were more likely to demonstrate significant benefit. Conversely, Fig. 3c shows that intervention group mortality did not differ between trials demonstrating significant benefit and compared to trials with an indeterminate outcome or harm (mean 28-day intervention group mortality rate: 0.271 vs. 0.301; p = 0.697). Figure 3d demonstrates that trials with higher or lower intervention group mortality rates were not more or less likely to demonstrate significant benefit.

Fig. 3

Differences in 28-day control group and intervention group mortality between significant and indeterminate trials, and the corresponding probability of a significant treatment effect. a Mean 28-day control group mortality was 43.9% in trials with a beneficial outcome versus 27.5% in trials with an indeterminate outcome or significant harm (p = 0.001). b The higher control group mortality, the higher the probability to obtain a beneficial trial outcome for the intervention group (p = 0.012). c Mean 28-day intervention group mortality does not differ between trials with significant benefit and trials with the indeterminate outcome or significant harm. (27.1% vs. 30.1%; p = 0.697). d The probability to obtain a beneficial trial outcome was not affected by intervention group mortality (p = 0.410)

Discussion

This systematic analysis of 67 ARDS RCTs in the lung-protective ventilation era revealed a statistically significant and clinically relevant amount of between-trial heterogeneity in reporting and outcome. The description of patient characteristics was variable and often incomplete. Basic ventilation characteristics such as mean respiratory rate, FiO2, pH and PaCO2 were reported in a minority of trials. The estimated range of 28-day control group mortality corrected for small trials and random error was 14 to 55%. This between-trial heterogeneity in control group outcomes could not be explained by differences in trial characteristics. Reported baseline patient characteristics explained some of the outcome heterogeneity, but the residual (unexplainable) range in control group mortality was still 18 to 45% after adjusting for the most predictive baseline characteristics. There was no secular trend in mortality outcomes over the period 2000–2019. Notably, trials with higher control group mortality were more likely to report a significant benefit, also after adjustment for baseline mean severity of illness.

Relevance for clinicians

To assess the applicability of a trial’s result to individual patients, clinicians need a clear description of the study population and concomitant treatment from trial reports. In the present study, we identified important problems in this respect. The variation in reporting of baseline variables was considerable, with ≥ 90% of all studies reporting on age and gender, but only 75% describing observed tidal volumes and PEEP, down to only a third of all trials reporting on lung injury scores or results from blood gas analyses. After adjustment for significant baseline characteristics (mean age, mean LIS, mean plateau pressure, mean arterial pH), the residual (unexplainable) range in control group mortality was 18 to 45%. In other words, among trials with comparable inclusion criteria and comparable baseline patient characteristics, there were inexplicable 2.5-fold mortality differences (45%/18%). This indicates that there are very important yet unreported differences in ARDS populations, and possibly also differences in co-interventions and standard care. This silent heterogeneity between trials makes it nearly impossible to evaluate whether RCT results are valid outside of the immediate trial context (i.e., the exact population in the actual participating centers). At its most extreme, this can be thought of as a generalizability crisis in ARDS research: we cannot know which trial results are transportable to which patients outside the trial [87, 88]. The generalizability crisis comes into clear focus when different RCTs show conflicting and statistically mutually exclusive results (benefit vs. no benefit or harm) of the same intervention. Conflicting study results are often ascribed to subtle differences in the intervention, while, in fact, it may be important yet unreported differences in the population or standard of care that are driving conflicting outcomes. The finding that the number of exclusion criteria is inversely associated with the control group mortality rate only exacerbates the generalizability problem. It means that trial populations, especially those with a large number of exclusion criteria, likely differ from the intended (broader) target population. Clinicians trying to gauge the applicability of a trial’s result should carefully review not only inclusion criteria and baseline characteristics, but also whether the control group mortality fits the apparent patient characteristics.

Relevance for ARDS researchers

What could account for the large unexplainable heterogeneity? Possible factors may be found in biology, standards of care and co-interventions, or measurement variability. Statistical cluster analyses of various biological and clinical characteristics led to the identification of distinct ARDS subphenotypes, each associated with a different mortality risk, a different biochemical inflammatory profile and importantly, differential responses to treatments such as PEEP, fluids, low-dose macrolide therapy or simvastatin [89-94]. We are only just beginning to appreciate the extraordinary biological heterogeneity of ARDS [95], which is undoubtedly one of the causes of between-trial variability in outcomes. It is currently not clear whether larger more pragmatic trials or smaller high-adherence trials in more selected populations will provide more useful clinical information in the future. Variability in standards of care and co-interventions may be another likely cause of between-trial heterogeneity. The LUNG-SAFE study revealed that many ARDS patients are undertreated or not treated according to the current best practices [1]. We cannot know the implications for interpreting RCTs, because standard care and co-interventions are almost never described in trial papers. Historically, this was due to restrictions in space and words allotted by scientific journals. However, there is an urgent need for future studies to report these details, nowadays enabled by the use of supplementary materials and public data repositories. A final likely cause of between-trial heterogeneity may be measurement variability of baseline characteristics and clinical outcomes. Severity of illness scores, for example, are notoriously dependent on small variations in measurement definitions [96, 97]. The call for standardization of baseline and outcome measures in ARDS trials is not new, but the current study underlines its urgency and importance [98, 99]. Consequently, comprehensive reporting in a standardized manner on patient characteristics, standard care and concomitant treatments may be one step towards solving the generalizability crisis in ARDS research. Clearly, heterogeneity is not the only cause of a large number of indeterminate and conflicting trial results. Statistical shortcomings, such as underpowered studies [100] or overestimations of effect size [101], equally contribute to indeterminate trial outcomes. Moreover, qualifying studies as ‘indeterminate’ is the consequence of the frequentist statistical paradigm, while Bayesian analyses offer another perspective providing often useful information about trial results [102]. Because we were limited to trial-level data in this study, we should be careful to avoid the ‘ecological fallacy’: individual-level relationships cannot be inferred from group-level data. This important limitation of the present study is also an important message: we will continue to fail to understand outcome heterogeneity between ARDS trials as long as we must rely on aggregated study-level data. Sharing (anonymized) individual patient data is likely to provide a path forward and provides an immense opportunity to stratify and subphenotype patients, to detect treatment benefit and harm for specific patient groups, and to find valuable therapeutic strategies in an inherently heterogeneous syndrome. Complete standardization of reporting is unwarranted and can even be detrimental. Reported characteristics and outcomes should be tailored to the research question at hand. But the results in this study indicate that different trials lack sufficient common ground to validly compare trial populations. Creating this common ground for between-trial comparisons requires the reporting of a ‘core baseline set’: a commonly agreed-upon minimum set of descriptors to characterize the patient population of a trial (including standards of care and co-interventions). Developing this core population characteristics set requires meta-epidemiological data (which this study provides) and clinical domain expertise from a diverse sample of ARDS researchers.

Conclusion

Randomized controlled ARDS trials in the lung-protective ventilation era present a statistically significant and clinically relevant amount of heterogeneity in reporting and mortality outcomes. Differences in baseline characteristics partly explained the variability in outcome, but large unexplainable heterogeneity remained after extensive statistical adjustments. This study underlines the urgent need for standardized and comprehensive reporting of trial and baseline characteristics to diminish between-trial heterogeneity and to support the transportability of study results across populations. Below is the link to the electronic supplementary material. Electronic supplementary material 1 (DOCX 17142 kb)

92 in total

1. The ALIEN study: incidence and outcome of acute respiratory distress syndrome in the era of lung protective ventilation.

Authors: Jesús Villar; Jesús Blanco; José Manuel Añón; Antonio Santos-Bouza; Lluís Blanch; Alfonso Ambrós; Francisco Gandía; Demetrio Carriedo; Fernando Mosteiro; Santiago Basaldúa; Rosa Lidia Fernández; Robert M Kacmarek
Journal: Intensive Care Med Date: 2011-10-14 Impact factor: 17.440

2. Epidemiology, Patterns of Care, and Mortality for Patients With Acute Respiratory Distress Syndrome in Intensive Care Units in 50 Countries.

Authors: Giacomo Bellani; John G Laffey; Tài Pham; Eddy Fan; Laurent Brochard; Andres Esteban; Luciano Gattinoni; Frank van Haren; Anders Larsson; Daniel F McAuley; Marco Ranieri; Gordon Rubenfeld; B Taylor Thompson; Hermann Wrigge; Arthur S Slutsky; Antonio Pesenti
Journal: JAMA Date: 2016-02-23 Impact factor: 56.272

3. Ventilation with lower tidal volumes as compared with traditional tidal volumes for acute lung injury and the acute respiratory distress syndrome.

Authors: Roy G Brower; Michael A Matthay; Alan Morris; David Schoenfeld; B Taylor Thompson; Arthur Wheeler
Journal: N Engl J Med Date: 2000-05-04 Impact factor: 91.245

4. Neuromuscular blockers in early acute respiratory distress syndrome.

Authors: Laurent Papazian; Jean-Marie Forel; Arnaud Gacouin; Christine Penot-Ragon; Gilles Perrin; Anderson Loundou; Samir Jaber; Jean-Michel Arnal; Didier Perez; Jean-Marie Seghboyan; Jean-Michel Constantin; Pierre Courant; Jean-Yves Lefrant; Claude Guérin; Gwenaël Prat; Sophie Morange; Antoine Roch
Journal: N Engl J Med Date: 2010-09-16 Impact factor: 91.245

5. Prospective, randomized, controlled clinical trial comparing traditional versus reduced tidal volume ventilation in acute respiratory distress syndrome patients.

Authors: R G Brower; C B Shanholtz; H E Fessler; D M Shade; P White; C M Wiener; J G Teeter; J M Dodd-o; Y Almog; S Piantadosi
Journal: Crit Care Med Date: 1999-08 Impact factor: 7.598

6. Incidence and outcomes of acute lung injury.

Authors: Gordon D Rubenfeld; Ellen Caldwell; Eve Peabody; Jim Weaver; Diane P Martin; Margaret Neff; Eric J Stern; Leonard D Hudson
Journal: N Engl J Med Date: 2005-10-20 Impact factor: 91.245

7. Tidal volume reduction for prevention of ventilator-induced lung injury in acute respiratory distress syndrome. The Multicenter Trail Group on Tidal Volume reduction in ARDS.

Authors: L Brochard; F Roudot-Thoraval; E Roupie; C Delclaux; J Chastre; E Fernandez-Mondéjar; E Clémenti; J Mancebo; P Factor; D Matamis; M Ranieri; L Blanch; G Rodi; H Mentec; D Dreyfuss; M Ferrer; C Brun-Buisson; M Tobin; F Lemaire
Journal: Am J Respir Crit Care Med Date: 1998-12 Impact factor: 21.405

Review 8. Effects of interventions on survival in acute respiratory distress syndrome: an umbrella review of 159 published randomized trials and 29 meta-analyses.

Authors: Adriano R Tonelli; Joe Zein; Jacob Adams; John P A Ioannidis
Journal: Intensive Care Med Date: 2014-03-26 Impact factor: 17.440

9. An Official American Thoracic Society/European Society of Intensive Care Medicine/Society of Critical Care Medicine Clinical Practice Guideline: Mechanical Ventilation in Adult Patients with Acute Respiratory Distress Syndrome.

Authors: Eddy Fan; Lorenzo Del Sorbo; Ewan C Goligher; Carol L Hodgson; Laveena Munshi; Allan J Walkey; Neill K J Adhikari; Marcelo B P Amato; Richard Branson; Roy G Brower; Niall D Ferguson; Ognjen Gajic; Luciano Gattinoni; Dean Hess; Jordi Mancebo; Maureen O Meade; Daniel F McAuley; Antonio Pesenti; V Marco Ranieri; Gordon D Rubenfeld; Eileen Rubin; Maureen Seckel; Arthur S Slutsky; Daniel Talmor; B Taylor Thompson; Hannah Wunsch; Elizabeth Uleryk; Jan Brozek; Laurent J Brochard
Journal: Am J Respir Crit Care Med Date: 2017-05-01 Impact factor: 21.405

Review 10. Formal guidelines: management of acute respiratory distress syndrome.

Authors: Laurent Papazian; Cécile Aubron; Laurent Brochard; Jean-Daniel Chiche; Alain Combes; Didier Dreyfuss; Jean-Marie Forel; Claude Guérin; Samir Jaber; Armand Mekontso-Dessap; Alain Mercat; Jean-Christophe Richard; Damien Roux; Antoine Vieillard-Baron; Henri Faure
Journal: Ann Intensive Care Date: 2019-06-13 Impact factor: 6.925

5 in total

1. Development and validation of a clinical risk model to predict the hospital mortality in ventilated patients with acute respiratory distress syndrome: a population-based study.

Authors: Weiyan Ye; Rujian Li; Hanwen Liang; Yongbo Huang; Yonghao Xu; Yuchong Li; Limin Ou; Pu Mao; Xiaoqing Liu; Yimin Li
Journal: BMC Pulm Med Date: 2022-07-11 Impact factor: 3.320

2. I-SPY COVID adaptive platform trial for COVID-19 acute respiratory failure: rationale, design and operations.

Authors: Daniel Clark Files; Michael A Matthay; Carolyn S Calfee; Neil R Aggarwal; Adam L Asare; Jeremy R Beitler; Paul A Berger; Ellen L Burnham; George Cimino; Melissa H Coleman; Alessio Crippa; Andrea Discacciati; Sheetal Gandotra; Kevin W Gibbs; Paul T Henderson; Caroline A G Ittner; Alejandra Jauregui; Kashif T Khan; Jonathan L Koff; Julie Lang; Mary LaRose; Joe Levitt; Ruixiao Lu; Jeffrey D McKeehan; Nuala J Meyer; Derek W Russell; Karl W Thomas; Martin Eklund; Laura J Esserman; Kathleen D Liu
Journal: BMJ Open Date: 2022-06-06 Impact factor: 3.006

3. Presence of comorbidities alters management and worsens outcome of patients with acute respiratory distress syndrome: insights from the LUNG SAFE study.

Authors: Emanuele Rezoagli; Bairbre A McNicholas; Fabiana Madotto; Tài Pham; Giacomo Bellani; John G Laffey
Journal: Ann Intensive Care Date: 2022-05-21 Impact factor: 10.318

Review 4. Unsuccessful and Successful Clinical Trials in Acute Respiratory Distress Syndrome: Addressing Physiology-Based Gaps.

Authors: Jesús Villar; Carlos Ferrando; Gerardo Tusman; Lorenzo Berra; Pedro Rodríguez-Suárez; Fernando Suárez-Sipmann
Journal: Front Physiol Date: 2021-11-30 Impact factor: 4.566

Review 5. Respiratory Subsets in Patients with Moderate to Severe Acute Respiratory Distress Syndrome for Early Prediction of Death.

Authors: Jesús Villar; Cristina Fernández; Jesús M González-Martín; Carlos Ferrando; José M Añón; Ana M Del Saz-Ortíz; Ana Díaz-Lamas; Ana Bueno-González; Lorena Fernández; Ana M Domínguez-Berrot; Eduardo Peinado; David Andaluz-Ojeda; Elena González-Higueras; Anxela Vidal; M Mar Fernández; Juan M Mora-Ordoñez; Isabel Murcia; Concepción Tarancón; Eleuterio Merayo; Alba Pérez; Miguel A Romera; Francisco Alba; David Pestaña; Pedro Rodríguez-Suárez; Rosa L Fernández; Ewout W Steyerberg; Lorenzo Berra; Arthur S Slutsky
Journal: J Clin Med Date: 2022-09-27 Impact factor: 4.964

5 in total