Literature DB >> 29546535

Unexplained mortality differences between septic shock trials: a systematic analysis of population characteristics and control-group mortality rates.

Harm-Jan de Grooth^1,2, Jonne Postema³, Stephan A Loer³, Jean-Jacques Parienti^4,5, Heleen M Oudemans-van Straaten⁶, Armand R Girbes⁶.

Abstract

PURPOSE: Although the definition of septic shock has been standardized, some variation in mortality rates among clinical trials is expected. Insights into the sources of heterogeneity may influence the design and interpretation of septic shock studies. We set out to identify inclusion criteria and baseline characteristics associated with between-trial differences in control group mortality rates.
METHODS: We conducted a systematic review of RCTs published between 2006 and 2018 that included patients with septic shock. The percentage of variance in control-group mortality attributable to study heterogeneity rather than chance was measured by I2. The association between control-group mortality and population characteristics was estimated using linear mixed models and a recursive partitioning algorithm.
RESULTS: Sixty-five septic shock RCTs were included. Overall control-group mortality was 38.6%, with significant heterogeneity (I2 = 93%, P < 0.0001) and a 95% prediction interval of 13.5-71.7%. The mean mortality rate did not differ between trials with different definitions of hypotension, infection or vasopressor or mechanical ventilation inclusion criteria. Population characteristics univariately associated with mortality rates were mean Sequential Organ Failure Assessment score (standardized regression coefficient (β) = 0.57, P = 0.007), mean serum creatinine (β = 0.48, P = 0.007), the proportion of patients on mechanical ventilation (β = 0.61, P < 0.001), and the proportion with vasopressors (β = 0.57, P = 0.002). Combinations of population characteristics selected with a linear model and recursive partitioning explained 41 and 42%, respectively, of the heterogeneity in mortality rates.
CONCLUSIONS: Among 65 septic shock trials, there was a clinically relevant amount of heterogeneity in control group mortality rates which was explained only partly by differences in inclusion criteria and reported baseline characteristics.

Entities: CellLine Chemical Disease Gene Species

Keywords: Clinical trials; Heterogeneity; Machine learning; Meta-research; Methodology; Septic shock

Mesh：

Substances：
Vasoconstrictor Agents

Year: 2018 PMID： 29546535 PMCID： PMC5861172 DOI： 10.1007/s00134-018-5134-8

Source DB: PubMed Journal: Intensive Care Med ISSN： 0342-4642 Impact factor: 17.440

Introduction

The fundamental criteria from the consensus definitions of septic shock are used to select patients for inclusion in clinical studies [1-4]. While the mortality rate of septic shock was found to be 46% (95% confidence interval (CI) 43–50%) in a meta-analysis of observational cohorts [5], randomized controlled trials report more diverse numbers. For example, two high-profile septic shock trials published a year apart reported control group mortality rates as disparate as 16% [6] and 80% [7]. Despite the seemingly wide range of mortality rates there has not yet been a systematic inquiry into its patterns and possible causes. Identifying the correct patient population to benefit from a specific therapy has been recognized as an essential condition for improving critical care research [8-10]. Yet large unexplained mortality differences among trials that all aim to include septic shock patients may hamper reproducibility and generalizability. Insights into the magnitude and sources of between-trial heterogeneity are therefore valuable in the design, reporting, and interpretation of septic shock trials. For example, incorrect prediction of baseline mortality rates has been identified as a major reason for negative critical care trials, as a discrepancy between expected and observed event rates often leads to underpowered studies [11]. We sought to quantify between-trial heterogeneity and identify inclusion criteria and population characteristics associated with differences in control group mortality rates.

Methods

After a systematic search to identify all trials published in the past decade that aimed to include patients with septic shock, we used linear mixed models to estimate the total heterogeneity in control group mortality rates and its association with reported baseline characteristics. Using both a multivariate linear model and a machine learning algorithm, we estimated the proportion of heterogeneity that can be explained by population characteristics. The review protocol was prospectively registered [12] and adheres to the PRISMA checklist [13], which is included in the electronic supplementary material (ESM). Study screening, application of the inclusion- and exclusion criteria and data-extraction were performed independently by two reviewers (HJdG and JP). Conflicting entries were resolved by consensus.

Inclusion criteria and search strategy

PubMed, Embase, and the Cochrane Central Register of Controlled Trials were queried using the search term [“septic shock” AND (random* or rct)]. Embase was additionally queried using the search term “septic shock” with the randomized controlled trial filter activated. The queries were limited to publications from 1 January 2006 and the queries were last performed on 20 January 2018. We limited the search to trials published between 2006 and 2018 as a compromise between the number of eligible studies and secular trends in clinical practice, research practice, and reporting standards. Publications from 2006 and later had sufficient lead time to incorporate the 2004 update of the Surviving Sepsis Campaign guidelines [4]. Eligible for inclusion were parallel-group randomized controlled trials with adult patients in septic shock according to the published consensus definitions or Surviving Sepsis Campaign guidelines [1, 2, 4]. Trials were excluded if the report was not written in English, if it was only available in abstract, if no baseline characteristics were reported, or if no mortality outcome was reported. Trials that aimed to include a specific subcategory of septic shock patients (e.g. “septic shock patients requiring renal replacement therapy”) were also excluded, as these would be a major source of between-trial heterogeneity.

Identification of the control group and variables of interest

Because the nature of the randomized intervention could contribute to heterogeneity, we focused on the control groups. For each trial, we identified the control group as defined by the authors as ‘control group’, ‘usual care group’, or a variation thereof. When no control group could be identified (in a comparison of two usual care therapies) we defined the control group as the means of the two groups in terms of sample size, mortality, and baseline characteristics. A sensitivity analysis was performed towards this construct by analyzing whether trials with and without specifically defined control groups differed in terms of mean mortality or the amount of between-trial heterogeneity. For each trial, we recorded the type of intervention, single- or multicenter design, and the primary endpoint. Trials were graded according to the Jadad scale [14]. For the control group in each trial, we recorded the sample size, the reported baseline characteristics, and the mortality rates.

Estimation of heterogeneity in mortality rates and associations with population characteristics

We used 28-day mortality throughout all analyses. For trials that did not report this outcome, we estimated 28-day mortality based on reported hospital, ICU, or 90-day mortality using linear regression with data from trials that reported both 28-day and another mortality measure. To analyze mortality rates across trials we used a random-effects meta-regression model with the log odds of mortality as dependent variable and a random intercept for each study. Each trial was weighted by the inverse of the sampling variance of the mortality rates. A maximum likelihood estimator was used to estimate the mean mortality (random effects pooled estimate), the between-study standard deviation due to heterogeneity (τ), and the percentage of variation due to heterogeneity rather than change (I2). To quantify between-trial heterogeneity, we report the 95% prediction interval (mean mortality ± 1.96 τ), which represents the distribution of estimated future mortality rates based on observed mortalities weighted by sampling variance (trial size) and corrected for random chance [15]. In the absence of between-study heterogeneity, the 95% prediction interval is equal to the 95% confidence interval, but when significant heterogeneity is present the prediction interval estimates the bandwidth of expected mortality rates from similar studies [15, 16]. In other words, the 95% prediction interval can be thought of as the estimate of true between-study distribution of mortality rates. The prediction interval can therefore be used to guide power calculations for future studies [16]. The between-trial heterogeneity in mortality rates was calculated for subcategories of trials employing different inclusion criteria: confirmed or suspected infection; confirmed infection only; different definitions of hypotension; mandatory hyperlactatemia; mandatory vasopressor therapy; and mandatory mechanical ventilation. Differences in mortality rates between subcategories were calculated by addition of dummy variables to the mixed-effects model. To estimate the association between study and population characteristics and mortality, these variables were added to the model as covariates. Residuals were checked for normality with Q–Q plots, and the goodness of fit of the log‐linear model was compared with quadratic and power models by selecting the model with the lowest Akaike information criterion (AIC). To facilitate comparisons between variables, we report standardized regression coefficients (β) and the proportion of between-trial variability in mortality explained by the population variable (unadjusted R2) for all univariate analyses.

Predicting mortality rates using a linear model and recursive partitioning

We then constructed a comprehensive model to predict between-study differences in mortality. Population variables that were reported by at least 25% of the included trials with a univariate regression R2 ≥ 0.10 were included as regressors in a multivariate model and removed in a stepwise manner for P values ≥ 0.05. The threshold R2 of 0.10 was a compromise between the number of variables and the limited number of observations. This model selection process was not prospectively protocolized as the number of eligible variables could not be estimated a priori. Multiple imputation (generating 20 datasets) with predictive mean matching was used for missing observations (i.e., missing population characteristics). The imputation methods are further described in section 7 of the ESM. As a complementary approach to predict 28-day mortality rates from population characteristics, we constructed a regression tree model based on recursive partitioning (a machine learning algorithm) [17, 18] for its ability to handle partially missing observations (obviating the need for imputation) and its robustness to nonlinear relations. We set up the model to predict 28-day mortality based on all inclusion criteria and population characteristics. In short, the recursive partitioning algorithm selected the most informative variable, which was then ‘split’ at the value that best differentiates low from high mortality. The algorithm then selected the most informative variable for each of the two resulting subgroups, and split it again. When a splitting variable was missing for a specific trial, a surrogate variable (the variable most closely correlated to the splitting variable) was used. After multiple splits, this recursive partitioning resulted in a regression tree (similar to a decision tree) with subgroups of trials ranked from low to high expected mortality. R2 represents the variance in mortality explained by the decision tree. Overfitting was examined using the cross-validated error. For all analyses, P < 0.05 was considered significant. The analyses were performed in R version 3.4.2 using the metafor, mice and rpart packages [19-21].

Results

Characteristics of the included trials

The search resulted in 65 trials that met all inclusion and exclusion criteria (eFigure 1 in the ESM), representing a total of 8634 control group patients [6, 7, 22–84]. A list of excluded trials is available in the ESM. The trial characteristics are presented in Table 1.

Table 1

Characteristics of included trials

	No. (%) or median (IQR)
Number of included trials	65
Control group sample size: median (IQR)	34 (20–100)
Multicenter trials: n (%)	28 (43)
Trial country: n (%)
France	12 (18)
China	9 (14)
Italy	8 (12)
USA	6 (9)
India	3 (5)
The Netherlands	3 (5)
UK	3 (5)
Other countries (1 each)	13 (20)
Multinational trials	9 (14)
Trial intervention: n (%)
Drug	44 (68)
Treatment bundle	14 (21)
Device	7 (11)
Primary endpoint: n (%)
Mortality	21 (32)
Other	32 (49)
Not specified	12 (18)
Jadad scale: median (IQR)	3 (2–4)
Jadad scale components: n (%)
Randomization	65 (100)
Randomization appropriate	45 (69)
Blinding	23 (35)
Blinding appropriate	19 (29)
Description of withdrawals and dropouts	42 (65)

IQR Interquartile range

Characteristics of included trials IQR Interquartile range Twenty trials (31%) did not report 28-day mortality but only hospital mortality, ICU mortality, or 90-day mortality. Using trials that reported multiple mortality measures, 28-day mortality was estimated as a linear function of hospital mortality, ICU mortality, or 90-day mortality (R2 values 0.99, 0.98, and 0.98, respectively). The estimates and validation plots are presented in eTable 1 and eFigure 2 of the ESM. In 14 trials (21%) the control group could not be identified because two usual care therapies were compared. For these trials, the control group characteristics and mortality rates were defined as the means of the two treatment groups. None of these 14 trials reported significant mortality differences between the treatment groups.

The distribution of mortality rates

The control group mortality rates ranged between 13.8 and 84.6%, with a random-effects estimated mean mortality rate of 38.6%. There was significant heterogeneity among trials (I2 = 93%, τ = 0.710, p < 0.0001), and the 95% prediction interval was 13.5–71.7%. Figure 1 shows the mortality rates of trials categorized by inclusion criteria. The mean mortality rate did not differ between trials with different definitions of hypotension, infection (confirmed vs. suspected), or vasopressor or mechanical ventilation inclusion criteria. There were no significant differences in mean mortality rate or in heterogeneity between large vs. small trials, monocenter vs. multicenter trials, unblinded vs blinded trials, high-quality trials vs. low-quality trials, or trials with vs. without a specifically defined control group (eTable 2 in the ESM).

Fig. 1

Control-group mortality rates categorized by trial inclusion criteria. The diamonds represent the mean mortality rates and 95% confidence intervals. The 95% prediction intervals (dashed lines) represents the estimated between-trial variability in mortality rates after adjusting for random chance and sample size. I2 represents the proportion of between-trial variability that cannot be explained by chance. There were no significant differences in mean mortality rates between inclusion criteria. MAP mean arterial pressure, SBP systolic blood pressure The exclusion criteria employed in the trials were too diverse for statistical analysis, but the total number of exclusion criteria (ranging from 0 to 30) was inversely associated with the mortality rate (β = − 0.375, R2 = 0.14, P = 0.007). The heatmap in Fig. 2 provides an overview of the between-trial differences in mortality rates and population characteristics. The log-linear associations between the mortality rate and reported control group baseline characteristics are presented in Table 2 (goodness-of-fit statistics are reported in eTable 3 in the ESM). There was no significant decrease in mortality over the period 2006–2018, with only (R2) 4% of heterogeneity explained by the year of publication (Table 2, eFigure 3). Baseline variables that were univariately associated with mortality were: mean Sequential Organ Failure Assessment (SOFA) score, the proportion of patients on mechanical ventilation, the proportion of patients on vasopressors, and mean serum creatinine. Regression plots of selected associations are shown in eFigure 3 of the ESM.

Fig. 2

Table 2

Univariate associations between mortality rates and reported mean or median population characteristics

	Trials reporting variable (% of n = 56)	Mean (SD)	Standardized regression coefficient β (R²)	P value
Publication year	65 (100)	2013.3 (3.58)	− 0.19 (0.04)	0.197
Age, years	64 (98)	62.9 (3.80)	0.18 (0.03)	0.160
Male patients %	63 (97)	60.5 (5.80)	0.02 (0.00)	0.927
Comorbidity characteristics
Charlson Comorbidity Index	5 (8)	1.90 (1.11)	0.52 (0.27)	0.183
From long-term care facility %	6 (9)	5.8 (5.6)	0.44 (0.20)	0.312
McCabe class I %	6 (9)	34.1 (15.2)	− 0.40 (0.16)	0.374
McCabe class II %	6 (9)	14.7 (12.9)	0.02 (0.00)	0.948
McCabe class III %	4 (6)	16.2 (15.0)	0.71 (0.50)	0.120
Diabetes mellitus %	23 (36)	24.4 (6.88)	0.01 (0.00)	0.856
Heart failure or coronary disease %	26 (40)	20.7 (8.7)	0.33 (0.11)	0.133
Chronic obstructive pulmonary disease %	25 (39)	15.1 (6.3)	0.04 (0.00)	0.911
Chronic renal disease %	21 (33)	7.6 (5.0)	0.06 (0.00)	0.773
Chronic liver disease %	17 (26)	5.5 (2.8)	0.25 (0.06)	0.320
Cancer %	20 (31)	21.2 (8.1)	0.19 (0.03)	0.426
Severity of illness scores
APACHE II score	33 (51)	22.5 (3.65)	0.21 (0.05)	0.376
APACHE III score	1 (2)	–	–	–
APACHE IV score	1 (2)	–	–	–
SAPS II score	24 (37)	55.7 (4.42)	0.36 (0.13)	0.079
SAPS III score	3 (4)	77.6 (1.91)	0.01 (0.00)	0.644
SOFA score	37 (58)	9.59 (2.47)	0.57 (0.33)	0.007**
Characteristics of acute illness
Medical (non-surgical) %	22 (34)	69.7 (13.1)	0.26 (0.07)	0.314
Time from diagnosis to randomization, hours	13 (20)	13.77 (8.84)	0.47 (0.22)	0.069
Mechanical ventilation %	33 (51)	78.1 (28.3)	0.61 (0.38)	0.0005***
Heart rate, 1/min	39 (60)	104 (8.8)	0.13 (0.02)	0.435
Mean arterial pressure, mmHg	43 (66)	70.7 (6.65)	0.06 (0.00)	0.561
Central venous pressure, mmHg	22 (34)	11.2 (2.21)	0.17 (0.03)	0.425
Vasopressor support %	38 (58)	84.6 (30.0)	0.57 (0.32)	0.0019**
Serum lactate, mmol/l	52 (80)	4.00 (1.28)	− 0.13 (0.02)	0.389
Serum creatinine, µmol/l	26 (40)	168 (31.1)	0.48 (0.23)	0.007**
Fluids before randomization, ml	19 (30)	3209 (1637)	0.31 (0.10)	0.194
Infection site characteristics
Respiratory %	53 (82)	42.6 (13.7)	0.27 (0.08)	0.087
Abdominal %	51 (78)	24.0 (15.0)	0.06 (0.00)	0.686
Urogenital %	41 (63)	11.3 (5.7)	− 0.27 (0.07)	0.094
Central nervous system %	19 (30)	1.2 (1.6)	0.03 (0.00)	0.885
Skin and soft tissue %	28 (43)	6.8 (3.6)	− 0.09 (0.01)	0.803
Bloodstream %	32 (49)	12.9 (8.2)	− 0.11 (0.01)	0.487
Pathogen characteristics
Gram-negative %	25 (39)	32.0 (16.1)	0.41 (0.17)	0.0573
Gram-positive %	22 (34)	24.6 (7.12)	− 0.41 (0.17)	0.083
Other pathogen %	22 (34)	44.0 (23.3)	− 0.13 (0.02)	0.473
Culture negative %	18 (28)	29.4 (8.3)	− 0.38 (0.14)	0.085

Univariate associations between control group mortality rate and commonly reported mean baseline characteristics. Associations were estimated using a weighted random-effects model with mortality on the log-odds scale. Some baseline characteristics were reported by a minority of trials, which resulted in low power to detect a significant association. R2 can be interpreted as the proportion of heterogeneity that is explained by the population characteristic for the n trials that report that characteristic

APACHE Acute Physiology and Chronic Health Evaluation score, SAPS Simplified Acute Physiology score, SOFA Sequential Organ Failure Assessment score

Heatmap of included trials (n = 65) and associated baseline characteristics, ranked by decreasing mortality rates. White tiles represent the mean value across trials, while red and blue tiles are indicative of higher and lower than average values, respectively. Gray tiles (N/A) are variables that were not reported. The 28-day mortality rate ranged between 13.8 and 84.6%, with a mean of 38.6%. APACHE Acute Physiology and Chronic Health Evaluation, SAPS Simplified Acute Physiology Score, SOFA Sequential Organ Failure Assessment score, MAP mean arterial pressure, CVP central venous pressure, CNS central nervous system. (Asterisk) Variables with a significant univariate association with 28-day mortality Univariate associations between mortality rates and reported mean or median population characteristics Univariate associations between control group mortality rate and commonly reported mean baseline characteristics. Associations were estimated using a weighted random-effects model with mortality on the log-odds scale. Some baseline characteristics were reported by a minority of trials, which resulted in low power to detect a significant association. R2 can be interpreted as the proportion of heterogeneity that is explained by the population characteristic for the n trials that report that characteristic APACHE Acute Physiology and Chronic Health Evaluation score, SAPS Simplified Acute Physiology score, SOFA Sequential Organ Failure Assessment score

Predicting mortality rates from population characteristics

Details of the variable selection process for the multivariate model are available in section 7 of the ESM. Significant independent variables in the final multivariate model were: baseline mean SOFA score (β = 0.39, standardized standard error (SSE) = 0.17, P = 0.019), the proportion of patients on mechanical ventilation (β = 0.42, SSE = 0.18, P = 0.019), and mean serum creatinine (β = 0.31, SSE = 0.10, P = 0.0015). The multivariate model R2 was 0.41 with significant residual heterogeneity (I2 = 82%, τ = 0.544, P < 0.0001). Figure 3 shows the predicted and actual mortality rates of the included trials.

Fig. 3

Included trials ordered by predicted control group mortality rate (diamonds). The predicted mortality rates were based on a multivariate weighted random-effects regression model with baseline mean Sequential Organ Failure Assessment (SOFA) score, the proportion of patients on mechanical ventilation, and mean serum creatinine as significant independent variables. The squares and brackets are the observed control-group mortality rates with 95% confidence interval. The figure illustrates that the model explained (R2) 41% of the variability in mortality rates, with significant residual heterogeneity (P < 0.0001). The red dots are the reported a-priori expected mortality rates used for sample size calculations The recursive partitioning algorithm resulted in a regression tree with the following variables as informative determinants of the mortality rate: mean age (split at 64.8 years); the proportion of patients with a respiratory infection (split at 54.5%); the proportion of patients on mechanical ventilation (split at 74.3%); and the proportion of male patients (splits at 63.8 and 53.8%). The R2 value of the regression tree was 0.42. The cross-validated relative error decreases to below the root (split 0) value, which indicates that the tree was not overfitted. The results from the regression tree analysis are further described in eFigures 4 and 5 of the ESM (section 7).

Discussion

In this analysis of 65 septic shock trials published in the past decade, we found a statistically significant and clinically relevant amount of heterogeneity in control group mortality rates. The mean mortality rate was 38.6% with estimated 95% prediction limits of 13.5–71.7%, revealing a wide range in underlying mortality rates after discounting the effects of random change and small trials. In contrast to findings from large observational studies that the mortality of sepsis has decreased in the past decade, we found only a small nonsignificant decline in the period 2006–2018 [85, 86]. Different inclusion definitions of septic shock did not affect mean mortality rates, but a higher total number of exclusion criteria was associated with lower mortality. We used three statistical methods to analyze the association between population characteristics and mortality. The univariate associations reflect how the reader of a trial report could interpret the population characteristics in relation to the mortality rate, and shows that the proportion of ventilated patients, mean SOFA score, and the proportion of patients on vasopressor support were most informative (i.e. have highest standardized regression coefficients). The multivariate linear model (with missing observations imputed) shows which combinations of characteristics were predictive of mortality if all trials hypothetically reported the same variables. A combination of three independently significant characteristics (mean SOFA score, proportion of ventilated patients, and mean creatinine) explained only 41% of the heterogeneity in mortality rates across trials. The recursive partitioning algorithm, which is not limited by dependence on multiple imputation and the assumption of linearity, shows which characteristics were most informative, given that different trials report different characteristics. The resulting regression tree explained only 42% of the heterogeneity in mortality. The linear model and the regression tree arrived at different predictor variables because the linear model is biased towards more informative linear associations, while the regression tree allows for nonlinear relations and is biased towards variables with less missing data. In all, these results indicate that there are clinically significant between-trial differences in control group mortality rates, and that these differences are not associated with differences in inclusion criteria and only weakly associated with reported baseline characteristics. Visual inspection of the heatmap (Fig. 2) shows that there are no unambiguous patterns in the relation between population characteristics and mortality rates. This heterogeneity is reflected in our finding that different statistical methods result in different predictive variables.

Possible sources of residual heterogeneity

Residual heterogeneity among trials may be caused by population differences in nutrition and socio-economic status, heterogenous exclusion criteria, incomplete reporting, between-trial differences in variable definitions, the timing of randomization, and differences in post-randomization co-interventions and standards of care. We found that no single measure of chronic comorbidity was reported in more than 40% of the included trials and that characteristics of causative pathogens were reported in only 28–39% of trials. This compromised the power of our analysis to detect associations across all trials, but, more importantly, it also prevents readers of trial reports from evaluating and comparing populations among trials and from judging to what extent a trial population corresponds to the population under their care. Another source of heterogeneity is the imprecise definition of many variables. It is unclear whether a variable like ‘pre-existing kidney disease’ in one trial has the same meaning as ‘chronic renal insufficiency’ in another trial. Minor variations in variable definitions and data capture methods have been shown to lead to significantly different septic shock populations and to inter-observer variability in severity-of-illness scoring systems [5, 87, 88]. The importance of this ‘fine print’ in defining a population does not receive due attention in the methods section of most trials. The time of inclusion may be an additional source of heterogeneity. Patients recruited later after the diagnosis of septic shock have not responded to treatment in an earlier phase and are therefore likely to have a worse prognosis. Only 13 trials reported the time from diagnosis to randomization, and for those trials it explained 22% of the heterogeneity. While we have focused on inclusion criteria and baseline characteristics, the prognosis of septic shock may be largely influenced by post-randomization standards of care and co-interventions. Unfortunately, co-interventions and (control group) treatment standards are often described as ‘according to the Surviving Sepsis Campaign guidelines’ or not discussed at all in trial reports. Variables describing important post-randomization interventions, such as red blood cell transfusions, vasopressor dose, or fluid balance were recently found to be reported in only 33, 17, and 13% of large septic shock trials, respectively [89]. We did not analyze the association between trial countries and the mortality rate because many countries are represented by a single trial in the present sample. Nevertheless, between-country differences in standards of care or access to early healthcare may account for part of the residual heterogeneity. Large international observational studies are a more appropriate instrument for the investigation of differences in mortality rates among countries.

Implications for investigators and clinicians

Clinicians demand of clinical trials that they are relevant, reproducible, and generalizable to a clearly defined patient population. The results of this study indicate that many of the baseline characteristics upon which clinicians rely to gauge the applicability of trial results to their practice are in fact only weakly or not at all associated with mortality outcomes across trials. The association between the number of exclusion criteria and mortality suggests that many seemingly inconsequential criteria together may have a significant effect on the composition of a trial population. Investigators should therefore be aware of this phenomenon in the design phase of a trial, as it affects the generalizability and external validity of trial results. The wide prediction limits of control-group mortality have consequences for sample size calculations. Detecting a relative risk reduction of 25% with 80% power requires 245 patients if mortality is estimated to be 71.7%, while it requires 795 patients if control group mortality is 38.6% or 2980 patients if mortality is 13.5%. In practice, misestimation of the mortality rate by more than 7.5% occurred in 65% of critical care trials [11]. We therefore suggest that sample size calculations should not be based on the mean of reported control-group mortality rates in the literature but should be robust towards a wider range of expected event rates. Reproducibility and generalizability also require a common phenomenological structure with respect to diagnostic definitions, inclusion criteria, patient characteristics, concomitant treatment, and outcomes. A recent review of large septic shock trials found that only half of the information deemed necessary for evaluation of the control group was reported in the investigated trials [89]. In the present study, we now find that many of the reported characteristics are not associated with control-group mortality rates, possibly due to variations in variable definitions. The third consensus definitions for sepsis and septic shock were partly developed to harmonize the inclusion criteria for clinical studies [3]. We were unable to analyze a subset of trials with populations that might fit the Sepsis-3 septic shock definition, as none of the included trials employed both delta SOFA score and vasopressor inclusion criteria. We do note that SOFA score is independently associated with mortality rates, although baseline SOFA explains only 33% (R2) of the variation in mortality rates in the 37 trials that report it. Furthermore, we found significant heterogeneity within subsets of trials employing similar inclusion criteria (Fig. 2). We suggest that an international consensus is necessary to standardize variable definitions, data collection, and reporting of patient characteristics and outcomes for sepsis trials, as has been proposed before [89-92]. The feasibility of harmonizing study protocols has been demonstrated in three large trials investigating early goal-directed therapy [93]. The present results indicate that SOFA score, the proportion of ventilated patients, and creatinine independently reflect baseline risk across trials and should therefore be reported for each trial. The results from this study also support the practice of data sharing, as we have shown that aggregated population characteristics are less informative than expected. Sharing individual patient data will not only increase the power to detect treatment effects across multiple studies but can also be used to test the generalizability of trial results vis-à-vis large cohorts with septic shock.

Strengths and limitations

This study was performed with a prospectively registered protocol and analysis plan. We chose to include only trials published between 2006 and 2018 to minimize the influence of long-term secular trends in septic shock diagnosis, treatment, and mortality [94, 95]. The search strategy was broad and comprehensive, but we excluded 40 trial reports not written in English, which compromised power and generalizability. We excluded trials that recruited only septic shock patients with specific organ dysfunction (such as kidney or liver failure) to rule out this source of between-trial heterogeneity. For 20 trials, 28-day mortality was estimated using another reported mortality rate. Although the prediction equations were very precise (R2 values ≥ 0.98), we cannot rule out the possibility that this influenced the results. Excluding these 20 trials would have eroded the power of the study. Importantly, using study-level data means that, to avoid the ecological fallacy, we cannot make inferences about predictive characteristics at the individual patient level, although several predictor variables are known to be individually associated with mortality (e.g. high SOFA score as a risk factor [96, 97]).The fact that there was substantial variation in the reporting of baseline variables was an important finding in itself, but also limited our power to detect associations across trials. A more in-depth investigation into the heterogeneity among trial populations would require individual patient data, but we think that obtaining such data would lead to significant selection bias.

Conclusion

Septic shock is a syndrome with various etiologies, biochemical characteristics, and phenotypes [9, 98]. Onto this inherently heterogeneous syndrome, a layer of investigator-induced heterogeneity is added when trials employ different inclusion criteria, report different variables, and use different variable definitions. This compounded complexity causes heterogeneity among trial populations that may go unnoticed. We have shown that control-group mortality rates are very dissimilar across trials, and that the majority of this heterogeneity remains unexplained after accounting for reported population characteristics. The lack of standardized reporting limits the usefulness of the variables explaining the mortality differences found in this study. In all, the substantial between-trial heterogeneity limits the reproducibility and generalizability of septic shock research and may inhibit the discovery of beneficial therapies for specific (sub)populations. The findings of this study therefore strongly support the argument for profound standardization and harmonization of septic shock trial reporting as well as data-sharing policies to test the external validity of trial populations. Below is the link to the electronic supplementary material. Supplementary material 1 (PDF 1165 kb)

90 in total

1. Targeted Fluid Minimization Following Initial Resuscitation in Septic Shock: A Pilot Study.

Authors: Catherine Chen; Marin H Kollef
Journal: Chest Date: 2015-12 Impact factor: 9.410

2. Improved sepsis bundles in the treatment of septic shock: a prospective clinical study.

Authors: Nian-Fang Lu; Rui-Qiang Zheng; Hua Lin; Jun Shao; Jiang-Quan Yu; De-Gang Yang
Journal: Am J Emerg Med Date: 2015-04-25 Impact factor: 2.469

3. Randomized, double-blind, placebo-controlled trial of granulocyte colony-stimulating factor in patients with septic shock.

Authors: Dianne P Stephens; Jane H Thomas; Alisa Higgins; Michael Bailey; Nicholas M Anstey; Bart J Currie; Allen C Cheng
Journal: Crit Care Med Date: 2008-02 Impact factor: 7.598

4. Norepinephrine plus dobutamine versus epinephrine alone for management of septic shock: a randomised trial.

Authors: Djillali Annane; Philippe Vignon; Alain Renault; Pierre-Edouard Bollaert; Claire Charpentier; Claude Martin; Gilles Troché; Jean-Damien Ricard; Gérard Nitenberg; Laurent Papazian; Elie Azoulay; Eric Bellissant
Journal: Lancet Date: 2007-08-25 Impact factor: 79.321

5. Surviving Sepsis Campaign guidelines for management of severe sepsis and septic shock.

Authors: R Phillip Dellinger; Jean M Carlet; Henry Masur; Herwig Gerlach; Thierry Calandra; Jonathan Cohen; Juan Gea-Banacloche; Didier Keh; John C Marshall; Margaret M Parker; Graham Ramsay; Janice L Zimmerman; Jean-Louis Vincent; M M Levy
Journal: Intensive Care Med Date: 2004-03-03 Impact factor: 17.440

6. Glibenclamide dose response in patients with septic shock: effects on norepinephrine requirements, cardiopulmonary performance, and global oxygen transport.

Authors: Andrea Morelli; Matthias Lange; Christian Ertmer; Katrin Broeking; Hugo Van Aken; Alessandra Orecchioni; Monica Rocco; Alessandra Bachetoni; Daniel L Traber; Giovanni Landoni; Paolo Pietropaoli; Martin Westphal
Journal: Shock Date: 2007-11 Impact factor: 3.454

7. Vasopressin versus norepinephrine infusion in patients with septic shock.

Authors: James A Russell; Keith R Walley; Joel Singer; Anthony C Gordon; Paul C Hébert; D James Cooper; Cheryl L Holmes; Sangeeta Mehta; John T Granton; Michelle M Storms; Deborah J Cook; Jeffrey J Presneill; Dieter Ayers
Journal: N Engl J Med Date: 2008-02-28 Impact factor: 91.245

8. Adjunctive Glucocorticoid Therapy in Patients with Septic Shock.

Authors: Balasubramanian Venkatesh; Simon Finfer; Jeremy Cohen; Dorrilyn Rajbhandari; Yaseen Arabi; Rinaldo Bellomo; Laurent Billot; Maryam Correa; Parisa Glass; Meg Harward; Christopher Joyce; Qiang Li; Colin McArthur; Anders Perner; Andrew Rhodes; Kelly Thompson; Steve Webb; John Myburgh
Journal: N Engl J Med Date: 2018-01-19 Impact factor: 91.245

9. Corticosteroid treatment and intensive insulin therapy for septic shock in adults: a randomized controlled trial.

Authors: Djillali Annane; Alain Cariou; Virginie Maxime; Elie Azoulay; Gilles D'honneur; Jean François Timsit; Yves Cohen; Michel Wolf; Muriel Fartoukh; Christophe Adrie; Charles Santré; Pierre Edouard Bollaert; Armelle Mathonet; Roland Amathieu; Alexis Tabah; Christophe Clec'h; Julien Mayaux; Julie Lejeune; Sylvie Chevret
Journal: JAMA Date: 2010-01-27 Impact factor: 56.272

10. Efficacy of coupled plasma filtration adsorption (CPFA) in patients with septic shock: a multicenter randomised controlled clinical trial.

Authors: Sergio Livigni; Guido Bertolini; Carlotta Rossi; Fiorenza Ferrari; Michele Giardino; Marco Pozzato; Giuseppe Remuzzi
Journal: BMJ Open Date: 2014-01-08 Impact factor: 2.692

19 in total

1. Focus on sepsis.

Authors: Morten Hylander Møller; Waleed Alhazzani; Manu Shankar-Hari
Journal: Intensive Care Med Date: 2019-07-02 Impact factor: 17.440

2. Is research from databases reliable? Not sure.

Authors: Meri R J Varkila; Olaf L Cremer
Journal: Intensive Care Med Date: 2018-12-14 Impact factor: 17.440

3. Positive outcomes, mortality rates, and publication bias in septic shock trials.

Authors: Harm-Jan de Grooth; Jean-Jacques Parienti; Jonne Postema; Stephan A Loer; Heleen M Oudemans-van Straaten; Armand R Girbes
Journal: Intensive Care Med Date: 2018-06-19 Impact factor: 17.440

Review 4. Vasopressor therapy in critically ill patients with shock.

Authors: James A Russell
Journal: Intensive Care Med Date: 2019-10-23 Impact factor: 17.440

5. Vasopressin in septic shock: an individual patient data meta-analysis of randomised controlled trials.

Authors: Myura Nagendran; James A Russell; Keith R Walley; Stephen J Brett; Gavin D Perkins; Ludhmila Hajjar; Alexina J Mason; Deborah Ashby; Anthony C Gordon
Journal: Intensive Care Med Date: 2019-05-06 Impact factor: 17.440

6. Right Dose, Right Now: Development of AutoKinetics for Real Time Model Informed Precision Antibiotic Dosing Decision Support at the Bedside of Critically Ill Patients.

Authors: Luca F Roggeveen; Tingjie Guo; Ronald H Driessen; Lucas M Fleuren; Patrick Thoral; Peter H J van der Voort; Armand R J Girbes; Rob J Bosman; Paul Elbers
Journal: Front Pharmacol Date: 2020-05-15 Impact factor: 5.810

7. Effect of cytomegalovirus reactivation on the time course of systemic host response biomarkers in previously immunocompetent critically ill patients with sepsis: a matched cohort study.

Authors: Kirsten van de Groep; Stefan Nierkens; Olaf L Cremer; Linda M Peelen; Peter M C Klein Klouwenberg; Marcus J Schultz; C Erik Hack; Tom van der Poll; Marc J M Bonten; David S Y Ong
Journal: Crit Care Date: 2018-12-18 Impact factor: 9.097

8. The Future of Critical Care Lies in Quality Improvement and Education.

Authors: Alexander S Niven; Svetlana Herasevich; Brian W Pickering; Ognjen Gajic
Journal: Ann Am Thorac Soc Date: 2019-06

Review 9. Surveillance Strategies for Tracking Sepsis Incidence and Outcomes.

Authors: Claire N Shappell; Michael Klompas; Chanu Rhee
Journal: J Infect Dis Date: 2020-07-21 Impact factor: 7.759

Review 10. Machine learning for the prediction of sepsis: a systematic review and meta-analysis of diagnostic test accuracy.

Authors: Lucas M Fleuren; Thomas L T Klausch; Charlotte L Zwager; Linda J Schoonmade; Tingjie Guo; Luca F Roggeveen; Eleonora L Swart; Armand R J Girbes; Patrick Thoral; Ari Ercole; Mark Hoogendoorn; Paul W G Elbers
Journal: Intensive Care Med Date: 2020-01-21 Impact factor: 17.440