Literature DB >> 34853905

Randomised clinical trials in critical care: past, present and future.

Anders Granholm¹, Waleed Alhazzani^2,3, Lennie P G Derde^4,5, Derek C Angus⁶, Fernando G Zampieri⁷, Naomi E Hammond^8,9, Rob Mac Sweeney¹⁰, Sheila N Myatra¹¹, Elie Azoulay¹², Kathryn Rowan¹³, Paul J Young^14,15,16,17, Anders Perner¹⁸, Morten Hylander Møller¹⁸.

Abstract

Randomised clinical trials (RCTs) are the gold standard for providing unbiased evidence of intervention effects. Here, we provide an overview of the history of RCTs and discuss the major challenges and limitations of current critical care RCTs, including overly optimistic effect sizes; unnuanced conclusions based on dichotomization of results; limited focus on patient-centred outcomes other than mortality; lack of flexibility and ability to adapt, increasing the risk of inconclusive results and limiting knowledge gains before trial completion; and inefficiency due to lack of re-use of trial infrastructure. We discuss recent developments in critical care RCTs and novel methods that may provide solutions to some of these challenges, including a research programme approach (consecutive, complementary studies of multiple types rather than individual, independent studies), and novel design and analysis methods. These include standardization of trial protocols; alternative outcome choices and use of core outcome sets; increased acceptance of uncertainty, probabilistic interpretations and use of Bayesian statistics; novel approaches to assessing heterogeneity of treatment effects; adaptation and platform trials; and increased integration between clinical trials and clinical practice. We outline the advantages and discuss the potential methodological and practical disadvantages with these approaches. With this review, we aim to inform clinicians and researchers about conventional and novel RCTs, including the rationale for choosing one or the other methodological approach based on a thorough discussion of pros and cons. Importantly, the most central feature remains the randomisation, which provides unparalleled restriction of confounding compared to non-randomised designs by reducing confounding to chance.

Entities: Chemical

Keywords: Clinical trials; Critical care; Intensive care; Randomized clinical trials

Mesh：

Year: 2021 PMID： 34853905 PMCID： PMC8636283 DOI： 10.1007/s00134-021-06587-9

Source DB: PubMed Journal: Intensive Care Med ISSN： 0342-4642 Impact factor: 41.787

Take-home message

Introduction

Randomised clinical trials (RCTs) fundamentally changed the practice of medicine, and randomisation is the gold standard for providing unbiased estimates of intervention effects [1]. Clinical trials have evolved, substantially, from the first described systematic comparison of dietary regimens 2500 years ago in Babylon, to the 1747 scurvy trial, the first double-blinded trial of patulin for the common cold conducted in the 1940’s, and the establishment of modern ethical standards and regulatory frameworks following World War II (Fig. 1) [2, 3].

Fig. 1

Timeline of important milestones in the general history of clinical trials based on references [2, 3]. A historical timeline of key critical care studies and RCTs is available elsewhere [6]

Timeline of important milestones in the general history of clinical trials based on references [2, 3]. A historical timeline of key critical care studies and RCTs is available elsewhere [6] While the fundamental concept of RCTs has remained relatively unchanged since then, the degree of collaboration has increased, and the largest trials have become larger [4]. Additionally, smaller RCTs assessing efficacy in narrow populations in highly controlled settings have been complemented with larger, more pragmatic RCTs in broader populations with less protocolisation of concomitant interventions, more closely resembling clinical practice [5]. Similarly, per-protocol-analyses assessing efficacy (i.e., effects of an intervention under ideal circumstances in patients with complete protocol adherence) have been complemented with intention-to-treat-analyses, assessing effectiveness under pragmatic circumstances in all randomised patients, regardless of protocol adherence (which may be affected by the intervention itself). This provides a better estimate of the actual effects of choosing one intervention over another in clinical practice [5]. A discussion and historical timeline of key critical care studies and RCTs is available elsewhere [6]. In this review, we outline the characteristics and common challenges of conventional RCTs in critical care, discuss potential improvements and novel design features followed by discussion of their potential limitations.

Common limitations and challenges of RCTs in critical care

RCTs are not without limitations, some related to the conventional design (i.e., a parallel, two-group, fixed-allocation-ratio RCT analysed with frequentist methods) and several to how many RCTs are designed and conducted. First, most critical care RCTs compare two interventions; while appropriate if only two interventions are truly of interest, oversimplifications may occur when two interventions, doses, or durations are chosen primarily to simplify trials. Second, sample size estimations for most RCTs enrolling critically ill patients in the intensive care unit (ICU) use overly optimistic effect sizes [7-10], leading to RCTs capable of providing firm evidence for very large effects, but unable to confirm or refute smaller, yet clinically relevant effects. Consequently, critical care RCTs are frequently inconclusive from a clinical perspective, and “absence of evidence interpreted as evidence of absence”-errors of interpretation [11] are common when RCTs are analysed using frequentist statistical methods and interpreted according to whether ‘statistical significance’ has been reached [8, 11]. Ultimately, this may lead to beneficial interventions being prematurely abandoned, and it has been argued that the conduct of clearly underpowered RCTs is unethical [12]. Third, critical care RCTs frequently focus on mortality [8]; while patient-important [13] and capable of capturing both desirable and undesirable effects, it conveys limited information [14], thus requiring large samples. Interventions may reduce morbidity or mortality due to one cause, but if assessed in patients at substantial risk of dying from other causes, differences may be difficult to detect [15]. Similarly, interventions may lead to negative intermediate outcomes and prolonged admission or increased treatment intensity, but not necessarily death [15]. Fourth, conventional RCT are inflexible. While one or few interim analyses may be conducted, they often rely on hard criteria for stopping [16]. Benefit or harm can, therefore, only be detected early if the effect is very large. Fifth, planning and initiating RCTs usually takes substantial time and funding, re-use of trial infrastructure is limited, data collection is mostly manual requiring substantial resources, and between-trial coordination is usually absent, increasing the risk of competing trials. Finally, even for conclusive RCTs, disseminating and implementing results into clinical practice requires substantial effort and time [17].

Larger trials, standardisation, meta-analyses and research programmes

The simple solution to inconclusive, underpowered RCTs is enrolling more patients, which requires more resources and international collaboration, while increasing external validity. Fewer, larger RCTs are more likely to produce conclusive evidence regarding important clinical questions than multiple, smaller trials, and are better to assess safety (including rare adverse events) if properly monitored. Thus, there may be a rationale for focussing on widely used interventions, such as the Mega-ROX RCT [18] that aims to compare oxygenation targets in 40,000 ICU patients to provide conclusive evidence for smaller effects than what previous RCTs have been able to confirm or reject [19-21]. An alternative to very large RCTs which for logistic, economic and administrative reasons are challenging is standardisation or harmonisation of RCT protocols, followed by pre-planned, prospective meta-analyses [22], which may also limit competition between trials. This was done for three large RCTs of early goal-directed therapy for septic shock, with results included in a conventional, trial-level meta-analysis with other RCTs and a prospectively planned individual patient-data meta-analysis [23, 24]. Other examples include prospective meta-analyses on systemic corticosteroids and interleukin-6-receptor antagonists for critically ill patients with coronavirus disease 2019 (COVID-19) [25, 26], and a prospective meta-trial including six RCTs of awake prone positioning for patients with COVID-19 and hypoxia, with separate logistics and infrastructure, but harmonised protocols and prospective analysis of combined individual participant-data [27]. Importantly, RCTs should ideally be conducted as part of complete research programmes (Fig. 2), with pre-clinical studies (e.g., in-vitro and animal studies), systematic reviews, and non-randomised studies and pilot/feasibility RCTs informing RCT designs, including selection of appropriate research questions, populations, interventions and comparators, outcomes and realistic effect sizes. When RCTs are completed, results should be incorporated in updated systematic reviews and clinical practice guidelines to ease implementation [28], all considering relevant patient differences and effects of concomitant interventions. For example, the SUP-ICU programme included topical and systematic reviews summarising existing evidence, a survey describing preferences and indications for stress ulcer prophylaxis, and a cohort assessing prevalence, risk factors and outcomes of patients with gastrointestinal bleeding before the RCT was designed [29-33]. Following the RCT, results were incorporated in updated systematic reviews and clinical practice guidelines [33-35].

Fig. 2

Overview of different study types and their role in clinical research programmes. In general, pre-clinical studies can provide necessary background or laboratory knowledge that may be used to generate hypotheses later assessed in clinical trials. Summarising existing evidence prior to start of clinical studies is sensible, to identify knowledge gaps, avoid duplication of efforts, and inform further clinical studies. Surveys may identify existing beliefs, practices and attitudes towards further studies; cross-sectional studies and cohort studies can describe prevalence, outcomes, predictors/risk factors and current practice. Randomised clinical trials remain the gold standard for intervention comparisons but may also provide data for secondary studies not necessarily focussing on the randomised intervention comparison. Before randomised clinical trials aimed at assessing efficacy or effectiveness of an intervention are conducted, pilot/feasibility trials may be conducted to prepare larger trials and assess protocol delivery and feasibility. Following the conduct of a randomised clinical trial, relevant systematic reviews and clinical practice guidelines should be updated as necessary, to ease implementation of trial results into clinical practice. Of note, the process is not always linear and unidirectional, and different study types may be conducted at different temporal stages during a research programme. Translational research may incorporate pre-clinical and laboratory studies and clinical studies, including non-randomised cohort studies and randomised clinical trials. Similarly, clinical studies may be used to collect data or samples that are further analysed outside the clinical setting

Outcome selection

Historically, most RCTs in critically ill patients have focussed on all-cause landmark mortality assessed at a single time-point [8]. As mortality in critically ill patients is high, it needs to be considered regardless of the outcome chosen. However, mortality conveys limited statistical information compared to more granular outcomes, as it only contains two possible values, i.e., death or alive regardless of health state [14, 36] and is thus insensitive to changes leading to other clinical improvements, e.g., quicker disease resolution or better functional outcomes in survivors. Thus, mortality requires large samples, and RCTs focussing on mortality are less frequently ‘statistically significant’ compared to RCTs focussing on other outcomes [8]. While mortality may be the most appropriate outcome in some trials, other outcomes should thus be considered [37]. During the COVID-19 pandemic, multiple RCTs focussed on more granular, higher-information outcomes such as days alive without life support or mechanical ventilation [38-40], which includes both mortality, resource use and illness durations. However, these outcomes are challenging due to different definitions, different handling of death, potentially opposing effects on mortality and the duration of life support in survivors, possibly greater risk of bias in unblinded trials, and difficult statistical analysis [41-43]. Use of composite outcomes may increase power due to overall more events, but hamper interpretability as components of different importance to patients are weighted equally, and as interventions may affect individual components differently (e.g., increase intubation rates but decrease mortality) [44]. Finally, the development of core outcome sets may help in prioritisation and standardising outcome selection, allowing easier comparison and synthesis of RCT results [45].

Avoiding dichotomisation and embracing uncertainty

Most RCTs are planned and analysed using frequentist statistical methods, with results dichotomised as ‘statistically significant’ or not. Non-significant results are misinterpreted as evidence for no difference in approximately half of journal articles [46] and avoiding dichotomisations and abandoning the concept of statistical significance has been repeatedly discussed and recommended [46-49]. P values are calculated assuming that the null hypothesis is true (i.e., that there is exactly no difference, which is often implausible), and as they are indirect probabilities, they are hard to interpret (Fig. 3) [50]. As P values depend on both effect sizes and sample sizes, they will generally be small in large samples and large in small samples, regardless of the potential clinical importance of effects; thus, estimating effect sizes with uncertainty measures [i.e., confidence intervals (CIs)] may be preferable [51], although CIs are frequently misinterpreted, too [50]. As misinterpretations are common [46, 50], increased education of clinicians and researchers is likely needed [48].

Fig. 3

Direction of probabilities in frequentist (A) and Bayesian (B) analyses. This figure illustrates the direction of probabilities in frequentist (conventional) and Bayesian statistical analyses. A Frequentist P values, Pr(data | H0): probability of obtaining data (illustrated with a spreadsheet) at least as extreme as what was observed given the assumption that the null hypothesis (illustrated with a light bulb with 0 next to it) is correct. This mean that frequentist statistical tests assume that the null hypothesis (generally, that there is exactly no difference between interventions) is true. It then calculates the probability of obtaining a result at least as extreme (i.e., a difference that is at least as large as what was observed) under the assumption that there is no difference. Low P values thus provide direct evidence against the null hypothesis, but only indirect evidence related to the hypothesis of interest (i.e., that there is a difference), which makes them difficult to interpret. With more frequent analyses, there is an increased risk of obtaining results that would be surprising if the null hypothesis is true, and thus, with more tests or interim analyses, the risk of rejection the null hypothesis due to chance (a type I error) increases. B Bayesian probabilities, Pr(H | data): the probability of any hypothesis of interest (illustrated with a light bulb; e.g., that there is benefit with the intervention) given the data collected. Bayesian probabilities thus provide direct evidence for any hypothesis of interest, and the probabilities for multiple hypotheses, e.g. any benefit, clinically important benefit, or a difference smaller than what is considered clinically important, can be calculated from the same posterior distribution without any additional analyses or multiplicity issues. If further data are collected, the posterior probability distribution is updated and replaces the old posterior probability distribution. For both frequentist and Bayesian models, these probabilities are calculated according to a defined model and all its included assumptions—and for Bayesian analyses also a defined prior probability distribution—all of which are assumed to be correct or appropriate for the results to be trusted. Abbreviations and explanations: data: the results/difference observed; H: a hypothesis of interest; H0: a null hypothesis (i.e., that there is no difference). Pr: probability; |: should be read as “given” The issue with dichotomising results received attention following the publication of several important critical care RCTs with apparent discrepancies between statistical significance and clinical importance. The EOLIA RCT of extracorporeal membrane oxygenation (ECMO) in patients with severe acute respiratory distress syndrome (ARDS) concluded that “60-day mortality was not significantly lower with ECMO than with a strategy of conventional mechanical ventilation that included ECMO as rescue therapy.” [52]. While technically correct, it may be considered overly reductionistic, as the conclusion was based on 60-day mortality rates of 35% (ECMO) vs. 46% (control) and a P value of 0.09 following a sample size calculation based on an absolute risk reduction of 20 percentage points [52]. Similarly, the ANDROMEDA-SHOCK RCT conducted in septic shock patients concluded that “a resuscitation strategy targeting normalization of capillary refill time, compared with a strategy targeting serum lactate levels, did not reduce all-cause 28-day mortality.”, based on 28-day mortality rates of 34.9% vs. 43.4% and a P value of 0.06, following a sample size calculation based on a 15 percentage points absolute risk reduction [53]. Arguably, smaller effect sizes are clinically relevant in both cases. There has been increased interest in supplementing or replacing conventional analyses with Bayesian statistical methods [37, 54, 55], which start with probability distributions expressing prior beliefs. Once data have been collected, these are updated to posterior probability distributions [56, 57]. Different prior distributions can be used, including uninformative-, vaguely informative-, evidence-based-, sceptic-, positive- or negative priors [58]. The choice of prior may be difficult and may potentially be abused to get the ‘desired’ results; typically, however, weakly informative, neutral priors, with minimal influence on the results are used for the primary Bayesian analyses of critical care RCTs, with sensitivity analyses assessing the influence of other priors [59-63]. If priors are transparently reported (and ideally pre-specified), assessing whether they are reasonable is fairly easy. Posterior probability distributions can be summarised in multiple ways. Credible intervals (CrIs) directly represent the most probable values (which is how frequentist CIs are often erroneously interpreted) [50, 57], and direct probabilities of any effect size can be calculated, i.e., the probability of any benefit (relative risk < 1.00), clinically important benefit (e.g., absolute risk difference > 2 percentage points) or practical equivalence (e.g., absolute risk difference between −2 and 2 percentage points) (Fig. 3). In Bayesian re-analyses of EOLIA and ANDROMEDA-SHOCK, there were 96% and 98% probabilities of benefit with the interventions, respectively, using minimally informative or neutral priors [59, 60]; while thresholds for adopting interventions may vary depending on resources/availability, preferences, and cost, these re-analyses led to more nuanced interpretations, with the use of multiple priors allowing readers to form their own context-dependent conclusions. Several Bayesian analyses have been conducted post hoc [59, 60, 64–66], sometimes motivated by apparently clinically important effect sizes that did not reach statistical significance, while others have been pre-specified [61, 62, 67], which is preferable as selection driven by trial results is thus avoided. Nuanced interpretations avoiding dichotomisations are also possible using conventional, frequentist statistics [46-49]; however, assessments of statistical significance may be so ingrained in many clinicians, researchers and journal editors that more nuanced interpretation may be easier facilitated with alternative statistical approaches. Different evidence thresholds may be appropriate depending on the intervention, i.e., less certain evidence may be required when comparing commonly used and well-known interventions with similar costs and disadvantages, and more certain evidence may be required before implementing new, costly or burdensome interventions [68]. This is similar to how clinical practice guidelines consider the entire evidence base and the nature of the interventions being compared including costs, burden of implementation and patient preferences [1]. While claims of “no difference” based solely on lack of statistical significance should be avoided, clearly pre-defined thresholds may still be required for approving new interventions, for declaring trials “successful” and for limiting the risk of “spin” in conclusions. Thus, a nuanced set of standardised policy responses to more nuanced evidence summaries may be warranted to ensure some standardisation of interpretation and implementation, while still considering differences in patient characteristics and preferences.

Average and heterogeneity of treatment effects

The primary RCT results generally represent the average treatment effects across all included patients, however, heterogeneity of treatment effects (HTE) [69, 70] in subpopulations are likely, and, despite being difficult to prove, have been suggested in multiple previous critical care RCTs [33, 64, 71–74]. A neutral average effect may represent benefit in some patients and harm in others (Fig. 4), and a beneficial average effect may differ in magnitude across subgroups, which could influence decisions to use the intervention [1, 75, 76]. It is sometimes assumed that the risk of adverse events is similar for patients at different risk of the primary outcome [70], which may affect the balance between benefits and harms of a treatment according to baseline risk, although this assumption may not always hold [77].

Fig. 4

Heterogeneity of treatment effects in clinical trial. Forest plot illustrating a fictive clinical trial enrolling 4603 patients. In this trial, the average treatment effect may be considered neutral with a relative risk (RR) of 0.96 and 95% confidence interval of 0.90–1.04 (or inconclusive, if this interval included clinically relevant effects). The trial population consists of three fictive subgroups with heterogeneity of treatment effects: A, with an intervention effect that is neutral (or inconclusive), similarly to the pooled result; B, with substantial benefit from the intervention; and C, with substantial harm from the intervention. If only the average intervention effect is assessed, it may be concluded – based on the apparent neutral overall result – that whether the intervention or control is used has little influence on patient outcomes, and it may be missed that the intervention provides substantial benefit in some patients and substantial harm in others. Similarly, an intervention with an overall beneficial effect may be more beneficial in some subgroups than others and may provide harm in some patients, and vice versa While large, pragmatic RCTs may be preferred for detecting clinically relevant average treatment effects, guiding overall clinical practice recommendations and for public healthcare, they have been criticised for including too heterogenous populations, often due to inclusion of general acutely ill ICU patients or ICU patients with broad syndromic conditions, i.e., sepsis or ARDS [78]. Even if present, HTE may be of limited importance if some patients benefit while others are mostly unaffected, if cost or burden of implementation is limited, or if some patients are harmed while others are mostly unaffected. Most RCT assess potential HTE by conducting conventional subgroup analyses despite important limitations [79]. As substantially more patients are required to assess subgroup differences than for primary analyses, most subgroup analyses are substantially underpowered and may miss clinically relevant differences [79]. In addition, larger numbers of subgroup analyses increase the risk of chance findings [79]. Conventional subgroup analyses assess one characteristic at a time, which may not reflect biology or clinical practice where multiple risk factors are often synergistic or additive [79], or where effect modifiers may be dynamic and change during illness course. Finally, conventional subgroup analyses frequently dichotomise continuous variables, which limits power [80] and makes assessment of gradual changes in responses difficult. Alternative and better solutions for assessing HTE include predictive HTE analysis, where a prediction model incorporating multiple relevant clinical variables predictive of either the outcome or the change in outcomes with the intervention is used [77]; use of clustering algorithms and clinical knowledge to identify subgroups and distinct clinical pheno-/endotypes for syndromic conditions [64, 81, 82]; assessments of interactions with continuous variables without categorisation [63-65]; use of Bayesian hierarchical models, where subgroups effect estimates are partially pooled, limiting the risk of chance findings in smaller subgroups [63-65]; and adaptive enrichment [83, 84], discussed below. Improved and more granular analyses seem the most realistic way towards “personalised” medicine [77], but requires more data and thus overall larger RCTs. Regardless of the approach, appropriate caution should always be employed when interpreting subgroup and HTE analyses.

Adaptation

Adaptive trials are more flexible and can be more efficient than conventional RCTs [85], while being designed to have similar error rates. Adaptive trials often, but not always, use Bayesian statistical methods, which are well suited for continuous assessment of accumulating evidence [83, 86]. Adaptive trials can be adaptive in multiple ways [87]. First, pre-specified decision rules (for stopping for inferiority/superiority/equivalence/futility) allow trials to run without pre-specified sample sizes or to revise target sample sizes, thus allowing trials to run until just enough data have been accumulated. Expected sample sizes are estimated using simulation; if the expected baseline risks and effect sizes are incorrect, the final sample sizes will differ from expectations, but adaptive trials are still able to continue until sufficient evidence is obtained. Further, adaptive sample sizes are better suited for new diseases, where no or limited existing knowledge complicates sample size calculations. For example, conducting the OSCAR RCT assessing high-frequency oscillation in ARDS using a Bayesian adaptive design could have reduced the number of patients and total deaths by > 15% [88]. Second, trials may be adaptive regarding the interventions assessed; multiple interventions or doses may be studied simultaneously or in succession, and the least promising may be dropped while assessment of better performing interventions continues until conclusive evidence has been obtained [83, 86]. This has been used for dose-finding trials, e.g., the SEPSIS-ACT RCT initially compared three selepressin doses to placebo, followed by selection of the best dose for further comparison [67], and the ongoing adaptive phase II/III Revolution trial [89], comparing antiviral drugs and placebo focussing on reducing viral loads in its first phases and increasing the number of days without respiratory support in the third phase. Similarly, interventions may be added during the trial, as in platform trials discussed below. Third, trials may use response-adaptive randomisation to update allocation ratios based on accumulating evidence, thereby increasing the chance that patients will be allocated to more promising interventions, despite not having reached conclusiveness yet. This can increase efficiency in some situations, but also decrease it, as in two-armed RCT and some multi-armed RCTs [90, 91]. Thus, it has been argued that while response-adaptive randomisation may benefit internal patients, it may not always be preferable, as it can lead to slower discovery of interventions that can benefit patients external to the trial in some cases [91, 92]. Finally, trials may use adaptive enrichment to adapt/restrict inclusion criteria to focus on patients more likely to benefit, or use different allocation ratios for different subpopulations [84, 87].

Platform trials

Platform trials are RCTs that instead of focussing on single intervention comparisons focus on a disease or condition and assess multiple interventions according to a master protocol [83, 93]. Platform trials may run perpetually, with interventions added or dropped continuously [83, 94] and often employ multiple adaptive features and probabilistic decision rules [83, 93]. Interventions assessed can be nested in multiple domains, e.g., REMAP-CAP assesses interventions in patients with severe community-acquired pneumonia in several domains including antibiotics, corticosteroids, and immune-modulating therapies. By assessing multiple interventions simultaneously and by re-using controls for comparisons with multiple interventions, platform trials can be more efficient than sequential two-armed comparisons and can be more efficient than simpler adaptive trials [94, 95]. Adaptive platform trials are capable of “learning while doing”, and potentially allow tighter integration of clinical research and clinical practice, i.e., a better exploration–exploitation trade-off (learning versus doing based on existing knowledge) [83, 96]. If response-adaptive randomisation is used, probabilities of allocation to potentially superior interventions increases as evidence is accumulated, and interventions that are deemed superior may immediately become implemented as standard of care by becoming the new control group [83]. Thus, implementation of results into practice – at least in participating centres—may become substantially faster. While platform trials have only recently been used in critically ill patients, the RECOVERY and REMAP-CAP trials have led to substantial improvements in the treatment of patients with COVID-19 within a short time-frame [38, 97–99], although this may not only be explained by the platform design, but also the case load and urgency of the situation. Comparable to how data from multiple conventional RCTs may be prospectively planned to be analysed together, data from multiple platform trials may be combined in multiplatform trials with similar benefits and challenges as individual platform trials and standardisation across individual, conventional RCTs [100].

Further embedding of RCTs into clinical practice

In addition to the possible tighter integration between research and clinical practice that may come with adaptive platform trials and ultimately may lead to learning healthcare systems [83], integration may be increased in other ways. Trials may be embedded in electronic health records, where automatic integration may lead to substantial logistic improvements regarding data collection, integration of randomisation modules, and alerts about potentially eligible patients. This may improve logistics and data collection and facilitate closer integration between research and practice [61, 83]. Similarly, RCTs may use data already collected in registers or clinical databases, substantially decreasing the data-collection burden, as has been done in, e.g., the PEPTIC cluster-randomised register-embedded trial [74]. Finally, fostering an environment where clinical practice and clinical research are tightly integrated and where enrolment in clinical trials is considered an integral part of clinical practice in individual centres by clinicians, patients and relatives may lead to faster improvements of care for all patients.

Limitations and challenges

While the methods discussed may mitigate some challenges of conventional RCTs, they are not without limitations (Table 1). First, larger trials come with challenges regarding logistics, regulatory requirements (including approvals, consent procedures, and requirements for reporting adverse events), economy, collaboration, between-centre heterogeneity in other interventions administered, and potential challenges related to academic merits. Second, standardisation and meta-analyses may require compromises or increased data-collection burden in some centres or may not be possible due to between-trial differences. Third, while complete research programmes may lead to better RCTs, they may not be possible in, e.g., emergency situations such as pandemics caused by new diseases. Fourth, using outcomes other than mortality comes with difficulties relating to statistical analysis, how death is handled and possibly interpretation, and mortality should not be abandoned for outcomes that are not important to patients. Fifth, while avoiding dichotomisation of results and using Bayesian methods has some advantages, it may lead to larger differences in how evidence is interpreted and possibly lower thresholds for accepting new evidence if adequate caution is not employed. In addition, switching to Bayesian methods requires additional education of clinicians, researchers and statisticians, and specification of priors and estimating required sample sizes adds complexity. Sixth, while improved analyses of HTE have benefits compared to conventional subgroup analyses, the risk of chance findings and lack of power remains. Finally, adaptive and platform trials come with logistic and practical challenges as listed in Table 1 and discussed below.

Table 1

Methodological features that may improve clinical trials benefits and challenges

Feature	Benefits and advantages	Challenges and disadvantages
Larger trials (increased sample sizes)	Decreased uncertainty, increased precision Easier to detect potential subgroup differences Less chance of inconclusive results (i.e., greater precision and less uncertainty); results from fewer large RCTs are easier to compare than results from many smaller RCTs Easier to address safety concerns if properly monitored, as larger trials have higher chances of detecting rare adverse events Increased generalisability/external validity in multicentre/international RCTs	Economic: trial cost and optimal use of overall research resources Collaboration: increased workload coordinating, different regulatory requirements including different handling of consent procedures and reporting of adverse events, challenges with coordination due to language and time zone differences Comparability: potential differences in standard of care/available resources in international trials Academic challenges: less individually led projects due to increased collaboration – group authorships may be less attractive in settings where individual author positions are valued (e.g., grant applications)
Standardisation and meta-analyses	Increased comparability/less heterogeneity Less competition between trials Meta-analysis may be more sensible – less statistical inconsistency may lead to more precise results Prospective meta-analyses or meta-trials may provide quicker answers than individual conventional RCTs and meta-analyses, especially if trialists share data earlier, and adequate certainty is obtained before individual trials finish	Agreement between investigators on the design and variables could be challenging and time-consuming; compromises may be necessary for standardisations; core outcome sets may improve this Data not routinely collected in one setting may be required due to standardisation, potentially increasing workload in some centres If adequate standardisation is not possible, comparisons in meta-analyses may be difficult Differences in populations, interventions, comparators, outcomes, concurrent treatments and changes over time may hamper interpretation of meta-analyses
Research programmes	Complete research programmes including multiple study types may lead to better RCTs focussing on more relevant questions Evidence synthesis prior to trial conduct puts trials into context and may help identify the largest knowledge gaps or where new trials are not necessary	Research programmes may require substantial resources and time until an eventual trial can start; in most situations this will be sensible, but may not be possible during pandemics or emergencies and may require additional resources and funding
Outcome choices	Choosing non-dichotomous or non-mortality outcomes carrying more information may lead to more efficient or conclusive trials and smaller sample size requirements Outcomes with more levels than just dead/alive may convey important information on how well survivors fare	Definition and handling of death is challenging, including appropriate “weighting” of death, and clinical interpretation if mortality is treated in a special manner (e.g., if days alive without life support is analysed as an ordinal variable with death treated as worse than 0 days) Many non-mortality patient-important outcomes have skewed distributions complicating many common statistical (parametric) analyses and estimations of differences on an interpretable scale (including in meta-analyses) Difficulties in interpretation if effects on mortality and other parts of the outcome are in different directions (for composite outcomes and days alive without life support and similar outcomes) Risk of choosing less patient-important outcomes or surrogate outcomes
Avoiding dichotomisation of results, probabilistic interpretations	Nuanced conclusions; assessing evidence as a continuum avoids risk of incorrect “absence of evidence interpreted as evidence of absence” errors Using Bayesian methods allow incorporation of previous results or scepticism and easier propagation of uncertainty to subsequent calculations The same level of evidence may not be required to change clinical practice for all interventions—this depends on price, risk of adverse events, availability, character of intervention, invasiveness, etc.; these considerations apply to both trials and clinical practice guidelines	Probabilistic interpretations do not solve the primary issues of many trials; lack of dichotomisation does not in itself increase the certainty of evidence While conventional significance thresholds are arbitrary, they are widely used; changing methods may lead some researchers to opt for less strict thresholds or allow increased “spin” in conclusions. Disagreements in interpretation may increase if there is no standard threshold and pre-specified criteria for success for e.g. approving new interventions and standardised policy responses may be warranted Non-dichotomous and more detailed interpretations of trial results may be more difficult to communicate to non-researchers and non-experts Prior selection in Bayesian analyses adds additional complexity; results may be unduly influenced by strong(er) priors not shared by other researchers. Sensible priors (often non- or weakly informative priors are used in the primary Bayesian analyses of critical care trials), transparently reported, ideally pre-specified, and with adequate sensitivity analyses performed is warranted
Improved HTE analyses	Predictive HTE analyses and other approaches considering multiple patient characteristics simultaneously or overall risk may better reflect clinical reality than one-variable-at-a-time subgroup analyses Hierarchical models may limit the risk of exaggerated results and chance findings in smaller subgroups and increase precision due to borrowing of information Assessment of HTE according to variables of interest on the continuous scale may better detect dose–response relationships than categorised subgroup analyses	Subgroup or HTE analyses, regardless of approach, generally requires more patients—trials may still be underpowered to detect differences The more analyses conducted, the greater risk of chance findings—this may be mitigated, but not completely solved, by the discussed approaches Requires careful consideration of whether HTE analyses should be conducted on the absolute or relative scales; when the baseline risk differs between groups, there will always be HTE on either the relative or the absolute scale (often, intervention effects are most consistent on the relative scale)
Adaptation	Adaptive sample sizes/stopping rules may lead to optimally sized trials, more likely to reach conclusive evidence Adaptive arm adding/dropping may increase overall trial efficiency Adaptive randomisation may increase chance of getting better interventions in some situations, which may make trial participation more attractive to patients Adaptive enrichment may enable trials to better detect differences in responses and tailor interventions to different subpopulations or phenotypes	Logistic and economic challenges in planning and funding trials without fixed sample sizes; alternative financing models may be necessary Planning may be more difficult; instead of simple sample size calculations, advanced statistical simulation may be necessary to estimate required sizes and risk of random errors, requiring increased collaboration with statisticians and increased training of clinician-researchers Pre-specified criteria for stopping/adaptation necessary; may be difficult to define Adaptation requires more real-time data collection and verification, increasing data registration burden on individual sites Adaptations may be complex to implement and communicate Outcomes with longer follow-up lead to slower adaptation compared to shorter-term outcomes, which may add additional complexity. Consequently, the use of shorter-term outcomes to guide adaptive trials instead of the outcome of primary interest may be considered in some situations Risk of adaptations based on chance findings/fluctuations may require restraints of adaptation to avoid random errors, which is difficult to plan and handle While adaptive trials may be more likely to reach conclusive evidence if continued until a stopping rule is reached, they may need to be substantially larger to confirm or refute all clinically relevant effects (as is the case for conventional trials, too)
Adaptive platform trials	Increased efficiency, and potentially similar advantages as for adaptive conventional trials May decrease time to clinical adaptation and enable “learning while doing” Reuse of trial infrastructure and embedding in electronic health records and clinical practice may increase efficiency and decrease cost Potential improvement of informed consent procedures compared to consent when co-enrolment in multiple trials occurs Familiarity and consistency with a common platform design may be easier in practice than repeated conduction of independent RCTs	Same challenges as adaptive trials in general Potential regulatory issues; less well-known design may complicate approvals May take longer time to setup and implement than regular trials More complex – may be more difficult to implement and train staff, more difficult to explain to patients/potential complication of consent procedures, relatives and other stakeholders, may be more difficult to work with for non-researcher clinicians Standards for conducting and reporting less developed; may be more difficult to report and explain results Additional complexity with time drift/temporal variation and response-adaptive randomisation and potential re-use of non-concurrent controls requires adequate statistical handling to avoid bias Potential challenges with workload/stress of perpetual trials
Embedding trials in clinical practice and registers	Tighter integration of clinical practice and clinical trials may lead to faster improvements in patient care Embedding clinical trials in electronic health records may reduce data-collection burden and cost and alert clinicians and researchers of eligible patients and clinical events Register-based trials (including register-based cluster-randomised trials) may reduce data-collection burden and trial cost by using clinical registers already in place	Register-based data-collection may not be as easily standardised without changing individual registers; compromises based on availability in registers may be necessary Embedding trials in registers or electronic health records poses additional challenges with different electronic health record software and across borders Data quality and completeness in registers may not be as good as when data are prospectively collected for all variables Limited long-term outcome data generally available in registers due to additional complexity of data collection

Feature

Benefits and advantages

Challenges and disadvantages

Larger trials (increased sample sizes)

Decreased uncertainty, increased precision

Easier to detect potential subgroup differences

Less chance of inconclusive results (i.e., greater precision and less uncertainty); results from fewer large RCTs are easier to compare than results from many smaller RCTs

Easier to address safety concerns if properly monitored, as larger trials have higher chances of detecting rare adverse events

Increased generalisability/external validity in multicentre/international RCTs

Economic: trial cost and optimal use of overall research resources

Collaboration: increased workload coordinating, different regulatory requirements including different handling of consent procedures and reporting of adverse events, challenges with coordination due to language and time zone differences

Comparability: potential differences in standard of care/available resources in international trials

Academic challenges: less individually led projects due to increased collaboration – group authorships may be less attractive in settings where individual author positions are valued (e.g., grant applications)

Standardisation and meta-analyses

Increased comparability/less heterogeneity

Less competition between trials

Meta-analysis may be more sensible – less statistical inconsistency may lead to more precise results

Prospective meta-analyses or meta-trials may provide quicker answers than individual conventional RCTs and meta-analyses, especially if trialists share data earlier, and adequate certainty is obtained before individual trials finish

Agreement between investigators on the design and variables could be challenging and time-consuming; compromises may be necessary for standardisations; core outcome sets may improve

this

Data not routinely collected in one setting may be required due to standardisation, potentially increasing workload in some centres

If adequate standardisation is not possible, comparisons in meta-analyses may be difficult

Differences in populations, interventions, comparators, outcomes, concurrent treatments and changes over time may hamper interpretation of meta-analyses

Research programmes

Complete research programmes including multiple study types may lead to better RCTs focussing on more relevant questions

Evidence synthesis prior to trial conduct puts trials into context and may help identify the largest knowledge gaps or where new trials are not necessary

Research programmes may require substantial resources and time until an eventual trial can start; in most situations this will be sensible, but may not be possible during pandemics or emergencies and may require additional resources and funding

Outcome choices

Choosing non-dichotomous or non-mortality outcomes carrying more information may lead to more efficient or conclusive trials and smaller sample size requirements

Outcomes with more levels than just dead/alive may convey important information on how well survivors fare

Definition and handling of death is challenging, including appropriate “weighting” of death, and clinical interpretation if mortality is treated in a special manner (e.g., if days alive without life support is analysed as an ordinal variable with death treated as worse than 0 days)

Many non-mortality patient-important outcomes have skewed distributions complicating many common statistical (parametric) analyses and estimations of differences on an interpretable scale (including in meta-analyses)

Difficulties in interpretation if effects on mortality and other parts of the outcome are in different directions (for composite outcomes and days alive without life support and similar outcomes)

Risk of choosing less patient-important outcomes or surrogate outcomes

Avoiding dichotomisation of results, probabilistic interpretations

Nuanced conclusions; assessing evidence as a continuum avoids risk of incorrect “absence of evidence interpreted as evidence of absence” errors

Using Bayesian methods allow incorporation of previous results or scepticism and easier propagation of uncertainty to subsequent calculations

The same level of evidence may not be required to change clinical practice for all interventions—this depends on price, risk of adverse events, availability, character of intervention, invasiveness, etc.; these considerations apply to both trials and clinical practice guidelines

Probabilistic interpretations do not solve the primary issues of many trials; lack of dichotomisation does not in itself increase the certainty of evidence

While conventional significance thresholds are arbitrary, they are widely used; changing methods may lead some researchers to opt for less strict thresholds or allow increased “spin” in conclusions. Disagreements in interpretation may increase if there is no standard threshold and pre-specified criteria for success for e.g. approving new interventions and standardised policy responses may be warranted

Non-dichotomous and more detailed interpretations of trial results may be more difficult to communicate to non-researchers and non-experts

Prior selection in Bayesian analyses adds additional complexity; results may be unduly influenced by strong(er) priors not shared by other researchers. Sensible priors (often non- or weakly informative priors are used in the primary Bayesian analyses of critical care trials), transparently reported, ideally pre-specified, and with adequate sensitivity analyses performed is warranted

Improved HTE analyses

Predictive HTE analyses and other approaches considering multiple patient characteristics simultaneously or overall risk may better reflect clinical reality than one-variable-at-a-time subgroup analyses

Hierarchical models may limit the risk of exaggerated results and chance findings in smaller subgroups and increase precision due to borrowing of information

Assessment of HTE according to variables of interest on the continuous scale may better detect dose–response relationships than categorised subgroup analyses

Subgroup or HTE analyses, regardless of approach, generally requires more patients—trials may still be underpowered to detect differences

The more analyses conducted, the greater risk of chance findings—this may be mitigated, but not completely solved, by the discussed approaches

Requires careful consideration of whether HTE analyses should be conducted on the absolute or relative scales; when the baseline risk differs between groups, there will always be HTE on either the relative or the absolute scale (often, intervention effects are most consistent on the relative scale)

Adaptation

Adaptive sample sizes/stopping rules may lead to optimally sized trials, more likely to reach conclusive evidence

Adaptive arm adding/dropping may increase overall trial efficiency

Adaptive randomisation may increase chance of getting better interventions in some situations, which may make trial participation more attractive to patients

Adaptive enrichment may enable trials to better detect differences in responses and tailor interventions to different subpopulations or phenotypes

Logistic and economic challenges in planning and funding trials without fixed sample sizes; alternative financing models may be necessary

Planning may be more difficult; instead of simple sample size calculations, advanced statistical simulation may be necessary to estimate required sizes and risk of random errors, requiring increased collaboration with statisticians and increased training of clinician-researchers

Pre-specified criteria for stopping/adaptation necessary; may be difficult to define

Adaptation requires more real-time data collection and verification, increasing data registration burden on individual sites

Adaptations may be complex to implement and communicate

Outcomes with longer follow-up lead to slower adaptation compared to shorter-term outcomes, which may add additional complexity. Consequently, the use of shorter-term outcomes to guide adaptive trials instead of the outcome of primary interest may be considered in some situations

Risk of adaptations based on chance findings/fluctuations may require restraints of adaptation to avoid random errors, which is difficult to plan and handle

While adaptive trials may be more likely to reach conclusive evidence if continued until a stopping rule is reached, they may need to be substantially larger to confirm or refute all clinically relevant effects (as is the case for conventional trials, too)

Adaptive platform trials

Increased efficiency, and potentially similar advantages as for adaptive conventional trials

May decrease time to clinical adaptation and enable “learning while doing”

Reuse of trial infrastructure and embedding in electronic health records and clinical practice may increase efficiency and decrease cost

Potential improvement of informed consent procedures compared to consent when co-enrolment in multiple trials occurs

Familiarity and consistency with a common platform design may be easier in practice than repeated conduction of independent RCTs

Same challenges as adaptive trials in general

Potential regulatory issues; less well-known design may complicate approvals

May take longer time to setup and implement than regular trials

More complex – may be more difficult to implement and train staff, more difficult to explain to patients/potential complication of consent procedures, relatives and other stakeholders, may be more difficult to work with for non-researcher clinicians

Standards for conducting and reporting less developed; may be more difficult to report and explain results

Additional complexity with time drift/temporal variation and response-adaptive randomisation and potential re-use of non-concurrent controls requires adequate statistical handling to avoid bias

Potential challenges with workload/stress of perpetual trials

Embedding trials in clinical practice and registers

Tighter integration of clinical practice and clinical trials may lead to faster improvements in patient care

Embedding clinical trials in electronic health records may reduce data-collection burden and cost and alert clinicians and researchers of eligible patients and clinical events

Register-based trials (including register-based cluster-randomised trials) may reduce data-collection burden and trial cost by using clinical registers already in place

Register-based data-collection may not be as easily standardised without changing individual registers; compromises based on availability in registers may be necessary

Embedding trials in registers or electronic health records poses additional challenges with different electronic health record software and across borders

Data quality and completeness in registers may not be as good as when data are prospectively collected for all variables

Limited long-term outcome data generally available in registers due to additional complexity of data collection

HTE heterogeneity of treatment effects; RCT randomised clinical trial

Methodological features that may improve clinical trials benefits and challenges Decreased uncertainty, increased precision Easier to detect potential subgroup differences Less chance of inconclusive results (i.e., greater precision and less uncertainty); results from fewer large RCTs are easier to compare than results from many smaller RCTs Easier to address safety concerns if properly monitored, as larger trials have higher chances of detecting rare adverse events Increased generalisability/external validity in multicentre/international RCTs Economic: trial cost and optimal use of overall research resources Collaboration: increased workload coordinating, different regulatory requirements including different handling of consent procedures and reporting of adverse events, challenges with coordination due to language and time zone differences Comparability: potential differences in standard of care/available resources in international trials Academic challenges: less individually led projects due to increased collaboration – group authorships may be less attractive in settings where individual author positions are valued (e.g., grant applications) Increased comparability/less heterogeneity Less competition between trials Meta-analysis may be more sensible – less statistical inconsistency may lead to more precise results Prospective meta-analyses or meta-trials may provide quicker answers than individual conventional RCTs and meta-analyses, especially if trialists share data earlier, and adequate certainty is obtained before individual trials finish Agreement between investigators on the design and variables could be challenging and time-consuming; compromises may be necessary for standardisations; core outcome sets may improve this Data not routinely collected in one setting may be required due to standardisation, potentially increasing workload in some centres If adequate standardisation is not possible, comparisons in meta-analyses may be difficult Differences in populations, interventions, comparators, outcomes, concurrent treatments and changes over time may hamper interpretation of meta-analyses Complete research programmes including multiple study types may lead to better RCTs focussing on more relevant questions Evidence synthesis prior to trial conduct puts trials into context and may help identify the largest knowledge gaps or where new trials are not necessary Choosing non-dichotomous or non-mortality outcomes carrying more information may lead to more efficient or conclusive trials and smaller sample size requirements Outcomes with more levels than just dead/alive may convey important information on how well survivors fare Definition and handling of death is challenging, including appropriate “weighting” of death, and clinical interpretation if mortality is treated in a special manner (e.g., if days alive without life support is analysed as an ordinal variable with death treated as worse than 0 days) Many non-mortality patient-important outcomes have skewed distributions complicating many common statistical (parametric) analyses and estimations of differences on an interpretable scale (including in meta-analyses) Difficulties in interpretation if effects on mortality and other parts of the outcome are in different directions (for composite outcomes and days alive without life support and similar outcomes) Risk of choosing less patient-important outcomes or surrogate outcomes Nuanced conclusions; assessing evidence as a continuum avoids risk of incorrect “absence of evidence interpreted as evidence of absence” errors Using Bayesian methods allow incorporation of previous results or scepticism and easier propagation of uncertainty to subsequent calculations The same level of evidence may not be required to change clinical practice for all interventions—this depends on price, risk of adverse events, availability, character of intervention, invasiveness, etc.; these considerations apply to both trials and clinical practice guidelines Probabilistic interpretations do not solve the primary issues of many trials; lack of dichotomisation does not in itself increase the certainty of evidence While conventional significance thresholds are arbitrary, they are widely used; changing methods may lead some researchers to opt for less strict thresholds or allow increased “spin” in conclusions. Disagreements in interpretation may increase if there is no standard threshold and pre-specified criteria for success for e.g. approving new interventions and standardised policy responses may be warranted Non-dichotomous and more detailed interpretations of trial results may be more difficult to communicate to non-researchers and non-experts Prior selection in Bayesian analyses adds additional complexity; results may be unduly influenced by strong(er) priors not shared by other researchers. Sensible priors (often non- or weakly informative priors are used in the primary Bayesian analyses of critical care trials), transparently reported, ideally pre-specified, and with adequate sensitivity analyses performed is warranted Predictive HTE analyses and other approaches considering multiple patient characteristics simultaneously or overall risk may better reflect clinical reality than one-variable-at-a-time subgroup analyses Hierarchical models may limit the risk of exaggerated results and chance findings in smaller subgroups and increase precision due to borrowing of information Assessment of HTE according to variables of interest on the continuous scale may better detect dose–response relationships than categorised subgroup analyses Subgroup or HTE analyses, regardless of approach, generally requires more patients—trials may still be underpowered to detect differences The more analyses conducted, the greater risk of chance findings—this may be mitigated, but not completely solved, by the discussed approaches Requires careful consideration of whether HTE analyses should be conducted on the absolute or relative scales; when the baseline risk differs between groups, there will always be HTE on either the relative or the absolute scale (often, intervention effects are most consistent on the relative scale) Adaptive sample sizes/stopping rules may lead to optimally sized trials, more likely to reach conclusive evidence Adaptive arm adding/dropping may increase overall trial efficiency Adaptive randomisation may increase chance of getting better interventions in some situations, which may make trial participation more attractive to patients Adaptive enrichment may enable trials to better detect differences in responses and tailor interventions to different subpopulations or phenotypes Logistic and economic challenges in planning and funding trials without fixed sample sizes; alternative financing models may be necessary Planning may be more difficult; instead of simple sample size calculations, advanced statistical simulation may be necessary to estimate required sizes and risk of random errors, requiring increased collaboration with statisticians and increased training of clinician-researchers Pre-specified criteria for stopping/adaptation necessary; may be difficult to define Adaptation requires more real-time data collection and verification, increasing data registration burden on individual sites Adaptations may be complex to implement and communicate Outcomes with longer follow-up lead to slower adaptation compared to shorter-term outcomes, which may add additional complexity. Consequently, the use of shorter-term outcomes to guide adaptive trials instead of the outcome of primary interest may be considered in some situations Risk of adaptations based on chance findings/fluctuations may require restraints of adaptation to avoid random errors, which is difficult to plan and handle While adaptive trials may be more likely to reach conclusive evidence if continued until a stopping rule is reached, they may need to be substantially larger to confirm or refute all clinically relevant effects (as is the case for conventional trials, too) Increased efficiency, and potentially similar advantages as for adaptive conventional trials May decrease time to clinical adaptation and enable “learning while doing” Reuse of trial infrastructure and embedding in electronic health records and clinical practice may increase efficiency and decrease cost Potential improvement of informed consent procedures compared to consent when co-enrolment in multiple trials occurs Familiarity and consistency with a common platform design may be easier in practice than repeated conduction of independent RCTs Same challenges as adaptive trials in general Potential regulatory issues; less well-known design may complicate approvals May take longer time to setup and implement than regular trials More complex – may be more difficult to implement and train staff, more difficult to explain to patients/potential complication of consent procedures, relatives and other stakeholders, may be more difficult to work with for non-researcher clinicians Standards for conducting and reporting less developed; may be more difficult to report and explain results Additional complexity with time drift/temporal variation and response-adaptive randomisation and potential re-use of non-concurrent controls requires adequate statistical handling to avoid bias Potential challenges with workload/stress of perpetual trials Tighter integration of clinical practice and clinical trials may lead to faster improvements in patient care Embedding clinical trials in electronic health records may reduce data-collection burden and cost and alert clinicians and researchers of eligible patients and clinical events Register-based trials (including register-based cluster-randomised trials) may reduce data-collection burden and trial cost by using clinical registers already in place Register-based data-collection may not be as easily standardised without changing individual registers; compromises based on availability in registers may be necessary Embedding trials in registers or electronic health records poses additional challenges with different electronic health record software and across borders Data quality and completeness in registers may not be as good as when data are prospectively collected for all variables Limited long-term outcome data generally available in registers due to additional complexity of data collection HTE heterogeneity of treatment effects; RCT randomised clinical trial As adaptive and platform trials are substantially less common than more conventional RCTs, there is less methodological guidance and interpretation may be more difficult for readers. Fortunately, several successful platform trials have received substantial coverage in the critical care community [64, 99], and an extension for the Consolidated Standards of Reporting Trials (CONSORT) statement for adaptive trials was recently published [101]. Planning adaptive and platform trials comes with additional logistic and financial challenges related to the current project-based funding model, which is better suited for fixed-size RCTs [83, 85, 93]. While adaptive trials are more flexible, large samples may still be required to firmly assess all clinically relevant effect sizes, which may not always be feasible. In addition, statistical simulation is required instead of simple sample size estimations [83, 94]. Further, the regulatory framework for adaptive and platform trials is less well-developed than for conventional RCTs, and regulatory approvals may thus be more complex and time-consuming [83]. There are also challenges with the adaptive features, and careful planning is necessary to avoid aggressive adaptations to random, early fluctuations. Initial “burn-in” phases where interventions are not compared until a sufficient number of enrolled patients can be used, as can more restrictive rules for response-adaptive randomisation and arm dropping early in the trial [94]. Simulation may be required to ensure that the risk of stopping due to chance is kept at an acceptable level, analogous to alpha-spending functions in conventional, frequentist trials [102]. Temporal changes in case-mix or concomitant interventions used may influence results in all RCTs, but is complicated further if adaptive randomisation or arm dropping/adding is used, thus requiring additional consideration, especially if patients randomised at earlier stages are re-used for comparisons with more recently introduced interventions [83]. Finally, comparisons with non-concurrent controls may affect interpretation and introduce bias if inappropriately handled [103]. Adaptations require continues protocol amendments and additional resources to implement and communicate, and may require additional training when new interventions are added [104]. Finally, while platform trials come with potential logistic and efficiency benefits, they may be more time-consuming initially and lack of a clear-cut “finish-line” may stress involved personnel [105], although familiarity and consistency may also have the opposite effect once implemented compared with repeated initiation, running and closure of consecutive, independent RCTs.

Future directions

We expect that the discussed methodological features will become more common in future critical care RCTs, and that this will improve efficiency and flexibility, and may help answer more complex questions. These methods come with challenges, though, and conventional RCTs may be preferred for simple, straightforward comparisons. Some challenges may be mitigated as these designs become more familiar to clinicians and researchers, and as additional methodological guidance is developed. We expect the future critical care RCT landscape to be a mix of relatively conventional RCTs and more advanced, adaptive trials. We propose that researchers consider the optimal methodological approach carefully when planning new RCTs. While different designs may be preferable in different situations, the choice should be based on careful thought instead of convenience or tradition, and more advanced approaches may be necessary in some situations to move critical care RCTs and practice forward.

Conclusion

In this review, we have discussed challenges and limitations of conventional RCTs, along with recent developments, novel methodological approaches and their advantages and potential disadvantages. We expect critical care RCTs to evolve and improve in the coming years. At its core, however, the most central feature of any RCT remains the randomisation itself, which provides unparalleled protection against confounding. Consequently, the RCT remains the gold standard for comparing different interventions in critical care and beyond.

In this review, the primary challenges of conventional randomised clinical trials in critical care are discussed. This is followed by discussion of potential solutions and novel trial methods, including the challenges and potential disadvantages of using these methods.

82 in total

1. Absence of evidence is not evidence of absence.

Authors: D G Altman; J M Bland
Journal: BMJ Date: 1995-08-19

2. Overall bias and sample sizes were unchanged in ICU trials over time: a meta-epidemiological study.

Authors: Carl Thomas Anthon; Anders Granholm; Anders Perner; Jon Henrik Laake; Morten Hylander Møller
Journal: J Clin Epidemiol Date: 2019-05-28 Impact factor: 6.437

3. "Paying the Piper": The Downstream Implications of Manipulating Sample Size Assumptions for Critical Care Randomized Control Trials.

Authors: Brian H Cuthbertson; Damon C Scales
Journal: Crit Care Med Date: 2020-12 Impact factor: 7.598

4. Use of the GRADE approach in systematic reviews and guidelines.

Authors: Anders Granholm; Waleed Alhazzani; Morten H Møller
Journal: Br J Anaesth Date: 2019-09-24 Impact factor: 9.166

5. Statistics and ethics in medical research: III How large a sample?

Authors: D G Altman
Journal: Br Med J Date: 1980-11-15

6. Outcomes and statistical power in adult critical care randomized trials.

Authors: Michael O Harhay; Jason Wagner; Sarah J Ratcliffe; Rachel S Bronheim; Anand Gopal; Sydney Green; Elizabeth Cooney; Mark E Mikkelsen; Meeta Prasad Kerlin; Dylan S Small; Scott D Halpern
Journal: Am J Respir Crit Care Med Date: 2014-06-15 Impact factor: 21.405

7. Pragmatic Trials.

Authors: Ian Ford; John Norrie
Journal: N Engl J Med Date: 2016-08-04 Impact factor: 91.245

8. Powering Bias and Clinically Important Treatment Effects in Randomized Trials of Critical Illness.

Authors: Darryl Abrams; Sydney B Montesi; Sarah K L Moore; Daniel K Manson; Kaitlin M Klipper; Meredith A Case; Daniel Brodie; Jeremy R Beitler
Journal: Crit Care Med Date: 2020-12 Impact factor: 9.296

Review 9. Effect sizes in ongoing randomized controlled critical care trials.

Authors: Elliott E Ridgeon; Rinaldo Bellomo; Scott K Aberegg; Rob Mac Sweeney; Rachel S Varughese; Giovanni Landoni; Paul J Young
Journal: Crit Care Date: 2017-06-05 Impact factor: 9.097

4 in total

1. Choice of priors: how much scepticism is appropriate?

Authors: Anders Granholm; Marie Warrer Munch; Morten Hylander Møller; Theis Lange; Anders Perner
Journal: Intensive Care Med Date: 2022-01-13 Impact factor: 17.440

Review 2. Randomised clinical trials in critical care: past, present and future.

Authors: Anders Granholm; Waleed Alhazzani; Lennie P G Derde; Derek C Angus; Fernando G Zampieri; Naomi E Hammond; Rob Mac Sweeney; Sheila N Myatra; Elie Azoulay; Kathryn Rowan; Paul J Young; Anders Perner; Morten Hylander Møller
Journal: Intensive Care Med Date: 2021-12-02 Impact factor: 41.787

3. Web-based application for predicting the potential target phenotype for recombinant human thrombomodulin therapy in patients with sepsis: analysis of three multicentre registries.

Authors: Tadahiro Goto; Daisuke Kudo; Ryo Uchimido; Mineji Hayakawa; Kazuma Yamakawa; Toshikazu Abe; Atsushi Shiraishi; Shigeki Kushimoto
Journal: Crit Care Date: 2022-05-19 Impact factor: 9.097

4. Sepsis subphenotyping based on organ dysfunction trajectory.

Authors: Zhenxing Xu; Chengsheng Mao; Chang Su; Hao Zhang; Ilias Siempos; Lisa K Torres; Di Pan; Yuan Luo; Edward J Schenck; Fei Wang
Journal: Crit Care Date: 2022-07-03 Impact factor: 19.334

4 in total