Literature DB >> 26858277

Agreement of treatment effects for mortality from routinely collected data and subsequent randomized trials: meta-epidemiological survey.

Lars G Hemkens¹, Despina G Contopoulos-Ioannidis², John P A Ioannidis³.

Abstract

OBJECTIVE: To assess differences in estimated treatment effects for mortality between observational studies with routinely collected health data (RCD; that are published before trials are available) and subsequent evidence from randomized controlled trials on the same clinical question.
DESIGN: Meta-epidemiological survey. DATA SOURCES: PubMed searched up to November 2014.
METHODS: Eligible RCD studies were published up to 2010 that used propensity scores to address confounding bias and reported comparative effects of interventions for mortality. The analysis included only RCD studies conducted before any trial was published on the same topic. The direction of treatment effects, confidence intervals, and effect sizes (odds ratios) were compared between RCD studies and randomized controlled trials. The relative odds ratio (that is, the summary odds ratio of trial(s) divided by the RCD study estimate) and the summary relative odds ratio were calculated across all pairs of RCD studies and trials. A summary relative odds ratio greater than one indicates that RCD studies gave more favorable mortality results.
RESULTS: The evaluation included 16 eligible RCD studies, and 36 subsequent published randomized controlled trials investigating the same clinical questions (with 17,275 patients and 835 deaths). Trials were published a median of three years after the corresponding RCD study. For five (31%) of the 16 clinical questions, the direction of treatment effects differed between RCD studies and trials. Confidence intervals in nine (56%) RCD studies did not include the RCT effect estimate. Overall, RCD studies showed significantly more favorable mortality estimates by 31% than subsequent trials (summary relative odds ratio 1.31 (95% confidence interval 1.03 to 1.65; I(2)=0%)).
CONCLUSIONS: Studies of routinely collected health data could give different answers from subsequent randomized controlled trials on the same clinical questions, and may substantially overestimate treatment effects. Caution is needed to prevent misguided clinical decision making. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2016 PMID： 26858277 PMCID： PMC4772787 DOI： 10.1136/bmj.i493

Source DB: PubMed Journal: BMJ ISSN： 0959-8138

Introduction

Routinely collected health data (RCD), such as electronic health records or patient registries, are proposed to assess comparative treatment effects of medical interventions. In theory, observational studies collecting this type of data could complement randomized controlled trials.1 The most important limitation of RCD studies is their inherent risk of bias due to confounding by indication. While only proper randomization can pre-emptively eliminate such bias, approaches such as propensity scores are frequently used to deal with bias in observational research. The propensity score reflects the probability that a patient will be selected for a treatment and is estimated by use of information on known factors affecting the treatment choice, for example, disease severity.2 3 Many other methods are increasingly used, but propensity scores are probably the most popular method used to inform healthcare decisions.3 4 5 Studies using data not collected for the purpose of a specific research project face many challenges and are prone to various specific biases related to the very nature of this data.1 A major challenge is the accuracy and reliability of the collected data, which is typically lower than many clinical trials with standardized and predefined outcome assessments. This might be less problematic for mortality, because it is an unambiguous outcome and less prone to data accuracy problems. Although their limitations should not be underestimated,1 RCD studies could provide the best available evidence to inform healthcare decisions when randomized controlled trials are not available. However, it is unknown whether such studies offer highly reliable answers on vital clinical questions, for example, whether the estimated treatment effects from RCD studies agree with effects demonstrated in subsequent randomized controlled trials. Most RCD studies are published on questions where there is already available evidence from trials. For example, a 2010 survey showed that almost 70% of 337 RCD studies based on propensity scores already had randomized controlled trials published on the same question.6 It is likely that the authors of these RCD studies may be consciously or unconsciously influenced by the already available results of the respective trials. To directly assess whether RCD studies can predict the results of subsequent randomized controlled trials, one needs to focus on topics where no prior trial evidence is available to influence what might be considered as reasonable effects to report by the RCD studies. We therefore aimed to obtain insights on the concordance between RCD studies and randomized controlled trials with a comprehensive meta-epidemiological study. The present study used RCD studies that analyzed a critical healthcare question, used propensity scores to deal with bias, and evaluated effects on mortality. We systematically compared the findings from such studies on various clinical questions (which have never been addressed in trials before), with the findings from subsequent randomized controlled trials.

Methods

Eligibility criteria and identification of routine data studies

Eligible RCD studies compared one treatment with another or no intervention, usual care, or standard treatment; were performed before any randomized controlled trial on the same clinical question; assessed mortality effects; and used propensity scores based analyses for mortality. We considered studies that used only data that were routinely collected. Any type of such data was considered eligible,7 8 including those from health insurance claims, electronic health or medical records, and registries (even if registries also comprised some actively collected data for the purpose of the registry rather than only passive, routine data collection).9 We considered studies evaluating drugs, biologics, dietary supplements, devices, diagnostic procedures, surgeries, or radiotherapies in any patient population with any condition, and mortality outcome (all cause or cause specific) that were published in English. We included studies published up to 2010 to ensure sufficient time for randomized evidence, if any, to appear. We searched PubMed (last search November 2014) combining terms for RCD (such as “routine*”, “database*”, “claim*”, “health record*”, registr*”, and covering all terms used in the National Library of Medicine search strategy for electronic health or medical records10), with terms for mortality and propensity scores. For further details on inclusion criteria, definitions, and search strategies, see reference 6. One reviewer (LGH) screened titles and abstracts and obtained full texts of potentially relevant articles and determined eligibility.

Data extraction from RCD studies

For each eligible study, we extracted all clinical questions reported in the abstract following the PICO structure (patient, intervention, comparison, outcome).11 We formulated separate clinical questions for each combination of patients and compared interventions (experimental and comparator) for which any result was reported in the abstract. We considered clinically relevant variations of treatment characteristics (such as timing or dose) or patient conditions (eg, comorbidities) as separate PICO clinical questions. We also considered specific subquestions separately—such as when the main comparison looked at coronary stenting versus no stenting, and subanalyses compared drug eluting stents with bare metal stents separately. We did not consider separately specific age subgroups within adult populations and demographic subpopulations (sex, race, or ethnicity). For each clinical question, we searched the complete article for a comparative effect between the compared interventions on mortality outcomes based on analyses that used propensity scores in any way (adjustment, selection of compared populations, both, or other). If we identified such an effect estimate, we screened the full text and references for randomized evidence on the same clinical question (not necessarily evaluating mortality outcomes). We excluded any clinical questions with existing prior trial evidence. We then extracted data on RCD study characteristics and the mortality effect estimate with 95% confidence intervals. If a study reported multiple estimates, we used the analysis with results first mentioned in the abstract (as a prespecified rule to avoid subjectivity in the selection of effects). One reviewer (LGH) extracted the data and screened the articles.

Eligibility criteria and identification of randomized controlled trials

For each eligible clinical question, we systematically searched PubMed (to November 2014) for randomized controlled trials or systematic reviews or meta-analyses of trials that also addressed this question and reported any mortality outcome. We created standardized search strategies for each topic by combining search terms for the intervention, comparator, and condition. We used the PubMed standard filters for study design, limited results to the English language, and added terms for mortality to increase specificity when we searched for trials and diagnostic topics (web appendix 1 and reference 6). For RCD studies published up to 2007, we also searched all relevant modules of the Cochrane Library, but found no pertinent randomized controlled trial that was not also identified via PubMed; thus for newer RCDs, we only searched PubMed. We screened titles and abstracts, obtained full texts of potentially relevant articles and determined eligibility. The resulting randomized controlled trials derived from these searches were considered for further analyses. We tested the completeness of our search by using the related articles function in PubMed for each eligible trial (screening the first 20 related articles), and in no case we found an additional trial. These processes were all done by one reviewer (LGH) who marked studies if he was uncertain about eligibility. These studies were discussed with a second reviewer (DCI), who also confirmed the eligibility of all identified pertinent trials and spot checked all excluded full texts for verification. Discrepancies were discussed to reach consensus. We excluded from further analyses any clinical questions for which preceding trials (that is, published up until the year before the RCD study was published) were identified with the above searches.

Data extraction from randomized controlled trials

For each eligible trial, we extracted the number of randomized patients and deaths per treatment group (we preferred intention to treat data wherever possible). If a trial had multiple mortality endpoints, we preferred the same type of outcome definition as in the RCD study (all cause or cause specific mortality) and the most similar follow-up period (eg, inhospital and 30 day mortality). We extracted the proportions of patients not initiating the randomized treatment and patients switching to the non-allocated treatment during the study (treatment crossover). Data extraction was performed by one reviewer (LGH).

Risk of bias assessment

We assessed the risk of bias for RCD studies (DCI, JPAI) and randomized controlled trials (LGH, and an external researcher experienced in systematic reviewing), using the Cochrane risk of bias tools.12 13 Discrepancies were discussed to reach consensus.

Statistical analysis

For consistency, we inverted the RCD effect estimates where necessary so that each RCD study indicated an odds ratio less than 1 (that is, swapping the study groups so that the first study group has lower mortality risk than the second). We assumed that reported relative risks or hazard ratios were approximations to the odds ratio, a reasonable assumption because death was a relatively uncommon event (median across treatment comparisons 3% (interquartile range 2-9%)). For each clinical question, we also calculated the odds ratio for mortality using data from randomized controlled trials for the same clinical question. Multiple trials were meta-analytically combined with random effects models to obtain a summary odds ratio.14 We used Peto’s approach for event rates less than 1%.15 We recorded how frequently the treatment effect estimates from RCD studies and randomized controlled trials were in the opposite direction, how often the confidence intervals did not overlap, and how often the RCD study’s confidence interval did not include the effect estimate demonstrated by later available trials. We also calculated for each clinical question the relative odds ratio (ratio of odds ratios) by dividing the summary odds ratio of all subsequent randomized controlled trials by the estimated odds ratio in the RCD study. Confidence intervals of relative odds ratios were calculated by use of the sum of the variances of the trial summary odds ratio and of the RCD study odds ratio estimate. We then combined the individual relative odds ratios across all questions to calculate the summary value. A summary relative odds ratio greater than 1 indicates that the RCD study found more favorable mortality outcomes than subsequent trials. Calculations were done after log-transformation. We conducted several sensitivity analyses: • Used fixed effect models instead of random effects models to combine effect sizes from randomized controlled trials14 • Excluded trials with a high risk of bias • Excluded trials reporting high treatment crossover rates (>20% in any group) or asymmetric crossover (between group difference >10%) • Included only trials clearly reporting low treatment crossover rates (<10% in all groups) • Excluded trials with frequent non-initiation of randomized treatment (>10% in any group) • Excluded trials in which the median age differed by more than two standard deviations from the median age in the RCD study • Used the effect estimates from two mutually exclusive patient subgroups instead of the main effect from one RCD study16 and compared them with the summary odds ratio for the trials representing effects specifically for these subgroups • Excluded one clinical question where all pertinent trials were already used for another treatment comparison17 18 • Used only trials identified by search strategies of existing systematic reviews • Included only RCD studies with low risk of bias for all assessed domains (with the exception of “bias due to confounding,” which was deemed moderate for all RCD studies). We used Stata 13.1 (Stata Corp) for all analyses and reported 95% confidence intervals. All P values were two tailed.

Patient involvement

No patients were involved in setting the research question or the outcome measures, nor were they involved in developing plans for design or implementation of the study. No patients were asked to advise on interpretation or writing up of results. There are no plans to disseminate the results of the research to study participants or the relevant patient community.

Results

In the search for RCD studies, we identified 929 records and evaluated 420 in full text (fig 1). We found preceding randomized evidence on all evaluated clinical questions in 231 studies, did not find any subsequent randomized controlled trials in 90 studies, and excluded 83 studies for different reasons (fig 1). We eventually analyzed 16 RCD studies on clinical questions that did not have preceding trials and for which subsequent pertinent trials were identified (table 1). One study reported on three clinical questions with one primary result (which we included in our main analysis) and two subgroup effects (included alternatively in sensitivity analyses).16

Fig 1 Study flow diagram. RCT=randomized controlled trial

Table 1

Description of analyzed treatment comparisons in routinely collected data studies

RCD study	Treatment comparison	Condition or disease	Total no of patients (groups 1/2)	Follow-up	Data source
Holman 2000*¹⁹	Preincision prophylactic intra-aortic balloon pump v no pump	CABG; stable, high risk patients	7581 (592/6989)	Unclear	Administrative data: “Alabama CABG Cooperative Project database,” USA
Shavelle 2002²⁰	Very early angiography (within 6h) v early conservative treatment (no or later angiography)	NSTEMI	2810 (1405/1405)	In hospital	National Registry of Myocardial Infarction 2 database, USA
Winkelmayer 2002²¹	Hemodialysis v peritoneal dialysis	Renal replacement therapy	2503 (1966/537)	3 months	Medicare/Medicaid, USA
Karthik 2003²²	Off pump v on pump CABG	CABG, non-elective	828 (417/411)	In hospital	Hospital database: “cardiac surgery registry,” UK
Guru 2006²³	2 or 3 arterial grafts v 1 arterial graft	CABG	10 982 (5491/5491)	4-6 years (average)	Linked clinical and administrative data: “Cardiac Care Network Database,” Canada
Wu 2008²⁴	CABG v drug eluting stent	Unprotected, left main coronary artery stenosis	112 (56/56)	8 months	Registries: “New York State Cardiac Surgery Reporting System” and “Percutaneous Coronary Intervention Reporting System,” USA
Ascione 2003²⁵	On pump v off pump CABG	CABG, severe left ventricular dysfunction	250 (176/74)	In hospital	Hospital database: “Patient Analysis and Tracking System,” UK
Polkinghorne 2004²⁶	Native arteriovenous fistula v arteriovenous grafts	Hemodialysis	2632 (2261/371)	1.07 years	Australian and New Zealand Dialysis and Transplant Association Registry, Australia and New Zealand
Gnerlich 2007²⁷	Surgical removal of the primary tumor v no surgical removal	Metastatic breast cancer (stage IV)	9734 (4578/5156)	≤15 years	Registry: “Surveillance, Epidemiology, and End Results,” USA
Lindenauer 2004²⁸	Early perioperative lipid lowering treatment (during first 2 hospital days) v no or later treatment	Major non-cardiac surgery	204 885 (73 050/131 835)	In hospital	Hospital discharge and pharmacy records of 329 hospitals, USA
Butler 2009¹⁸	Clopidogrel treatment duration ≥12 months v ≤6 months	Drug eluting stenting	1669 (1022/647)	1 year	Melbourne Interventional Group Registry, Australia
Cabell 2005†²⁹	Surgery v non surgical treatment	Infective endocarditis, native valve	299 (94/205)	In hospital	Registry: “International Collaboration on Endocarditis Merged Database,” Europe and USA
Kim 2009¹⁶	Clopidogrel initiation after CABG v no clopidogrel	CABG, preoperative treatment with aspirin	15 067 (3268/11 799)	In hospital	Clinical, administrative and financial data: “University Health System Database,” USA
Moss 2003³⁰	Mitral valve repair v mitral valve replacement	Mitral valve surgery	644 (322/322)	3.4 years	British Columbia Cardiac Registry, Canada
Fonarow 2008³¹	Beta blocker continuation v beta blocker withdrawal	Heart failure, decompensated	1429 (79/1350)	3 month post-discharge	Registry: OPTIMIZE/HF program, USA
Hahn 2010‡¹⁷	Clopidogrel treatment duration 3 month v >3 months	Drug eluting stenting	823 (661/151)	1 year	Duration of Dual Antiplatelet Therapy After Implantation of Endeavour Stent Registry, Republic of Korea

CABG=coronary artery bypass graft; HF=heart failure; NSTEMI=non-ST elevation myocardial infarction; RCD=routinely collected health data.

*We used the only available mortality effect reported as relative effects measure with confidence intervals because no mortality results were reported in the abstract.

†We used mortality rates reported for the median propensity stratum and calculated odds ratios because no mortality results were reported in the abstract.

‡For this treatment comparison, both identified subsequent randomized controlled trials are also relevant for another treatment comparison (Butler 2009).

Fig 1 Study flow diagram. RCT=randomized controlled trial Description of analyzed treatment comparisons in routinely collected data studies CABG=coronary artery bypass graft; HF=heart failure; NSTEMI=non-ST elevation myocardial infarction; RCD=routinely collected health data. *We used the only available mortality effect reported as relative effects measure with confidence intervals because no mortality results were reported in the abstract. †We used mortality rates reported for the median propensity stratum and calculated odds ratios because no mortality results were reported in the abstract. ‡For this treatment comparison, both identified subsequent randomized controlled trials are also relevant for another treatment comparison (Butler 2009). RCD studies were published between 2000 and 2010 and used diverse types of routine data including registries, hospital databases, and administrative data. Most studies were relevant to cardiology (12 (75%) of 16), and 11 (69%) compared two active interventions. All RCD studies assessed all cause mortality, and comparative effect estimates were based on a median of 2086 patients per analysis (interquartile range 734-8658; table 1). While we deemed the risk of bias due to confounding moderate for all studies, most had a low risk of bias for other types of bias. The overall risk of bias was therefore low to moderate for all RCD studies (web appendix 2). We identified 36 subsequent randomized controlled trials32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 with 17 275 patients and 835 deaths overall, addressing the same clinical question as the RCD studies. All trials reported all cause mortality, and were published between 2003 and 2014, a median of three years after the RCD study. For each clinical question, we included a median of 985 randomized patients (interquartile range 287-1696; fig 2 and fig 3). We deemed the risk of bias high for 10 trials, mainly due to lack of blinding (web appendix 3).

Fig 3 Meta-analyses of comparative effects of medical interventions on mortality reported in randomized controlled trials published after the same clinical question was investigated in RCD studies (part two). For each clinical question investigated in a RCD study, the trials published subsequently are shown. Diamonds=result of meta-analyses combining these subsequent trials as summary odds ratios (using random effects models)

Fig 2 Meta-analyses of comparative effects of medical interventions on mortality reported in randomized controlled trials published after the same clinical question was investigated in RCD studies (part one). For each clinical question investigated in a RCD study, the trials published subsequently are shown. Diamonds=result of meta-analyses combining these subsequent trials as summary odds ratios (using random effects models) Fig 3 Meta-analyses of comparative effects of medical interventions on mortality reported in randomized controlled trials published after the same clinical question was investigated in RCD studies (part two). For each clinical question investigated in a RCD study, the trials published subsequently are shown. Diamonds=result of meta-analyses combining these subsequent trials as summary odds ratios (using random effects models)

Agreement of treatment effects

Across 16 clinical questions, eight RCD studies found significant treatment effects (fig 4). Confidence intervals were wide and overlapped between RCD studies and randomized controlled trials in all 16 treatment comparisons. However, in more than half of cases (nine of 16; 56%), the confidence intervals of the RCD based estimate did not include the mortality effect found in subsequent randomized trial evidence. For five (31%) of 16 clinical questions, treatment effects from randomized evidence were in the opposite direction to the RCD study estimate. None of these five trial estimates was significant, and one RCD study estimate was significant.

Fig 4 Treatment effects on mortality in RCD studies and randomized controlled trials. Left panel shows comparative effects of medical interventions on mortality reported in RCD studies and results of subsequently published trials on the same treatment comparisons. White circles=effect estimates reported in RCD studies; blue circles=pooled summary effects from subsequent trials (corresponding meta-analyses are shown in fig 2 and fig 3); lines=95% confidence intervals. Right panel shows for each clinical question the ratio of mortality effects reported in trial evidence versus RCD study effects (as relative odds ratios). Blue squares (lines)=relative odds ratio (95% confidence intervals); blue diamond=pooled summary relative odds ratio (meta-analysis of relative odds ratio) across all clinical questions. A relative odds ratio greater than 1 indicates more favorable mortality outcomes in RCD studies than in subsequent trials When data were synthesized, RCD studies showed significantly inflated results compared with randomized controlled trials, with an average overestimation of mortality benefits by 31% (summary relative odds ratio 1.31 (95% confidence interval 1.03 to 1.65); table 2, fig 4). There was no heterogeneity between topics (I2=0% (0% to 45%)). The results were quite similar in all sensitivity analyses (table 2), with estimates of summary relative odds ratios ranging between 1.20 and 1.34 and their 95% confidence intervals excluding the null in six of the 10 sensitivity analyses. We found the smallest estimate of a difference between RCD studies and trials (summary relative odds ratio 1.20) when we considered only RCD studies with a low risk of bias on all dimensions (except for confounding bias, where a moderate risk is probably the best one can expect for this type of study design).

Table 2

Agreement of treatment effects reported in RCD studies and subsequent randomized trial evidence

Analysis	No of treatment comparisons	Summary ROR (95% CI)	I² (%; 95% CI)
Main analysis
Random effect models to combine RCTs	16	1.31 (1.03 to 1.65)	0 (0 to 45)
Sensitivity analyses
Fixed effect models to combine RCTs	16	1.34 (1.09 to 1.63)	0 (0 to 45)
Exclusion of RCTs with high risk of bias	14	1.21 (0.92 to 1.59)	0 (0 to 47)
Exclusion of RCTs with high treatment crossover rates or with asymmetric crossover	15	1.34 (1.05 to 1.70)	0 (0 to 46)
RCTs that had low treatment crossover rates	12	1.27 (0.92 to 1.76)	0 (0 to 51)
Exclusion of RCTs with frequent non-initiation of randomized treatment	16	1.31 (1.04 to 1.65)	0 (0 to 45)
Exclusion of RCTs with age differences >2 SD	16	1.28 (1.01 to 1.62)	0 (0 to 45)
Two subquestions instead of main question	15	1.33 (1.05 to 1.68)	0 (0 to 46)
Exclusion of treatment comparison with RCTs that are also pertinent for another treatment comparison	15	1.28 (1.01 to 1.62)	0 (0 to 46)
RCTs that were identified in available systematic reviews	10	1.21 (0.88 to 1.65)	0 (0 to 53)
RCD studies with low risk of bias except for “bias due to confounding”*	11	1.20 (0.89 to 1.62)	0 (0 to 51)

RCD=routinely collected health data; RCT=randomized controlled trial; ROR=relative odds ratio; SD=standard deviation.

*All RCD studies were deemed to have moderate risk of bias due to confounding.

Agreement of treatment effects reported in RCD studies and subsequent randomized trial evidence RCD=routinely collected health data; RCT=randomized controlled trial; ROR=relative odds ratio; SD=standard deviation. *All RCD studies were deemed to have moderate risk of bias due to confounding.

Discussion

Principal findings

In our comprehensive analysis of various clinical questions on topics never evaluated in randomized controlled trials before, we found that studies using routinely collected health data frequently do not agree with subsequent randomized trials. We analyzed 16 clinical questions with 36 corresponding subsequent trials published a median of three years later. Although our results need to be interpreted cautiously given the relatively small numbers of studies, the emerging pattern was that RCD studies systematically and substantially overestimated the mortality benefits of medical treatments compared with subsequent trials investigating the same question. The overall findings suggest that results from RCD studies in the absence of randomized controlled trials need to be seen with substantial caution. RCD studies might not necessarily provide reliable answers on how to best treat patients. As an example, the clinical consequences might be illustrated by the clinical question in our analysis with the largest body of randomized evidence—that is, on the duration of clopidogrel treatment after use of drug eluting stents.18 Here, the RCD based estimate suggested substantial and significant reductions in mortality (odds ratio 0.59 (95% confidence interval 0.35 to 0.99)), leaving the study authors to conclude that “longer (≥12 months) planned duration of clopidogrel results in reduced 12-month mortality . . . Randomized studies are urgently needed to address this issue.”18 However, later trial evidence showed no benefit of longer clopidogrel treatment and rather indicated harm, and the confidence intervals were not compatible with the early findings in the RCD study (odds ratio 1.11 (95% confidence interval 0.85 to 1.45)). This shows that RCD studies have a substantial risk of misguiding patient care.1

Comparison with other studies

A recent Cochrane review identified 14 previous meta-epidemiological studies comparing randomized and observational study results.68 Most focused on traditional observational epidemiology rather than on RCD studies, and only two meta-epidemiological analyses compared propensity score analyses with randomized controlled trials.69 70 A further empirical evaluation was excluded from the Cochrane review.71 In their analysis of mortality effects across 22 clinical questions in the field of surgery, Lonjon and colleagues found a point estimate of a summary relative odds ratio that was similar to our analysis (1.20, 95% confidence interval 0.96 to 1.54; original results inverted to allow comparison with this study).69 For subjective outcomes, they found a summary relative odds ratio close to 1 (0.93, 95% confidence interval 0.75 to 1.15). The authors interpreted the lack of statistical difference between study designs as evidence for equivalent effects. However, 20-30% relative changes in the odds of mortality are substantial, because most differences in mortality with treatments across medicine are of this magnitude or even smaller.72 Kuss and colleagues analyzed only one treatment comparison (off pump v on pump cardiac bypass surgery) and similarly interpreted lack of statistical difference as signaling equivalence.70 Dahabreh and colleagues analyzed mortality effects of treatments in the setting of acute coronary syndrome.71 They also found that propensity score analyses gave significantly larger effect sizes than RCTs.70

Strengths and limitations of study

All these previous empirical evaluations were restricted to specific topics and none evaluated clinical questions where all the data from randomized controlled trials were published subsequently to the RCD studies. However, many RCD studies are specifically undertaken to explore whether trials results can be replicated in the real world.6 In such cases, the trial evidence provides some prior knowledge that could inhibit the publication of findings that deviate greatly from the trial experience. Thus, our approach provides a more clean assessment of the ability of RCD results to predict the results of trials. Some caveats should be considered in our study. Although we screened many RCD studies using propensity scores, only a fraction of the entire RCD literature was eligible for our analyses. This was largely due to the high number of clinical questions that were already addressed by some randomized trials, as we have previously discussed.1 6 However, we followed a systematic approach to derive a reproducible sample of RCD studies that covers a wide range of diverse healthcare questions. Although many relate to cardiovascular conditions, they represent various types of interventions, including surgery, devices, drug treatment, or treatment concepts. The generalizability to other conditions and diseases might also need to be assessed in the future. The RCD studies included in our sample encompass a wide spectrum of data sources, from administrative hospital databases to committed registries. These data sources might differ with regard to their granularity, validation processes, and completeness. The sample was too small to allow a meaningful evaluation of differences across different subgroups of routine data sources. We have no detailed information on the accuracy of the key information of interest for our analyses (mortality and treatment allocation). Although we assume high data accuracy given the type of outcome (death is difficult to err on) and the clinical prominence of the assessed interventions, we cannot rule out that accuracy problems further reduce the reliability of such research. Our PubMed search strategy for subsequent randomized controlled trials was relatively specific. It would be difficult to conduct thorough systematic reviews from scratch with highly sensitive search strategies for all the 106 RCD studies without preceding trials that we evaluated. Instead, we used a standardized search approach, systematically integrated existing systematic reviews and validated the search results with alternative identification algorithms—that is, the related article function in PubMed. Although the number of included clinical questions with pairs of RCD study and trials could have been higher with a more sensitive strategy for subsequent trials, we had similar results in sensitivity analyses restricted to trial results obtained from search strategies of existing systematic reviews. We assessed only mortality effects. Other more subjective clinical outcomes would probably be collected less accurately in the routinely collected datasets. This might further reduce the validity of treatment effect estimates and further limit the reliability of RCD studies to guide clinical decision making. Conversely, some other types of outcomes might have much larger treatment effects than mortality, and thus it might be easier to separate from noise due to bias in RCD studies. However, treatment benefits for other outcomes (eg, hospital admission) might not necessarily translate to benefits for mortality or other hard benefits.73 We compared the RCD effects with early evidence from subsequent randomized trials that sometimes overestimates treatment effects.74 Thus, our results even might be conservative and we may have underestimated the inflated and optimistic effects from RCD studies. Randomized controlled trials are not necessarily a perfect gold standard. When their results differ against those of observational studies on the same question,75 76 it may not be certain that the trials are correct and the observational data are wrong. We explored the potential effect of risk of bias in the randomized and non-randomized studies. None of the RCD studies and only a few trials were deemed to have high risk of bias. When we compared only the effects from studies without high bias potential, we found similar effects as in the main analysis. We used intention to treat effects for our comparison with RCD studies, because this is the most robust approach against bias. Such effects could be conservative in trials without active controls, low adherence to the allocated treatment, or high dropout rates. However, most trials compared active treatments, most had only very few patients not starting the allocated treatment or switching to the other treatment during the study, and none had a high risk of bias due to missing outcome information (dropouts). In various sensitivity analyses, we found no indication that use of intention to treat effects affected our main findings. For RCD studies, the assessment of the risk of bias is not straightforward. Use of propensity score methods helps to reduce confounding, but it is unlikely that confounding can be eliminated. It is difficult even to judge to what exact extent confounding has been reduced with different propensity adjustments or other approaches. For other dimensions of potential bias beyond confounding, our selected studies might have been at lower risk for bias than many other RCD studies that look at outcomes other than mortality. For non-mortality outcomes, missing information, measurement errors, and availability of diverse definitions and analyses could be more prominent than for death. Our results remained largely similar in different sensitivity analyses, although we did see the lowest estimate for a summary relative odds ratio (indicating closest convergence of results from randomized controlled trials and RCD studies) when we considered RCD studies with low risk of bias in all dimensions (other than confounding). We cannot exclude the possibility that RCD studies become better in predicting trial results when bias is minimized, although much more data are needed to make a conclusive statement about this. Genuine differences in estimated effect sizes could still exist between the two methods. Nevertheless, we tried to make the PICO structure highly comparable in the juxtaposed RCD studies and randomized controlled trials that we evaluated. It is also unclear whether those questions where subsequent trials were performed are qualitatively different from those where subsequent trials are never performed once an effect has been described in the observational literature. When strong, conclusive effects are seen in RCD studies, there may be less likelihood to perform a subsequent trial.72 However, it is unlikely that such strong, conclusive effects are commonly seen.

Conclusions and policy implications

Despite the wide and increasing use of routinely collected health data in comparative effectiveness research, the reliability of this approach needs to be questioned, especially when effectiveness outcomes are concerned and randomized controlled trials might be feasible to conduct. Of course, for some outcomes (especially on safety or harms), it may be difficult to obtain definitive evidence from large trials, and RCD data could then offer the best possible guidance. If no randomized trials exist, clinicians and funders of care can still act on the results from observational RCD and other evidence, but they should consider that treatment effects could be more uncertain and substantially smaller than what RCD studies suggest. Therefore, decisions for widespread adoption and reimbursement of expensive interventions with evidence based entirely on RCD may be best withheld until trial evidence becomes available. Large randomized trials might still be needed to address critically important clinical questions for patient relevant outcomes.1 77 78 Observational studies using routinely collected data (RCD studies) are increasingly used to inform healthcare decisions when RCTs are not available However, observational studies have an inherent risk of bias due to confounding by indication Another difficulty is the accuracy and reliability of routinely collected data RCD studies systematically and substantially overestimate mortality benefits of medical treatments compared with subsequent trials investigating the same question Observational RCD studies might not necessarily provide very reliable answers on how to best treat patients; caution is needed to prevent misguided clinical decision making If no randomized trials exist, clinicians and funders of care should consider that treatment effects are probably more uncertain and substantially smaller than RCD studies suggest; decisions for widespread adoption and reimbursement of expensive interventions might be best withheld until trial evidence becomes available

69 in total

1. Outcomes in patients with de novo left main disease treated with either percutaneous coronary intervention using paclitaxel-eluting stents or coronary artery bypass graft treatment in the Synergy Between Percutaneous Coronary Intervention with TAXUS and Cardiac Surgery (SYNTAX) trial.

Authors: Marie-Claude Morice; Patrick W Serruys; A Pieter Kappetein; Ted E Feldman; Elisabeth Ståhle; Antonio Colombo; Michael J Mack; David R Holmes; Lucia Torracca; Gerrit-Anne van Es; Katrin Leadley; Keith D Dawkins; Friedrich Mohr
Journal: Circulation Date: 2010-06-07 Impact factor: 29.690

2. Good research practices for comparative effectiveness research: analytic methods to improve causal inference from nonrandomized studies of treatment effects using secondary data sources: the ISPOR Good Research Practices for Retrospective Database Analysis Task Force Report--Part III.

Authors: Michael L Johnson; William Crown; Bradley C Martin; Colin R Dormuth; Uwe Siebert
Journal: Value Health Date: 2009-09-29 Impact factor: 5.725

3. Aspirin and clopidogrel use in the early postoperative period following on-pump and off-pump coronary artery bypass grafting.

Authors: Dae Hyun Kim; Constantine Daskalakis; Scott C Silvestry; Mital P Sheth; Andrew N Lee; Suzanne Adams; Sam Hohmann; Sofia Medvedev; David J Whellan
Journal: J Thorac Cardiovasc Surg Date: 2009-12 Impact factor: 5.209

4. One-year results of total arterial revascularization vs. conventional coronary surgery: CARRPO trial.

Authors: Sune Damgaard; Jørn Wetterslev; Jens T Lund; Nikolaj B Lilleør; Mario J Perko; Henning Kelbaek; Jan K Madsen; Daniel A Steinbrüchel
Journal: Eur Heart J Date: 2009-03-06 Impact factor: 29.983

5. B-CONVINCED: Beta-blocker CONtinuation Vs. INterruption in patients with Congestive heart failure hospitalizED for a decompensation episode.

Authors: Guillaume Jondeau; Yannick Neuder; Jean-Christophe Eicher; Patrick Jourdain; Elodie Fauveau; Michel Galinier; Arnaud Jegou; Fabrice Bauer; Jean Noel Trochu; Anissa Bouzamondo; Marie-Laure Tanguy; Philippe Lechat
Journal: Eur Heart J Date: 2009-08-30 Impact factor: 29.983

6. The effect of intended duration of clopidogrel use on early and late mortality and major adverse cardiac events in patients with drug-eluting stents.

Authors: Michelle J Butler; David Eccleston; David J Clark; Andrew E Ajani; Nick Andrianopoulos; Angela Brennan; Gishel New; Alexander Black; Gregory Szto; Chris M Reid; Bryan P Yan; James A Shaw; Anthony M Dart; Stephen J Duffy
Journal: Am Heart J Date: 2009-05 Impact factor: 4.749

7. Bisoprolol and fluvastatin for the reduction of perioperative cardiac mortality and myocardial infarction in intermediate-risk patients undergoing noncardiovascular surgery: a randomized controlled trial (DECREASE-IV).

Authors: Martin Dunkelgrun; Eric Boersma; Olaf Schouten; Ankie W M M Koopman-van Gemert; Frans van Poorten; Jeroen J Bax; Ian R Thomson; Don Poldermans
Journal: Ann Surg Date: 2009-06 Impact factor: 12.969

8. Off-pump versus on-pump myocardial revascularization in patients with ST-segment elevation myocardial infarction: a randomized trial.

Authors: Khalil Fattouch; Francesco Guccione; Pietro Dioguardi; Roberta Sampognaro; Egle Corrado; Marco Caruso; Giovanni Ruvolo
Journal: J Thorac Cardiovasc Surg Date: 2009-03 Impact factor: 5.209

9. Fluvastatin and perioperative events in patients undergoing vascular surgery.

Authors: Olaf Schouten; Eric Boersma; Sanne E Hoeks; Robbert Benner; Hero van Urk; Marc R H M van Sambeek; Hence J M Verhagen; Nisar A Khan; Martin Dunkelgrun; Jeroen J Bax; Don Poldermans
Journal: N Engl J Med Date: 2009-09-03 Impact factor: 91.245

10. Immediate vs delayed intervention for acute coronary syndromes: a randomized clinical trial.

Authors: Gilles Montalescot; Guillaume Cayla; Jean-Philippe Collet; Simon Elhadad; Farzin Beygui; Hervé Le Breton; Rémi Choussat; Florence Leclercq; Johanne Silvain; François Duclos; Mounir Aout; Jean-Luc Dubois-Randé; Olivier Barthélémy; Grégory Ducrocq; Anne Bellemain-Appaix; Laurent Payot; Philippe-Gabriel Steg; Patrick Henry; Christian Spaulding; Eric Vicaut
Journal: JAMA Date: 2009-09-02 Impact factor: 56.272

42 in total

1. The Expanding Role of Real-World Evidence Trials in Health Care Decision Making.

Authors: David C Klonoff
Journal: J Diabetes Sci Technol Date: 2019-03-06

2. Is research from databases reliable? Not sure.

Authors: Meri R J Varkila; Olaf L Cremer
Journal: Intensive Care Med Date: 2018-12-14 Impact factor: 17.440

3. Ethics and Epistemology in Big Data Research.

Authors: Wendy Lipworth; Paul H Mason; Ian Kerridge; John P A Ioannidis
Journal: J Bioeth Inq Date: 2017-03-20 Impact factor: 1.352

4. CCCS-SSAI WikiRecs clinical practice guideline: vasopressor blood pressure targets in critically ill adults with hypotension and vasopressor use in early traumatic shock.

Authors: B Rochwerg; M Hylands; M H Møller; P Asfar; D Cohen; R G Khadaroo; J H Laake; A Perner; T Tanguay; S Widder; P Vandvik; A Kristiansen; F Lamontagne
Journal: Intensive Care Med Date: 2017-05-11 Impact factor: 17.440

5. Is research from databases reliable? No.

Authors: Anders Perner; Rinaldo Bellomo; Morten Hylander Møller
Journal: Intensive Care Med Date: 2019-01-16 Impact factor: 17.440

6. A Bias in the Evaluation of Bias Comparing Randomized Trials with Nonexperimental Studies.

Authors: Jessica M Franklin; Sara Dejene; Krista F Huybrechts; Shirley V Wang; Martin Kulldorff; Kenneth J Rothman
Journal: Epidemiol Methods Date: 2017-04-22

Review 7. Benchmarking Observational Analyses Against Randomized Trials: a Review of Studies Assessing Propensity Score Methods.

Authors: Shaun P Forbes; Issa J Dahabreh
Journal: J Gen Intern Med Date: 2020-03-19 Impact factor: 5.128

Review 8. Opportunities and challenges in using real-world data for health care.

Authors: Vivek A Rudrapatna; Atul J Butte
Journal: J Clin Invest Date: 2020-02-03 Impact factor: 14.808

Review 9. From Nonclinical Research to Clinical Trials and Patient-registries: Challenges and Opportunities in Biomedical Research.

Authors: José M de la Torre Hernández; Elazer R Edelman
Journal: Rev Esp Cardiol (Engl Ed) Date: 2017-08-31

10. The authors reply.

Authors: David R Janz; Matthew W Semler; Todd W Rice
Journal: Crit Care Med Date: 2017-04 Impact factor: 7.598