Literature DB >> 28176959

A systematic literature review on the efficacy-effectiveness gap: comparison of randomized controlled trials and observational studies of glucose-lowering drugs.

Mikkel Z Ankarfeldt¹, Erpur Adalsteinsson², Rolf Hh Groenwold³, M Sanni Ali⁴, Olaf H Klungel³.

Abstract

AIM: To identify a potential efficacy-effectiveness gap and possible explanations (drivers of effectiveness) for differences between results of randomized controlled trials (RCTs) and observational studies investigating glucose-lowering drugs.
METHODS: A systematic literature review was conducted in English language articles published between 1 January, 2000 and 31 January, 2015 describing either RCTs or observational studies comparing glucagon-like peptide-1 analogs (GLP-1) with insulin or comparing dipeptidyl peptidase-4 inhibitors (DPP-4i) with sulfonylurea, all with change in glycated hemoglobin (HbA1c) as outcome. Medline, Embase, Current Content, and Biosis were searched. Information on effect estimates, baseline characteristics of the study population, publication year, study duration, and number of patients, and for observational studies, characteristics related to confounding adjustment and selection- and information bias were extracted.
RESULTS: From 312 hits, 11 RCTs and 7 observational studies comparing GLP-1 with insulin, and from 474 hits, 16 RCTs and 4 observational studies comparing DPP-4i with sulfonylurea were finally included. No differences were observed in baseline characteristics of the study populations (age, sex, body mass index, time since diagnosis of type 2 diabetes mellitus, and HbA1c) or effect sizes across study designs. Mean effect sizes ranged from -0.43 to 0.91 and from -0.80 to 1.13 in RCTs and observational studies, respectively, comparing GLP-1 with insulin, and from -0.13 to 2.70 and -0.20 to 0.30 in RCTs and observational studies, respectively, comparing DPP-4i and sulfonylurea. Generally, the identified observational studies held potential flaws with regard to confounding adjustment and selection- and information bias.
CONCLUSIONS: Neither potential drivers of effectiveness nor an efficacy-effectiveness gap were identified. However, the limited number of studies and potential problems with confounding adjustment, selection- and information bias in the observational studies, may have hidden a true efficacy-effectiveness gap.

Entities: Chemical Disease Gene Species

Keywords: diabetes mellitus; efficacy–effectiveness gap; glucose-lowering drugs; hemoglobin A1c; literature review; type 2

Year: 2017 PMID： 28176959 PMCID： PMC5271378 DOI： 10.2147/CLEP.S121991

Source DB: PubMed Journal: Clin Epidemiol ISSN： 1179-1349 Impact factor: 4.790

Introduction

The beneficial effects of drugs can be divided into efficacy and effectiveness. The efficacy of a drug describes the biological effect and can be seen as the effect evaluated under optimal conditions in randomized controlled trials (RCTs). The effectiveness of a drug describes the effect under circumstances of routine clinical practice. The efficacy–effectiveness gap refers to the difference between the (in theory) largest possible effect of a drug and its effect in clinical practice.1–4 A comparison of RCTs and observational studies can be used as a model to investigate and better understand the efficacy–effectiveness gap. The population in routine clinical practice may differ from the often highly selected study population included in RCTs,5–10 which could be one possible reason for an efficacy–effectiveness gap. Observational studies usually reflect the population seen in clinical practice, and also other factors such as the delivery of care, adherence to treatment, and time between treatment and assessment of the outcome are often more similar to ordinary clinical practice than that which is seen in RCTs because observational studies are often based on real-world data.11 Discrepancies in the results from RCTs and observational studies may be due to biases in the observational study design,12–15 but may also be explained by an efficacy–effectiveness gap. An understanding of the efficacy–effectiveness gap is important for patients, health care professionals, payers, regulators, and the pharmaceutical industry to provide effective treatments.3,16 The aim of this literature review is to identify a potential efficacy–effectiveness gap, by comparing RCTs and observational studies investigating glucose-lowering drugs in relation to change in glycated hemoglobin (HbA1c), and if such a gap exists, to investigate whether it can be explained by differences in the baseline characteristics of the study populations or other features that characterize the RCTs and observational studies.

Methods

A systematic literature search was performed to identify RCTs and observational studies fulfilling the following inclusion criteria: published between 1 January, 2000 and 31 January, 2015 in English language and compared either glucagon-like peptide-1 analogs (GLP-1) with insulin or dipeptidyl peptidase-4 inhibitors (DPP-4i) with sulfonylurea, all with change in HbA1c as an outcome. The chosen comparator groups were to compare second-line (DPP-4i and sulfonylurea) and third-line (GLP-1 and insulin) treatments, respectively.17 Especially, observational studies are difficult to identify, and therefore, more search terms were used to identify such studies, and covered both prospective and retrospective studies, as well as cohort and case–control studies. The key terms and the combination of these can be found in the supplementary material. The following databases were used: Medline, Embase, Current Content, and Biosis. The search strategy was developed by one of the reviewers (MZA) and a librarian. References of the identified studies were searched to identify additional relevant studies. The studies identified through the literature search were screened on title and abstract by two reviewers independently. Disagreements were settled through discussions and consensus. Full text was read by a single reviewer, who extracted information on the baseline characteristics of the study population, other features that described the included studies, and effect estimate from text and tables in the included studies. Some of the hits from the search were abstracts published in relation to scientific conferences. Information from conference abstracts was not included in this review. If a conference abstract seemed relevant, an attempt was made to identify the published studies related to the conference abstract by web search and by contacting the authors of the conference abstract. Post hoc, it was decided to exclude studies comparing DPP-4i with sulfonylurea during Ramadan in Muslim populations (three RCTs and six observational studies) because we did not want to compare across fasting and nonfasting studies and to exclude studies with fast-acting insulin (five RCTs) because we did not want to compare across fast-acting and basal insulins. Studies investigating mixed insulin (combination of fast-acting and intermediate/long-acting) were included. Post hoc exclusion criteria were applied as we gained knowledge when working on the review. Importantly, none of the post hoc exclusion criteria are in conflict with the initial inclusion criteria and they only narrow the inclusion criteria further. If the identified RCTs and the observational studies included treatment arms of other drugs or placebo, only information about the relevant treatment arms was extracted. If several publications were based on the same study population, but with different follow-up time, the information on patient characteristics was extracted once, while each effect size at different time points was extracted. If studies included several analyses, for example, intention-to-treat and per protocol, the analysis that was reported as the primary analysis was extracted. Two RCTs18,19 included a once-daily and a twice-daily insulin group; GLP-1 vs. twice-daily insulin is reported later. Two RCTs20,21 included a high and a low dose of GLP-1 and DPP-4i, respectively; high dose vs. comparator is reported later. Generally, the data extraction protocol was based on the Cochrane Handbook:22 Baseline characteristics were extracted as mean and standard deviation (SD) or proportion. A few studies reported median and interquartile range, and in those cases, SD was derived by dividing interquartile range by 1.35.22 The reported outcome is the difference in change in HbA1c between treatment groups. When extracting effect estimates, the following prioritization was used: 1) effect estimate and 95% confidence interval (CI) as written in text; 2) effect estimate and 95% CI as written in a table; 3) if, for example, one-sided interval was given, then the two-sided 95% CI was calculated; 4) if no effect size with CI was given, these were calculated from the effect estimate and SD or standard error of the mean (SEM) in each treatment group; 5) if no SD or SEM, but a p value was given, then z values were calculated, and from this SEM and 95% CI; and 6) if only an effect estimate was reported and no CI or a p value, only the point estimate was used. For the observational studies, additional information was extracted: confounding adjustment, analysis of initiator by having a “wash-out” period, selection bias related to clear and reasonable inclusion criteria or handling of missing data, and information bias related to the assessment of exposure and outcome. Comprehensive methods to assess quality of observational studies, such as, for example, ACROBATENRSI,23 were not deemed necessary because the aim was not to have an estimate of the overall treatment effect across studies, but rather to look at signals of an efficacy–effectiveness gap and potential drivers of such a gap. In relation to this, pooled estimates of the study characteristics and the effect estimates were not performed. The literature search and inclusion of studies did not strive to get homogeneous studies suitable for pooled estimates. Instead, baseline characteristics and effect estimates were handled descriptively. The overlap of patient characteristics and effect estimate was used to assess if difference was present across studies. A difference >0.4% units is acknowledged as a clinically meaningful difference in HbA1c24 and was used to evaluate an efficacy–effectiveness gap.

Results

The search for studies comparing GLP-1 with insulin showed 312 hits, of which 19 publications were included. However, the three publications by Diamant et al25–27 were based on the same RCT, but with different follow-up time, and the study by Thayer et al28 included two cohorts, which were reported separately later. Hence, 13 publications described 11 individual RCTs18–20,25–27,29–35 and 6 publications described 7 individual observational studies28,36–40 (Figure 1). The study duration ranged from 16 to 156 weeks and from 26 to 102 weeks in RCTs and observational studies, respectively, and the number of participants ranged from 69 to 1028 and from 47 to 51,977, respectively. Among the 312 hits, 9 were conference abstracts of observational studies, of which 1 was among the included observational studies as a research article. The authors of the other conference abstracts were contacted; one author replied, and no additional full-text study was identified.

Figure 1

Flow chart.

Notes: (A) Studies comparing glucagon-like peptide-1 with insulin. (B) Studies comparing dipeptidyl peptidase-4 inhibitors with sulfonylurea.

Abbreviation: RCTs, randomized controlled trials.

The search for studies comparing DPP-4i with sulfonylurea showed 474 hits, of which 23 publications were included. However, the publications by Nauck et al,41 Seck et al,42 Ferrannini et al,43 and Matthews et al,44 and the two publications by Göke et al,45,46 respectively, were based on the same RCTs with different follow-up time. Hence, 19 publications described 16 individual RCTs21,41–58 and 4 publications described 4 individual observational studies59–62 (Figure 1). The study duration ranged from 4 to 104 weeks and from 24 to 52 weeks in RCTs and observational studies, respectively, and the number of participants ranged from 33 to 3118 and from 69 to 16,832, respectively. Among the 474 hits, 4 and 17 were conference abstracts of RCTs and observational studies, respectively, of which 2 were among the included RCTs as research articles. The authors of the other conference abstracts were contacted; none of them replied, and no additional full-text study was identified. More detailed information on the included studies is found in Tables S1–S4. Table 1 holds information on study population characteristics of the 17 individual studies (10 RCTs and 7 observational studies) and the effect estimates from the 18 publications comparing GLP-1 with insulin. Table 2 holds information on study population characteristics of the 20 individual studies (16 RCTs and 4 observational studies) and the effect estimates of the 23 publications comparing DPP-4i with sulfonylurea.

Table 1

Characteristics of RCTs and observational studies comparing glucagon-like peptide-1 with insulin

	Authors	Duration, weeks	N	Age, years	Men, %	Body mass index, kg/m²	Time since diagnosis of type 2 diabetes mellitus, years	Baseline HbA1c, %	Mean effect (95% confidence interval)a
Randomized controlled trials	Barnett et al30	16	138	54.9 (9.1)	47.1	31.1 (4.7)	7.4 (5.9)	9.0 (1.1)	−0.01 (−0.17, 0.15)
	Bergenstal et al18	24	248	53.0 (10.3)	48.0	33.9 (7.3)	9.3 (5.8)	10.3 (1.7)	0.91 (0.59, 1.23)
	Nauck et al20	24	667	57.5 (9.0)	51.7	32.5 (5.3)	9.5 (6.0)	8.4 (0.9)	−0.14 (−0.28, −0.01)
	Davies et al32	26	234	56.5 (9.1)	68.4	34.1 (5.3)	8.7 (4.5)	8.6 (0.7)	0.01 (−0.24, 0.26)
	Davies et al19	26	216	58.5 (10.0)	66.4	33.7 (4.7)	7.5 (5.5)	8.4 (0.9)	−0.42 (−0.63, −0.21)
	Diamant et al27	26	456	58.0 (9.5)	53.5	32.3 (5.1)	7.9 (5.0)	8.3 (1.1)	−0.16 (−0.29, −0.03)
	Heine et al29	26	535	58.9 (9.1)	55.8	31.4 (4.5)	9.6 (5.9)	8.3 (1.0)	0.02 (−0.12, 0.16)
	Inagaki et al33	26	427	56.8 (10.8)	67.9	26.2 (3.9)	9.0 (6.0)	8.5 (0.8)	−0.43 (−0.59, −0.26)
	Russell-Jones et al35	26	466	57.6 (10.0)	58.5	30.4 (5.3)	9.5 (6.1)	8.3 (0.9)	−0.24 (−0.08, −0.39)
	Bunck et al31	52	69	58.4 (8.0)	65.2	30.5 (3.8)	4.9 (4.2)	7.5 (0.6)	−0.10 (−0.54, 0.34)
	Weissman et al34	52	725	55.5 (9.5)	56.1	33.1 (5.5)	8.8 (6.3)	8.1 (0.9)	0.11 (−0.04, 0.27)
	Diamant et al26	84	456	58.0 (9.5)	53.5	32.3 (5.1)	7.9 (5.0)	8.3 (1.1)	−0.18 (−0.33, −0.02)
	Diamant et al25	156	456	58.0 (9.5)	53.5	32.3 (5.1)	7.9 (5.0)	8.3 (1.1)	−0.20 (−0.39, −0.02)
Observational studies	Karagianni et al39	26	47	62.0 (8.6)	34.0	34.4 (5.6)	11.9 (7.1)	8.4 (1.6)	−0.80 (−1.84, 0.24)
	Horton et al36	36	38,678	60.4 (13.3)	46.9	34.4 (9.1)	–	8.6 (2.2)	0.50 (0.46, 0.54)
	Thayer et al28,b	36	861	53.0 (8.9)	55.7	–	–	9.0 (6.1)	0.53 (–)
	Thayer et al28,b	52	1709	55.8 (11.0)	54.7	–	–	8.7 (1.7)	1.13 (–)
	Pawaskar et al37	52	5366	58.0 (–)	46.3	36.7	–	8.1 (–)	−0.20 (–)
	Hall et al38	52	2965	60.7 (11.4)	61.9	33.7 (6.5)	8.8 (5.7)	9.6 (3.8)	0.13 (−0.11, 0.38)
	Bounthavong et al40	102	51,977	64.2 (10.4)	96.8	33.0 (6.7)	–	8.8 (2.0)	−0.32 (−0.47, −0.18)

Notes: Data shown as mean (standard deviation) unless specified otherwise. Diamant et al25–27 are based on the same RCTs, but with different follow-ups.

The difference in change in HbA1c between treatment groups.

Two cohort studies described in the same publication. – indicates data not reported.

Abbreviations: HbA1c, glycated hemoglobin; RCTs, randomized controlled trials.

Table 2

Characteristics of RCTs and observational studies comparing dipeptidyl peptidase-4 inhibitors with sulfonylurea

	Authors	Duration, weeks	N	Age, years	Men, %	Body mass index, kg/m²	Time since diagnosis of type 2 diabetes mellitus, years	Baseline HbA1c, %	Mean effect (95% confidence interval)a
Randomized controlled trials	Kim et al55	4	33	57.8 (6.7)	58.6	25.5 (2.8)	5.3 (4.7)	7.2 (0.5)	0.00 (–)
	Shimoda et al58	12	50	63.1 (12.4)	31.0	25.1 (3.9)	–	7.4 (0.6)	2.70 (−0.10, 5.50)
	Srivastava et al50	18	50	–	–	25.9 (3.3)	–	8.3 (0.5)	0.54 (0.02, 1.06)
	Derosa et al57	26	167	58.1 (9.4)	49.1	27.8 (1.5)	6.7 (4.1)	7.8 (0.8)	0.00 (–)
	Jeon and Oh49	32	101	54.5 (10.7)	64.7	22.9 (6.0)	5.9 (1.7)	8.1 (1.0)	0.06 (−0.42, 0.54)
	Derosa et al53	52	453	–	49.6	27.3 (2.1)	5.0 (2.0)	8.3 (1.2)	0.20 (−1.73, 2.13)
	Rosenstock et al54	52	441	70.0 (4.3)	44.9	29.8 (4.5)	6.1 (6.3)	7.5 (0.7)	−0.05 (−0.23, 0.13)
	Nauck et al41	52	1172	56.7 (9.6)	59.2	31.3 (5.1)	6.4 (5.8)	7.7 (0.9)	0.00 (–)
	Göke et al45	52	858	57.6 (10.3)	51.8	31.4 (5.9)	5.5 (4.6)	7.7 (0.9)	0.06 (−0.05, 0.16)
	Ferrannini et al43	52	3118	57.5 (9.13)	53.5	31.8 (5.3)	5.7 (5.1)	7.3 (0.7)	0.09 (0.03, 0.15)
	Filozof and Gautier48	52	1007	59.5 (10.0)	52.0	31.0 (5.0)	6.6 (5.2)	8.5 (1.0)	0.04 (−0.11, 0.20)
	Arjona et al52	54	426	64.5 (9.9)	57.0	26.8 (4.8)	10.4 (7.7)	7.8 (0.7)	−0.11 (−0.29, 0.06)
	Arjona et al51	54	129	59.5 (9.5)	59.7	26.8 (5.0)	17.5 (8.9)	7.9 (0.7)	0.15 (−0.18, 0.49)
	Ahrén et al56	104	609	54.4 (9.9)	48.8	32.5 (5.5)	5.9 (4.8)	8.1 (0.8)	0.08 (–)
	Del Prato et al21	104	1759	55.5 (9.7)	50.8	31.2 (5.3)	5.5 (4.8)	7.6 (0.6)	−0.13 (−0.24, −0.02)
	Foley and Sreenan47	104	1092	54.8 (10.5)	55.8	30.7 (5.3)	2.2 (3.7)	8.7 (1.1)	0.13 (−0.06, 0.33)
	Göke et al46	104	858	57.6 (10.3)	51.8	31.4 (5.9)	5.5 (4.6)	7.7 (0.9)	−0.05 (−0.17, 0.06)
	Matthews et al44	104	3118	57.5 (9.13)	53.5	31.8 (5.3)	5.7 (5.1)	7.3 (0.7)	0.00 (0.00, 0.1)
	Seck et al42	104	1172	56.7 (9.6)	59.2	31.3 (5.1)	6.4 (5.8)	7.7 (0.9)	−0.03 (–)
Observational studies	Lee et al60	24	69	52.3 (12.8)	58.0	26.9 (3.9)	0.5 (0.5)	8.1 (0.8)	0.07 (−0.24, 0.37)
	Gitt et al61	52	256	65.2 (11.1)	52.0	–	5.0 (4.2)	7.4 (0.7)	−0.10 (−0.24, 0.04)
	Göke et al62	52	7410	62.6 (11.1)	54.0	30.8 (5.5)	5.8 (4.9)	7.7 (1.2)	−0.20 (−0.22, −0.09)
	Morgan et al59	52	16,832	61.9 (11.4)	59.8	32.1 (5.5)	4.6 (3.8)	8.7 (1.4)	0.30 (–)

Notes: Data shown as mean (standard deviation) unless specified otherwise. Nauck et al41 and Seck et al;42 Göke et al45 and Göke et al;46 and Ferrannini et al43 and Matthews et al44 are based on the same RCTs, but with different follow-ups.

The difference in change in HbA1c between treatment groups. – indicates data not reported.

Abbreviations: HbA1c, glycated hemoglobin; RCTs, randomized controlled trials.

Characteristics of observational studies

Of the 11 individual observational studies,28,36–40,59–62 4 were prospective39,60–62 and 7 were based on registries.28,36–38,40,59 Information of exposure in the prospective studies was based on doctor’s records of prescription, whereas exposure in registry studies was based on databases with information on prescription36–38,40,59 or claims.28 The outcome in all studies was based on the clinical measure of HbA1c. The inclusion criteria in the studies were primarily based on previous medication, but also age and comorbidity were used in most studies. The observational studies analyzed patients who initiated either GLP-1 or insulin, or DPP-4i or sulfonylurea, respectively. Five of the observational studies excluded patients if information was missing,28,36–38 while the other six studies did not mention how missing data were handled.39,40,59–62 Five of the observational studies used multivariable regression38–40,60 or propensity score matching37 to adjust for potential confounding, although Karagianni et al39 only included body mass index (BMI) and age in the model. Unadjusted effect estimates were reported in the remaining six observational studies.28,36,59,61,62 Generally, the design of the included observational studies was deemed suboptimal regarding confounding adjustment and the potential for selection- and information bias. However, two observational studies – one study37 comparing GLP-1 with insulin and another study60 comparing DPP-4i and sulfonylurea – were explicit about the conducted analysis, including confounding adjustment, and gave information about possible selection bias and information bias. Neither the effect estimate nor the patient characteristics of these studies37,60 were different from the other observational studies comparing GLP-1 with insulin or comparing DPP-4i and sulfonylurea, respectively.

Characteristics of the study populations across study designs

The study populations did not differ across RCTs and observational studies with regard to age, sex ratio, BMI, time since diagnosis of type 2 diabetes mellitus, and baseline HbA1c neither in the studies that compared GLP-1 with insulin nor in the studies that compared DPP-4i with sulfonylureas. Generally, this goes for both means and SDs. One exception is HbA1c among studies of GLP-1 and insulin, where the HbA1c distribution in the observational studies was more heterogeneous than in the RCTs. Also, a few outliers should be mentioned. Among studies comparing GLP-1 with insulin, the observational study by Bounthavong et al40 included almost only men and BMI was low in the RCT by Inagaki et al33 (explained by a Japanese population). The range of the distribution of HbA1c is generally wider in the observational studies than in the RCTs. This indicates that the study population is more heterogeneous with regard to HbA1c in the observational studies. However, the mean of HbA1c is of similar magnitude across study designs. An outlier among the studies comparing DPP-4i with sulfonylurea is the RCT by Shimoda et al,58 which included a higher proportion of women compared to the other studies. Unfortunately, information on time since diagnosis of type 2 diabetes mellitus was only available in two of the observational studies comparing GLP-1 with insulin.

Effect estimates across study designs

Effect estimates did not differ across RCTs and observational studies, both for studies comparing GLP-1 with insulin (Figure 2) and studies comparing DPP-4i with sulfonylurea (Figure 3). Among studies comparing GLP-1 with insulin, a few studies18,28,36 reported findings outside the 95% CI of the other studies; in the observational study by Horton et al36 and the two cohorts in the observational study by Thayer et al,28 no adjustment for confounding was done. This could explain why the findings differ from those of the confounding adjusted observational studies and the RCTs. It should be noted that Thayer et al28 did not aim for a comparison of effects across treatments. The RCT by Bergenstal et al18 reported results outside the 95% CI of the other RCTs and must be seen as an outlier. Among studies comparing DPP-4i with sulfonylurea, the three observational studies reporting unadjusted effects59,61,62 show effect estimates of similar magnitude to the effect estimates in the confounding adjusted observational study60 and the RCTs.

Figure 2

Effect estimates of studies comparing glucagon-like peptide-1 with insulin.

Notes: Difference in mean change HbA1c ±95% confidence interval. The difference in change in HbA1c between treatment groups. Diamant et al25–27 are based on the same RCTs, but with different follow-ups. aTwo different cohorts analyzed and reported in the same publication. Red circle: RCTs. Blue filled square: observational studies with confounding adjustment. Blue open square: observational studies unadjusted for confounding.

Abbreviations: HbA1c, glycated hemoglobin; RCTs, randomized controlled trials.

Figure 3

Effect estimates of studies comparing dipeptidyl peptidase-4 inhibitors with sulfonylurea.

Notes: Difference in mean change HbA1c ±95% confidence interval. The difference in change in HbA1c between treatment groups. Nauck et al41 and Seck et al;42 Göke et al45 and Göke et al;46 and Ferrannini et al43 and Matthews et al44 are based on the same RCTs, but with different follow-ups, Red circle: RCTs. Blue filled square: observational studies with confounding adjustment. Blue open square: observational studies unadjusted for confounding.

Abbreviations: HbA1c, glycated hemoglobin; RCTs, randomized controlled trials.

The numbers from Tables 1 and 2 are presented graphically in Figures S1 and S2. More information on the RCTs and observational studies is found in Tables S1–S4.

Discussion

No clear differences in the available baseline characteristics of the study populations and in the effect estimates of the identified RCTs and observational studies were observed in this review. Hence, no efficacy–effectiveness gap was observed and no drivers of effectiveness were identified. Despite examples where results from RCTs and observational studies seem not to agree,12–15 reviews that have systematically compared the results from RCTs and observational studies have found that effect sizes from RCTs and observational studies are often similar or do not differ systematically across a range of medical subjects63,64 and suggest that the theoretical efficacy–effectiveness gap may not be as widespread as often thought. This is in line with the findings in this review. An efficacy–effectiveness gap with regard to DPP-4i (specifically vildagliptin) and sulfonylurea in relation to change in HbA1c has been investigated elsewhere;65 the effect of the individual drug, that is, change from baseline of the two drugs separately, was compared across five RCTs and the one observational study. Ahrén et al65 found that DPP-4i had a similar effect in the RCTs and the observational study, but that an efficacy–effectiveness gap may exist with regard to sulfonylurea because sulfonylurea proved more effective in RCTs than in the observational study. The study by Ahrén et al65 is based on other data than this review because Ahrén et al65 included RCTs that compared DPP-4i with placebo (only using data on the active arm), and because the observational data were based on the full EDGE study,66 which was not included in this review because the EDGE study reported comparison of DPP-4i with other oral hypoglycemics and not specifically sulfonylurea. In this review article, the German part of the observational EDGE study62 was included. Also to be mentioned, it is unclear how Ahrén et al65 identified the included studies, as it was not based on a systematic literature search as in this review. This review used the comparison of two drugs as outcome (change with DPP-4i subtracted from change with sulfonylurea) and did not assess the effect of the individual drugs (change for DPP-4i and sulfonylurea, respectively) as done by Ahrén et al.65 Possible biases in this review could work in opposite directions, and thus hide an actual efficacy–effectiveness gap. No identification of an efficacy–effectiveness gap could be a net result of such biases. Possible biases in this review are described in the following points: 1) Unmeasured confounding is always a potential problem in observational studies, and several of the observational studies reported effects not adjusted for potential confounders. Selection bias may also have been a problem in the observational studies because inclusion criteria were only partly clear in the observational studies, and all observational studies either excluded participants with missing information or did not report how missing data were handled. From this it is clear that future observational studies in the investigated area of this review can be designed to a higher degree to avoid biases and include confounding adjustment in the analyses. A descriptive approach to identify key drivers of bias was used to assess the observational studies. As stated, the aim of this review was not to assess the quality of the studies in detail with a more comprehensive and validated tool. Rather, the descriptive approach was found sufficient to identify potential flaws in the observational studies. 2) The limited number of studies in this review may also have affected the findings. Especially, the number of observational studies was lower than the RCTs. One could speculate whether the use of a hard end point (e.g., death) would have led to a higher number of available observational studies. However, it would probably limit the amount of available RCTs. As to the number of available studies, publication bias may also have affected our results. Probably, publication bias will be most pronounced among observational studies. However, the effect estimates of the observational studies look fairly symmetric, when looking at Figures 2 and 3, which suggest no publication bias. However, a specific study on this topic is needed to draw final conclusions. It is important to note that effect estimates from the same RCTs at different follow-up time points are listed in Tables 1 and 2. However, as there was no overall effect estimated, we did not double count these studies in any pooled analysis. In the descriptive comparison of effect estimates, we wanted to make it complete, and, therefore, all effect estimates were listed. 3) Characteristics of the study populations and other features of the studies may differ in ways not quantified in the data extraction. The assessed characteristics were restricted to the information that was available in both the RCTs and the observational studies. The observational studies often included more information on patient characteristics than the RCTs, for example, distribution of comorbidities and comedication of the study population. Delivery of care and adherence to the treatments is an area where RCTs and observational studies may differ with a possible impact on treatment effect as, for example, seen in osteoporosis treatment.67 However, such information was not available and, therefore, cannot be compared across study designs. Future studies based on patient-level data rather than systematic reviews may be better suited to investigate the potential drivers of effectiveness not observed in this review, for example comorbidity, comedication, delivery of care, and adherence to treatment. Studies on patient-level data are also useful to investigate effect modification of, for example, drug and patient characteristics, which will give insights in possible drivers of effectiveness. 4) It is possible that the observational studies were designed to be comparable with the RCTs with regard to, for example, the study population. If so, this would result in no efficacy–effectiveness gap because of differences in the study populations when compared in this review. However, this was neither explicitly stated in any of the observational studies nor could it be deduced from the listed inclusion criteria. 5) If the studies have had similar subgroup analyses across RCTs and observational studies, this could be used to investigate the potential efficacy–effectiveness gap even further. However, few subgroup analyses were conducted in the included studies, and not in a way that we could compare across study designs. 6) The results of this review should be interpreted in the light of GLP-1 and DPP-4i being analyzed on drug class level. It would require many more studies to do subgroup analyses on the individual drugs, and not all observational studies give information on drug names and doses. Tables S1–S4 hold the available information on drug names and doses investigated in the included studies. In this review, HbA1c was used as outcome measure because it is the common effect measure of glucose-lowering drugs. It is important to note that this review did not aim to do a full evaluation of the included glucose-lowering drugs. Such evaluation should involve more parameters than solely change in HbA1c, for example cardiovascular events, hypoglycemic events, and weight change. We used this outcome measure as an example to study a potential efficacy–effectiveness gap. As described in the Methods section, pooled analyses were not the aim of this review. For pooled analyses to make sense, this would require more homogeneous studies, for example with regard to the duration of study, and it is likely that very few studies would be included in such analyses. Instead, the present review gives an insight into the published studies in this area, and with the inclusion of heterogenetic studies, for example with varying study duration, possible explanation of an efficacy–effectiveness gap was investigated. To conclude, no efficacy–effectiveness gap between RCTs and observational studies comparing GLP-1 with insulin or DPP-4i with sulfonylurea was observed. However, the limited number of studies and potential problems with confounding adjustment, selection- and information bias in the observational studies, may have hidden a true efficacy-effectiveness gap. Hence, the existence of an efficacy-effectiveness gap cannot be fully excluded. No potential drivers of effectiveness were identified among age, sex, BMI, time since diagnosis of type 2 diabetes mellitus, baseline HbA1c, publication year, duration of study, and number of patients in the study.

63 in total

1. Observational studies of treatment effectiveness: some cautions.

Authors: Andreas Laupacis; Muhammad Mamdani
Journal: Ann Intern Med Date: 2004-06-01 Impact factor: 25.391

2. Commentary: the hormone replacement-coronary heart disease conundrum: is this the death of observational epidemiology?

Authors: Debbie A Lawlor; George Davey Smith; Shah Ebrahim
Journal: Int J Epidemiol Date: 2004-05-27 Impact factor: 7.196

3. The "Efficacy-Effectiveness Gap": Historical Background and Current Conceptualization.

Authors: Clementine Nordon; Helene Karcher; Rolf H H Groenwold; Mikkel Zöllner Ankarfeldt; Franz Pichler; Helene Chevrou-Severac; Michel Rossignol; Adeline Abbe; Lucien Abenhaim
Journal: Value Health Date: 2015-11-19 Impact factor: 5.725

Review 4. Assessing causal relationships between treatments and clinical outcomes: always read the fine print.

Authors: B Freidlin; E L Korn
Journal: Bone Marrow Transplant Date: 2011-05-30 Impact factor: 5.483

Review 5. Relative efficacy of drugs: an emerging issue between regulatory agencies and third-party payers.

Authors: Hans-Georg Eichler; Brigitte Bloechl-Daum; Eric Abadie; David Barnett; Franz König; Steven Pearson
Journal: Nat Rev Drug Discov Date: 2010-02-26 Impact factor: 84.694

6. Effectiveness/efficacy difference too often ignored.

Authors: Ed Silverman
Journal: Manag Care Date: 2013-01

7. Generalizing from clinical trials.

Authors: C E Davis
Journal: Control Clin Trials Date: 1994-02

8. Study subjects and ordinary patients.

Authors: R Dowd; R R Recker; R P Heaney
Journal: Osteoporos Int Date: 2000 Impact factor: 4.507

9. Prognostic implications of DPP-4 inhibitor vs. sulfonylurea use on top of metformin in a real world setting - results of the 1 year follow-up of the prospective DiaRegis registry.

Authors: A K Gitt; P Bramlage; C Binz; M Krekler; E Deeg; D Tschöpe
Journal: Int J Clin Pract Date: 2013-08-28 Impact factor: 2.503

10. Comparative efficacy of exenatide versus insulin glargine on glycemic control in type 2 diabetes mellitus patients inadequately treated with metformin monotherapy.

Authors: P Karagianni; S A Polyzos; N Kartali; I Zografou; C Sambanis
Journal: Adv Med Sci Date: 2013 Impact factor: 3.287

8 in total

1. Use of the PRECIS-II instrument to categorize reports along the efficacy-effectiveness spectrum in an hepatitis C virus care continuum systematic review and meta-analysis.

Authors: Ashly E Jordan; David C Perlman; Daniel J Smith; Jennifer R Reed; Holly Hagan
Journal: J Clin Epidemiol Date: 2017-11-02 Impact factor: 6.437

2. Discontinuation of non-anti-TNF drugs for rheumatoid arthritis in interventional versus observational studies: a systematic review and meta-analysis.

Authors: Fernanda S Tonin; Laiza M Steimbach; Leticia P Leonart; Vinicius L Ferreira; Helena H Borba; Thais Piazza; Ariane G Araújo; Fernando Fernandez-Llimos; Roberto Pontarolo; Astrid Wiens
Journal: Eur J Clin Pharmacol Date: 2018-07-18 Impact factor: 2.953

3. The inclusion of real world evidence in clinical development planning.

Authors: Reynaldo Martina; David Jenkins; Sylwia Bujkiewicz; Pascale Dequen; Keith Abrams
Journal: Trials Date: 2018-08-29 Impact factor: 2.279

4. Efficacy gap between phase II and subsequent phase III studies in oncology.

Authors: Rick A Vreman; Svetlana V Belitser; Ana T M Mota; Anke M Hövels; Wim G Goettsch; Kit C B Roes; Hubert G M Leufkens; Aukje K Mantel-Teeuwisse
Journal: Br J Clin Pharmacol Date: 2020-02-21 Impact factor: 4.335

5. Outcome Prediction at Patient Level Derived from Pre-Treatment 18F-FDG PET Due to Machine Learning in Metastatic Melanoma Treated with Anti-PD1 Treatment.

Authors: Anthime Flaus; Vincent Habouzit; Nicolas de Leiris; Jean-Philippe Vuillez; Marie-Thérèse Leccia; Mathilde Simonson; Jean-Luc Perrot; Florent Cachin; Nathalie Prevot
Journal: Diagnostics (Basel) Date: 2022-02-02

6. Real World Data in Health Technology Assessment of Complex Health Technologies.

Authors: Milou A Hogervorst; Johan Pontén; Rick A Vreman; Aukje K Mantel-Teeuwisse; Wim G Goettsch
Journal: Front Pharmacol Date: 2022-02-10 Impact factor: 5.810

7. The use of random-effects models to identify health care center-related characteristics modifying the effect of antipsychotic drugs.

Authors: Clementine Nordon; Constance Battin; Helene Verdoux; Josef Maria Haro; Mark Belger; Lucien Abenhaim; Tjeerd Pieter van Staa
Journal: Clin Epidemiol Date: 2017-12-14 Impact factor: 4.790

8. Patient-Level Effectiveness Prediction Modeling for Glioblastoma Using Classification Trees.

Authors: Tine Geldof; Nancy Van Damme; Isabelle Huys; Walter Van Dyck
Journal: Front Pharmacol Date: 2020-01-31 Impact factor: 5.810

8 in total