| Literature DB >> 27855102 |
Sunita Rehal1,2, Tim P Morris1,2, Katherine Fielding1,3, James R Carpenter1,2,4, Patrick P J Phillips1.
Abstract
OBJECTIVE: To assess the adequacy of reporting of non-inferiority trials alongside the consistency and utility of current recommended analyses and guidelines.Entities:
Keywords: clinical trial; non-inferiority; randomised controlled clinical trials; systematic review
Mesh:
Year: 2016 PMID: 27855102 PMCID: PMC5073571 DOI: 10.1136/bmjopen-2016-012594
Source DB: PubMed Journal: BMJ Open ISSN: 2044-6055 Impact factor: 2.692
Summary of guidelines
| Justification of margin | Who is included in analysis | CI | Missing data | Sensitivity analyses | |
|---|---|---|---|---|---|
| CONSORT 2006 | ‘Margin should be specified and preferably justified on clinical grounds’ | ‘Non-ITT analyses might be desirable as a protection from ITTs increase in type I error. There is greater confidence in results when the conclusions are consistent’. | ‘Many non-inferiority trials based their interpretation on the upper limit of a one-sided 97.5% CI, which is the same as the upper limit of a two-sided 95% CI’. ‘Although one-sided and two-sided CIs allow for inferences about non-inferiority, we suggest that two-sided CIs are appropriate in most non-inferiority trials. If a one-sided 5% significance level is deemed acceptable for the non-inferiority hypothesis test (a decision open to question), a 90% two-sided CI could then be used’. | ||
| CONSORT 2012 | ‘Should be indicated if conclusions are related to PP analysis, ITT analysis or both and if the conclusions are stable between them’. | ‘The two-sided CI provides additional information, in particular for the situation in which the new treatment is superior to the reference treatment’ | Sensitivity analysis is discussed through an example: ‘Study endpoints were analysed primarily for the PP population and repeated, for sensitivity reasons, for the ITT population’. | ||
| Draft FDA 2010 | ‘Whether M1 (the effect of the active control arm relative to placebo) is based on a single study or multiple studies, the observed (if there were multiple studies) or anticipated (if there is only one study) statistical variation of the treatment effect size should contribute to the ultimate choice of M1, as should any concerns about constancy. The selection of M2 (the largest clinically acceptable difference of the test treatment compared to the active control) is then based on clinical judgment regarding how much of the M1 active comparator treatment effect can be lost. The exercise of clinical judgment for the determination of M2 should be applied after the determination of M1 has been made based on the historical data and subsequent analysis’ | ‘It is therefore important to conduct both ITT and “as-treated” analyses in non-inferiority studies’. | ‘Typically, the one-sided type I error is set at 0.025, by asking that the upper bound of the 95% CI for control treat be less than the NI margin. If multiple studies provide very homogeneous results for one or more important endpoints, it may be possible to use the 90% lower bound rather than the 95% lower bound of the CI to determine the active control effect size’ | ‘Poor quality can reduce the drug's effect size and undermine the assumption of the effect size of the control agent, giving the study a “bias towards the null”’. | |
| ICH E9 | ‘This margin is the largest difference that can be judged as being clinically acceptable’ | ‘In confirmatory trials, it is usually appropriate to plan to conduct an analysis of the full analysis set and a PP analysis. In an equivalence or non-inferiority trial, use of the full analysis set is generally not conservative and its role should be considered very carefully’. | ‘For non-inferiority trials, a one-sided interval should be used. The choice of type I error should be a consideration separate from the use of a one-sided or two-sided procedure’. | ‘Imputation techniques, ranging from LOCF to the use of complex mathematical models, may be used to compensate for missing data’ | ‘An investigation should be made concerning the sensitivity of the results of analysis to the method of handling missing values, especially if the number of missing values is substantial’. |
| ICH E10 | ‘The determination of the margin in a non-inferiority trial is based on statistical reasoning and clinical judgment’ | ||||
| SPIRIT | Use an example where ‘non-inferiority would be claimed if ITT and PP analyses show conclusions of NI’. | ‘Multiple imputation can be used to handle missing data although relies on untestable assumptions’ | ‘Sensitivity analyses are highly recommended to assess the robustness of trial results under different methods of handling missing data’ | ||
| EMEA 2006 | ‘The choice of delta must always be justified on clinical and statistical grounds’ | ‘A two-sided 95% CI (or one-sided 97.5% CI) is constructed. The interval should lie entirely on the positive side of the margin. Statistical significance is generally assessed using the two-sided 0.05 level of significance (or one-sided 0.025)’ | |||
| EMEA 2000 | ‘ITT and PP analyses have equal importance and their use should lead to similar conclusions for robust interpretation’ | ‘A two-sided CI should lie entirely to the right of delta. If one-sided confidence is used then 97.5% should be used’ | ‘It will be necessary to pay particular attention to demonstrating the sensitivity of the trial by showing similar results for the full analysis set and PP analysis set’ |
ITT, intention to treat; LOCF, last observation carried forward; mITT, modified intention to treat; NI, non-inferiority; PP, per-protocol.
Figure 1Flow chart of eligibility of articles.
General characteristics
| All articles (n=168) | Including | |
|---|---|---|
| Characteristics | n (%) | n (%) |
| Journal | ||
| | 61 (36) | 61 |
| | 64 (38) | |
| | 19 (11) | |
| | 8 (5) | |
| | 5 (2) | |
| | 7 (4) | |
| | 2 (1) | |
| | 2 (1) | |
| Year of publication | ||
| 2010 | 26 (15) | 9 (15) |
| 2011 | 27 (16) | 9 (15) |
| 2012 | 29 (17) | 8 (13) |
| 2013 | 39 (23) | 19 (31) |
| 2014 | 27 (16) | 10 (16) |
| 2015 | 20 (12) | 6 (10) |
| Type of intervention | ||
| Drug | 112 (67) | 44 (72) |
| Surgery | 22 (13) | 7 (11) |
| Other | 34 (20) | 10 (16) |
| Randomisation | ||
| Patient | 163 (97) | 59 (97) |
| Cluster | 5 (3) | 2 (3) |
| Power | ||
| 80% | 6 (36) | 19 (31) |
| 85% | 11 (7) | 5 (8) |
| 90% | 65 (39) | 26 (43) |
| 71–99% (excluding the above) | 21 (12) | 11 (18) |
| Not reported/unclear | 10 (6) | 0 |
| Composite outcome | ||
| Yes | 78 (46) | 37 (61) |
| No | 90 (54) | 24 (39) |
| Disease | ||
| Heart disease | 30 (18) | 13 (21) |
| Blood disorder | 19 (11) | 6 (10) |
| Cancer | 16 (10) | 8 (13) |
| Diabetes | 11 (7) | 2 (3) |
| Thromboembolism | 6 (4) | 6 (10) |
| Skin infection (non-contagious) | 3 (2) | 2 (3) |
| Urinary tract infection | 3 (2) | 0 |
| Arthritis | 3 (2) | 1 (2) |
| Opthomology | 3 (2) | 1 (2) |
| Pneumonia | 3 (2) | 1 (2) |
| Complications in pregnancy | 3 (2) | 0 |
| Stroke | 3 (2) | 2 (3) |
| Testing method | 3 (2) | 1 (2) |
| Appendicitis | 2 (1) | 1 (2) |
| Depression | 2 (1) | 0 |
| Other non-infectious disease | 18 (11) | 7 (11) |
| HIV | 18 (11) | 2 (3) |
| Tuberculosis | 6 (4) | 4 (7) |
| Malaria | 4 (2) | 1 (2) |
| Skin infection (contagious) | 2 (1) | 0 |
| Hepatitis C | 2 (1) | 2 (3) |
| Other infectious disease | 8 (5) | 1 (2) |
Justification of choice of margin, total number of patient populations considered for analyses and patient population included in the analysis
| All articles (N=168) | Including | |
|---|---|---|
| n (%) | n (%) | |
| Justification of NI margin | ||
| Made no attempt for justification | 90 (54) | 22 (36) |
| Clinical basis. No evidence for consultation with external expert group, and no reference to previous trials of the control arm | 32 (19) | 11 (18) |
| Preservation of treatment effect based on estimates of control arm effect from previous trials | 13 (8) | 14 (23) |
| Expert group external to the authors. No reference to previous trials of the control arm | 6 (4) | 3 (5) |
| The same margin as was used in other similar trials | 5 (3 | 2 (3) |
| 10–12% recommended by disease-specific FDA guidelines | 4 (2) | 1 (2) |
| General comment that margin was decided according to FDA/regulatory guidance | 4 (2) | 0 |
| Clinical basis and based on previous similar trial. No evidence for consultation with external expert group, and no reference to previous trials of the control arm | 3 (2) | 0 |
| Based on registry/development programme | 0 | 2 (3) |
| Other* | 11 (7) | 6 (10) |
| Number of analyses | ||
| One | 65 (39) | 15 (25) |
| Two | 91 (54) | 38 (62) |
| Three | 10 (6) | 7 (11) |
| Not defined | 2 (1) | 1 (2) |
| Analysis | ||
| ITT | 129 (77) | 44 (72) |
| PP | 90 (54) | 35 (57) |
| mITT | 34 (20) | 17 (28) |
| As-treated | 4 (2) | 6 (10) |
| Other | 20 (12) | 10 (16) |
| Unclear | 2 (1) | 2 (3) |
*See online supplementary material.
ITT, intention to treat; mITT, modified intention to treat; PP, per-protocol.
Consistency of type I error rate with significance levels of CIs over year of publication
| Year of publication | |||||||
|---|---|---|---|---|---|---|---|
| 2010 | 2011 | 2012 | 2013 | 2014 | 2015 | Total | |
| All articles (N=168) | |||||||
| Yes | 11 (42%) | 15 (56%) | 15 (52%) | 24 (62%) | 19 (70%) | 11 (55%) | 95 (57%) |
| No | 5 (19%) | 4 (15%) | 4 (14%) | 5 (13%) | 5 (19%) | 3 (15%) | 26 (15%) |
| Not reported | 10 (38%) | 8 (30%) | 10 (34%) | 10 (26%) | 3 (11%) | 6 (30%) | 47 (28%) |
| Yes | 7 (78%) | 6 (67%) | 5 (63%) | 14 (74%) | 8 (80%) | 4 (67%) | 44 (72%) |
| No | 1 (11%) | 2 (22%) | 2 (25%) | 3 (16%) | 2 (20%) | 1 (17%) | 11 (18%) |
| Not reported | 1 (11%) | 1 (11%) | 1 (13%) | 2 (11%) | 0 | 1 (17%) | 6 (10%) |
Significance level of (a) type I error rate and (b) CIs for all articles by whether CI was one-sided or two-sided
| One-sided | Two-sided | Not reported | |
|---|---|---|---|
| (a) Type I error rate (%) | |||
| 0.8 | 0 | 1 (1%) | 0 |
| 1.25 | 3 (2) | 0 | 0 |
| 2.45 | 1 (1) | 0 | 0 |
| 2.5 | 40 (24) | 2 (1) | 2 (1) |
| 5 | 46 (27) | 29 (17) | 15 (9) |
| 10 | 1 (1) | 2 (1) | 0 |
| Not reported | 3 (2) | 0 | 23 (14) |
| (b) Significance level of CI (%) | |||
| 90 | 1 (1) | 14 (8) | 1 (1) |
| 95 | 14 (8) | 125 (74) | 0 |
| 97.5 | 4 (2) | 7 (4) | 0 |
| Other | 0 | 1 (1) | 0 |
| Not reported | 0 | 0 | 1 (1) |
Reporting of (a) missing data and (b) sensitivity analyses
| n (%) | |
|---|---|
| (a) Imputation performed | |
| Yes | 56 (33) |
| Worst-case scenario | 19 (34) |
| Multiple imputation | 11 (20) |
| Last observation carried forward | 8 (14) |
| Complete case analysis | 6 (11) |
| Best-case scenario | 2 (4) |
| Last observation carried forward and worst-case scenario | 2 (4) |
| Best-case/worst-case scenario | 3 (5) |
| Mean imputation | 1 (2) |
| Complete case analysis, multiple imputation using propensity scores and multiple imputation using regression modelling | 1 (2) |
| Other and worst-case scenario | 1 (2) |
| Other | 1 (2) |
| No | 12 (7) |
| Not reported | 99 (59) |
| Unclear | 1 (1) |
| Including | |
| Yes | 22 (36) |
| No | 7 (11) |
| Not reported | 31 (51) |
| Unclear | 1 (2) |
| (b) Sensitivity analyses performed | |
| Yes | 64 (38) |
| Patient population | 13 (20) |
| Competing risks | 2 (3) |
| Statistical modelling | 2 (3) |
| Adjusted for baseline variables | 1 (2) |
| Excluded protocol violations | 1 (2) |
| On-treatment | 1 (2) |
| Patient population/other | 1 (2) |
| Unclear | 2 (3) |
| Other | 15 (23) |
| Missing data | 27 (42) |
| Best-case/worst-case scenario | 5 |
| Complete case analysis | 3 |
| Imputation of missing values | 3 |
| Multiple imputation | 3 |
| Worst-case scenario | 3 |
| Baseline observation carried forward | 1 |
| Baseline observation carried forward and complete case analysis | 1 |
| Complete case analysis, multiple imputation using propensity scores and multiple imputation using regression modelling | 1 |
| Complete case analysis and missing not at random | 1 |
| Complete case analysis and best-case scenario | 1 |
| Different methods | 1 |
| Last observation carried forward | 1 |
| Modelling | 1 |
| Observed failure | 1 |
| Worst-case scenario and last observation carried forward | 1 |
| No | 103 (61) |
| Unclear | 1 (1) |
| Including | |
| Yes | 38 (62) |
| No | 23 (38) |
Quality of reporting of trials associated with conclusions of non-inferiority
| Concluded non-inferiority | ||||
|---|---|---|---|---|
| Yes (N=132) | No (N=29) | Other (N=7) | Total (N=168) | |
| Grade | n (%) | n (%) | n (%) | n (%) |
| Excellent† | 11 (73) | 2 (13) | 2 (13) | 15 |
| Good‡ | 55 (86) | 9 (14) | 0 (0) | 64 |
| Fair§ | 48 (80) | 8 (13) | 4 (7) | 60 |
| Poor¶ | 18 (62) | 10 (34) | 1 (3) | 29 |
*Excluding trials that concluded ‘other’: ; p=0.05 (Cochran–Armitage test).
†Excellent if margin justified, ≥2 analyses on patient population performed, type I error rate consistent with significance level of CI.
‡Good if fulfilled two of the following: margin justified, ≥2 analyses on patient population performed, type I error rate consistent with significance level of CI.
§Fair if fulfilled one of the following: margin justified, ≥2 analyses on patient population performed, type I error rate consistent with significance level of CI.
¶Poor if margin not justified, <2 analyses on patient population performed, type I error rate not consistent with significance level of CI.