Literature DB >> 30206071

Systematic bias between blinded independent central review and local assessment: literature review and analyses of 76 phase III randomised controlled trials in 45 688 patients with advanced solid tumour.

Jianrong Zhang^1,2,3,4, Yiyin Zhang^5,6,7, Shiyan Tang^5,8, Long Jiang^1,2,3, Qihua He^1,2,3, Lindsey Tristine Hamblin^1,9, Jiaxi He^1,2,3, Zhiheng Xu^2,3, Jieyu Wu^10,11, Yaoqi Chen^1,2,3, Hengrui Liang⁵, Difei Chen⁵, Yu Huang⁵, Xinyu Wang⁵, Kexin Deng⁵, Shuhan Jiang⁵, Jiaqing Zhou⁵, Jiaxuan Xu⁵, Xuanzuo Chen⁵, Wenhua Liang^1,2,3, Jianxing He^1,2,3.

Abstract

OBJECTIVE: Unbiased assessment of tumour response is crucial in randomised controlled trials (RCTs). Blinded independent central review is usually used as a supplemental or monitor to local assessment but is costly. The aim of this study is to investigate whether systematic bias existed in RCTs by comparing the treatment effects of efficacy endpoints between central and local assessments.
DESIGN: Literature review, pooling analysis and correlation analysis. DATA SOURCES: PubMed, from 1 January 2010 to 30 June 2017. ELIGIBILITY CRITERIA FOR SELECTING STUDIES: Eligible articles are phase III RCTs comparing anticancer agents for advanced solid tumours. Additionally, the articles should report objective response rate (ORR), disease control rate (DCR), progression-free survival (PFS) or time to progression (TTP); the treatment effect of these endpoints, OR or HR, should be based on central and local assessments.
RESULTS: Of 76 included trials involving 45 688 patients, 17 (22%) trials reported their endpoints with statistically inconsistent inferences (p value lower/higher than the probability of type I error) between central and local assessments; among them, 9 (53%) trials had statistically significant inference based on central assessment. Pooling analysis presented no systematic bias when comparing treatment effects of both assessments (ORR: OR=1.02 (95% CI 0.97 to 1.07), p=0.42, I2=0%; DCR: OR=0.97 (95% CI 0.92 to 1.03), p=0.32, I2=0%); PFS: HR=1.01 (95% CI 0.99 to 1.02), p=0.32, I2=0%; TTP: HR=1.04 (95% CI 0.95 to 1.14), p=0.37, I2=0%), regardless of funding source, mask, region, tumour type, study design, number of enrolled patients, response assessment criteria, primary endpoint and trials with statistically consistent/inconsistent inferences. Correlation analysis also presented no sign of systematic bias between central and local assessments (ORR, DCR, PFS: r>0.90, p<0.01; TTP: r=0.90, p=0.29).
CONCLUSIONS: No systematic bias could be found between local and central assessments in phase III RCTs on solid tumours. However, statistically inconsistent inferences could be made in many trials between both assessments. © Author(s) (or their employer(s)) 2018. Re-use permitted under CC BY-NC. No commercial re-use. See rights and permissions. Published by BMJ.

Entities: Chemical Disease Gene Mutation Species

Keywords: blind independent central review; local assessment; oncological randomized control trials

Mesh：

Year: 2018 PMID： 30206071 PMCID： PMC6144327 DOI： 10.1136/bmjopen-2017-017240

Source DB: PubMed Journal: BMJ Open ISSN： 2044-6055 Impact factor: 2.692

To our knowledge, this is the largest literature review and pooling analysis comparing treatment effects between blinded independent central review and local assessment in phase III randomised controlled trials on solid tumours. We performed an exhaustive literature search to include all potential studies fulfilling the inclusion criteria. We carefully extracted the data based on the independent and double-blind principle, in order to guarantee the accuracy of the data applied for further analysis. Compared with our study-level analysis, the analysis using individual patients’ data could be more robust. For using trial data of both blinded independent central review and local assessment, the findings and conclusion of this research may not be generalisable for all phase III oncological randomised controlled trials, because the situation of either assessment could be unknown when trials did not implement or report both central and local assessments.

Introduction

In phase III randomised controlled trials (RCTs), response-related or progression-related endpoints like objective response rate (ORR), disease control rate (DCR), progression-free survival (PFS) and time to progression (TTP) are key for reflecting treatment effects of the experimental arm and the control arm for patients with advanced solid tumour.1–3 During trials, determination of tumour response should be assessed with accuracy, which is the prerequisite of implementation with standardised response assessment criteria (eg, Response Evaluation Criteria in Solid Tumors (RECIST) and WHO) as well. Unlike overall survival, these endpoints assessed by local investigators are more influenced by subjective factors, including variability during tumour measurement, target lesion selection, failure to diagnose new lesions and different interpretations of non-target or immeasurable lesions.4 In open-label trials, the knowledge of investigators regarding treatment assignment could influence their assessment. Even in some double-blind trials, the investigators’ knowledge may not be completely eliminated due to the adverse effects; for example, the investigators might be able to tell which treatments are assigned for their patients according to the different manifestations of treatments’ adverse effects.5 Treatment effect is one of the main results considered for drug approval. If aforementioned subjective factors impact the assessment for trial endpoints, the subsequent result will overestimate or underestimate the true effect of treatments, which is called systematic bias.6 In order to detect potential bias from local investigators, blinded independent central review is requested by the regulatory authorities (eg, the US Food and Drug Administration (FDA)). During its implementation, all imaging examinations are reviewed by independent radiologists who are blinded to patients’ treatment assignments and clinical information.7 However, this mechanism has some drawbacks. It increases the burden of time and expenditure on trials. Additionally, it may introduce missing data, information censoring and the neglect of symptomatic progression. These factors could result in different discrepancy rates of central and local assessments and sometimes among central reviewers themselves, which impacts treatment effects and may even cause potential bias.4 7 8 Given the pros and cons of assessment by central reviewers, the FDA Oncology Drugs Advisory Committee discussed how to design a reliable assessment strategy for clinical trials with central review: if there is no strong evidence indicating systematic bias from two assessments, a sample-based central review could be considered in future usage instead of the complete assessment for all patients in the trials.9 This strategy may effectively reduce the complexity and implementation burden, without compromising the reliability of the RCTs.9 Accordingly, in order to understand the reliability of local assessment, as well as the necessity of central review, we conducted this literature review and analyses in order to investigate whether systematic bias existed in previous phase III RCTs on solid tumours.

Method

Search strategy and study selection

In accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) Statement,10 a PubMed search was conducted by JRZ using the dates of 1 January 2010 to 30 June 2017. The search strategy is shown in online supplementary etable 1. Inappropriate articles such as reviews, systematic reviews and/or meta analyses, guidelines and commentaries were excluded. Eligible trials were those directly evaluating therapeutic efficacy of anticancer agents in phase III RCTs for patients with advanced solid tumour; additionally, the imaging assessment for tumour response or progression was conducted by both central reviewers and local investigators. As some authors reported their data in more than one article, we used the name and/or National Clinical Trial (NCT) number of eligible RCTs as search terms to re-search PubMed (without the time interval limitation), to find out if there were more available articles on those RCTs. Endnote X7 (Thomson Reuters, New York City, New York, USA) was used in the above process.

Data extraction

The process of data extraction was carried out independently and double-blindly by three reviewers (JRZ with YYZ and SYT; in blocks of 50 articles allocated at random; discrepancies were resolved by WHL). To ensure consistency between reviewers, we used the same data extraction form, piloted the data extraction by using a sample of 16 included trials and had discussions before and during the extraction process to confer how to properly extract and interpret the data. The following characteristics of each trial were extracted: author, year, NCT number, funding source (pharmaceutical or academic), mask (open label, single blind or double blind), region (global or intracontinental), tumour type (eg, breast cancer, ovarian cancer, melanoma), study design (superiority, non-inferiority or hybrid; hybrid design includes the design of superiority and non-inferiority), number of enrolled patients, response assessment criteria (RECIST or WHO), primary endpoint (central assessed, local assessed or other) and the statistical inference of the primary endpoint according to whether the p value was lower than the probability of the type I error (positive, negative or indeterminate). We also extracted estimated treatment effects from both central and local assessments, including the OR of experimental arm ORR to control arm ORR, OR of experimental arm DCR to control arm DCR, HR of experimental arm PFS to control arm PFS, and HR of experimental arm TTP to control arm TTP. Regarding overlapped data from more than one article on one trial, we selected data based on primarily larger analysis or recently updated analysis. For PFS and TTP, if both intention to treat (or other methods with a larger population) and per-protocol population were available for trials’ treatment effects, we preferred the former in our research. According to characteristics, the risk of bias was evaluated in each trial (online supplementary efigure 1).

Statistical analysis

First, we investigated whether there were trials with statistically inconsistent inferences between two assessments in primary and secondary endpoints (including ORR, PFS and TTP). If these trials could be identified, we calculated the percentage of these trials among all our eligible trials. Statistically inconsistent inferences are defined as the treatment effect from one of the assessments (eg, central assessment) indicating significant difference (p value is lower than the probability of the type I error or the confidence interval of the treatment effect does not cross 1), but the treatment effect from another assessment (eg, local assessment) indicating non-significant difference (p value is higher than the probability of the type I error, or the confidence interval of the treatment effect crosses 1). Furthermore, to statistically investigate whether systematic bias existed, we made a comparison of treatment effects between central and local assessments, by conducting a pooling analysis with the inverse variance method and fixed-effect model in Review Manager 5.3 (The Cochrane Collaboration, London, England). In this process, if the corresponding p value for heterogeneity was less than 0.05 or the I2 index was over 50%, we used a random-effect model instead of the fixed-effect model in order to reduce the effect of heterogeneity. The pooled OR and HR were the measure of this comparison, expressed as the ratio of central-assessed treatment effects (eg, OR of ORR, OR of DCR, HR of PFS, HR of TTP) to local-assessed treatment effects.11 The OR (of ORR or DCR) greater than 1 indicated that central review overestimated the efficacy of the therapeutic strategy in the experimental arm; while a HR (of PFS or TTP) greater than 1 indicated that central review underestimated the therapeutic efficacy of the experimental arm (compared with local assessment). Regardless of whether the ratio was higher or lower than 1, we concluded no sign of a significant systematic bias if: (1) the corresponding p value was higher than 0.05, which means the 95% CI of the pooled ratio (HR, OR) crossed 1; (2) the 95% CI of the pooled ratio was extremely tight (<5%) if the first consideration was not met. For the above summary synthesis of ORR, DCR, PFS and TTP, a funnel plot was used to estimate publication bias (online supplementary efigure 2). Furthermore, we conducted subgroup analysis based on the trial characteristics: funding source, mask, region, trial design, number of enrolled patients (based on median value of all included trials), tumour type, response assessment criteria, primary endpoint and its outcome, as well as statistical inferences between central and local assessments (consistent/inconsistent). In order to verify the result of the pooling analysis, we conducted correlation analysis for the treatment effects between central and local assessments, by using SPSS V.23 (SPSS, Chicago, Illinois, USA). The test for normality was completed first, followed by correlation analysis with a bivariate model: if normal distribution was indicated, we estimated the correlation by the Pearson correlation coefficient; if not, the Spearman’s correlation was applied. Significant correlation was indicated when the p value was less than 0.05. The correlation between two assessments was also demonstrated in scatterplots, constructed by using Excel 2011 (Microsoft, Seattle, Washington, USA).

Patient and public involvement

Due to the nature of the literature review, we do not have patient and public involvement in this research.

Results

Trial searching and characteristics

Based on article identification and selection (figure 1), we included a total of 76 trials from 100 articles, involving 45 688 randomly assigned patients.12–111

Figure 1

Flow chart of study identification and selection. DCR, disease control rate; NCT, National Clinical Trial; ORR, objective response rate; PFS, progression-free survival; RCT, randomised controlled trial; TTP, time to progression. Summary and detailed characteristics are presented in table 1 and in online supplementary etable 2. A majority of the 100 articles were published in high-impact journals: Journal of Clinical Oncology (29), Lancet Oncology (24), New England Journal of Medicine (18), Lancet (10), European Journal of Cancer (4), Gynecologic Oncology (4), Annals of Oncology (3), Oncologist (3) and so on. In all 76 included trials, 15 trials13–17 26 27 30 31 41 48 64 67 68 90 97 101 105 109 110 reported both central-assessed and local-assessed treatment effects of ORR and DCR; among them, 14 trials13–17 26 27 30 31 41 64 67 68 90 97 101 105 109 110 had those of ORR, DCR and PFS, including one trial68 with those of ORR, DCR, PFS and TTP. Another 12 trials18 28 29 33 37 51 57 65 79 84 85 91 92 103 with both central and local assessments only contained treatment effects of ORR and PFS.

Table 1

Summary characteristics of included trials

Characteristics	Trial(s) (n=76)	Patients (n=45 688)
Fund source
Pharmaceutical	73	43 557
Academic	3	2131
Mask
Open label	37	21 455
Single blind	1	185
Double blind	38	24 048
Region
Global	62	39 766
Intracontinental	14	5922
Design
Superiority	71	42 213
Other*	5	3475
Number of enrolled patients
Maximum	–	1314
Median	–	542
Minimum	–	81
Tumour type
Breast cancer	17	11 132
NSCLC	14	9327
Renal cell carcinoma	11	6720
Ovarian cancer	6	4536
Melanoma	5	1675
Other†	23	12 298
Response assessment criteria
RECIST	71	42 756
WHO	4	2387
Not given	1	545
Primary endpoint
Central assessed‡	43	26 344
Other§	10	6177
Local assessed¶	23	13 167
Primary endpoint outcome
Positive	51	29 982
Indeterminate**††	2	1106
Negative	23	14 600

*Four non-inferiority, one hybrid design combing superiority and non-inferiority.

†Four gastrointestinal stromal tumour, three pancreatic tumour, three sarcoma, three medullary thyroid cancer, two glioblastoma, two prostate cancer, two neuroendocrine tumour, one colorectal adenocarcinoma, one gastric cancer, one head and neck cancer and one hepatocellular carcinoma.

‡Forty central-assessed PFS, two central-assessed time to progression and one central-assessed ORR.

§Nine overall survival and one unknown-assessed ORR.

¶Twenty-three local-assessed PFS.

**One study used ORR as the primary endpoint, but we were unable to recognise which assessment (central or local assessment) for the ORR was considered as the primary endpoint (central-assessed ORR or local-assessed ORR?). Because a significant difference was found in central review (p=0.03) but not found in local assessment (p=0.05), we considered the outcome of the primary endpoint as indeterminate.48

††Another study considered local-assessed PFS and OS as coprimary endpoints: a significant difference was found in PFS (p<0.01), but was not found in OS (p=0.10). We considered the outcome of the primary endpoint as indeterminate as well.83

NSCLC, non-small-cell lung cancer; ORR, objective response rate; OS, overall survival; PFS, progression-free survival; RECIST, Response Evaluation Criteria in Solid Tumors.

Summary characteristics of included trials *Four non-inferiority, one hybrid design combing superiority and non-inferiority. †Four gastrointestinal stromal tumour, three pancreatic tumour, three sarcoma, three medullary thyroid cancer, two glioblastoma, two prostate cancer, two neuroendocrine tumour, one colorectal adenocarcinoma, one gastric cancer, one head and neck cancer and one hepatocellular carcinoma. ‡Forty central-assessed PFS, two central-assessed time to progression and one central-assessed ORR. §Nine overall survival and one unknown-assessed ORR. ¶Twenty-three local-assessed PFS. **One study used ORR as the primary endpoint, but we were unable to recognise which assessment (central or local assessment) for the ORR was considered as the primary endpoint (central-assessed ORR or local-assessed ORR?). Because a significant difference was found in central review (p=0.03) but not found in local assessment (p=0.05), we considered the outcome of the primary endpoint as indeterminate.48 ††Another study considered local-assessed PFS and OS as coprimary endpoints: a significant difference was found in PFS (p<0.01), but was not found in OS (p=0.10). We considered the outcome of the primary endpoint as indeterminate as well.83 NSCLC, non-small-cell lung cancer; ORR, objective response rate; OS, overall survival; PFS, progression-free survival; RECIST, Response Evaluation Criteria in Solid Tumors.

Statistically inconsistent inferences of central and local assessments

From a total of 76 included trials, 17 trials (22%) had statistically inconsistent inferences (significant difference/non-significant difference) of ORR, PFS and/or TTP between central and local assessments.17 29 33 48 57 66 68 69 79 87 97 105 110 Among these 17 trials, 2 trials29 33 had inconsistent inferences in both of the primary endpoint and secondary endpoint simultaneously. In total, there were 9 of 17 trials (53%) with significant difference based on central assessment; 5 (56%) of these 9 trials were on open-label design (table 2).

Table 2

Trials with statistically inconsistent inferences between central and local assessments

Trial	Endpoint	Mask	Tumour type	Therapy (experimental arm vs control arm)	HR/OR and p value
Primary endpoint
NCT0001968248	ORR	Single	Melanoma	Exp: gp100:209-217(210M)+Montanide ISA-51+interleukin 2	Central: 2.86 (95% CI 1.05 to 7.82); p=0.03
NCT0001968248	ORR	Single	Melanoma	Con: Interleukin 2	Local: 2.33 (95% CI 0.98 to 5.56); p=0.05
NCT0047132866	Central PFS	Open	Gastrointestinal stromal tumour	Exp: Nilotinib	Central: 0.90 (95% CI 0.65 to 1.26); p=0.56
NCT0047132866	Central PFS	Open	Gastrointestinal stromal tumour	Con: Best supportive care/BSC+imatinib/BSC+sunitinib	Local: 0.58 (95% CI 0.42 to 0.80); p<0.01
NCT0011229433	Central PFS	Open	Non-small-cell lung cancer	Exp: Cetuximab+taxane (paclitaxel/docetaxel)+carboplatin	Central: 0.90 (95% CI 0.76 to 1.07); p=0.24
NCT0011229433	Central PFS	Open	Non-small-cell lung cancer	Con: Taxane (paclitaxel/docetaxel)+carboplatin	Local: 0.79 (95% CI 0.67 to 0.93); p<0.01
NCT0070332669	Local PFS	Double	Breast cancer	Exp: Ramucirumab+docetaxel	Central: 0.79 (95% CI 0.67 to 0.94); p<0.01
NCT0070332669	Local PFS	Double	Breast cancer	Con: Placebo+docetaxel	Local: 0.88 (95% CI 0.75 to 1.01); p=0.08
NCT0039109229	Local PFS	Open	Breast cancer	Exp: Bevacizumab+docetaxel+trastuzumab	Central: 0.72 (95% CI 0.54 to 0.94); p=0.02
NCT0039109229	Local PFS	Open	Breast cancer	Con: Docetaxel+trastuzumab	Local: 0.82 (95% CI 0.65 to 1.02); p=0.08
NCT0049429987	Central TTP	Double	Hepatocellular carcinoma	Exp: Sorafenib	Central: 0.87 (95% CI 0.70 to 1.09); p=0.25
NCT0049429987	Central TTP	Double	Hepatocellular carcinoma	Con: Placebo	Local: 0.79 (95% CI 0.62 to 1.00); p=0.049
NCT0100794212	Local PFS	Double	Breast cancer	Exp: Everolimus+trastuzumab+vinorelbine	Central: 0.88 (95% CI 0.71 to 1.07); p=NG
NCT0100794212	Local PFS	Double	Breast cancer	Con: Placebo+trastuzumab+vinorelbine	Local: 0.78 (95% CI 0.65 to 0.95); p<0.01
NCT0158464834 35	Local PFS	Double	Melanoma	Exp: Trametinib+dabrafenib	Central: 0.78 (95% CI 0.59 to 1.04); p=NG
NCT0158464834 35	Local PFS	Double	Melanoma	Con: Placebo+dabrafenib	Local: 0.75 (95% CI 0.57 to 0.99); p=0.03
NCT0041206144	Central PFS	Double	Neuroendocrine tumour	Exp: Everolimus+octreotide	Central: 0.77 (95% CI 0.59 to 1.00); p=0.03*
NCT0041206144	Central PFS	Double	Neuroendocrine tumour	Con: Placebo+octreotide	Local: 0.78 (95% CI 0.62 to 0.98); p=0.02*
NCT0005645976	Central PFS	Double	Colorectal adenocarcinoma	Exp: Vatalanib+oxaliplatin+fluorouracil+leucovorin	Central: 0.88 (95% CI 0.74 to 1.03); p=0.12
NCT0005645976	Central PFS	Double	Colorectal adenocarcinoma	Con: Placebo+oxaliplatin+fluorouracil+leucovorin	Local: 0.83 (95% CI 0.70 to 0.98); p=0.03
Secondary endpoint
NCT0039109229	ORR	Open	Breast cancer	Exp: Bevacizumab+docetaxel+trastuzumab	Central: 1.66 (95% CI 1.08 to 2.54); p=0.02
NCT0039109229	ORR	Open	Breast cancer	Con: Docetaxel+trastuzumab	Local: 1.25 (95% CI 0.82 to 1.92); p=0.30
NCT0011229433	ORR	Open	Non-small-cell lung cancer	Exp: Cetuximab+taxane(paclitaxel/docetaxel)+carboplatin	Central: 1.67 (95% CI 1.15 to 2.43); p=0.01
NCT0011229433	ORR	Open	Non-small-cell lung cancer	Con: Taxane(paclitaxel/docetaxel)+carboplatin	Local: 1.31 (95% CI 0.92 to 1.86); p=0.13
NCT0072094157	ORR	Open	Renal cell carcinoma	Exp: Pazopanib	Central: 1.35 (95% CI 1.03 to 1.75); p=0.03
NCT0072094157	ORR	Open	Renal cell carcinoma	Con: Sunitinib	Local: 1.23 (95% CI 0.95 to 1.59); p=0.11
NCT0103078379	ORR	Open	Renal cell carcinoma	Exp: Tivozanib	Central: 1.62 (95% CI 1.10 to 2.39); p=0.01
NCT0103078379	ORR	Open	Renal cell carcinoma	Con: Sorafenib	Local: 1.23 (95% CI 0.85 to 1.78); p=0.26
NCT0152358797	ORR	Open	Non-small-cell lung cancer	Exp: Afatinib	Central: 2.05 (95% CI 0.98 to 4.29); p=0.06
NCT0152358797	ORR	Open	Non-small-cell lung cancer	Con: Erlotinib	Local: 2.88 (95% CI 1.60 to 5.21); p<0.01
NCT01345682105	ORR	Open	Head and neck cancer	Exp: Afatinib	Central: 1.90 (95% CI 0.88 to 4.14); p=0.10
NCT01345682105	ORR	Open	Head and neck cancer	Con: Methotrexate	Local: 3.00 (95% CI 1.3–6.9); p=0.01
NCT00785785110	ORR	Open	Gastrointestinal stromal tumour	Exp: Nilotinib	Central: 0.71 (95% CI 0.52 to 0.96); p=0.03
NCT00785785110	ORR	Open	Gastrointestinal stromal tumour	Con: Imatinib	Local: 0.78 (95% CI 0.57 to 1.06); p=0.12
NCT0038872617	PFS	Open	Breast cancer	Exp: Eribulin	Central: 0.87 (95% CI 0.71 to 1.05); p=0.14
NCT0038872617	PFS	Open	Breast cancer	Con: Treatment of physician’s choice†	Local: 0.76 (95% CI 0.64 to 0.90); p<0.01
NCT0044903368	PFS	Double	Non-small-cell lung cancer	Exp: Sorafenib+gemcitabine+cisplatin	Central: 0.96 (95% CI 0.77 to 1.21); p=0.37*
	PFS	Double	Non-small-cell lung cancer	Con: Placebo+gemcitabine+cisplatin	Local: 0.83 (95% CI 0.71 to 0.97); p<0.01*
	TTP				Central: 0.91 (95% CI 0.67 to 1.23); p=0.26*
	TTP				Local: 0.73 (95% CI 0.60to 0.88); p<0.01*

*One side.

†Any single-agent chemotherapy or hormonal or biological treatment approved for the treatment of cancer.

BSC, best supportive care; central, central assessed; Con, control arm; Double, double blind; Exp, experimental arm; Local, local assessed; NG, not given; Open, open label; ORR, objective response rate; PFS, progression-free survival; Single, single blind; TTP, time to progression.

Trials with statistically inconsistent inferences between central and local assessments *One side. †Any single-agent chemotherapy or hormonal or biological treatment approved for the treatment of cancer. BSC, best supportive care; central, central assessed; Con, control arm; Double, double blind; Exp, experimental arm; Local, local assessed; NG, not given; Open, open label; ORR, objective response rate; PFS, progression-free survival; Single, single blind; TTP, time to progression.

Systematic bias between central and local assessments

All comparison results of pooling analysis are presented at table 3. There was no significant difference in the treatment effects of ORR between central and local assessments (OR: 1.02 (95% CI 0.97 to 1.07), p=0.42; heterogeneity: p=0.91, I2=0%; online supplementary efigure 3). Similarly, no sign of significant difference was in DCR (OR: 0.97 (95% CI 0.92 to 1.03), p=0.32; heterogeneity: p=0.93, I2=0%; online supplementary efigure 4), PFS (HR: 1.01 (95% CI 0.99 to 1.02), p=0.32; heterogeneity: p=1.00, I2=0%; online supplementary efigure 5) and TTP (HR: 1.04 (95% CI 0.95 to 1.14), p=0.37; heterogeneity: p=0.59, I2=0%; online supplementary efigure 6). Subgroup analysis also presented no significant difference between central and local assessments, and no significant interaction effect between different elements of subgroup factors, including open label or blind design (table 3).

Table 3

Summary results of comparing treatment effects between central and local assessments

Summary/subgroup	Objective response rate (ORR)					Disease control rate					Progression-free survival
Summary/subgroup	Study (n)	Patient (n)	OR (95% CI)	P values*	I²†‡	Study (n)	Patient (n)	OR (95% CI)	P values*	I²†‡	Study (n)	Patient (n)	HR (95% CI)	P values*	I²†‡
Summary	29	17 949	1.02 (0.97 to 1.07)	0.42	0%	15	9590	0.97 (0.92 to 1.03)	0.32	0%	72	43 695	1.01 (0.99 to 1.02)	0.32	0%
Funding source
Pharmaceutical	28	17 764	1.02 (0.97 to 1.07)	0.43	0%	14	9405	0.97 (0.92 to 1.03)	0.33	0%	69	41 749	1.01 (0.99 to 1.02)	0.38	0%
Academic	1	185	1.09 (0.61 to 1.95)	0.76		1	185	0.98 (0.67 to 1.42)	0.91		3	1946	1.03 (0.95 to 1.11)	0.50
Mask
Open label	23	14 616	1.03 (0.98 to 1.09)	0.23	0%	12	7777	0.96 (0.91 to 1.02)	0.23	0%	36	20 403	1.01 (0.99 to 1.04)	0.13	0%
Single blind	1	185	1.09 (0.61 to 1.95)	0.76		1	185	0.98 (0.67 to 1.42)	0.91		–	–	–	–
Double blind	5	3148	0.95 (0.83 to 1.08)	0.41		2	1628	1.02 (0.89 to 1.17)	0.77		36	23 292	1.00 (0.98 to 1.02)	0.93
Region
Global	23	15 384	1.01 (0.95 to 1.06)	0.81	24%	12	8636	0.96 (0.91 to 1.02)	0.20	37%	61	38 714	1.01 (0.99 to 1.02)	0.37	0%
Intracontinental	6	2565	1.08 (0.97 to 1.21)	0.17		3	954	1.11 (0.90 to 1.36)	0.34		11	4981	1.01 (0.97 to 1.05)	0.68
Trial design
Superiority	27	15 787	1.02 (0.97 to 1.08)	0.48	0%	15	9590	0.97 (0.92 to 1.03)	0.32	–	69	41 570	1.01 (0.99 to 1.02)	0.34	–
Other§	2	2162	1.03 (0.91 to 1.15)	0.67		–	–	–	–		3	2125	1.01 (0.95 to 1.07)	0.79
Number of enrolled patients
>542	17	13 260	1.00 (0.94 to 1.06)	0.94	11%	10	7711	0.97 (0.91 to 1.03)	0.26	0%	37	30 493	1.01 (0.99 to 1.03)	0.40	0%
<542	12	4689	1.06 (0.97 to 1.16)	0.18		5	1879	1.01 (0.88 to 1.15)	0.93		35	13 202	1.01 (0.98 to 1.04)	0.60
Tumour type
Breast cancer	6	4028	1.08 (0.97 to 1.21)	0.18	0%	5	3435	0.97 (0.88 to 1.07)	0.58	0%	15	10 410	1.00 (0.97 to 1.03)	0.88	0%
NSCLC	8	5172	1.01 (0.93 to 1.11)	0.80		3	2063	1.01 (0.89 to 1.13)	0.91		13	8275	1.02 (0.99 to 1.06)	0.19
Renal cell carcinoma	6	3917	1.01 (0.91 to 1.12)	0.81		3	1951	0.95 (0.84 to 1.08)	0.43		11	6720	1.00 (0.96 to 1.04)	0.87
Ovarian cancer	3	1985	0.98 (0.85 to 1.12)	0.72		1	829	0.92 (0.78 to 1.08)	0.31		6	4536	1.03 (0.99 to 1.09)	0.17
Melanoma	2	435	1.35 (0.91 to 2.00)	0.14		1	185	0.98 (0.67 to 1.42)	0.91		4	1490	1.02 (0.92 to 1.12)	0.74
Others¶	4	2412	0.99 (0.87 to 1.12)	0.83		2	1127	1.00 (0.84 to 1.19)	1.00		23	12 264	1.00 (0.97 to 1.03)	0.94
Response assessment criteria
RECIST	27	17 088	1.02 (0.97 to 1.07)	0.55	0%	14	9405	0.97 (0.92 to 1.03)	0.33	0%	68	40 948	1.01 (0.99 to 1.02)	0.41	0%
WHO	2	861	1.11 (0.90 to 1.37)	0.33		1	185	0.98 (0.67 to 1.42)	0.91		3	2202	1.02 (0.97 to 1.08)	0.44
Not given	–	–	–	–		–	–	–	–		1	545	1.00 (0.88 to 1.33)	0.94
Primary endpoint
Central assessed	17	11 151	1.04 (0.98 to 1.10)	0.23	47%	10	6186	0.97 (0.90 to 1.04)	0.37	0%	40	24 536	1.01 (0.99 to 1.03)	0.44	0%
Others**	7	4465	0.93 (0.83 to 1.04)	0.19		4	2680	0.98 (0.88 to 1.08)	0.65		9	5992	1.02 (0.98 to 1.07)	0.26
Local assessed	5	2333	1.08 (0.94 to 1.24)	0.27		1	724	0.99 (0.79 to 1.24)	0.95		23	13 167	1.00 (0.97 to 1.03)	0.96
Primary endpoint outcome
Positive	19	11 811	1.04 (0.98 to 1.11)	0.24	0%	9	5484	0.99 (0.91 to 1.06)	0.70	0%	50	28 930	1.00 (0.99 to 1.02)	0.63	0%
Indeterminate	1	185	1.09 (0.61 to 1.95)	0.76		1	185	0.98 (0.67 to 1.42)	0.91		1	921	0.98 (0.89 to 1.07)	0.65
Negative	9	5953	0.99 (0.92 to 1.07)	0.86		5	3921	0.95 (0.88 to 1.04)	0.29		21	13 844	1.02 (0.99 to 1.04)	0.24
Statistical inferences between central and local assessments
Consistent	18	10 726	1.02 (0.96 to 1.09)	0.53	0%	8	5094	0.95 (0.88 to 1.03)	0.22	0%	56	32 676	1.00 (0.99 to 1.02)	0.69	0%
Inconsistent	11	7223	1.02 (0.95 to 1.10)	0.60	0%	7	4496	0.99 (0.92 to 1.08)	0.89	0%	16	11 019	1.02 (0.99 to 1.05)	0.20	0%

*P value for the comparison between central and local assessments.

†I2 in summary outcome was for heterogeneity of data synthesis.

‡I2 in subgroup was for subgroup difference, representing the interaction effects between the elements of each subgroup factor.

§Four non-inferiority, one hybrid design combing superiority and non-inferiority.

¶Four gastrointestinal stromal tumour, three pancreatic tumour, three sarcoma, three medullary thyroid cancer, two glioblastoma, two prostate cancer, two neuroendocrine tumour, one colorectal adenocarcinoma, one gastric cancer, one head and neck cancer and one hepatocellular carcinoma.

**Nine overall survival and one unknown-assessed ORR.

RECIST, Response Evaluation Criteria in Solid Tumors.

Summary results of comparing treatment effects between central and local assessments *P value for the comparison between central and local assessments. †I2 in summary outcome was for heterogeneity of data synthesis. ‡I2 in subgroup was for subgroup difference, representing the interaction effects between the elements of each subgroup factor. §Four non-inferiority, one hybrid design combing superiority and non-inferiority. ¶Four gastrointestinal stromal tumour, three pancreatic tumour, three sarcoma, three medullary thyroid cancer, two glioblastoma, two prostate cancer, two neuroendocrine tumour, one colorectal adenocarcinoma, one gastric cancer, one head and neck cancer and one hepatocellular carcinoma. **Nine overall survival and one unknown-assessed ORR. RECIST, Response Evaluation Criteria in Solid Tumors. The strength of the correlation between central and local assessments regarding treatment effect of ORR, DCR, PFS and TTP was 0.91 (p<0.01), 0.93 (p<0.01), 0.94 (p<0.01) and 0.90 (p=0.29), respectively (figure 2).

Figure 2

Scatterplot for the correlation of treatment effects between central and local assessments. DCR, disease control rate; ORR, objective response rate; PFS, progression-free survival; TTP, time to progression.

Discussion

To our knowledge, this is the largest literature review with data analyses investigating blinded independent central review and local assessment in phase III RCTs on solid tumours. Also, it is the first research article to report the statistically inconsistent inferences (significant difference or not) of primary and secondary endpoints assessed by central reviewers and local investigators. We found 22% of trials (17/76) with inconsistent inferences between central and local assessments. However, our subsequent pooling analysis and correlation analysis based on all 76 trials confirmed no sign of systematic bias between central and local assessments, regardless of funding source, mask, region, tumour type, study design, number of enrolled patients, response assessment criteria, primary endpoint and outcome, as well as trials with statistically consistent/inconsistent inferences. Blinded independent central review is used to detect potential bias introduced by the assessment of local investigators. This consideration is based on a common assumption that local investigators might expect superior efficacy of experimental arm treatments compared with control arm treatments, especially in trials with open-label design. Interestingly, among the 17 trials with statistically inconsistent inferences between central and local assessments, more than half of those 17 studies (9/17; 53%) had a statistically significant difference in central assessment; in these 9 trials, 5 (56%) trials were based on open-label design. This means that central assessment seems to have more positive outcomes in favour of experimental treatments in an open-label design, which contradicts the above common assumption. With respect to statistically inconsistent inferences between central and local assessments, we assume evaluation variability is one factor accounting for these. As we understand, variability could be impacted by many subjective factors, causing measurement errors or uncertainty.8 This situation occurs when one scan reviewer assesses the response status of different individual patients, as well as when several reviewers conduct the scan assessment for one trial, regardless of whether this is a central or local assessment. In this situation, the evaluation variability attenuates the treatment effect and reduces the statistical power of the clinical trials.6 8 This understanding has been verified based on 21 phase III cancer trials, demonstrating large variability but no sign of systematic bias between two assessments.112 Missing data could be another factor. It occurs when some patients do not have complete follow-up to determine progression or death, or when patients stop receiving randomised treatments or use alternative treatments before they have progression.113 In oncological clinical trials, missing data are regarded as censoring. Similar to evaluation variability, the effect of censoring would not contribute to systematic bias but could attenuate the treatment effect.113 In the trials included in our study, we consider that evaluation variability, censoring and other unmentioned factors simultaneously played a role in attenuating the treatment effects, resulting in statistically inconsistent inferences between two assessments in 17 of the 76 trials. Whereas, regardless of what causes statistically inconsistent inferences, the robustness of the trial efficacy outcome needs to be carefully considered when two assessments present statistically inconsistent inferences, especially in primary endpoint. Even though this inconsistency is unnecessary to reflect a systematic bias, it would be interesting to know how policy-makers consider the approval process for corresponding anticancer agents to the specific patients with cancer. Considering statistically inconsistent inferences, we believe that blinded independent central review is still a useful method for controlling the risk of bias from local assessment. However, we also question the necessity of central assessment as a routine assessment method for all patients (complete-case fashion) in clinical trials. According to our research, there was no sign of systematic bias: (1) the 95% CIs of all pooled ratios in ORR, DCR, PFS and TTP crossed 1, indicating non-significant difference of the treatment effects between central and local assessments; (2) the 95% CIs were tight as well (especially in PFS), representing quite a precise estimate of the bias that should be negligible. These findings could be further confirmed by our subgroup analysis, even though a small number of the intervals are too wide to be informative due to a limited number of the trials (eg, only one trial used single blind, the OR of ORR was 1.09 (95% CI 0.61 to 1.95)). When questioning the necessity of the complete central assessment, its drawbacks should be considered as well. First, its implementation in the complete-case fashion is very costly. Second, technically it is hard to conduct a real-time central assessment along with local assessment, to determine disease progression independently. In other words, the decision of central reviewers could be impacted by local investigators when the local investigators declare progression, and ‘progressed’ patients may start to receive subsequent-line treatments. Therefore, the progression time of these specific patients is unknown for central reviewers, which is called informative censoring.5 6 9 11 112 Third, based only on imaging information, central reviewers could not conclude progression when patients have symptomatic deterioration. Both information censoring and withdrawal of patients with symptomatic progression (because of no radiological progression in central assessment) may potentially cause bias when the final treatment effects of the experimental arm to the control arm in RCTs are calculated.5 8 114 Fourth, similar to local assessment, central assessment also shares some drawbacks, such as evaluation variability, target-lesion selection and different interpretations on non-target or immeasurable lesions.4 7 In fact, the continuous implementation of the present response assessment criteria, the RECIST and the WHO criteria, has become controversial in the new era of medicine with biomarker-driven therapies, no matter whether for central or local assessment. For instance, when patients are treated with immunotherapies, some tumour lesions might manifest a sign of tumour ‘progression’ based on the RECIST/WHO criteria before manifesting a sign of tumour shrinkage, which is called pseudoprogression.115 Pseudoprogression was initially reported by Wolchok et al. They found that by using the immune-related response criteria (irRC), at least 10% of ipilimumab-treated patients whose response status was characterised as progression disease (PD) based on the WHO criteria could have favourable survival.116 The increased lesion in one case of the study was shown by histopathology as T-cell infiltration instead of tumour proliferation when PD was considered according to the WHO criteria.116 Similar findings have been proved by another two studies that compared the assessment of irRC with RECIST V.1.1, and immune-modified RECIST with RECIST V.1.1, respectively.117 118 Even though in our subgroup analysis the comparison result of central versus local assessments did not present significant difference regardless of the RECIST and WHO criteria, these criteria deserve an improvement for biomarker-driven therapies. Our research has several limitations. First, due to using data from RCTs with both assessments, our outcome may not perfectly match all phase III trials, especially when the trials are implemented by only one type of assessment. Another situation that needs to be considered is trials evaluating two radiological assessment methods, but eventually reporting the outcomes based only on one assessment in published articles. In this situation, a statistically positive outcome may be reported in one assessment; whereas, the ‘not-yet-reported’ outcome of another assessment might be negative. Second, we included trials covering all solid tumours instead of focusing on one specific tumour type, in that we assumed that our research outcome could not be strongly impacted by tumours’ biological characteristics when comparing specific trial processes (eg, central and local assessments) based on study-level strategy. Our subgroup analysis based on different tumour types verified our assumption. Furthermore, individual-level data would have been the best option for our research, but we did not have access yet. However, we consider that using study-level data reported in each published article is still a good option because the aim of our research type is to investigate study-level issues. Moreover, given that the effect of informative censoring might exist on the treatment effects of PFS and TTP, we also included another important endpoint, ORR, in order to acquire a more exact understanding about whether the treatment effects of both assessments are consistent or not. In this circumstance, the effect of informative censoring could be eliminated because when assessing ORR, central reviewers and local investigators worked independently before local investigators declared progression. Lastly, even though we have done our best to minimise inconsistency during the process of data extraction, it is possible that potential errors may have accrued. Nevertheless, all reviewers have tried to ensure consistency for data interpretation. In conclusion, we estimate that there was essentially no systematic bias between local and central assessments, as evidenced by our precisely estimated pooled ratios of OR in ORR and DCR, as well as estimated pooled ratios of HR in PFS and TTP. Despite this, we found that statistically inconsistent inferences could be made in many trials depending on whether central or local assessment was used. Considering these, we think blinded independent central review is still an irreplaceable method for controlling the risk of bias from local assessment, but its routine usage for all patients may be unnecessary in oncological randomised controlled trials.

115 in total

1. Everolimus in postmenopausal hormone-receptor-positive advanced breast cancer.

Authors: José Baselga; Mario Campone; Martine Piccart; Howard A Burris; Hope S Rugo; Tarek Sahmoud; Shinzaburo Noguchi; Michael Gnant; Kathleen I Pritchard; Fabienne Lebrun; J Thaddeus Beck; Yoshinori Ito; Denise Yardley; Ines Deleu; Alejandra Perez; Thomas Bachelot; Luc Vittori; Zhiying Xu; Pabak Mukhopadhyay; David Lebwohl; Gabriel N Hortobagyi
Journal: N Engl J Med Date: 2011-12-07 Impact factor: 91.245

Review 2. Blinded independent central review of progression-free survival in phase III clinical trials: important design element or unnecessary expense?

Authors: Lori E Dodd; Edward L Korn; Boris Freidlin; C Carl Jaffe; Lawrence V Rubinstein; Janet Dancey; Margaret M Mooney
Journal: J Clin Oncol Date: 2008-08-01 Impact factor: 44.544

3. Trastuzumab emtansine for HER2-positive advanced breast cancer.

Authors: Sunil Verma; David Miles; Luca Gianni; Ian E Krop; Manfred Welslau; José Baselga; Mark Pegram; Do-Youn Oh; Véronique Diéras; Ellie Guardino; Liang Fang; Michael W Lu; Steven Olsen; Kim Blackwell
Journal: N Engl J Med Date: 2012-10-01 Impact factor: 91.245

4. Primary results of ROSE/TRIO-12, a randomized placebo-controlled phase III trial evaluating the addition of ramucirumab to first-line docetaxel chemotherapy in metastatic breast cancer.

Authors: John R Mackey; Manuel Ramos-Vazquez; Oleg Lipatov; Nicole McCarthy; Dmitriy Krasnozhon; Vladimir Semiglazov; Alexey Manikhas; Karen A Gelmon; Gottfried E Konecny; Marc Webster; Roberto Hegg; Sunil Verma; Vera Gorbunova; Dany Abi Gerges; Francois Thireau; Helena Fung; Lorinda Simms; Marc Buyse; Ayman Ibrahim; Miguel Martin
Journal: J Clin Oncol Date: 2014-09-02 Impact factor: 44.544

5. Progression-free survival by local investigator versus independent central review: comparative analysis of the AGO-OVAR16 Trial.

Authors: Anne Floquet; Ignace Vergote; Nicoletta Colombo; Bent Fiane; Bradley J Monk; Alexander Reinthaller; Paula Calvert; Thomas J Herzog; Werner Meier; Jae-Weon Kim; Josep M del Campo; Michael Friedlander; Carmela Pisano; Seiji Isonishi; Rocco J Crescenzo; Catherine Barrett; Karrie Wang; Ionel Mitrica; Andreas du Bois
Journal: Gynecol Oncol Date: 2014-11-28 Impact factor: 5.482

6. Independent radiologic review of the Gynecologic Oncology Group Study 0218, a phase III trial of bevacizumab in the primary treatment of advanced epithelial ovarian, primary peritoneal, or fallopian tube cancer.

Authors: Robert A Burger; Mark F Brady; Joon Rhee; Mika A Sovak; George Kong; Hoa P Nguyen; Michael A Bookman
Journal: Gynecol Oncol Date: 2013-07-29 Impact factor: 5.482

7. Sunitinib malate for the treatment of pancreatic neuroendocrine tumors.

Authors: Eric Raymond; Laetitia Dahan; Jean-Luc Raoul; Yung-Jue Bang; Ivan Borbath; Catherine Lombard-Bohas; Juan Valle; Peter Metrakos; Denis Smith; Aaron Vinik; Jen-Shi Chen; Dieter Hörsch; Pascal Hammel; Bertram Wiedenmann; Eric Van Cutsem; Shem Patyna; Dongrui Ray Lu; Carolyn Blanckmeister; Richard Chao; Philippe Ruszniewski
Journal: N Engl J Med Date: 2011-02-10 Impact factor: 91.245

8. IMA901, a multipeptide cancer vaccine, plus sunitinib versus sunitinib alone, as first-line therapy for advanced or metastatic renal cell carcinoma (IMPRINT): a multicentre, open-label, randomised, controlled, phase 3 trial.

Authors: Brian I Rini; Arnulf Stenzl; Romauld Zdrojowy; Mikhail Kogan; Mikhail Shkolnik; Stephane Oudard; Steffen Weikert; Sergio Bracarda; Simon J Crabb; Jens Bedke; Joerg Ludwig; Dominik Maurer; Regina Mendrzyk; Claudia Wagner; Andrea Mahr; Jens Fritsche; Toni Weinschenk; Steffen Walter; Alexandra Kirner; Harpreet Singh-Jasuja; Carsten Reinhardt; Tim Eisen
Journal: Lancet Oncol Date: 2016-10-03 Impact factor: 41.316

9. Phase III study of afatinib or cisplatin plus pemetrexed in patients with metastatic lung adenocarcinoma with EGFR mutations.

Authors: Lecia V Sequist; James Chih-Hsin Yang; Nobuyuki Yamamoto; Kenneth O'Byrne; Vera Hirsh; Tony Mok; Sarayut Lucien Geater; Sergey Orlov; Chun-Ming Tsai; Michael Boyer; Wu-Chou Su; Jaafar Bennouna; Terufumi Kato; Vera Gorbunova; Ki Hyeong Lee; Riyaz Shah; Dan Massey; Victoria Zazulina; Mehdi Shahidi; Martin Schuler
Journal: J Clin Oncol Date: 2013-07-01 Impact factor: 44.544

10. Everolimus plus exemestane in postmenopausal patients with HR(+) breast cancer: BOLERO-2 final progression-free survival analysis.

Authors: Denise A Yardley; Shinzaburo Noguchi; Kathleen I Pritchard; Howard A Burris; José Baselga; Michael Gnant; Gabriel N Hortobagyi; Mario Campone; Barbara Pistilli; Martine Piccart; Bohuslav Melichar; Katarina Petrakova; Francis P Arena; Frans Erdkamp; Wael A Harb; Wentao Feng; Ayelet Cahana; Tetiana Taran; David Lebwohl; Hope S Rugo
Journal: Adv Ther Date: 2013-10-25 Impact factor: 3.845

4 in total

Review 1. Local Investigators Significantly Overestimate Overall Response Rates Compared to Blinded Independent Central Reviews in Uncontrolled Oncology Trials: A Comprehensive Review of the Literature.

Authors: Cinzia Dello Russo; Pierluigi Navarra
Journal: Front Pharmacol Date: 2022-05-16 Impact factor: 5.988

2. A randomised, multicentre open-label phase II study to evaluate the efficacy, tolerability and pharmacokinetics of oral vinorelbine plus cisplatin versus intravenous vinorelbine plus cisplatin in Chinese patients with chemotherapy-naive unresectable or metastatic non-small cell lung cancer.

Authors: Yunpeng Yang; Jianhua Chang; Cheng Huang; Yiping Zhang; Jie Wang; Yongqian Shu; Jean Philippe Burillon; Marcello Riggi; Aurélie Petain; Pierre Ferre; Ying Liang; Li Zhang
Journal: J Thorac Dis Date: 2019-08 Impact factor: 2.895

3. Phase 2 study of cemiplimab in patients with metastatic cutaneous squamous cell carcinoma: primary analysis of fixed-dosing, long-term outcome of weight-based dosing.

Authors: Danny Rischin; Michael R Migden; Annette M Lim; Chrysalyne D Schmults; Nikhil I Khushalani; Brett G M Hughes; Dirk Schadendorf; Lara A Dunn; Leonel Hernandez-Aya; Anne Lynn S Chang; Badri Modi; Axel Hauschild; Claas Ulrich; Thomas Eigentler; Brian Stein; Anna C Pavlick; Jessica L Geiger; Ralf Gutzmer; Murad Alam; Emmanuel Okoye; Melissa Mathias; Vladimir Jankovic; Elizabeth Stankevich; Jocelyn Booth; Siyu Li; Israel Lowy; Matthew G Fury; Alexander Guminski
Journal: J Immunother Cancer Date: 2020-06 Impact factor: 13.751

4. Design characteristics, risk of bias, and reporting of randomised controlled trials supporting approvals of cancer drugs by European Medicines Agency, 2014-16: cross sectional analysis.

Authors: Huseyin Naci; Courtney Davis; Jelena Savović; Julian P T Higgins; Jonathan A C Sterne; Bishal Gyawali; Xochitl Romo-Sandoval; Nicola Handley; Christopher M Booth
Journal: BMJ Date: 2019-09-18

4 in total