Literature DB >> 33887690

Biases in study design, implementation, and data analysis that distort the appraisal of clinical benefit and ESMO-Magnitude of Clinical Benefit Scale (ESMO-MCBS) scoring.

B Gyawali¹, E G E de Vries², U Dafni³, T Amaral⁴, J Barriuso⁵, J Bogaerts⁶, A Calles⁷, G Curigliano⁸, C Gomez-Roca⁹, B Kiesewetter¹⁰, S Oosting², A Passaro¹¹, G Pentheroudakis¹², M Piccart¹³, F Roitberg¹⁴, J Tabernero¹⁵, N Tarazona¹⁶, D Trapani¹⁷, R Wester¹⁸, G Zarkavelis¹⁹, C Zielinski²⁰, P Zygoura²¹, N I Cherny²².

Abstract

BACKGROUND: The European Society for Medical Oncology-Magnitude of Clinical Benefit Scale (ESMO-MCBS) is a validated, widely used tool developed to score the clinical benefit from cancer medicines reported in clinical trials. ESMO-MCBS scores assume valid research methodologies and quality trial implementation. Studies incorporating flawed design, implementation, or data analysis may generate outcomes that exaggerate true benefit and are not generalisable. Failure to either indicate or penalise studies with bias undermines the intention and diminishes the integrity of ESMO-MCBS scores. This review aimed to evaluate the adequacy of the ESMO-MCBS to address bias generated by flawed design, implementation, or data analysis and identify shortcomings in need of amendment.
METHODS: As part of a refinement of the ESMO-MCBS, we reviewed trial design, implementation, and data analysis issues that could bias the results. For each issue of concern, we reviewed the ESMO-MCBS v1.1 approach against standards derived from Helsinki guidelines for ethical human research and guidelines from the International Council for Harmonisation of Technical Requirements for Pharmaceuticals for Human Use, the Food and Drugs Administration, the European Medicines Agency, and European Network for Health Technology Assessment.
RESULTS: Six design, two implementation, and two data analysis and interpretation issues were evaluated and in three, the ESMO-MCBS provided adequate protections. Seven shortcomings in the ability of the ESMO-MCBS to identify and address bias were identified. These related to (i) evaluation of the control arm, (ii) crossover issues, (iii) criteria for non-inferiority, (iv) substandard post-progression treatment, (v) post hoc subgroup findings based on biomarkers, (vi) informative censoring, and (vii) publication bias against quality-of-life data.
CONCLUSION: Interpretation of the ESMO-MCBS scores requires critical appraisal of trials to understand caveats in trial design, implementation, and data analysis that may have biased results and conclusions. These will be addressed in future iterations of the ESMO-MCBS.

Entities: Chemical Disease Gene Species

Keywords: ESMO-MCBS; bias; clinical trial analysis; clinical trial design; clinical trial implementation; clinical trial reporting

Year: 2021 PMID： 33887690 PMCID： PMC8086024 DOI： 10.1016/j.esmoop.2021.100117

Source DB: PubMed Journal: ESMO Open ISSN： 2059-7029

Introduction

The European Society for Medical Oncology-Magnitude of Clinical Benefit Scale (ESMO-MCBS) was first published in 2015 and revised in 2017., With a growing recognition that many cancer medicines provided modest benefits disproportionate to their high costs, the oncology community needed a tool that could objectively assess the clinical benefit from cancer medicines, assist in comparison with other similar medicines, and guide regulatory and reimbursement decisions. The ESMO-MCBS was established to address these needs., To reduce bias and error in grading, the scale has been developed in close adherence to the principles of ‘accountability for reasonableness’, a standard for ethical public health decision-making processes. The ESMO-MCBS aims to highlight treatments with a substantial level of clinical benefit for patients and distinguish those from studies demonstrating only moderate, minor, or marginal clinical benefit. Within ESMO, the ESMO-MCBS is used in clinical practice guidelines and provides a structured approach to evaluate clinical research data. On its website, ESMO has an open access searchable portal detailing >230 clinical studies (Scorecards) assessed using the ESMO-MCBS. Internationally, a high ESMO-MCBS score is currently valued and adopted by the World Health Organization Essential Medicines List (WHO EML) and Health Technology Assessment bodies worldwide. These global health applications underscore the importance of the ESMO-MCBS commitments to ‘accountability for reasonableness’ and continual efforts to improve the scoring process’s validity. ESMO-MCBS scores assume valid research methodologies and high-quality trial implementation. Studies that incorporate flawed design, implementation, and/or data analysis may generate biased outcomes and conclusions that exaggerate real benefit and are not generalisable. This subverts the intention of the ESMO-MCBS to give representative grading to the benefit observed in generalisable data and compromises its integrity. Therefore, as part of the ongoing commitment to improving the validity of the scoring process, we undertook a review of trial design, implementation, and analysis issues that could bias the results and reviewed the adequacy of the ESMO-MCBS v1.1 to address these issues and identify shortcomings to redress in future revisions.

Methodology

Based on experience in evaluating the magnitude of benefit in clinical studies, ESMO-MCBS Working Group and Extended Working Group members (all listed in authorship) identified issues in study design, implementation, and data analysis that may influence study outcomes and compromise the veracity of the ESMO-MCBS scores. We conducted a review for each of these issues, including definitions, relevant policy documents derived from regulatory authorities, relevant literature, and illustrative studies. The policy documents included the World Medical Association Helsinki Declaration for Ethical Principles for Human Research, and guidelines from the International Council for Harmonisation of Technical Requirements for Pharmaceuticals for Human Use (ICH),6, 7, 8 the Food and Drugs Administration (FDA),9, 10, 11 the European Medicines Agency (EMA),12, 13, 14 and the European Network for Health Technology Assessment.15, 16, 17, 18, 19 For each issue we reviewed the ESMO-MCBS v1.1 approach to identify shortcomings of the scale to adequately address and document the corresponding sources of bias.

Results

Design issues

Six issues in study design that could bias benefit evaluation were considered (Figure 1).

Figure 1

Issues in study design, implementation, and data analysis that may influence study outcomes and compromise the ESMO-MCBS scores.

HR, hazard ratio; NI, non-inferiority; QoL, quality of life.

Issues in study design, implementation, and data analysis that may influence study outcomes and compromise the ESMO-MCBS scores. HR, hazard ratio; NI, non-inferiority; QoL, quality of life.

Substandard control arm

Rationale

Data derived from studies with a comparator (control) arm inferior to the standard of care (SOC), may bias the outcome by generating a larger benefit than if SOC had been used.,,,

Regulations

According to the Helsinki Declaration, the comparator arm of a randomised, clinical trial (RCT) must be ‘the best-proven intervention(s)’. The ICH guidelines emphasise the importance of using appropriate dosing and scheduling of the control. The Helsinki Declaration allows two exceptions: (i) when no proven intervention exists and (ii) when there are compelling and scientifically sound methodological reasons for using a less than best-proven control therapy. The Helsinki Declaration allows the use of placebo, no intervention, or a lesser SOC if deemed necessary to determine an intervention’s efficacy or safety. However this is only permitted on the condition that subjects receiving the control arm will not be subject to additional risks of serious or irreversible harm. The guidelines add the admonition that ‘extreme care must be taken to avoid abuse of this option.’ For non-inferiority (NI), the ICH emphasises that the control arm should comprise ‘a drug acceptable in the region to which the studies will be submitted (for licensing) for the same indication’. Therefore, it is incumbent upon researchers to demonstrate that the control arm is consistent with the SOC at study initiation or that any deviation is adequately justified. The justification must present compelling and scientifically sound methodological reasons for the deviation and that participants will not be subject to serious harm. Institutional Review Boards (IRBs) are responsible for ensuring compliance with these conditions. For registration trials, this adjudication is often guided by the regulatory agencies themselves.

Illustrative case

The NEMO study in treatment-naive or pretreated patients with advanced NRAS-mutated melanoma randomised 402 participants in a 2 : 1 ratio, between August 2013 and April 2015, to receive binimetinib or dacarbazine. Seventy-nine percent of the participants were treatment-naive. Dacarbazine, the control arm for treatment-naive patients, was already proven to be inferior to ipilimumab immunotherapy plus dacarbazine. Ipilimumab monotherapy was subsequently licensed as first-line treatment in 2011 by both the EMA and FDA. Consequently, patients in the control arm were deprived of the best, licensed upfront treatment, and in the first-line setting the marginal benefit of binimetinib was only demonstrated relative to a suboptimal comparator.

ESMO-MCBS v1.1

ESMO-MCBS relies on the integrity of the IRB and regulatory agencies to evaluate the control arm’s adequacy.

Shortcoming

The ESMO-MCBS does not independently evaluate the control arm’s appropriateness, nor does it have a mechanism to either indicate or penalise studies with a substandard control arm.

The predictive reliability of surrogate endpoints

Definitions

Surrogate outcome endpoints provide an indirect measurement when direct measurement of clinical effect is not feasible or practical. While they aim to predict clinical benefits such as prolonged survival or improved quality of life (QoL), the reliability and strength of surrogates’ predictive capacity vary. The effect of an improved surrogate endpoint may not directly benefit the patient. Commonly used surrogate outcomes in cancer trials include a decrease in tumour size response rate (RR) and delays in tumour progression [progression-free survival (PFS); disease-free survival (DFS)].,,

Limitations of surrogate outcomes

The validity of a surrogate outcome depends on its reliability as a predictor of true clinical benefit, i.e. longer survival or improved QoL.,,, Hitherto, no outcome measure in oncology has been found to have absolute surrogacy for true clinical benefit across diseases and treatments.25, 26, 27, 28, 29 As stated by the ICH, there is concern that they may not reliably predict clinical benefit. Evaluation of DFS as a surrogate for overall survival (OS) in adjuvant therapy studies, found that predictive reliability is variable across diseases and, overall, it is at best characterised as moderate.,,, Even within the same tumour type, there may be differences in predictive reliability of DFS based on tumour subtypes: for example, DFS is a better surrogate for OS in HER2-positive breast cancer than for other breast cancer subtypes. In studies evaluating therapies in non-curative settings, PFS and time to progression provide information about the biological activity and may indicate the possibility of benefit to patients., However, they are not reliable surrogates for improved OS31, 32, 33, 34, 35, 36 or QoL, in all patients. RR and pathological complete response (pCR) rate are also weak predictors of improved OS., ESMO-MCBS v1.1 considers surrogacy in its weighting. Using ESMO-MCBS form 1, DFS scores are only creditable in the adjuvant setting if OS data are immature. If mature OS results do not demonstrate benefit, surrogacy is not confirmed, and the study is considered to not provide evaluable benefit (labelled ‘No evaluable benefit’). Studies showing benefit based on pCR are credited at the lowest level, C, and only if a relatively high threshold marginal benefit is demonstrated. In the non-curative setting, when the primary endpoint is PFS or RR, several stringencies are applied. The preliminary grades are capped: for studies using PFS as primary endpoint at 3 and for RR at 2, with penalties for adverse effects. Furthermore, when PFS is the primary endpoint a non-significant OS gain at mature follow-up and QoL evaluation indicating neither improvement nor delayed deterioration is considered as refutation of surrogacy, and the score is downgraded by one point. Hitherto, in v1.1, it was assumed that DFS did not confer patient benefit independent of OS. The approach of ESMO-MCBS v1.1 to the grading of DFS was recently reviewed and considered unreasonable. Patients and other stakeholders appealed that the ESMO-MCBS approach to DFS does give credit to the benefit of added time without treatment or the burden of disease for a proportion of patients independent of any impact (or lack thereof) on mature OS. This is illustrated by the meta-analysis of trastuzumab in HER2 overexpressed, hormone receptor-negative early breast cancer with less than two involved nodes. After a median of 8 years follow-up, there was a 5.9% gain in DFS, but the OS gain was not significant. The ESMO-MCBS Working Group has concluded that DFS is an intermediate endpoint (i.e. a surrogate endpoint that may also directly have some patient benefits) that is worthy of a lower but persistent credit if OS benefit is not achieved. This consideration is incorporated in the draft revision of the ESMO-MCBS v2, and it is currently undergoing field testing and review.

Crossover

In an RCT, crossover implies patients randomised to the control arm of the trial get the intervention allocated to the experimental arm upon disease progression. Crossover has methodological and ethical implications, depending on the medicine and line of therapy., When a medicine has already been approved, is the SOC for later lines, and is being evaluated for an earlier line, the trial design should incorporate crossover. This is called appropriate or desirable crossover., In such situations, since the experimental therapy is part of subsequent standard care, the clinical question is whether using the same drug earlier improves OS versus using it later in the disease course. Failure to incorporate crossover in this setting harms participants on the control arm by not ensuring that they receive optimal post-progression therapy and may exaggerate the observed OS benefits. If a medicine, never approved for a condition, is being tested in a trial, then crossover design is generally undesirable.41, 42, 43 Since the new medicine’s efficacy is unknown, there is no ethical mandate for the control arm patients to receive the medicine upon relapse. Furthermore, crossover in this setting undermines the ability to determine the impact of the intervention on OS, and if crossover delays initiation of proven subsequent therapies, it may adversely impact patient well-being. For these reasons, crossover in this setting is discouraged by the EMA and FDA.,

Illustrative cases

Failure to incorporate appropriate crossover

Abiraterone acetate was approved for use in patients with chemotherapy-naive metastatic castration-resistant prostate cancer (CRPC) in 2012 and has become the SOC in that setting based on the COU-AA-302 trial showing prolonged OS., Between 2013 and 2014, abiraterone was tested versus placebo in chemotherapy-naive patients with castration-sensitive prostate cancer in the LATITUDE trial. In that study, only 11% of patients on the placebo arm received abiraterone upon progression to CRPC. A substantial OS benefit {hazard ratio (HR) 0.66 [95% confidence interval (CI): 0.56-0.78]} generated a high ESMO-MCBS score of 4. However, due to the lack of crossover, we do not know whether using abiraterone earlier while the tumour is castration-sensitive is better than using the same drug while castration-resistant. Furthermore, since abiraterone had improved OS for patients with CRPC, the control arm patients were potentially harmed by not receiving a proven post-progression therapy.

Incorporation of undesirable crossover

In the IMPACT trial, which randomised patients with low volume metastatic CRPC to the autologous dendritic cell therapeutic vaccine sipuleucel-T, or placebo, patients who progressed on the control arm were allowed a frozen version of the vaccine, even though its efficacy had not been proven. Outside the trial, these patients would have immediately received docetaxel chemotherapy that had previously demonstrated survival advantage and improved QoL in this setting. In the study, treatment with sipuleucel-T did not affect RR or PFS compared with placebo, but it was associated with improved OS. The crossover of 64% patients in the control arm to the frozen vaccine version confounded interpretation of the findings since it was uncertain whether prolonged survival was because of treatment efficacy in the experimental arm or delayed access to docetaxel in the control arm. ESMO-MCBS Scorecards indicate whether crossover is allowed or not allowed. The ESMO-MCBS does not have a mechanism to either indicate or penalise studies with inappropriate or inadequate crossover.

Early stopping of clinical trials

Definition

Early stopping rules allow for a study to terminate earlier than planned, with all patients crossing to the superior therapy, because of the result of an interim analysis showing larger than expected benefit or harm of the experimental intervention that adequately undermines equipoise., These stopping boundaries are stringent and based on solid statistical methodology., Cancer drug trials may be stopped early based on an interim analysis of time-to-event probability (DFS, PFS, or OS) when the HR crosses the stopping boundary.

Concern

Under the statistical rules applied, trials that are stopped early may overestimate the magnitude of benefit. The sooner the trial is stopped, the more impressive the HR will look since the stopping criteria are more stringent early in the trial course. Hence, although the medicine is likely effective, the true benefit may be smaller in magnitude. Such overestimations of the treatment effect’s magnitude are particularly important when the primary endpoint is not a definitive endpoint like OS but a surrogate endpoint such as PFS. In solid tumours, PFS is scorable only if the median PFS of the control arm has been reached. Consistent with EMA guidance, there is no extra credit for early stopping based on PFS. If, however, early stopping is triggered by interim analysis of OS gain meeting pre-specified statistical criteria, the gain already credited for PFS in the preliminary score is upgraded by one point. None identified.

Inflated RRs and durations in single-arm trials

In settings where there is no available therapy and where measurable reduction in tumour size meeting the RECIST criteria, can be attributed to the tested medicine, regulatory authorities often accept overall RR (ORR) and duration of response (DoR) derived from single-arm studies as adequate evidence supporting accelerated approval,,, and occasionally full (regular) approval.

Limitations of single-arm studies

Studies have shown that ORR and DoR in single-arm trials are higher than the ORR and DoR when the same medicine for the same indication is tested in an RCT., Furthermore, ORR is a poor surrogate for OS or QoL., The scoring of single-arm studies using the ESMO-MCBS form 3 applies two stringencies. The preliminary score for single-arm studies is capped at 3, and penalties are applied for adverse events. The score may be upgraded by one point if the findings are confirmed in a phase IV study or cancelled if accelerated approval is subsequently withdrawn. None identified.

NI design trials

In some cases, an investigational product is tested not to show superiority over the SOC but to demonstrate that for the primary outcome, the new agent is not worse than the active control by more than a pre-specified small amount, known as an NI margin.,10, 11, 12 Benefit from the novel agent is demonstrated if it is less burdensome, less expensive, if it has less adverse effects, or if associated with improved QoL. Defining the NI margin is critical. According to ICH standards, the NI margin, expressed by an upper limit of the 95% CI for the relevant endpoint, is the largest difference that can be judged as clinically acceptable. Moreover, it should be less than the gain observed in superiority trials of the active comparator. Non-adherence to the assigned treatment is particularly problematic in NI studies since it will bias the study toward concluding NI. Consequently, monitoring treatment adherence by investigators and by the independent data-monitoring committee is crucial in these studies. Therefore, unlike superiority studies, both an intention-to-treat (ITT) analysis and a per-protocol analysis are required by the FDA and EMA for NI studies.,,,,

Concerns regarding NI margin

If the defined NI margin is too lenient, there is a concern that treatments with true inferiority may seem non-inferior. Regretfully, the biostatistical rules for defining NI have not been standardised. A recent analysis showed that cancer medicine trials used an NI threshold as high as 1.33 for the upper limit of the 95% CI for the HR of OS. Consequently, it is plausible that if NI definitions are too lenient, NI may be credited even when substantial differences in the treatment arms exist. If a previous superiority trial has demonstrated gains, a substantial percentage of these gains must be preserved. ESMO-MCBS relies on IRB processes’ integrity to evaluate the validity of the NI thresholds. NI studies can be scored using the ESMO-MCBS form 1 in the adjuvant setting (grade B) and form 2c in the advanced setting (grade 4). The ESMO-MCBS v1.1 only credits NI design trials if NI is confirmed according to pre-specified statistical criteria and if the study demonstrates benefits of reduced costs, adverse effects, or benefits in global QoL. NI alone is not the basis for any credit of benefit. ESMO-MCBS does not have rules to determine the validity of the pre-specified NI margin.

Study implementation issues

Two issues of study implementation and reporting were considered: (1) the impact of post-progression subsequent treatments on OS and (2) the publication bias in the reporting of QoL data (Figure 1).

Post-progression subsequent therapies

Most RCTs involve evaluating a single period of randomisation between a novel treatment and an active control. In studies of first- or second-line therapies in solid tumours, most patients will subsequently receive one or more lines of post-progression treatment, which influences OS. In some settings, such as hormone-responsive breast cancer, it is not uncommon for patients to receive more than five subsequent therapy lines. When patients receive optimal post-progression therapy, any advantage gained by the experimental treatment may be impacted by subsequent therapies. When the PFS gain is maintained or even improved after optimal post-progression therapies and reflected in an OS gain, the benefit is recognised as being important. However, when PFS gains are diluted after optimal post-progression therapies and reflected in no significant OS gain, the benefits may be relatively trivial. This, however, is not the case when patients also derived qualitative benefits such as delayed deterioration or improvement in QoL. The ICH guidelines state that efforts should be made to collect all data pertinent to the relevant outcomes, including the occurrence and timing of intercurrent events. They emphasise that clinical trials are less generalisable if the sponsor tries to avoid or minimise these issues. Post-progression treatments constitute an intercurrent event that is pertinent to OS. While some degree of attrition may be expected post-progression, the acceptable thresholds should be judged based on previous experiences from real-world studies.

Concerns regarding post-progression treatments

Failure to provide optimal post-progression treatment can exaggerate the impact of a PFS gain on OS even when both arms receive the same suboptimal therapies.,, This underscores the importance of documenting post-progression subsequent treatments until death as part of routine follow-up data. The MONALEESA-7 study evaluated hormonal therapy with ribociclib or placebo in the first- or second-line treatment of premenopausal women with estrogen receptor-expressing breast cancer. Patients receiving ribociclib had a PFS gain of 10.8 months. A planned interim analysis of OS at 76% of anticipated deaths showed a large OS gain that met pre-specified significance thresholds. Applying ESMO-MCBS v1.1, the MONALEESA-7 study achieved a preliminary score of 4, which was upgraded to 5 after QoL data demonstrated delayed deterioration in global QoL. The paper indicated that 26.8% of the patients in the control arm and 31.1% of patients in the ribociclib arm received no further subsequent treatments at disease progression after the first line of therapy. Although some degree of attrition is expected with each subsequent line of therapy, nearly one-third of patients not getting any subsequent therapy post first-line is an astoundingly aberrant figure given that most women with estrogen receptor-positive HER2-negative breast cancer routinely survive for >2 years after first progression and generally receive four subsequent lines of therapy or more. This major divergence from SOC for a substantial proportion of patients renders the OS data from this study non-generalisable. Indeed, it is plausible that the failure to provide subsequent standard therapy to more than a quarter of the patients who progressed on the study may have exaggerated the OS gain from ribociclib compared with placebo. The ESMO-MCBS does not indicate or penalise studies in which OS benefit may have been exaggerated by substandard post-progression treatment.

Publication bias in the reporting of QoL data

Publication bias occurs when the outcome of an experiment or research study influences the decision to publish or otherwise distribute it.

Publication bias in QoL results

QoL data remain missing for many trials. Most QoL data from trials go unpublished or are substantially delayed, even when the primary study results are positive. When QoL is evaluated as a secondary outcome in clinical studies, the generated results impact ESMO-MCBS scoring. When the QoL benefits are reported in studies applying a valid scale, with an adequately complete dataset and using valid statistical criteria, ESMO-MCBS scores are upgraded one point for evaluations in the non-curative setting. When the primary outcome is PFS with secondary outcomes of OS and QoL, and the subsequent mature OS does not demonstrate any survival advantage, the surrogacy of the PFS finding is dependent on the QoL results. In this scenario, a negative QoL finding without improvement or delayed deterioration in global QoL results in readjusting the score with a one point downgrade. Failure to publish negative QoL results or substantial publication delay subverts this important score adjustment. ESMO-MCBS does not address non-publication or delayed publication of QoL data.

Issues related to analysis of trial data

Two issues related to the analysis and interpretation of trial data were considered: (1) conjectural findings from exploratory and unplanned analyses and (2) informative censoring (Figure 1).

Conjectural findings from exploratory and unplanned analyses

A conjecture is an unproven proposition suspected to be true based on preliminary supporting evidence. ‘Conjectural findings’ relate to the evaluation of efficacy based upon incomplete or suboptimal data. These include findings from post hoc subgroup analyses or exploratory analyses outside of the statistical plan. ‘Conjectural findings’ contrast with ‘confirmatory findings’ derived from primary analysis in a study with a pre-specified and justified statistical plan and a significant positive outcome. In many instances, subgroup analyses with appropriate adjustment for multiplicity of testing and alpha splitting are part of the planned confirmatory analysis and are incorporated into the statistical plan. The EMA guideline on the investigation of subgroups in confirmatory clinical trials describes two types of conjectural analyses: (i) when the evidence of benefit in the primary analysis population is statistically significant but of small magnitude, it is of post hoc interest to identify and to distinguish between subgroups more or less likely to derive clinically meaningful benefit, and (ii) when a study fails to establish statistically significant evidence of benefit in the primary analysis population, and there is interest in identifying a subgroup where the treatment may be effective.

Concerns

Conjectural findings increase the probability of false-positive findings, i.e. the magnitude of clinical benefit is falsely concluded to be greater than in the primary analysis population., False-negative conclusions, in which a subgroup is inaccurately identified as being unlikely to benefit, are equally important. The ICH guidelines, endorsed by FDA and EMA, exhort that findings from post hoc subgroup analyses should be interpreted cautiously. The EMA guideline outlines a structured approach to conjectural evaluation based on (i) external evidence that the subgroup of interest is well defined and clinically relevant, (ii) plausible explanation for different efficacy (or risk–benefit) in a sub-population and its complement, (iii) substantially different results and, when possible (iv) replication of similar subgroup findings from other relevant trials. In a draft guideline that is not yet ratified, the FDA expresses the concern that investigators’ or sponsors’ incentives can influence the choice of analyses to identify one or more positive findings. The ESMO-MCBS v1.1 distinguishes confirmatory findings, based on the pre-specified endpoints and statistical plan, and conjectural findings, based on post hoc and exploratory analyses. Confirmatory findings of clinical benefit, including pre-specified subgroups, are scored. The ESMO–MCBS v1.1 constrains the number of pre-specified subgroups (no more than 3) and allows separate subgroups grading when adjusted for multiplicity. Conjectural findings based on post hoc subgroup analyses and exploratory endpoints are not eligible for scoring by the ESMO-MCBS v1.1. An exception is made for studies that incorporate tissue samples collection to enable restratification based on plausible new genetic or other biomarkers. When conjectural findings form the basis for regulatory approval, the ESMO Clinical Practice Guidelines and E-Updates’ approach is to present the ITT and planned subgroup data and scoring in the tables. The relevant conjectural data relating to the regulatory approval are discussed in the text and annotated below the ESMO-MCBS tabulations. The APHINITY trial tested adjuvant pertuzumab in patients with HER2-positive breast cancer and showed marginal gains in DFS for the ITT population. The publication, however, reported the findings of 12 post hoc subgroup analyses and highlighted better outcomes among patients who had node-positive disease. In this case, the ESMO-MCBS v1.1 scored only the ITT (score B) results and not the post hoc subgroup findings. More recently, atezolizumab was tested combined with nab-paclitaxel in triple-negative breast cancer in the IMpassion130 trial. The median PFS was improved by 1.7 months in the ITT population and by 2.5 months in patients with programmed death-ligand 1 (PD-L1)-positive tumours compared with nab-paclitaxel alone. There was no difference in OS in the ITT population. The statistical plan incorporated hierarchical testing, which allowed evaluation of OS in the PD-L1-positive subgroup only if there was OS benefit in the ITT population. An exploratory analysis of the PD-L1-positive subgroup found an OS improvement of 10 months. The ESMO-MCBS v1.1 only scored the PFS result of the PD-L1-positive subgroup, since the OS data were derived from an exploratory analysis outside of the statistical plan. Two examples illustrate the importance of the ESMO-MCBS exception for post hoc subgroup findings based on enabling restratification based on plausible new genetic or other biomarkers. The IPASS trial identified the importance of the EGFR mutation status for treatment with gefitinib, and the PRIME, and CRYSTAL studies identified the importance of RAS/RAF status for anti-EGFR therapy in metastatic colorectal cancer. The ESMO-MCBS does not explicitly state that the exception for post hoc subgroup findings based on plausible new genetic or other biomarkers is restricted to findings resulting into a modification in licensed indication.

Informative censoring

In clinical trials, the term ‘censoring’ refers to patients who do not complete the study in full and drop out without further measurements. When dropouts are balanced between the two arms of a comparative superiority study, it is assumed that this does not impact the results. This is called ‘uninformative censoring’. When patients discontinue for reasons related to the study drug, including lack of effect or side-effects, this assumption does not hold, and this is referred to as ‘informative censoring.’

The problem of informative censoring

In studies using the surrogate outcomes of DFS and PFS, patients who stop treatment before documentation of disease progression for reasons other than death are at risk of no longer being evaluated. When censoring is greater in patients receiving the experimental therapy than in the control arm, censoring poorly performing patients may exaggerate the benefit seen in these outcome measures.72, 73, 74 Four approaches to mitigate this bias are described, including (i) encouraging OS rather than surrogates as the primary endpoint, (ii) comparing PFS/DFS gains with time-to-treatment-failure (TTF) differences, which includes discontinuations as failures, (iii) listing the reasons for censoring, and (iv) providing best-case (assuming all censored patients do not have disease progression) and worst-case (assuming all censored patients have progressed) sensitivity analyses.72, 73, 74

Regulatory requirements

The ICH guidelines address this issue, stating that ‘the frequency and type of protocol violations, missing values, and other problems should be documented in the clinical study report and their potential influence on the trial results should be described’. The BOLERO-2 study of exemestane combined with everolimus or placebo in hormone-positive advanced breast cancer reported a 6.5 months benefit in median PFS with HR 0.36 (95% CI, 0.27-0.47) for patients receiving everolimus. This result was reasonably impacted by informative censoring since 19% patients in the everolimus arm discontinued treatment due to adverse effects versus 4% in the placebo arm (since treatment discontinuation due to adverse effects does not count as a PFS event). Reanalysing the study data using TTF which considers progression or discontinuation as well as death, the median gain in TTF was only 1.1 months and the difference in OS, which is based on ITT analysis, was not significant. ESMO-MCBS v1.1 does not evaluate the causes and rates for censoring when evaluating trials with DFS or PFS primary endpoint. The draft revision of the ESMO-MCBS v2, currently undergoing field testing and review, incorporates a 1-point downgrade for PFS studies where there is a difference of ≥10% in prevalence of treatment discontinuations for adverse effects. The ESMO-MCBS does not account for the impact of informative censoring on scores based on DFS.

Discussion

The ESMO-MCBS scores assume valid research methodologies and high-quality trial implementation, and freedom from publication bias. To promote the integrity of the ESMO-MCBS scoring, there is a need to discern valid and biased research. Consequently, new approaches are needed to indicate or penalise studies with deficiencies in their research methodologies, trial implementation, analysis or publication strategy that may contribute to biased outcomes and conclusions. The necessary preconditions for a valid study are outlined in Table 1. The ESMO-MCBS already addresses some of these issues in version 1.1 and its upcoming revisions. The ESMO-MCBS only scores studies with a clinically relevant hypothesis and statistically significant findings consistent with a valid pre-specified statistical plan. When indirect surrogate outcomes are used, the scale incorporates additional precautions and caps to minimise the risk of exaggerated claims of benefit unless surrogacy is verified. Regarding the QoL data, the Working Group is collaborating with partners in the European Organization for Research and Treatment of Cancer (EORTC) to refine new strategies to restrict credits to findings based on robust methodology and adequately complete datasets.

Table 1

The necessary preconditions for a valid study

1. Clinically relevant and appropriate hypothesis (primary outcome, targeted magnitude of benefit, secondary outcomes, type I and II errors)

2. Appropriate study design

3. In comparative studies: an adequate control arm that is consistent with the contemporaneous standard of care at the time of trial initiation

4. Inclusion and exclusion criteria that optimise the balance between generalisability and participant safety

5. Completeness of data collection

6. Valid statistical plan and adherence to that plan

7. When overall survival is either a primary or secondary outcome, post-progression treatment demonstrably consistent with the contemporaneous standard of care

8. Analysis of data that clearly distinguishes between confirmatory findings and conjectural conclusions

The necessary preconditions for a valid study This review has identified seven shortcomings in the ESMO-MCBS approach to potential sources of bias in clinical studies that will need to be addressed in the future development of the scale: The ESMO-MCBS does not independently evaluate the control arm’s validity, nor does it have a mechanism to identify to either indicate or penalise studies with a substandard control arm. This is relevant to all ESMO-MCBS forms evaluating comparative studies. The ESMO-MCBS does not evaluate crossover, its appropriateness, and when appropriate, its adequacy. This is relevant to scores derived from OS data using form 2a. The ESMO-MCBS does not have discriminatory rules to determine the pre-specified NI margin validity. This is relevant to form 2c. The ESMO-MCBS does not indicate or penalise studies in which OS benefit may have been exaggerated by substandard post-progression treatment. This is relevant to scores derived from OS data using form 2a. The ESMO-MCBS exception for post hoc subgroup findings based on enabling restratification based on plausible new genetic or other biomarkers is not explicitly restricted to biomarkers generating a modification in licensed indications. This is relevant to the instructions regarding the use of forms 1 and 2. The ESMO-MCBS does not indicate or penalise trials with differential rates of informative censoring in studies graded based on DFS. This is relevant to form 1. ESMO-MCBS does not address non-publication or delayed publication of QoL data. This is particularly relevant to form 2b. These issues will be addressed in future iterations of the ESMO-MCBS. The ESMO-MCBS Working Group will consider all potential options and would appreciate stakeholder feedback in this process. Options include developing a checklist for evaluating these issues, using annotations to indicate flawed studies, or possibly applying a downgrade to ESMO-MCBS scores. Nevertheless, the appropriate interpretation of the ESMO-MCBS scores requires the critical appraisal of trials to understand these issues in trial design, implementation, and data analysis that may have biased the results and conclusions. The ESMO-MCBS facilitates unbiased evaluation of the magnitude of clinical benefit from cancer medicines, however, like all tools, its utility lies in the hands of the user. The ESMO-MCBS does not obviate the need to think critically about cancer medicine trial designs, and users should consider all these issues when appraising and scoring any clinical trial.

7 in total

Review 1. Clinical Benefit Scales and Trial Design: Some Statistical Issues.

Authors: Edward L Korn; Carmen J Allegra; Boris Freidlin
Journal: J Natl Cancer Inst Date: 2022-09-09 Impact factor: 11.816

Review 2. An appraisal of FDA approvals for adult solid tumours in 2017-2021: has the eagle landed?

Authors: Nathan I Cherny
Journal: Nat Rev Clin Oncol Date: 2022-04-28 Impact factor: 65.011

3. Cancer treatments should benefit patients: a common-sense revolution in oncology.

Authors: Bishal Gyawali; Christopher M Booth
Journal: Nat Med Date: 2022-04 Impact factor: 87.241

4. How aging of the global population is changing oncology.

Authors: Yan Fei Gu; Frank P Lin; Richard J Epstein
Journal: Ecancermedicalscience Date: 2021-12-13

Review 5. Rational development and application of biomarkers in the field of autoimmunity: A conceptual framework guiding clinicians and researchers.

Authors: Mirjam Kolev; Michael P Horn; Nasser Semmo; Michael Nagler
Journal: J Transl Autoimmun Date: 2022-03-06

6. Evaluation of Information Theoretic Network Meta-analysis to Rank First-Line Anticancer Regimens for Hormone Receptor-Positive, ERBB2-Negative Metastatic Breast Cancer.

Authors: Xuanyi Li; Alicia Beeghly-Fadiel; Suresh K Bhavnani; Hossein Tavana; Samuel M Rubinstein; Bishal Gyawali; Irbaz Bin Riaz; H Deepika Fernandes; Jeremy L Warner
Journal: JAMA Netw Open Date: 2022-04-01

7. CDK 4/6 inhibitors for adjuvant therapy in early breast cancer-Do we have a clear winner?

Authors: Amol Akhade; Simon Van Wambeke; Bishal Gyawali
Journal: Ecancermedicalscience Date: 2022-08-30

7 in total