Literature DB >> 33095247

Survival-Inferred Fragility Index of Phase 3 Clinical Trials Evaluating Immune Checkpoint Inhibitors.

David Bomze1,2, Nethanel Asher3, Omar Hasan Ali2,4, Lukas Flatz2,4,5, Daniel Azoulay6, Gal Markel3,7, Tomer Meirson3,8.   

Abstract

Importance: In science and medical research, extreme and dichotomous conclusions may be drawn based on whether the P value falls above or below the threshold. The fragility index (ie, the minimum number of changes from nonevents to events resulting in loss of statistical significance) captures the vulnerability of statistics in trials with binary outcomes. There are a growing number of clinical trials of immune checkpoint inhibitors (ICIs), as well as expanding eligibility for patients to receive them. The robustness of survival outcomes in randomized clinical trials (RCTs) should be evaluated using the fragility index extended to time-to-event data. Objective: To calculate the fragility of survival data in RCTs evaluating ICIs. Design, Setting, and Participants: In this cross-sectional study, data on phase 3 prospective RCTs investigating ICIs included in PubMed from inception until January 1, 2020, were extracted. Two- or three-group studies reporting results for overall survival were eligible for the survival-inferred fragility index (SIFI) calculation, which is the minimum number of reassignments of the best survivors from the interventional group to the control group resulting in loss of significance (defined as P < .05 by log-rank test). For nonsignificant results, a negative SIFI was calculated by reversing the direction of reassignment (from the control group to the interventional group). Main Outcomes and Measures: Survival-inferred fragility index.
Results: A total of 45 phase 3 prospective RCTs (4 of which had 3 groups, for a total of 49 groups) were identified, of which 6 (13%) investigated anti-cytotoxic T-lymphocyte-associated protein 4 (CTLA-4) agents, 25 (56%) investigated anti-programmed cell death 1 (PD-1) agents, 12 (27%) investigated anti-programmed cell death 1 ligand 1 agents, and 3 (7%) investigated the combination of anti-CTLA-4 and anti-PD-1 agents. The median SIFI was 5 (interquartile range, -4 to 12) for the intention-to-treat analysis; for these trials, the SIFI was 1% or less of the total sample size in 17 of 49 populations (35%). In 25 of the 49 intention-to-treat populations (51%), the SIFI was less than the number of censored patients in the intervention group shortly after randomization (defined as <5% of the follow-up time). Conclusions and Relevance: This study suggests that many phase 3 RCTs evaluating ICI therapies have a low SIFI for overall survival, resulting in uncertainty regarding their potential clinical benefit. Although not a definitive solution for the problems arising from dichotomization, SIFI provides an additional means of assessing and communicating the strength of statistical conclusions.

Entities:  

Mesh:

Substances:

Year:  2020        PMID: 33095247      PMCID: PMC7584930          DOI: 10.1001/jamanetworkopen.2020.17675

Source DB:  PubMed          Journal:  JAMA Netw Open        ISSN: 2574-3805


Introduction

Immune checkpoint inhibitors (ICIs) targeting cytotoxic T-lymphocyte–associated protein 4 (CTLA-4) or programmed cell death 1 (PD-1) and programmed cell death 1 ligand 1 (PD-L1) have revolutionized cancer treatment and led to their approval as first-line therapies, either alone or in combination with chemotherapy, for many solid tumors and hematologic malignant neoplasms.[1] However, the clinical benefit associated with ICIs cannot be generalized into a single category, as the therapeutic effectiveness varies widely across different cancer indications.[2,3,4,5,6,7] The number of active clinical trials of ICIs is growing rapidly, along with an increased pace of accelerated approvals by the US Food and Drug Administration (FDA).[8,9] The eligibility criteria for ICI therapy are dynamic, and results of postmarketing studies often lead to label revisions, with more changes expected to follow.[10] Despite the popularity of ICIs and the expanding eligibility for expensive and potentially toxic treatments, the percentage of eligible patients who benefit from ICIs is decreasing.[10,11] This gap between ICI eligibility and clinical benefit is concerning and is not fully understood. Since the introduction of the P value almost a century ago, reliance on a fixed cutoff serving as the gatekeeper for establishing significance in clinical trials has caused controversy.[12,13] Statistically significant differences in outcomes using an arbitrary threshold (P < .05) may not be clinically relevant, especially when the estimated outcome does not offer substantial clinical benefit.[14,15] The fragility of statistical inference can be signified by the ease with which a significant P value (P < .05) crosses over the significance threshold (P > .05).[16,17] Johnson et al[18] introduced a method to compute the fragility for survival analysis by iteratively adding artificial patients to the experimental group with events at the mean exposure time of all individuals until significance is lost. Using this method, one study has recently shown that the fragility index of time-to-event data can be used to estimate the level of confidence of positive results reported in randomized clinical trials (RCTs) leading to FDA approval of anticancer drugs.[19] However, this approach that simulates average “virtual” patients might inflate the fragility estimate as patients at the extreme, who contribute the most to the survival curves, are disregarded. Many possible ways could be formulated to estimate the fragility of survival data. Therefore, we aimed to define a simple and intuitive fragility measure for survival analysis, based on real-life conditions, that captures the vulnerability of the data. Hence, we define the survival-inferred fragility index (SIFI) as the minimum number of reassignments of the best survivors (defined as the patients with the longest follow-up time, regardless of having an event or being censored; the worst survivors were defined as the patients with the earliest events) from the experimental group to the control group resulting in loss of significance (Figure 1). The purpose of this study is to evaluate the fragility of phase 3 RCTs comparing ICIs with control or standard treatments in a time-aware context.
Figure 1.

Example of Survival-Inferred Fragility Index (SIFI) Calculation of Overall Survival

A, Original reconstructed survival curve. B, Second iteration of the survival curve. C, Third iteration of the survival curve. The SIFI in this example is 2, which is the iterative reassignment of the best survivors (designated by circles at the end of the survival curves) from the experimental group to the control group, until positive significance is lost (defined as α = .05 using log-rank test). HR indicates hazard ratio.

Example of Survival-Inferred Fragility Index (SIFI) Calculation of Overall Survival

A, Original reconstructed survival curve. B, Second iteration of the survival curve. C, Third iteration of the survival curve. The SIFI in this example is 2, which is the iterative reassignment of the best survivors (designated by circles at the end of the survival curves) from the experimental group to the control group, until positive significance is lost (defined as α = .05 using log-rank test). HR indicates hazard ratio.

Methods

Study Design

The cross-sectional study followed the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) reporting guideline.[20] We searched PubMed from inception until January 1, 2020, for phase 3 RCTs of ICIs (anti–CTLA-4, anti–PD-1, and anti–PD-L1) compared with standard treatment in solid and hematologic malignant neoplasms. Key words for the literature search included randomised, randomized, phase 3, phase III, ipilimumab, nivolumab, pembrolizumab, cemiplimab, durvalumab, avelumab, and atezolizumab. For the fragility analysis, we included 2- or 3-group studies that reported overall survival as a primary or secondary outcome. We excluded retrospective studies, pooled studies, and post hoc subgroup analyses. When duplicate publications for the same trial were identified, we included the most updated publication. We abstracted information on trial design and the number of enrolled patients in the study. According to institutional review board policy, ethical approval is not required because no human data were included and publicly available information was used.

Data Extraction

Overall survival data from 45 trials were extracted from Kaplan-Meier curves in the main text using DigitizeIt software (DigitizeIt) and the method by Wei and Royston[21] using Stata, version 13.0 (StataCorp). This reverse-engineering strategy enabled us to reproduce survival time and censoring status at the individual patient level with minor differences between reconstructed and published data.[19] We excluded publications of trials with raster images in which data extraction could not be performed directly. We separated the populations into 2 cohorts—the intention-to-treat (ITT) populations, which also included modified ITT populations, and subgroup populations.

Statistical Analysis

The SIFI was calculated from Kaplan-Meier curves by the iterative redesignation of the best survivors from the experimental group to the control group until positive significance (defined as P < .05 obtained with a 2-sided log-rank test) was lost. Negative SIFI was calculated similarly, but the direction was opposite—redesignation of the best survivors from the control group to the experimental group. In addition to the default SIFI application (flipping the best survivor from the intervention group to the control group), we defined 3 alternative approaches: flipping the worst survivor from the experimental group to the control group, cloning the best survivor in the experimental group into the control group, and cloning the worst survivor in the control group to the experimental group. P values were calculated with the 2-sided unstratified log-rank test. The follow-up time distribution was calculated using the prodlim package in R (R Foundation for Statistical Computing). All other analyses were performed in R, version 3.5.0. The code used to calculate SIFI is available online.[22] To provide a reference for the ranges of SIFI for various parameters of survival data, we generated synthetic survival data with the survsim package in R.[23] The “simple.surv.sim” function was used with the Weibull distribution for both the time to event and the time to censoring. The cohort size was set to range from 100 to 1200 individuals in intervals of 100 (with a 1:1 allocation). The ancillary parameter for the events was set to 1.5, and the ancillary parameter for the censoring was set to 2, 4, 6, 8, or 10. The covariate for the effect size was set to all values between −1 and 0.2 in increments of 0.05. The β0 parameter for the event distribution was set to 2.0, and the β0 for the censoring distribution was set to 2.01.

Results

For the period until January 1, 2020, we identified 45 phase 3 RCTs (4 of which had 3 groups, for a total of 49 groups)[2,3,4,5,6,7,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62] evaluating ICI therapies that met the inclusion criteria for survival fragility analysis. All except 2 multiple myeloma trials (4%)[2,47] investigated solid tumors. Six trials (13%) investigated an anti–CTLA-4 agent (ipilimumab),[6,24,25,26,27,28] 25 trials (56%) investigated anti–PD-1 agents (nivolumab and pembrolizumab),[2,3,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51] 12 trials (27%) investigated anti–PD-L1 agents (atezolizumab, avelumab, and durvalumab),[5,7,52,53,54,55,56,57,58,59,60,61] and 3 trials (7%) investigated the combination of anti–CTLA-4 and anti–PD-1 agents (ipilimumab and nivolumab).[4,36,62] We could not calculate the SIFI for 2 trials (CA184-002 and CA184-043)[63,64] because of an incompatible graphical format of the Kaplan-Meier plots. The median sample size for the eligible trials was 559 (interquartile range [IQR], 418-727). The SIFI was calculated for an additional 36 subgroups (eg, PD-L1, ≥1%) in 15 trials with a median sample size of 362 (IQR, 217-486).[4,7,28,31,36,37,41,46,51,52,53,56,57,59,62] Thirty-four of the 49 reconstructed overall survival curves in the ITT population (69%), which includes the modified ITT population, and 26 of the 36 subgroup populations (72%) were significant (P < .05) (Table 1).[2,3,4,5,6,7,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62] The median SIFI for ITT populations was 5 (IQR, –4 to 12) (ie, a median of 5 patients [among best survivors] reassigned to the control group was required to shift the results from significant to nonsignificant). The median SIFI for subgroup populations was 3.5 (IQR, 1-6.3) (eTable in the Supplement). In comparison, the fragility estimate for survival data by Johnson et al[18] is unable to estimate fragility for nonsignificant results (negative fragility) and depicts higher values, with a median of 29 (IQR, 0-51) for the ITT populations and 29 (IQR, 0-43) for the subgroup populations. The absolute SIFI was less than 1% of the sample size in 17 (35%) of the 49 ITT populations and 10 (28%) of the 36 subgroup populations. Furthermore, in 25 (51%) of the 49 ITT populations and 16 (44%) of the 36 subgroup populations, the SIFI was less than the number of patients censored in the interventional group during only the first ventile (1/20th) of the follow-up time (eFigure 1 in the Supplement).
Table 1.

SIFI of Overall Survival Calculated for 45 Phase 3 Trials Evaluating Immune Checkpoint Inhibitors in the Intention-to-Treat Populations

InterventionControlTumor typeClinical trialYearSample sizeHRP valueaSIFIb
Anti–CTLA-4
Ipilimumab + dacarbazineDacarbazineMelanomaCA184-024[24]20155020.72.0017
IpilimumabPlaceboMelanomaCA184-029[25]20169510.72.00220
IpilimumabPlaceboPCCA184-095[6]20176021.11.47−21
Ipilimumab + etoposide + platinumEtoposide + platinumSCLCCA184-156[26]c20169540.94.44−7
Ipilimumab + paclitaxel + carboplatinPaclitaxel + carboplatinSquamous NSCLCCA184-104[27]c20177490.91.17−3
Ipilimumab, 10 mg/kgIpilimumab, 3 mg/kgMelanomaCA184-169[28]20177270.84.041
Anti–PD-1
NivolumabDocetaxelSquamous NSCLCCheckMate 017[29]20152720.59.00038
NivolumabDocetaxelNonsquamous NSCLCCheckMate 057[30]20155820.73.0045
NivolumabEverolimusRCCCheckMate 025[31]20198210.74.00310
NivolumabPlatinumNSCLCCheckMate 026[32]d20174231.02.77−16
NivolumabPlaceboGC or GEJCATTRACTION-2[33]20174930.63<.000110
NivolumabDacarbazine or carboplatin + paclitaxelMelanomaCheckMate 037[34]20184050.95.59−9
NivolumabDacarbazineMelanomaCheckMate 066[35]20194180.46<.000130
NivolumabIpilimumabMelanomaCheckMate 067[36]20186310.65<.000123
NivolumabMethotrexate, docetaxel, or cetuximabHNSCCCheckMate 141[37]20183610.68.0015
NivolumabPaclitaxel or docetaxelESCCATTRACTION-3[38]20194190.77.0152
NivolumabDocetaxelNSCLCCheckMate 078[39]20195040.68.0035
PembrolizumabPlatinumNSCLCKEYNOTE-024[40]e20163050.60.016
Pembrolizumab
Every 2 wkIpilimumabMelanomaKEYNOTE-006[3]20175570.68.00115
Every 3 wkIpilimumabMelanomaKEYNOTE-006[3]20175550.68.00114
PembrolizumabMethotrexate, docetaxel, or cetuximabHNSCCKEYNOTE-040[41]20184950.8.023
PembrolizumabPaclitaxelGC or GEJCKEYNOTE-061[42]20183950.82.06−1
Pembrolizumab + pemetrexed + platinumPemetrexed + platinumNonsquamous NSCLCKEYNOTE-189[43]20186160.49<.000140
Pembrolizumab + carboplatin + paclitaxel or nab-paclitaxelCarboplatin + paclitaxel or nab-paclitaxelSquamous NSCLCKEYNOTE-407[44]20185590.64.00210
PembrolizumabPaclitaxel, docetaxel, or vinflunineUCKEYNOTE-045[45]20195420.70.00059
PembrolizumabCetuximab + platinum + fluorouracilHNSCCKEYNOTE-048[46]20196010.83.022
Pembrolizumab + platinum + fluorouracilCetuximab + platinum + fluorouracilHNSCCKEYNOTE-048[46]20195590.77.0055
Pembrolizumab + pomalidomide + dexamethasonePomalidomide + dexamethasoneMMKEYNOTE-183[47]20192491.61.14−47
Pembrolizumab + lenalidomide + dexamethasoneLenalidomide + dexamethasoneMMKEYNOTE-185[2]20193012.06.06−79
PembrolizumabPlaceboHCCKEYNOTE-240[48]20204130.78.041
Pembrolizumab + axitinibSunitinibRCCKEYNOTE-426[49]20198610.53.000340
Pembrolizumab epacadostatPembrolizumabMelanomaKEYNOTE-252[50]20197061.13.44−43
PembrolizumabPlatinumNSCLCKEYNOTE-042[51]d201912740.81.00212
Anti–PD-L1
AtezolizumabPaclitaxel, docetaxel, or vinflunineUCIMvigor211[52]20189310.85.023
AtezolizumabDocetaxelNSCLCOAK[53]f20188500.85.000312
Atezolizumab + bevacizumab + carboplatin + paclitaxelBevacizumab + carboplatin + paclitaxelNonsquamous NSCLCIMpower150[54]g20186960.78.024
Atezolizumab + carboplatin + etoposideCarboplatin + etoposideSCLCIMpower133[55]20184030.70.013
Atezolizumab + carboplatin + nab-paclitaxelCarboplatin + nab-paclitaxelNonsquamous NSCLCIMpower130[56]20197230.79.032
Atezolizumab + bevacizumabSunitinibRCCIMmotion151[57]20199150.93.71−23
Atezolizumab + nab-paclitaxelNab-paclitaxelBRCAIMpassion130[7]20208460.86.13−4
AtezolizumabRegorafenibCRCIMblaze370[5]20191801.19.35−12
Atezolizumab + cobimetinibRegorafenibCRCIMblaze370[5]20192731.0.8−9
AvelumabPaclitaxel or irinotecanGC or GEJCJAVELIN Gastric 300[58]20183711.1.47−13
AvelumabDocetaxelNSCLCJAVELIN Lung 200[59]d20185290.9.36−6
DurvalumabPlaceboNSCLCPACIFIC[60]20187130.68.00115
Durvalumab + platinum + etoposidePlatinum + etoposideSCLCCASPIAN[61]20195370.73.0036
Anti–PD-1 + anti–CTLA-4
Ipilimumab + nivolumabIpilimumabMelanomaCheckMate 067[36]20186290.54<.000138
Ipilimumab + nivolumabSunitinibRCCCheckMate 214[4]201910960.71.00318
Ipilimumab + nivolumabPlatinum doubletNSCLCCheckMate 227[62]201911660.73<.000124

Abbreviations: BRCA, breast cancer; CRC, colorectal cancer; CTLA-4, cytotoxic T-lymphocyte–associated protein 4; ESCC, esophageal squamous cell carcinoma; GC, gastric cancer; GEJC, gastroesophageal junction cancer; HCC, hepatocellular carcinoma; HNSCC, head and neck squamous cell carcinoma; HR, hazard ratio; MM, multiple myeloma; NSCLC, non–small cell lung carcinoma; PC, prostate cancer; PD-1, programmed cell death 1; PD-L1, programmed cell death 1 ligand 1; RCC, renal cell carcinoma; SCLC, small cell lung carcinoma; SIFI, survival-inferred fragility index; UC, urothelial carcinoma.

Calculated using 2-sided unstratified log-rank test.

Survival-inferred fragility index associated with the calculated P value (α = .05).

Modified intention-to-treat populations.

PD-L1 ≥ 1%.

PD-L1 ≥ 50%.

Intention-to-treat populations (n = 850).

EGFR or ALK wild-type.

Abbreviations: BRCA, breast cancer; CRC, colorectal cancer; CTLA-4, cytotoxic T-lymphocyte–associated protein 4; ESCC, esophageal squamous cell carcinoma; GC, gastric cancer; GEJC, gastroesophageal junction cancer; HCC, hepatocellular carcinoma; HNSCC, head and neck squamous cell carcinoma; HR, hazard ratio; MM, multiple myeloma; NSCLC, non–small cell lung carcinoma; PC, prostate cancer; PD-1, programmed cell death 1; PD-L1, programmed cell death 1 ligand 1; RCC, renal cell carcinoma; SCLC, small cell lung carcinoma; SIFI, survival-inferred fragility index; UC, urothelial carcinoma. Calculated using 2-sided unstratified log-rank test. Survival-inferred fragility index associated with the calculated P value (α = .05). Modified intention-to-treat populations. PD-L1 ≥ 1%. PD-L1 ≥ 50%. Intention-to-treat populations (n = 850). EGFR or ALK wild-type. A comparison between positive SIFI levels in different tumor types among ITT populations (Figure 2) showed that non–small cell lung carcinoma, renal cell carcinoma, and melanoma had the highest values and that hepatocellular carcinoma, head and neck squamous cell carcinoma, and small cell lung carcinoma had the lowest values. Examining the association between SIFI and P values (in logarithmic scale) revealed a high correlation in ITT populations (R = 0.70; P < 1 × 10−7) and subgroup populations (R = 0.82; P < 1 × 10−9). However, the level of SIFI was not explained entirely by the variation in P values. For example, despite having relatively similar P values, hazard ratios, and sample sizes, the SIFI was 2-fold higher in KEYNOTE-024[40] compared with IMpower133,[55] and in ATTRACTION-2[33] compared with CheckMate 067[36] monotherapy (Table 1,[2,3,4,5,6,7,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62] Figure 3), indicating higher robustness. These examples demonstrate that statistical significance depends on the distribution of the longest-surviving patients, with more fragile studies relying on fewer patients to drive the significance, compared with less fragile studies that are associated with a higher “reserve” of patients. Similar associations between SIFI as a proportion of the population and P values are shown in eFigure 2 in the Supplement. To explore the potential association of longer follow-up periods with the SIFI, we identified trials that published overall survival results for earlier follow-up periods. We found that the SIFI is stable and displays only a small variation for trials at different follow-up periods (Table 2),[3,4,24,36,37,45,66,67,68,69,70] including studies with median follow-up time more than twice as long as in the original publication. Furthermore, we explored the operating characteristics of the SIFI, including sample size, censoring rate, and effect size (eFigures 3-5 in the Supplement). Performing simulations using combinations of the parameters resulted in 15 000 synthetic time-to-event data sets. Hazard ratios ranged from 0.13 to 1.95, and the percentage of individuals censored ranged from 17.5% to 50%. The simulated results provide a reference for the ranges of the SIFI for the various parameters of survival data.
Figure 2.

Survival-Inferred Fragility Index (SIFI) of Overall Survival in Phase 3 Randomized Clinical Trials

Comparison between SIFI levels in different tumor types among the intention-to-treat populations. Trials were grouped and colored by tumor type and sorted by descending order. CRC indicates colorectal cancer; HCC, hepatocellular carcinoma; HNSCC, head and neck squamous cell carcinoma; MM, multiple myeloma; NSCLC, non–small cell lung carcinoma; RCC, renal cell carcinoma; and SCLC, small cell lung carcinoma.

Figure 3.

Survival-Inferred Fragility Index (SIFI) of Overall Survival in Phase 3 Randomized Clinical Trials

A, Correlation between SIFI and P values in a logarithmic scale for the intention-to-treat (ITT) populations. B, Correlation between SIFI and P values in a logarithmic scale for the subgroup populations. Color bars indicate hazard ratios and circle size represents the sample size. Correlation was calculated using Pearson correlation coefficient. Horizontal lines denoting .05 and .001 P value thresholds are shown.

Table 2.

Comparison of SIFI of Overall Survival Calculated for Trials in Different Follow-up Periods

StudyTumorSample sizePublication yearFollow-up, median, moNo. of eventsSIFIa
CA184-024Melanoma5022011[65]39.04095
2015[24]63.14277
CheckMate-025RCC8212015[66]22.23978
2019[4]45.656710
CheckMate-067
Combination therapyMelanoma6292017[67]40.234637
2018[36]51.536438
MonotherapyMelanoma6312017[67]40.236423
2018[36]51.538423
CheckMate-141HNSCC3612018[68]17.82884
2018[37]30.63215
KEYNOTE-006
Every 2 wkMelanoma5572015[69]13.319618
2017[3]22.526215
Every 3 wkMelanoma5552015[69]13.320312
2017[3]22.526014
KEYNOTE-045UC5422017[70]13.63338
2019[45]28.24239

Abbreviations: HNSCC, head and neck squamous cell carcinoma; RCC, renal cell carcinoma; SIFI, survival-inferred fragility index; UC, urothelial carcinoma.

SIFI associated with the calculated P value (α = .05).

Survival-Inferred Fragility Index (SIFI) of Overall Survival in Phase 3 Randomized Clinical Trials

Comparison between SIFI levels in different tumor types among the intention-to-treat populations. Trials were grouped and colored by tumor type and sorted by descending order. CRC indicates colorectal cancer; HCC, hepatocellular carcinoma; HNSCC, head and neck squamous cell carcinoma; MM, multiple myeloma; NSCLC, non–small cell lung carcinoma; RCC, renal cell carcinoma; and SCLC, small cell lung carcinoma. A, Correlation between SIFI and P values in a logarithmic scale for the intention-to-treat (ITT) populations. B, Correlation between SIFI and P values in a logarithmic scale for the subgroup populations. Color bars indicate hazard ratios and circle size represents the sample size. Correlation was calculated using Pearson correlation coefficient. Horizontal lines denoting .05 and .001 P value thresholds are shown. Abbreviations: HNSCC, head and neck squamous cell carcinoma; RCC, renal cell carcinoma; SIFI, survival-inferred fragility index; UC, urothelial carcinoma. SIFI associated with the calculated P value (α = .05). The fragility for survival data can be calculated in various ways. Overall, we calculated 4 versions of SIFI, which include reassigning patients (flip) or adding patients (clone) to the opposite group using the best survivors from the experimental group or worst survivors from the control group. A comparison of the different SIFI approaches is shown for the ITT populations in eFigure 6 in the Supplement. Compared with the default SIFI (flipping the best survivors to the opposite group) with a magnitude of 9 (IQR, 5-18) for ITT populations, the 3 alternative versions are associated with higher values in most studies. The SIFI magnitudes are 11 (IQR, 8-18) for flipping the worst survivors to the opposite group, 17.5 (IQR, 7-38.3) for cloning the best survivors to the opposite group, and 24 (IQR, 16-35) for cloning the worst survivors to the opposite group. These findings suggest that the SIFI using the version that flips the best survivors to the opposite group is the most sensitive approach for detecting the minimum changes required to overturn the conclusions.

Discussion

In our study, we found that the statistical significance of a substantial amount of phase 3 trials of ICIs could be lost or gained with a change in assignment of very few of the best surviving patients, often less than 1% of the respective trial sample size. Although this is an arbitrary number and does not reflect a random sampling of the patients, it represents a small fraction of the population that can overturn the statistical conclusions. Also, the change in the number of patients required for fragility is often smaller than the number of patients censored in the experimental group shortly after randomization, adding further uncertainties and raising concerns about the statistical outcomes had these and other patients been assessed to their end point. Eligibility for treatment with ICIs is assessed by concluding whether results of a trial are positive or negative. Our findings demonstrate how unstable these conclusions may be, and explain, in part, the widening gap between eligibility and benefit associated with ICIs. The original fragility index has been applied to RCTs in oncology and other areas of medicine.[17,19,71,72,73,74] However, the original fragility index is based on binary outcomes and the Fisher exact test, which could be misleading for time-to-event data, in which the primary interest is the timing of events.[19] Although descriptions of time-to-event fragility exist,[18,19] to our knowledge, no previous peer-reviewed original investigations have estimated time-aware fragility index for clinical trials, including oncology trials. Also, to our knowledge, no study has evaluated negative fragility measures for survival analysis. In general, the P value serves as a measure of the compatibility of collected data with a defined statistical model. In a testing framework, smaller P values indicate greater evidence against the null hypothesis—a conjecture of no difference between outcomes of the intervention and control groups.[75] Undoubtedly, the P value plays a central role in the clinical testing of new drugs, and since the 1960s, the FDA has relied on significance testing to establish their effectiveness in the approval process.[76] As such, nowhere is this role more important than in clinical trials, where the smallest change in the P value can decisively influence the drug approval process and result in trial success or failure. Consequently, passing the statistical significance threshold has become the ultimate goal, and unless an analysis is adequately prespecified, most research designs allow enough leeway to manipulate the results to claim importance.[77,78,79,80] Therefore, reliance on P values falling to either side of the significance threshold can result in extreme conclusions and be misleading, especially for a low threshold such as P < .05. Recently, an influential commentary published in Nature[12] has even called for the abandonment of the conventional threshold for statistical significance, regardless of the level (eg, P < .05), owing to this imposed dichotomization. However, statistical inferences are unavoidably dichotomous in many scientific fields. Most decisions in medicine are dichotomous, such as a new drug will either be approved or not, and will either be prescribed or not.[77] This study introduces the SIFI as a novel measure that enables us to estimate the vulnerability of the statistical conclusions of clinical trials with time-to-event outcomes. This index transforms the dichotomous conclusion to a discrete variable that provides more perspective regarding the potential benefit associated with ICIs or any other intervention. The SIFI provides context to the P value and statistical significance, which may not necessarily be intuitive and are often poorly understood.[77] Therefore, the SIFI translates uncertainty to a specified number that represents actual patients and events and places it on a linear scale that allows for assessment of the robustness of the results. For example, consider 2 comparable studies with similar P values. Although the SIFI is not a measure of effect, a trial with a high SIFI with an acceptable association with the sample size and censoring provides more robustness than a trial with a small SIFI representing a small fraction of the sample size and censoring. The latter relies on fragile evidence with higher uncertainty regarding the incompatibility with the null hypothesis. We did not define criteria for fragile vs nonfragile values, nor do we believe that a measure aimed to address the dichotomization of results by a threshold should be replaced by another. Perhaps trials involving the addition of a costly and a toxic drug to the standard treatment with a small effect size would require a higher level of robustness than trials comparing 2 drugs with similar overall properties. In contrast, concluding that statistically significant results show no real association when the fragility measure is very low is discouraged; it is equally inaccurate to claim that nonsignificant results with very small negative fragility point to an important signal. However, the SIFI allows for putting these 2 scenarios in context, expressing uncertainty and suggesting that the interpretation of their importance should be similar or, de facto, the same. In both cases, and especially for negative fragility measures, small values indicate that the true underlying effects either are negligible or lack statistical power. Nevertheless, considerations such as study design, data quality, comprehension of the underlying mechanisms, and other factors may often have more importance than statistical findings[12] such as P values or fragility indices. The default solution for improving the confidence level would be making the barrier more demanding; however, this is a suboptimal option because the chance for false-negative results increases accordingly, and it still fails to address the vulnerability of the statistics. Nevertheless, fragility corresponding to one threshold is not comparable with another, and it is reasonable to expect lower fragility measures for lower P value thresholds, as they are interrelated. Hence, the approach encourages using lower significance thresholds. A trial not meeting a low prespecified significance threshold (eg, P < .0001), with a small negative SIFI (eg, −2), may provide higher confidence in the validity of the results compared with a trial that meets a higher threshold (eg, P < .05) but has a low positive SIFI (eg, 2). The SIFI relative to sample size can be useful to estimate the robustness of the results, but it could be misleading for small sample sizes. Although SIFI less than 1% in many RCTs could suggest extreme fragility, small trials with less than 100 patients cannot achieve a SIFI of less than 1%, even when the results are certainly less robust. Therefore, the SIFI relative to sample size, especially for small trials, should not be interpreted alone and must be accompanied by the SIFI.

Limitations

Several limitations of the study should be recognized. We did not address prespecified P value thresholds, which were allocated and controlled differently in every trial and are often much lower than .05. Instead, we used the standard α level of .05 as a common reference; therefore, some trials did not meet the prespecified threshold but resulted in a positive SIFI. Although not a strict rule by the FDA, the standard 2-trial α level is .05 but is smaller for approval based on a single trial.[76] The analysis of overall survival was based on an unstratified log-rank test at a 2-sided significance level as a uniform statistical test for all trials; however, studies have analyzed the data differently (eg, stratified or weighted log-rank test). Therefore, small differences exist between the published P value and the calculated P value. Furthermore, we found a small discrepancy in the numbers of patients at risk published in the original publications and the reproduced curves. For 19 of the 49 populations in the trials (39%), there was no discrepancy between the published and estimated number at risk at any time point. In the time points for which discrepancy existed, we found the difference to be small, with a median of 1 patient (IQR, 1-2). The SIFI can be calculated in various ways. Our comparison of different implementations of the SIFI demonstrates that reassigning or adding the best survivors to the opposite group provides lower fragility estimates compared with the worst survivors, for most trials. This finding indicates that the longest-surviving patients can tilt the balance between the groups more strongly compared with the shortest-surviving patients. The association of the longest survivors with the survival curves is potentially unlimited, as they are constrained only by the follow-up time, whereas the shortest-surviving patients cannot have an event before time zero. By both removing a long-time survivor from one group and adding them to the other group, the total number of patients required to pass the significance threshold is reduced compared with other techniques. This approach coincides with the essence of fragility—identifying the minimum required changes to overturn the conclusions. Furthermore, we aimed to define a simple and intuitive method that can be recreated using existing routines, is quantifiable in all conditions, and is applicable to real-world practice in which patients are randomly assigned from a pool of eligible patients. Although random variations alone can lead to large disparities in P values, the calculation of the SIFI is not based on random variations in the assignment of patients but on the reassignment of patients at the extreme ends of the scale. However, the random allocation of patients can lead to different proportions of the best (or worst) survivors in the groups, which may impact the outcomes. Therefore, the SIFI serves as a simple and conservative approach to reflect the fragility of the statistics. Alternatively, the mean or median survival time can be exploited in different ways to quantify the fragility[18,19]; however, this approach can underestimate the fragility if the few patients who cause most of the difference are not captured.

Conclusions

The results of this study suggest that many phase 3 RCTs evaluating ICI therapies are fragile and challenge the confidence in rejecting or concluding superiority for these drugs compared with standard treatments. Low fragility levels express uncertainty when there is no appreciable difference between the interpretative significance of data. In contrast, high fragility levels can provide robustness and aid in binary decision-making, especially for treatments associated with high cost and toxic effects that require strong support. Interpretation of any outcome is far more complicated than just significance testing, and the SIFI as a statistical and communication tool may serve as a better starting point for discerning between science and fiction.
  76 in total

1.  Scientists rise up against statistical significance.

Authors:  Valentin Amrhein; Sander Greenland; Blake McShane
Journal:  Nature       Date:  2019-03       Impact factor: 49.962

2.  Survival Outcomes in Patients With Previously Untreated BRAF Wild-Type Advanced Melanoma Treated With Nivolumab Therapy: Three-Year Follow-up of a Randomized Phase 3 Trial.

Authors:  Paolo A Ascierto; Georgina V Long; Caroline Robert; Benjamin Brady; Caroline Dutriaux; Anna Maria Di Giacomo; Laurent Mortier; Jessica C Hassel; Piotr Rutkowski; Catriona McNeil; Ewa Kalinka-Warzocha; Kerry J Savage; Micaela M Hernberg; Celeste Lebbé; Julie Charles; Catalin Mihalcioiu; Vanna Chiarion-Sileni; Cornelia Mauch; Francesco Cognetti; Lars Ny; Ana Arance; Inge Marie Svane; Dirk Schadendorf; Helen Gogas; Abdel Saci; Joel Jiang; Jasmine Rizzo; Victoria Atkinson
Journal:  JAMA Oncol       Date:  2019-02-01       Impact factor: 31.777

3.  A critique of the fragility index.

Authors:  David Bomze; Tomer Meirson
Journal:  Lancet Oncol       Date:  2019-09-30       Impact factor: 41.316

4.  Atezolizumab plus bevacizumab versus sunitinib in patients with previously untreated metastatic renal cell carcinoma (IMmotion151): a multicentre, open-label, phase 3, randomised controlled trial.

Authors:  Brian I Rini; Thomas Powles; Michael B Atkins; Bernard Escudier; David F McDermott; Cristina Suarez; Sergio Bracarda; Walter M Stadler; Frede Donskov; Jae Lyun Lee; Robert Hawkins; Alain Ravaud; Boris Alekseev; Michael Staehler; Motohide Uemura; Ugo De Giorgi; Begoña Mellado; Camillo Porta; Bohuslav Melichar; Howard Gurney; Jens Bedke; Toni K Choueiri; Francis Parnis; Tarik Khaznadar; Alpa Thobhani; Shi Li; Elisabeth Piault-Louis; Gretchen Frantz; Mahrukh Huseni; Christina Schiff; Marjorie C Green; Robert J Motzer
Journal:  Lancet       Date:  2019-05-09       Impact factor: 79.321

5.  First-Line Atezolizumab plus Chemotherapy in Extensive-Stage Small-Cell Lung Cancer.

Authors:  Leora Horn; Aaron S Mansfield; Aleksandra Szczęsna; Libor Havel; Maciej Krzakowski; Maximilian J Hochmair; Florian Huemer; György Losonczy; Melissa L Johnson; Makoto Nishio; Martin Reck; Tony Mok; Sivuonthanh Lam; David S Shames; Juan Liu; Beiying Ding; Ariel Lopez-Chavez; Fairooz Kabbinavar; Wei Lin; Alan Sandler; Stephen V Liu
Journal:  N Engl J Med       Date:  2018-09-25       Impact factor: 91.245

6.  Nivolumab versus Everolimus in Advanced Renal-Cell Carcinoma.

Authors:  Robert J Motzer; Bernard Escudier; David F McDermott; Saby George; Hans J Hammers; Sandhya Srinivas; Scott S Tykodi; Jeffrey A Sosman; Giuseppe Procopio; Elizabeth R Plimack; Daniel Castellano; Toni K Choueiri; Howard Gurney; Frede Donskov; Petri Bono; John Wagstaff; Thomas C Gauler; Takeshi Ueda; Yoshihiko Tomita; Fabio A Schutz; Christian Kollmannsberger; James Larkin; Alain Ravaud; Jason S Simon; Li-An Xu; Ian M Waxman; Padmanee Sharma
Journal:  N Engl J Med       Date:  2015-09-25       Impact factor: 91.245

7.  Atezolizumab versus chemotherapy in patients with platinum-treated locally advanced or metastatic urothelial carcinoma (IMvigor211): a multicentre, open-label, phase 3 randomised controlled trial.

Authors:  Thomas Powles; Ignacio Durán; Michiel S van der Heijden; Yohann Loriot; Nicholas J Vogelzang; Ugo De Giorgi; Stéphane Oudard; Margitta M Retz; Daniel Castellano; Aristotelis Bamias; Aude Fléchon; Gwenaëlle Gravis; Syed Hussain; Toshimi Takano; Ning Leng; Edward E Kadel; Romain Banchereau; Priti S Hegde; Sanjeev Mariathasan; Na Cui; Xiaodong Shen; Christina L Derleth; Marjorie C Green; Alain Ravaud
Journal:  Lancet       Date:  2017-12-18       Impact factor: 79.321

8.  Overall Survival with Combined Nivolumab and Ipilimumab in Advanced Melanoma.

Authors:  Jedd D Wolchok; Vanna Chiarion-Sileni; Rene Gonzalez; Piotr Rutkowski; Jean-Jacques Grob; C Lance Cowey; Christopher D Lao; John Wagstaff; Dirk Schadendorf; Pier F Ferrucci; Michael Smylie; Reinhard Dummer; Andrew Hill; David Hogg; John Haanen; Matteo S Carlino; Oliver Bechter; Michele Maio; Ivan Marquez-Rodas; Massimo Guidoboni; Grant McArthur; Celeste Lebbé; Paolo A Ascierto; Georgina V Long; Jonathan Cebon; Jeffrey Sosman; Michael A Postow; Margaret K Callahan; Dana Walker; Linda Rollin; Rafia Bhore; F Stephen Hodi; James Larkin
Journal:  N Engl J Med       Date:  2017-09-11       Impact factor: 91.245

9.  Avelumab versus docetaxel in patients with platinum-treated advanced non-small-cell lung cancer (JAVELIN Lung 200): an open-label, randomised, phase 3 study.

Authors:  Fabrice Barlesi; Johan Vansteenkiste; David Spigel; Hidenobu Ishii; Marina Garassino; Filippo de Marinis; Mustafa Özgüroğlu; Aleksandra Szczesna; Andreas Polychronis; Ruchan Uslu; Maciej Krzakowski; Jong-Seok Lee; Luana Calabrò; Osvaldo Arén Frontera; Barbara Ellers-Lenz; Marcis Bajars; Mary Ruisi; Keunchil Park
Journal:  Lancet Oncol       Date:  2018-09-24       Impact factor: 41.316

10.  Atezolizumab for First-Line Treatment of Metastatic Nonsquamous NSCLC.

Authors:  Mark A Socinski; Robert M Jotte; Federico Cappuzzo; Francisco Orlandi; Daniil Stroyakovskiy; Naoyuki Nogami; Delvys Rodríguez-Abreu; Denis Moro-Sibilot; Christian A Thomas; Fabrice Barlesi; Gene Finley; Claudia Kelsch; Anthony Lee; Shelley Coleman; Yu Deng; Yijing Shen; Marcin Kowanetz; Ariel Lopez-Chavez; Alan Sandler; Martin Reck
Journal:  N Engl J Med       Date:  2018-06-04       Impact factor: 91.245

View more
  4 in total

1.  Fragility indices for only sufficiently likely modifications.

Authors:  Benjamin R Baer; Mario Gaudino; Mary Charlson; Stephen E Fremes; Martin T Wells
Journal:  Proc Natl Acad Sci U S A       Date:  2021-12-07       Impact factor: 12.779

2.  Is There Already a Need of Reckoning on Cancer Immunotherapy?

Authors:  Pierpaolo Correale; Francesca Pentimalli; Giovanni Baglio; Marjia Krstic-Demonacos; Rita Emilena Saladino; Antonio Giordano; Luciano Mutti
Journal:  Front Pharmacol       Date:  2021-03-26       Impact factor: 5.810

3.  Comparison of 3 Randomized Clinical Trials of Frontline Therapies for Malignant Pleural Mesothelioma.

Authors:  Tomer Meirson; Francesca Pentimalli; Francesco Cerza; Giovanni Baglio; Steven G Gray; Pierpaolo Correale; Marija Krstic-Demonacos; Gal Markel; Antonio Giordano; David Bomze; Luciano Mutti
Journal:  JAMA Netw Open       Date:  2022-03-01

4.  The Fragility Index for Assessing the Robustness of the Statistically Significant Results of Experimental Clinical Studies.

Authors:  Adrienne K Ho
Journal:  J Gen Intern Med       Date:  2021-08-06       Impact factor: 5.128

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.