Rohit Agrawal1, Muhammad Majeed1, Bashar M Attar2, Yazan Abu Omar1, Mbachi Chimezi1, Palak Patel3, Shaheera Kamal4, Melchor Demetria2, Seema Gandhi2. 1. Department of Medicine, Cook County Health and Hospital System, IL, USA (Rohit Agrawal, Muhammad Majeed, Yazan Abu Omar, Mbachi, Chimezi). 2. Division of Gastroenterology and Hepatology, Department of Medicine, Cook County Health and Hospital System, County, Chicago, IL, USA (Melchor Demetria, Seema Gandhi). 3. College of Medicine, Midwestern University, IL, USA (Palak Patel). 4. Jinnah Sindh Medical University, Rafiqi H J Rd, Karachi Cantonment Karachi, Pakistan (Shaheera Kamal).
Cirrhosis is the leading cause of ascites in the United States, accounting for around 85% of cases [1]. Ascites is one of the 3 major complications encountered in patients with cirrhosis and has implications such as increased hospital admissions and increased cost of care [2,3]. In addition, it is associated with significant morbidity and approximately 40% mortality within 5 years [3]. Therefore, there is a significant need for quality guidelines to assist physicians in managing ascites.In 2012, the American Association for the Study of Liver Diseases (AASLD) published the “Management of adult patients with ascites caused by cirrhosis”, which included 49 recommendation statements based on multiple randomized trials, non-randomized trials, meta-analysis, observational studies and consensus opinions from a panel of experts [4]. Recommendations are considered strongest when based on data from randomized control trials (RCTs) and their subsequent meta-analysis.Data are considered significant when the P-value is ≤0.05. However, P-values have been criticized for being extremely simplistic and not efficient in expressing the true significance of data [5-9]. The fragility index (FI) estimates the minimum number of events that would have to change to modify the results of a particular study from significant to nonsignificant [10]. The FI can thus help to assess the robustness of the trial results that form the foundation for these guideline recommendations. The use of the FI was first shown to be of value in a paper published in 2014, which reported a median FI of 8 for approximately 400 RTCs, and since then it has gained traction as a novel metric in several specialties, such as critical care medicine, pediatrics, urology, and spinal surgery [10-14]. In fact, an article in Chest journal examined the strength of the trial outcomes used to create the 2018 CHEST guideline and expert panel report on antithrombotic therapy for venous thromboembolism disease [15]. Likewise, in an article published in JAMA Surgery, Tignanelli et al proposed the routine use of FI for trauma RCTs to assist physicians in making better decisions about trauma patients [16]. It has been proposed that the fragility quotient (FQ), which is FI divided by the total sample size, should be reported along with FI, so as to convert an absolute measure (FI) into a relative one (FQ) for better understanding of the fragility of trials [17].FI and FQ are yet to be utilized in the field of gastroenterology. The aim of this study was to determine the FI and FQ of the trial outcomes used to create the recommendations for the management of ascites in cirrhotic patients outlined in the AASLD guidelines, in order to gauge the strength of these recommendations.
Materials and methods
Eligibility criteria
We screened all RCTs referenced in the 2012 AASLD guidelines on the management of adult patients with ascites due to cirrhosis. Randomized trials using a 1:1 allocation ratio, parallel 2-group designs, and those with at least one dichotomous outcome qualified for inclusion. Only statistically significant outcomes were further included in the final analysis.
Data collection
The review of the guidelines was followed by a MEDLINE and PubMed data base search to acquire the abstracts and full-texts of eligible trials. Two independent investigators (RA and MM) screened the abstracts and full texts to identify eligible RCTs and extracted data using a pre-specified piloted data collection electronic form. In case of any discrepancies, a third investigator (YA) was referred to for consensus.The variables extracted from each RCT were the outcomes reported, sample size, sample size of each group, the event rates of outcomes in each group, P-value, number lost to follow up, year of publication, number of centers and the trial’s Science Citation Index (SCI). SCI is a tool for identifying a researcher’s publications and the number of times his paper has been cited by other authors [18]. When available, additional data on the type of blinding (unblinded, single-blind or double-blind), RCT type (placebo-controlled or active comparator), type of intervention (pharmaceutical, surgical or other), type of funding (government or private), and whether the intention-to-treat principle was employed or not were also collected. Primary outcomes were prioritized for the analysis; however, when these not specified or statistically insignificant, secondary or any significant dichotomous outcomes were included. For each trial, the intervention effect was calculated and expressed as the number needed to treat (NNT).
FI and FQ
FI represents the minimum number of patients whose status needs to be changed from a “nonevent” to an “event” to make the P-value nonsignificant (i.e., exceed 0.05). The lower the FI, the more fragile the trial data from an RCT [10]. The FQ, on the other hand, is a relative measure of fragility calculated by FI divided by the sample size of a given trial [17]. These can, however, only be applied to RCTs with dichotomous outcomes.
Statistical analysis
All statistical analyses were performed using SPSS version 23.0 (IBM, Chicago, IL, USA; 2012). Data from each trial were depicted in a 2-by-2 contingency table (Table 1) and the FI was calculated demonstrated by Walsh et al [10]. Events were added to the smaller event group and non-events were simultaneously subtracted while maintaining a constant patient population. Fisher’s exact test was then used to re-calculate the 2-sided P-value while iteratively adding events until the first time the P-value exceeded 0.05. The number of additional events required to reach a P-value >0.05 was considered as that trial’s FI. We determined the median FI amongst the identified trials of recorded events. We analyzed the correlation between FI and sample size, P-value, number lost to follow up, NNT, year of publication, and SCI, expressed as a Spearman correlation coefficient (rs). Two trials, one with a high sample size and the other with a high FI, were excluded from the correlation analysis, as they would have skewed the analysis in a non-meaningful way, potentially creating false-positive relations. The point-biserial correlation coefficient (rpb) was used to expresses correlations between dichotomous outcomes (blinding vs. no blinding and single- vs. multicenter trials) and FI. Differences in FI between several groups were assessed by Mann-Whitney U test to acknowledge the non-parametric distribution of data points. We considered P-values <0.05 to be statistically significant.
Of the 214 references used to develop the guidelines, 57 were RCTs. Twenty-one RCTs had a 1:1 parallel trial design, dichotomous outcomes and significant P-values. Fig. 1 demonstrates the flow of articles through the screening process.
Figure 1
Flow of articles through screening and reasons for exclusion
Flow of articles through screening and reasons for exclusion
Characteristics of trial and outcomes
The median sample size was 80 (interquartile ratio [IQR] 60-106). Twelve trials (57%) reported the number of patients lost to follow up. For studies that reported the number of patients lost to follow up (12 RCTs), the median of patients lost was 2 (IQR 0-6.5). The median number of citations was 335 (IQR 142-417). Nine trials (42%) were blinded, 5 trials (23%) were unblinded, while 7 trials (33%) did not report it. Nine trials (42%) were multicenter trials, 10 (47%) were single-center, while 2 (0.1%) trials did not report it. Twelve trials (57%) reported an intention-to-treat analysis. The median year of publication was 2000 (IQR 1994-2007), with 11 trials (52%) published before 2000.The median (25th, 75th) FI was 1 (IQR 0.5-6), ranging from 0-17. A histogram showing the frequencies of FI scores is represented in Fig. 2. Six trials had an FI of 1 while 5 trials had an FI of 0. The median (25th, 75th) FQ was 0.07 (IQR 0.008-0.166). There was no significant correlation between FI and sample size (rs=0.357), P-value (rs=-0.299), number lost to follow up (rs=0.355), SCI (rs=0.347), year of publication (rs=-0.085), blinding (rpb=-0.18), or number of centers (rpb=-0.10). None of the P-values were significant. Our analysis showed a significant correlation between FI and NNT (rs=-0.549; P=0.015). A scatter plot relating FI to sample size is represented in Fig. 3. Table 2 represents the complete FI/FQ and NNT data, along with other characteristics, for all the 21 trials included.
Figure 2
Distribution of fragility index values from 21 trials. The median number of patients whose status would have to change from a non-event to an event to change a statistically significant result to a non-significant result was 1 (interquartile range 0.5-6)
Figure 3
Scatterplot of fragility index and sample size
Table 2
List of all included studies (N=21) and basic characteristics
List of all included studies (N=21) and basic characteristicsDistribution of fragility index values from 21 trials. The median number of patients whose status would have to change from a non-event to an event to change a statistically significant result to a non-significant result was 1 (interquartile range 0.5-6)Scatterplot of fragility index and sample sizeWhen segregated by sample size, the median FI for 11 (52%) trials that had a sample size of less than 80 was 1 (IQR 0-2), compared to the median FI of 4.5 (IQR 0.75-9.75; P=0.106) for trials with a sample size of 80 and above. For the 11 trials published in 2000 and before, the median FI was 1 (IQR 0-6), lower than the median FI of 2 (IQR 0.75-6.75; P=0.47) for trials published after 2000. Likewise, the median FI was 1 (0-2) for the 11 trials cited 334 times or less, compared to 4 (IQR 0.75-7.5; P=0.22) for studies with more than 334 citations. Trials with an NNT of 6 and below had a median FI of 1.5 (IQR 0.75-6; P=0.22), compared to 1 for those with an NNT greater than 6 (Table 3). However, none of these associations were found to be significant (Table 4).
Table 3
Median levels of the fragility index (FI) according to the characteristics of the randomized clinical trials (N=21 trials included)
Table 4
Correlation of trial characteristics with fragility index
Median levels of the fragility index (FI) according to the characteristics of the randomized clinical trials (N=21 trials included)Correlation of trial characteristics with fragility index
Discussion
Our investigation demonstrated a median (25th, 75th) FI of 1 and FQ of 0.070 for all the 21 trials, suggesting that only 1 inferior event on average in either arm would render their significant findings insignificant. The median number of patients lost to follow up reported in the 12 trials was 2. It is also worth mentioning that the FI was less than or equal to the number lost to follow up in 6 of the 21 trials. If data from one of these patients had been available, it could have swayed the final outcome of the study in a significant direction.A combination of results from several studies influences specific recommendations, with some having a stronger FI than the other, but the average FI for most of these studies was low. Recommendations 8, 9, 12, 24, 27, 33-37 and 39-41 are based on the RCTs included in our analysis. Recommendation 8, which suggests the use of baclofen, is based on an RCT with an FI of 9 and a number lost to follow up of 12. Likewise, recommendations 24, 27 and 33 hinge on RCTs with an FI lower than the number of patients lost to follow up. These numbers clearly bring into question the strength of the recommendations. On the other hand, our analysis showed that recommendations concerning the use of antibiotics for spontaneous bacterial peritonitis (SBP) in patients with gastrointestinal bleeding and previous episodes of SBP (Recommendations 34, 35) are supported by slightly stronger RCTs with FIs of 12 and 3, respectively, with both studies having lost no patients to follow up.Narayan et al reported an FI of 3 (IQR 1-4.5), where 67.5% of the trials had a number of patients lost to follow up greater than the FI in an analysis of RCTs in the field of urology [11]. Likewise, a study examining RCTs in spine surgery reported an FI of 2, where 25% of the trials had a number of patients lost to follow up greater than the FI [14]. These results are in line with our findings. If there a large number of patients are lost to follow up, the results should be interpreted with caution, as the data are likely to be very fragile. Although the trials we analyzed were of 2-by-2 factorial design and included only dichotomous outcomes, our finding still suggests that traditional statistical tests, such as P-value and 95% confidence intervals, are not reliable in isolation for evaluating the effectiveness of interventions in RCTs [16].Finally, we also found there was no correlation between FI and the other variables included in the analysis, except for NNT. This reconfirms that FI is an independent measure of robustness and is not affected by other parameters, some of which can affect the P-value. Studies in the fields of urology and spine surgery have, however, reported a positive correlation between FI and sample size, while other studies have demonstrated a weak positive correlation. A study in pediatrics published in 2017 reported no correlation between FI and sample size [11-15]. The relationships between FI and sample size, along with other parameters, are inconsistent among several studies and a concrete understanding is yet to be established. With the exception of one very large trial (N=6632 and FI=17) [22], the remaining trials [19-21,23-39] in our analysis had sample sizes in a narrow range (27-117) that does not allow any assessment of the possible relationship between the FI and sample size. Our findings of no correlation between FI and other parameters are either true or attributable to the limited number of trials and their sample sizes.Our methodology included duplicative data extraction with 2 independent investigators. We had no limitations on including interventions and outcomes, considering either primary or secondary outcomes. Given the nature of FI analysis, we included only randomized trials with dichotomous outcomes and significant P-values, which restricts the generalization of our results to all the RCTs referenced in the 2012 guidelines. The median sample size of the RCTs in our investigation was 80, which could be the reason why a positive correlation could not be established between FI and other parameters. The research community is yet to establish a specific threshold for FI to establish its significance, as seen with a P-value of 0.05. Increased use of this metric is therefore needed to better understand the true nature and value of FI.In conclusion, RCTs conducted in the field of cirrhosis-related ascites hinge on a very small number of superior events, therefore making these studies fragile. The trials become weaker when large numbers of patients are lost to follow up and this number exceeds FI, seen in the studies we examined. Based on our investigation, it is clear that the guidelines on ascites exhibit high fragility, and this should be taken into account when making clinical decisions. When interpreting the outcomes of RCTs, FI should not be used in isolation, but rather coupled with other measures of statistical significance, such as P-value, 95% confidence intervals and sample size, to determine the robustness of the trials. We recommend the incorporation of FI and FQ in gastroenterological trials to better understand the true strength of the data outcomes.What is already known:Recommendations from guidelines are considered strongest when based on data from randomized controlled trials (RCTs)Data is defined as significant on the basis of a P-value, which has been criticized for being simplisticFragility index (FI) and fragility quotient (FQ) have been proposed as a novel metric to evaluate the robustness of RCTsWhat the new findings are:When evaluating the strength of a few recommendations of guidelines on the management of cirrhosis, we found that the RCTs hinge on a very small number of superior events that make these recommendations extremely fragileFI/FQ should be used along with P-value when we evaluate the strength of these recommendations
Authors: Christophe Bureau; Juan Carlos Garcia-Pagan; Philippe Otal; Gilles Pomier-Layrargues; Valérie Chabbert; Carlos Cortez; Pierre Perreault; Jean Marie Péron; Juan G Abraldes; Louis Bouchard; José Ignacio Bilbao; Jaume Bosch; Hervé Rousseau; Jean Pierre Vinel Journal: Gastroenterology Date: 2004-02 Impact factor: 22.682
Authors: P Sort; M Navasa; V Arroyo; X Aldeguer; R Planas; L Ruiz-del-Arbol; L Castells; V Vargas; G Soriano; M Guevara; P Ginès; J Rodés Journal: N Engl J Med Date: 1999-08-05 Impact factor: 91.245
Authors: M Rössle; A Ochs; V Gülberg; V Siegerstetter; J Holl; P Deibert; M Olschewski; M Reiser; A L Gerbes Journal: N Engl J Med Date: 2000-06-08 Impact factor: 91.245
Authors: Bertram Pitt; Willem Remme; Faiez Zannad; James Neaton; Felipe Martinez; Barbara Roniker; Richard Bittman; Steve Hurley; Jay Kleiman; Marjorie Gatlin Journal: N Engl J Med Date: 2003-03-31 Impact factor: 91.245
Authors: Ramon Planas; Silvia Montoliu; Belen Ballesté; Monica Rivera; Mireia Miquel; Helena Masnou; Jose Antonio Galeras; María D Giménez; Justiniano Santos; Isabel Cirera; Rosa María Morillas; Susanna Coll; Ricard Solà Journal: Clin Gastroenterol Hepatol Date: 2006-11 Impact factor: 11.382
Authors: Marta Martín-Llahí; Marie-Noëlle Pépin; Mónica Guevara; Fernando Díaz; Aldo Torre; Alberto Monescillo; Germán Soriano; Carlos Terra; Emilio Fábrega; Vicente Arroyo; Juan Rodés; Pere Ginès Journal: Gastroenterology Date: 2008-02-14 Impact factor: 22.682
Authors: P Ginés; V Arroyo; E Quintero; R Planas; F Bory; J Cabrera; A Rimola; J Viver; J Camps; W Jiménez Journal: Gastroenterology Date: 1987-08 Impact factor: 22.682