Literature DB >> 35416432

Overconfident results with the bivariate random effects model for meta-analysis of diagnostic accuracy studies.

Luis Furuya-Kanamori¹, Eletherios Meletis², Chang Xu³, Polychronis Kostoulas², Suhail Ar Doi³.

Abstract

Entities: Chemical

Keywords: Bayesian; diagnosis; latent class; split component synthesis

Mesh：

Year: 2022 PMID： 35416432 PMCID： PMC9321862 DOI： 10.1111/jebm.12467

Source DB: PubMed Journal: J Evid Based Med ISSN： 1756-5391

× No keyword cloud information.

Meta‐analyses of diagnostic accuracy studies are a fundamental component of evidence‐based medicine, and they are extensively used in medical imaging and the clinical laboratory. Techniques specifically developed to combine independent studies of diagnostic accuracy and provide pooled estimates for sensitivity (Se), specificity (Sp), positive (pLR) and negative (nLR) likelihood ratios are relatively new. In 2001, Rutter and Gatsonis proposed the hierarchical summary receiver operating characteristic (HSROC) model, and in 2004 Macaskill described an empirical Bayes approach. Soon after, in 2005, Reitsma et al. proposed the bivariate random effects model, which has been widely adopted and is the most commonly used method for diagnostic meta‐analysis. However, as pointed out by Diaz, the statistical performance of the bivariate model has not been scrutinized. Diaz found that the performance of the bivariate model deteriorates when between‐study heterogeneity increases and the number of studies decrease. Our simulation studies found similar results—with moderate levels of heterogeneity (tau2 = 1), the coverage probabilities of Se, Sp, and the diagnostic odds ratio (DOR) with the bivariate model dropped below the nominal level. Diagnostic accuracy studies usually favor sensitivity over specificity, or vice versa leading to diagnostic 2 × 2 tables with one or more of the cells with low frequency or zero counts. Thus, extreme DORs are more commonly observed in diagnostic than in intervention meta‐analyses, which leads to high levels of heterogeneity (despite the wide confidence intervals of the studies).

CASE STUDY: ELISA FOR DETECTING RABIES ANTIBODIES

We report the results of a meta‐analysis with five studies estimating the operating characteristics of enzyme‐linked immunosorbent assay (ELISA) when compared against the reference standard, fluorescent focus inhibition test (RFFIT), for detection of immune response (i.e., seropositive or seronegative status) after a rabies vaccine. The sample size of the studies ranged from 28 to 990. Despite the studies not having ‘extreme’ values (range of Se: 84.2 to 100; and range of Sp: 87.1 to 100) (Table 1), large between‐study heterogeneity (tau2 = 8.3) was observed, due to low frequency of false positives and false negatives, including cells with zero counts. The pooled estimates were calculated using the:

TABLE 1

Data from the five studies included in the meta‐analysis

Author, year	Sample size	TP	FP	FN	TN	Sensitivity (95% CI)	Specificity (95% CI)
Feyssaguet 2007	655	191	3	2	459	99.0 (96.3–99.9)	99.4 (98.1–99.9)
Muhamuda 2007	990	740	0	0	250	100 (99.5–100)	100 (98.5–100)
Pandit 1991	28	16	1	3	8	84.2 (60.4–96.6)	88.9 (51.8–99.7)
Welch 2009	82	32	2	2	46	94.1 (80.3–99.3)	95.8 (85.8–99.5)
Zhao 2019	428	374	4	23	27	94.2 (91.4–96.3)	87.1 (70.2–96.4)

TP, true positive; FP, false positive; FN, false negative; TN, true negative; CI, confidence interval.

Bivariate random effects model An extension of the bivariate model proposed by Chu and Cole which is a generalized linear mixed model with a modeling approach for sparse data recommended when there are low cell counts Split component synthesis (SCS) method which summarizes the study‐specific ln(DOR) using the inverse variance heterogeneity model, and then splits the summary ln(DOR) into its component parts (i.e., logit(Se) and logit(Sp)) Hierarchical summary receiver operating characteristic model in a Bayesian latent class meta‐analysis framework (Bayes‐HSROC), which assumes the absence of a perfect reference standard Data from the five studies included in the meta‐analysis TP, true positive; FP, false positive; FN, false negative; TN, true negative; CI, confidence interval. The analyses were conducted in Stata MP version 14.1 using the metandi module for the bivariate models and the diagma module for the SCS method. The Bayes‐HSROC model was implemented in the R programming language using the rjags and runjags packages. The Bayes‐HSROC model applies Bayesian inference, where the posterior distribution of the parameters of interest depends on the likelihood function and the prior information provided. The likelihood function was computed as a statistical model for the observed data (Supplementary Material S1). Noninformative priors were used, meaning that no external information was provided to the model. Parameter estimates were based on analytical summaries of 500,000 iterations of two chains after a burn‐in phase of 10,000 iterations. Time series plots were used to assessed convergence. The two chains converged to the same solution and autocorrelation plots dropped‐off fast (Supplementary Material S2). The results of the four models are presented in Table 2, and point estimates and confidence intervals (credible intervals for the Bayes‐HSROC model) were more conservative with the SCS methods and the Bayes‐HSROC model than with both bivariate models. The Se was 98.4 (95% CI 90.2–99.8) with the bivariate model, while it was 95.6 (95% CI 62.6–99.6) and 92.7 (95% CI 67.4–99.8) with the SCS method and Bayesian HSROC respectively, with similar results for the Sp.

TABLE 2

Comparison of pooled estimates using the bivariate models, split component method, and Bayesian HSROC

	Bivariate model	Bivariate model (Chu and Cole extension)	Split component method	Bayesian HSROC ^a
Sensitivity	98.4 (90.2–99.8)	98.5 (88.6–99.8)	95.6 (62.6–99.6)	92.7 (67.4–99.8)
Specificity	98.3 (85.4–99.8)	98.5 (82.5–99.9)	95.4 (61.9–99.6)	92.8 (75.5–99.5)
pLR	58.86 (5.93–584.63)	65.60 (4.78–899.62)	20.91 (2.04–214.02)	12.88 (2.75–199.60)
nLR	0.02 (0.00–0.11)	0.02 (0.00–0.13)	0.05 (0.01–0.48)	0.08 (0.01–0.43)
DOR	3608.5 (62.3–208848.8)	4413.1 (42.2–461117.2)	451.1 (16.7–12205.6)	163.7 (6.4–99301.0)
AUC	0.98 (0.89–0.99) ^b	0.99 (0.87–0.99) ^b	0.96 (0.80–0.99)	–

HSROC, hierarchical summary receiver operating characteristic; pLR, positive likelihood ratio; nLR, negative likelihood ratio; DOR, diagnostic odds ratio; AUC, area under the curve.

Reference standard for sensitivity 99.9 (99.5–100) and specificity 99.7 (99.2–100).

AUC estimated from the DOR.

Comparison of pooled estimates using the bivariate models, split component method, and Bayesian HSROC HSROC, hierarchical summary receiver operating characteristic; pLR, positive likelihood ratio; nLR, negative likelihood ratio; DOR, diagnostic odds ratio; AUC, area under the curve. Reference standard for sensitivity 99.9 (99.5–100) and specificity 99.7 (99.2–100). AUC estimated from the DOR.

SIMULATION STUDY

The five studies included in the case study were simulated fixing the sample size to original study and fixing the true value of Se and Sp = 0.96 (based on the pooled estimates in Table 2). The number of diseased (dis) and nondiseased (ndis) individuals were drawn from a binomial distribution using the sample size and the actual prevalence of seropositivity and seronegativity, respectively in each study. The four cell counts (tp, fp, fn, tn) were then derived from dis and ndis, and the Se and Sp. Next, the four counts were divided by a scale parameter (minimum value = 1) that was derived from a transformation of a hypothetically imputed bias variance to introduce systematic error. Both random and systematic error were introduced by regenerating a simulated Se and Sp from a beta distribution with parameters tp/f and fn/f; and tn/f and fp/f, respectively. Next, the studies were generated and meta‐analyzed, and 1000 meta‐analyses were simulated in each of 10 runs, with run 1 representing random error alone (scale parameter = 1) and runs 2–10 having increasing level of between‐study heterogeneity. The Stata codes for the data generation are provided in the Supplementary Material S3. For each level of heterogeneity, summary DOR, Se, and Sp estimated by the extension of the bivariate model (proposed by Chu and Cole) and SCS method were compared based on mean absolute estimation error squared (bias squared), mean squared error (MSE), width of the confidence interval, and coverage probability. The distribution of Se, Sp, and tau2 generated for each of the 10 runs are reported in the Supplementary Material S4. The bivariate model did not converge in 19% of the simulated meta‐analyses and these were excluded from the performance analyses for both the bivariate model and SCS method. The simulation study revealed that the SCS method's DOR, Se, and Sp estimates were less biased (Figure 1A) and had smaller MSE than the bivariate model estimates (Figure 1B). As heterogeneity increased, the width of the 95% confidence interval became wider with the bivariate model (Figure 1C), yet it had lower coverage probability of the confidence interval compared to the SCS method (Figure 1D). It was not possible to compare the performance of the models when moderate or extensive heterogeneity was introduced as the bivariate model did not converge in > 50% of the meta‐analyses.

FIGURE 1

Performance comparison of the diagnostic odds ratio (triangle), sensitivity (circle), and specificity (square) between the split component synthesis method (blue) and the bivariate model (red) at different levels of heterogeneity

DISCUSSION

In our case study with small number of studies and large heterogeneity, discrepancies were observed in the confidence/credible intervals ‐ very narrow confidence intervals with the bivariate models, while the confidence/credible intervals were wide with the SCS method and the Bayes‐HSROC. The simulation study revealed that when heterogeneity was introduced, there was a considerable decline in the performance of the bivariate model. Therefore, it is very likely that the results of the case study and other studies using the bivariate model would generate spuriously overconfident results due to overdispersion of the data relative to the model. Between‐study heterogeneity is the norm in meta‐analyses of diagnostic accuracy studies. In a methodological review, Dinnes et al. found that there was statistical heterogeneity in 79% of diagnostic meta‐analyses; thus pooling methods have to be able to properly maintain performance when heterogeneity is present. This study therefore suggests that newer SCS method can resolve the issue of overdispersion with the bivariate model and needs to be prioritized in research. Alternatively, a Bayesian approach can be used, especially when the reference method is imperfect. In conclusion, the bivariate model suffers from the same issue of overdispersion as the random effects model in standard meta‐analysis and the SCS method seems to be a viable alternative. The latter also avoids the issue of nonconvergence and is not unduly affected by varying implicit thresholds given that it starts with synthesis of the DOR. Further evaluation is therefore recommended to independently verify these findings, so that the necessary recommendations can be made for the research community

FUNDING

LFK was supported by Australian National Health and Medical Research Council Early Career Fellowships (APP1158469).

DATA SHARING

The data that supports the findings of this study are available in the Supplementary Material of this article. Supplementary Material S1. R/JAGS code to run Bayes‐HSROC model Supplementary Material S2. Time series plots for (A) pooled sensitivity, (B) pooled specificity, (C) reference method sensitivity, (D) reference method specificity Supplementary Material S3. Stata code for data simulation Supplementary Material S4. Summary output of the simulation study Click here for additional data file.

16 in total

1. A hierarchical regression approach to meta-analysis of diagnostic test accuracy evaluations.

Authors: C M Rutter; C A Gatsonis
Journal: Stat Med Date: 2001-10-15 Impact factor: 2.373

Review 2. A methodological review of how heterogeneity has been examined in systematic reviews of diagnostic test accuracy.

Authors: J Dinnes; J Deeks; J Kirby; P Roderick
Journal: Health Technol Assess Date: 2005-03 Impact factor: 4.014

3. Systematic reviews of diagnostic accuracy studies require study by study examination: first for heterogeneity, and then for sources of heterogeneity.

Authors: Colin B Begg
Journal: J Clin Epidemiol Date: 2005-09 Impact factor: 6.437

4. Bivariate meta-analysis of sensitivity and specificity with sparse data: a generalized linear mixed model approach.

Authors: Haitao Chu; Stephen R Cole
Journal: J Clin Epidemiol Date: 2006-09-28 Impact factor: 6.437

5. Assessing the convergence of Markov Chain Monte Carlo methods: an example from evaluation of diagnostic tests in absence of a gold standard.

Authors: Nils Toft; Giles T Innocent; George Gettinby; Stuart W J Reid
Journal: Prev Vet Med Date: 2007-02-09 Impact factor: 2.670

Overconfident results with the bivariate random effects model for meta-analysis of diagnostic accuracy studies.

CASE STUDY: ELISA FOR DETECTING RABIES ANTIBODIES

SIMULATION STUDY

DISCUSSION

FUNDING

DATA SHARING

1. A hierarchical regression approach to meta-analysis of diagnostic test accuracy evaluations.

Review 2. A methodological review of how heterogeneity has been examined in systematic reviews of diagnostic test accuracy.

3. Systematic reviews of diagnostic accuracy studies require study by study examination: first for heterogeneity, and then for sources of heterogeneity.

4. Bivariate meta-analysis of sensitivity and specificity with sparse data: a generalized linear mixed model approach.

5. Assessing the convergence of Markov Chain Monte Carlo methods: an example from evaluation of diagnostic tests in absence of a gold standard.

6. Survey revealed a lack of clarity about recommended methods for meta-analysis of diagnostic accuracy data.

7. A new method for synthesizing test accuracy data outperformed the bivariate method.

8. Meta-analysis in evidence-based healthcare: a paradigm shift away from random effects is overdue.

9. Advances in the meta-analysis of heterogeneous clinical trials I: The inverse variance heterogeneity model.

10. Overconfident results with the bivariate random effects model for meta-analysis of diagnostic accuracy studies.

1. Overconfident results with the bivariate random effects model for meta-analysis of diagnostic accuracy studies.