Literature DB >> 25209601

An overview of systematic reviews of diagnostic tests accuracy.

Abstract

The Cochrane Collaboration says that the Cochrane handbook for diagnostic test accuracy reviews (DTAR) is currently in development as per the Cochrane Collaboration. This implies that the methodology of systematic reviews (SR) of diagnostic test accuracy is still a matter of debate. At this point, comparison of methodologies for SR in case of interventions as against diagnostics would be helpful to understand DTAR.

Entities: Chemical Disease Species

Keywords: Clinical trial; Diagnostic test; Meta-analysis as topic; Review literature as topic

Year: 2014 PMID： 25209601 PMCID： PMC4183058 DOI： 10.4178/epih/e2014016

Source DB: PubMed Journal: Epidemiol Health ISSN： 2092-7193

INTRODUCTION

Recently, as comparative-effectiveness research (CER) and health-technology assessment (HTA) are being widely implemented, the need for systematic reviews (SR) with meta-analysis is growing [1-3]. In particular, SR methodology for randomized controlled clinical trials (RCT), which compare effectiveness of drug or procedural interventions, has been established on the Cochrane Handbook for Systematic Reviews of Interventions by Higgins et al. [4]. However, CER and HTA involve analyses of diagnostic tests accuracy as well as that of interventional trials. In reality, modern medicine includes a majority of diagnostics: one can only administer right treatment and obtain positive results—in regards to survival rates, for instance—with the help of accurate diagnosis. Accordingly, SR methodology regarding diagnostic test assessments (DTA) is clearly required. However, SR methodology of DTA is currently in development, as indicated on the Cochran Collaboration website [5,6]. At such a point, this article will overview DTA-related concepts and issues of SR methodology proposed by the Cochran Collaboration.

STEPS OF SYSTEMATIC REVIEWS

Table 1 shows a comparison of the concepts and indicators detailing each procedural step in conducting SR for effectiveness of interventional trials and accuracy of diagnostic tests. Based on this table, the following content may be proposed.

Table 1.

Comparison of issues related to systematic reviews (SR) of intervention trials and diagnostic tests

Step	Issues	SR of intervention trials	SR of diagnostic tests
Ask	Making questions	PICO	PPP-ICP-TR
Acquire	Main keyword	Intervention	Index test & target disorder
Acquire	Searching	Filtering	No filtering
Assess	Quality level	ROB	QUADAS-2
	Extracting results	Proportion of response (%)	Sensitivity & Specificity
	New index	NNT	DOR
	Summary figures	Forest plot	Coupled Forest Plot & SROC
Analysis	Heterogeneity index	I²	(SROCs by prediction region)
	On homogeneous	Fixed effect model	(Moses-Littenberg SROC)
	On heterogeneous	Random effect model	hierarchical models
Report	Standard for original article	CONSORT	STARD
	Standard for summary results	PRISMA	Not available
	Publication bias	Funnel plot	Not available

ROB, risk of bias; QUADAS-2, Quality Assessment of Diagnostic Accuracy Studies-2; NNT, number needed to treatment; DOR, diagnostic odds ratio; SROC, summary receiver operator characteristic curve; CONSORT, Consolidated Standards of Reporting Trials; STARD, Standards for Reporting of Diagnostic Accuracy; PRISMA, Preferred Reporting Items for Systematic reviews and Meta-analysis.

Making the answerable questions

The first step in a SR is to convert facing problems into some answerable questions. The patient or population, intervention, comparator, outcomes (PICO) method is being postulated as a viable tool for this process in SR of interventional trials [7]. The SR of DTA, however, begin with addressing the 8 aspects of ‘PPP-ICP-TR’ [6]. The first ‘P’ refers to patient characteristics, thus coinciding with the ‘P’ of the PICO, but the second and third ‘P’ refer to presentation, which concerns a patient’s major symptoms, as well as prior tests, which are used for patient diagnosis. The ‘I’ refers to index tests, which will be used in conducting systematic reviews, while the ‘C’ refers to comparator tests, which are regular procedures comparative to the index test. Accordingly, the ‘IC’ for DTA may correspond to the ‘IC’ for interventional trials. The last ‘P’ stands for ‘purpose’, which may be divided largely into 3 parts: 1) changing the conventional comparator test into the index test (replacement), 2) conducting comparator tests on those who tested positively in index tests, as to obtain a more accurate diagnosis (triage), and 3) conducting index tests on those who tested negatively in comparator tests, as to reduce false-negative results (add-on). The ‘T’ stands for the ‘target disorder’ of any given SR and corresponds with the ‘O’ of the PICO method. The final ‘R’ refers to the ‘reference standard’, or, more specifically, the gold standard. Indeed, a highly diverse range of information must be examined in order to conduct SR of DTA. Notably, addressing the 4 categories of test–namely, the prior test, index test, comparator test, and reference standard–requires that the concepts be clearly differentiated according to context. For instance, when conducting SR for choice between breast ultrasonography and breast magnetic resonance imaging (MRI) as additional examinations for diagnosis of breast cancer on women who showed dense mammography, the index test, comparator test, prior test, and reference standard would belong to breast MRI, breast ultrasonography, mammography, and pathologic results of breast tissues, respectively.

Searching literature

While key words for performing SR of interventional trials might include ‘intervention’ (the ‘I’ of the ‘PICO’ method), while those for diagnostic studies might include ‘index test’ (the ‘I’ of ‘PPP-ICP-TR’) and the target disorder (the ‘T’ of ‘PPP-ICP-TR’). Moreover, in interventional trials, filtering study design for RCT while focusing on the topics concerning the intervention can be an effective search strategy, as most interventional trials use the RCT design. However, as diagnostic tests utilize a diverse range of research design, such as cross-sectional studies (and not just comparative RCTs), it is meaningless to filter for research designs when searching for literature concerning diagnostic tests.

Evaluating individual article and extracting information

As for tools that evaluate the quality of each article, risk of bias (ROB) if applicable, as proposed by Higgins et al. [8], as well as Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) [9], have been developed for interventional trials and diagnostic tests, respectively. QUADAS-2 is the revised, 2011 version of the 2003 QUADAS and consists of 4 dimensions – patient selection, index test, reference standard, and finally, flow and timing, the first 3 of which requires an answer among the 3 available responses (yes/high, no/low, and unclear) [10]. An accessible Korean adaptation of the QUADAS-2 would be both, convenient and highly beneficial. As for data extraction in SR of interventional trials, response rates (%) among the treatment and control groups should be obtained from the articles chosen for review. With respect to diagnostic tests, however, sensitivity and specificity of the tests are essential [11]. Diagnostic tests also provide predictive values, but these change according to disease prevalence and thus, are not appropriate for use in SR of DTA [12]. Likewise, sensitivity and specificity have been used, precisely as they are not associated with disease prevalence [13]. However, these factors also change according to threshold level and therefore, receiver operator characteristic (ROC) curves should necessarily accompany their use [14]. When calculating relevant indices in order to detect new information in the extracted data, the number needed to treat in interventional trials should be calculated from the reciprocal values corresponding to the difference in the response rates of the treatment and control groups [15]. In contrast, SR of DTA could rather calculate a diagnostic odds ratio (DOR) by dividing the product of sensitivity and specificity (true results) by the product of the values that count as false results [16,17]. This value is referred to as an OR, as it is in the same form as ad/bc in a 2×2 table: the larger the value, the higher will sensitivity and specificity be in relation to each another. In other words, the larger the value, the closer one will approach the left upper maximum of the ROC curve and consequently, the larger the area under the curve will be [14]. In order to clearly display the extracted data, SR of interventional trials employ forest plots [18]. However, SR of DTA use coupled forest plots to show information concerning both sensitivity and specificity [19]. In addition, because sensitivity and specificity singularly change according to threshold level, summary ROC (SROC) curves accompany the plots [20]. The size of the mark may be changed according to the sample size of articles selected or standard error.

Meta-analysis

In order to conduct a meta-analysis, heterogeneity among the selected articles must be examined. Currently, SR of interventional trials assess heterogeneity using I2 values [21]. Summary statistics may be calculated according to fixed-effect models, if homogeneity is confirmed, or according to random-effect models, if heterogeneity is confirmed. However, taking the trade-offs concerning sensitivity and specificity into the account, SR of DTA assume heterogeneity, except in special cases. In particular, when thresholds, such as standards in hypertension diagnoses, continuously change over time, a subgroup analysis must be conducted according to the covariate that reflects this change [10]. Thus there are no statistical methods designated to assess heterogeneity in SR of DTA, and additional analyses involving hierarchical random-effect models are required mostly. Currently, the two methods such as bivariate method and Rutter & Gatsonis HSROC method have been developed for this purpose. They use different statistical values for calculation [6]. The bivariate method uses sensitivity and specificity, while the HSROC method uses thresholds and DOR [20]. Though, RevMan 5.3 supports neither method of analysis directly, when the statistical estimates from SAS PROC NLMIXED (SAS Inc., Cary, NC, USA) or STATA METANDI (StataCorp, College Station, TX, USA) are additionally entered, RevMan can calculate summary statistics [21]. If fewer studies and fixed threshold were used, the Moses-Littenberg SROC might be useful as well, for summary statistics.

Reporting

Reporting results of interventional trials and diagnostic tests may follow the guidelines postulated by Consolidated Standards of Reporting Trials (CONSORT) [23] and Standards for Reporting of Diagnostic Accuracy (STARD) guidelines [24], respectively. Additionally, while Preferred Reporting Items for Systematic reviews and Meta-analysis (PRISMA) is a guideline for reporting results of SR of interventional trials [25], there is yet none available for that of DTA. Moreover, while funnel plots may be used to indirectly check for publication bias in SR of interventional trials, there is no tool currently available for such evaluations in SR of DTA.

CONCLUSIONS AND SUGGESTIONS

The fact that methodology regarding SR of DTA is still in development implies that several issues remain unaddressed. Experts are yet to reach a consensus and the complex nature of DTA increase the quantum of issues than those in the case of interventional trials [26]. Furthermore, the content discussed in the present study opens for changes at any time. Nevertheless, I deliberated on the methodology for SR of DTA so as to encourage Korean researchers to take interest and actively participate in the process of refining this methodology. Finally, we hope that epidemiologists and biostatisticians will attempt several SR in near future.

22 in total

1. Quantifying heterogeneity in a meta-analysis.

Authors: Julian P T Higgins; Simon G Thompson
Journal: Stat Med Date: 2002-06-15 Impact factor: 2.373

2. The diagnostic odds ratio: a single indicator of test performance.

Authors: Afina S Glas; Jeroen G Lijmer; Martin H Prins; Gouke J Bonsel; Patrick M M Bossuyt
Journal: J Clin Epidemiol Date: 2003-11 Impact factor: 6.437

3. Key principles for the improved conduct of health technology assessments for resource allocation decisions.

Authors: Michael F Drummond; J Sanford Schwartz; Bengt Jönsson; Bryan R Luce; Peter J Neumann; Uwe Siebert; Sean D Sullivan
Journal: Int J Technol Assess Health Care Date: 2008 Impact factor: 2.188

Review 4. Systematic reviews and meta-analysis: studies of studies.

Authors: Sandra Engberg
Journal: J Wound Ostomy Continence Nurs Date: 2008 May-Jun Impact factor: 1.741

5. Grading quality of evidence and strength of recommendations for diagnostic tests and strategies.

Authors: Holger J Schünemann; A Holger J Schünemann; Andrew D Oxman; Jan Brozek; Paul Glasziou; Roman Jaeschke; Gunn E Vist; John W Williams; Regina Kunz; Jonathan Craig; Victor M Montori; Patrick Bossuyt; Gordon H Guyatt
Journal: BMJ Date: 2008-05-17

6. The number needed to treat: a clinically useful measure of treatment effect.

Authors: R J Cook; D L Sackett
Journal: BMJ Date: 1995-02-18

7. The STARD statement for reporting diagnostic accuracy studies: application to the history and physical examination.

Authors: David L Simel; Drummond Rennie; Patrick M M Bossuyt
Journal: J Gen Intern Med Date: 2008-03-18 Impact factor: 5.128

Review 8. Galactomannan detection for invasive aspergillosis in immunocompromized patients.

Authors: Mariska M Leeflang; Yvette J Debets-Ossenkopp; Caroline E Visser; Rob J P M Scholten; Lotty Hooft; Henk A Bijlmer; Johannes B Reitsma; Patrick Mm Bossuyt; Christina M Vandenbroucke-Grauls
Journal: Cochrane Database Syst Rev Date: 2008-10-08

9. QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies.

Authors: Penny F Whiting; Anne W S Rutjes; Marie E Westwood; Susan Mallett; Jonathan J Deeks; Johannes B Reitsma; Mariska M G Leeflang; Jonathan A C Sterne; Patrick M M Bossuyt
Journal: Ann Intern Med Date: 2011-10-18 Impact factor: 25.391

10. Conducting systematic reviews of diagnostic studies: didactic guidelines.

Authors: Walter L Devillé; Frank Buntinx; Lex M Bouter; Victor M Montori; Henrica C W de Vet; Danielle A W M van der Windt; P Dick Bezemer
Journal: BMC Med Res Methodol Date: 2002-07-03 Impact factor: 4.615

9 in total

1. Fractional Exhaled Nitric Oxide for the Diagnosis of Childhood Asthma: a Systematic Review and Meta-analysis.

Authors: Songqi Tang; Yiqiang Xie; Conghu Yuan; Xiaoming Sun; Yubao Cui
Journal: Clin Rev Allergy Immunol Date: 2019-04 Impact factor: 8.667

2. Early detection treatment response for head and neck carcinomas using intravoxel incoherent motion-magnetic resonance imaging: a meta-analysis.

Authors: Qingxu Song; Fang Li; Xin Chen; Jianbo Wang; Hong Liu; Yufeng Cheng
Journal: Dentomaxillofac Radiol Date: 2020-04-21 Impact factor: 2.419

3. What kind of systematic review should I conduct? A proposed typology and guidance for systematic reviewers in the medical and health sciences.

Authors: Zachary Munn; Cindy Stern; Edoardo Aromataris; Craig Lockwood; Zoe Jordan
Journal: BMC Med Res Methodol Date: 2018-01-10 Impact factor: 4.615

4. Diagnostic accuracy of the UBC^® Rapid Test for bladder cancer: A meta-analysis.

Authors: Pei Lu; Jianchun Cui; Keliang Chen; Qiang Lu; Jiexiu Zhang; Jun Tao; Zhijian Han; Wei Zhang; Rijin Song; Min Gu
Journal: Oncol Lett Date: 2018-07-05 Impact factor: 2.967

5. Accuracy of FGF-21 and GDF-15 for the diagnosis of mitochondrial disorders: A meta-analysis.

Authors: Yan Lin; Kunqian Ji; Xiaotian Ma; Shuangwu Liu; Wei Li; Yuying Zhao; Chuanzhu Yan
Journal: Ann Clin Transl Neurol Date: 2020-06-25 Impact factor: 4.511

6. Circulating microRNAs as promising diagnostic biomarkers for pancreatic cancer: a systematic review.

Authors: Jinru Xue; Erna Jia; Na Ren; Andrew Lindsay; Haixin Yu
Journal: Onco Targets Ther Date: 2019-08-19 Impact factor: 4.147

7. Diagnostic and clinical significance of antigen-specific pancreatic antibodies in inflammatory bowel diseases: A meta-analysis.

Authors: Konstantinos Gkiouras; Maria G Grammatikopoulou; Xenophon Theodoridis; Eirini Pagkalidou; Evangelia Chatzikyriakou; Anna G Apostolidou; Eirini I Rigopoulou; Lazaros I Sakkas; Dimitrios Petrou Bogdanos
Journal: World J Gastroenterol Date: 2020-01-14 Impact factor: 5.742

Review 8. Noninvasive fractional flow reserve derived from coronary computed tomography angiography for identification of ischemic lesions: a systematic review and meta-analysis.

Authors: Wen Wu; Dao-Rong Pan; Nicolas Foin; Si Pang; Peng Ye; Niels Holm; Xiao-Min Ren; Jie Luo; Aravinda Nanjundappa; Shao-Liang Chen
Journal: Sci Rep Date: 2016-07-05 Impact factor: 4.379

Review 9. Comparison of atopy patch testing to skin prick testing for diagnosing mite-induced atopic dermatitis: a systematic review and meta-analysis.

Authors: Yumei Liu; Jianglong Peng; Ying Zhou; Yubao Cui
Journal: Clin Transl Allergy Date: 2017-11-29 Impact factor: 5.871

9 in total