| Literature DB >> 31843693 |
Yemisi Takwoingi1, Christopher Partlett2, Richard D Riley3, Chris Hyde4, Jonathan J Deeks5.
Abstract
OBJECTIVE: The objective of this study was to examine methodological and reporting characteristics of systematic reviews and meta-analyses which compare diagnostic test accuracy (DTA) of multiple index tests, identify good practice, and develop guidance for better reporting. STUDY DESIGN ANDEntities:
Keywords: Comparative accuracy; Diagnostic accuracy; Meta-analysis; Systematic review; Test accuracy; Test comparison
Mesh:
Year: 2019 PMID: 31843693 PMCID: PMC7203546 DOI: 10.1016/j.jclinepi.2019.12.007
Source DB: PubMed Journal: J Clin Epidemiol ISSN: 0895-4356 Impact factor: 6.437
Fig. 1Flow of reviews through the selection process. *The 82 comparative accuracy reviews met at least one of the following four criteria: (1) clear objective to compare the accuracy of at least two tests; (2) selected only comparative studies; (3) performed statistical analyses comparing the accuracy of all or at least a pair of tests; or (4) performed a direct (head-to-head) comparison of two tests.
Descriptive characteristics of 127 reviews of comparative accuracy and multiple tests
| Characteristic | Comparative reviews | Multiple test reviews | Total | |
|---|---|---|---|---|
| Statistical test performed to compare accuracy | ||||
| Yes | No or unclear | |||
| Number of reviews | 53 (42) | 29 (23) | 45 (35) | 127 |
| Year of publication | ||||
| 2008 | 14 (26) | 11 (38) | 13 (29) | 38 (30) |
| 2009 | 6 (11) | 10 (34) | 8 (18) | 24 (19) |
| 2010 | 16 (30) | 4 (14) | 11 (24) | 31 (24) |
| 2011 | 13 (25) | 3 (10) | 7 (16) | 23 (18) |
| 2012 | 4 (8) | 1 (3) | 6 (13) | 11 (9) |
| Type of publication | ||||
| Cochrane review | 3 (6) | 1 (3) | 1 (2) | 5 (4) |
| General medical journal | 5 (9) | 5 (17) | 13 (29) | 23 (18) |
| Specialist medical journal | 42 (79) | 22 (76) | 30 (64) | 93 (73) |
| Technology assessment report | 3 (6) | 1 (3) | 2 (4) | 6 (5) |
| Number of tests evaluated | ||||
| 2 | 20 (38) | 14 (48) | 12 (27) | 46 (36) |
| 3 | 12 (23) | 6 (21) | 4 (9) | 22 (17) |
| 4 | 8 (15) | 3 (10) | 4 (9) | 15 (12) |
| ≥5 | 13 (25) | 6 (21) | 25 (56) | 44 (35) |
| Clinical topic (according to ICD-11 Version: 2018) | ||||
| Circulatory system | 9 (17) | 5 (17) | 5 (11) | 19 (15) |
| Digestive system | 3 (6) | 1 (3) | 8 (18) | 12 (9) |
| Infectious and parasitic diseases | 3 (6) | 4 (14) | 9 (20) | 16 (13) |
| Injury, poisoning, and certain other consequences of external causes | 2 (4) | 1 (3) | 2 (4) | 5 (4) |
| Mental, behavioral, or neurodevelopmental disorders | 2 (4) | 1 (3) | 3 (7) | 6 (5) |
| Musculoskeletal system and connective tissue | 1 (2) | 1 (3) | 4 (9) | 6 (5) |
| Neoplasms | 28 (53) | 12 (41) | 7 (16) | 47 (37) |
| Other ICD-11 codes | 5 (9) | 4 (14) | 7 (16) | 16 (13) |
| Type of tests evaluated | ||||
| Biopsy | 0 | 1 (3) | 0 | 1 (1) |
| Clinical and physical examination | 5 (9) | 3 (10) | 15 (33) | 23 (18) |
| Device | 1 (2) | 0 | 0 | 1 (1) |
| Imaging | 32 (60) | 13 (45) | 9 (20) | 54 (43) |
| Laboratory | 8 (15) | 8 (28) | 12 (27) | 28 (22) |
| RDT or POCT | 1 (2) | 0 | 4 (9) | 5 (4) |
| Self-administered questionnaire | 1 (2) | 1 (3) | 0 | 2 (2) |
| Combinations of any of the above | 5 (9) | 3 (10) | 5 (11) | 13 (10) |
| Clinical purpose of the tests | ||||
| Diagnostic | 42 (79) | 23 (79) | 44 (98) | 109 (86) |
| Monitoring | 1 (2) | 1 (3) | 0 | 2 (2) |
| Prognostic/prediction | 0 | 1 (3) | 0 | 1 (1) |
| Response to treatment | 1 (2) | 0 | 0 | 1 (1) |
| Screening | 3 (6) | 4 (14) | 1 (2) | 8 (6) |
| Staging | 6 (11) | 0 | 0 | 6 (5) |
| Number of test accuracy studies in reviews | ||||
| Median (range) | 25 (6–103) | 17 (5–82) | 19 (3–79) | 20 (3–103) |
| Interquartile range | 14–43 | 11–32 | 12–24 | 12–34 |
| Number of comparative studies | ||||
| Median (range) | 7 (0–59) | 6 (0–32) | 4 (0–52) | 6 (0–59) |
| Interquartile range | 4–14 | 1–11 | 2–10 | 3–11 |
| Number of noncomparative studies | ||||
| Median (range) | 17 (0–98) | 6 (0–79) | 10 (0–76) | 14 (0–98) |
| Interquartile range | 6–32 | 0–27 | 5–20 | 3–24 |
Abbreviations: ICD-11, International Classification of Diseases, Eleventh Revision; RDT, rapid diagnostic test; POCT, point of care test.
Numbers in parentheses are column percentages unless otherwise stated. Percentages may not add up to 100% because of rounding.
In 3 reviews, it was unclear whether a statistical comparison of test accuracy was done.
Includes only studies published up to October 2012.
Includes 8 ICD-11 codes that had fewer than 5 reviews across the 3 groups.
Tests evaluated in a review were not of the same type.
Strategies and methods for test comparisons
| Characteristic | Comparative reviews | Multiple test reviews | Total | |
|---|---|---|---|---|
| Statistical analyses to compare test accuracy | ||||
| Yes | No or unclear | |||
| Number of reviews | 53 (42) | 29 (23) | 45 (35) | 127 (100) |
| Study type | ||||
| Comparative only | 8 (15) | 8 (28) | 0 | 16 (13) |
| Any study type | 45 (85) | 21 (72) | 45 (100) | 111 (87) |
| Test comparison strategy | ||||
| Direct comparison only | 8 (15) | 8 (28) | 0 | 16 (13) |
| Indirect comparison only—comparative studies available | 26 (49) | 10 (34) | 4 (9) | 40 (32) |
| Indirect comparison only—no comparative studies available | 2 (4) | 6 (21) | 1 (2) | 9 (7) |
| Both direct and indirect comparison | 17 (32) | 5 (17) | 0 | 22 (17) |
| None | 0 | 0 | 40 (89) | 40 (32) |
| Method used for test comparison | ||||
| Meta-regression—hierarchical model | 18 (34) | 0 | 0 | 18 (14) |
| Meta-regression—SROC regression | 2 (4) | 0 | 0 | 2 (2) |
| Meta-regression—ANCOVA | 2 (4) | 0 | 0 | 2 (2) |
| Meta-regression—logistic regression | 1 (2) | 0 | 0 | 1 (1) |
| Univariate pooling of difference in sensitivity and specificity or DORs | 6 (11) | 0 | 0 | 6 (5) |
| Naïve (comparison of pooled estimates from separate meta-analyses) | 0 | 0 | ||
| Z-test | 15 (28) | 0 | 0 | 15 (12) |
| Paired t-test | 1 (2) | 0 | 0 | 1 (1) |
| Unpaired t-test | 1 (2) | 0 | 0 | 1 (1) |
| Chi-squared test | 1 (2) | 0 | 0 | 1 (1) |
| Comparison of Q* statistic and their SEs | 1 (2) | 0 | 0 | 1 (1) |
| Overlapping confidence intervals | 0 | 3 (10) | 0 | 3 (2) |
| Narrative | 0 | 9 (31) | 4 (9) | 13 (10) |
| None | 0 | 14 (48) | 40 (89) | 54 (43) |
| Unclear | 5 (9) | 3 (10) | 1 (2) | 9 (7) |
| Relative measures used to summarize differences in test accuracy | 18 (34) | 0 | 0 | 18 (14) |
| Multiple thresholds included | 13 (25) | 12 (41) | 17 (38) | 42 (33) |
| If multiple thresholds included, were they accounted for in the comparative meta-analysis (meta-analysis at each threshold or fitted appropriate model) | ||||
| Yes | 6 (46) | 0 | 0 | 6 (46) |
| No | 4 (31) | 0 | 0 | 4 (31) |
| Unclear | 3 (23) | 0 | 0 | 3 (23) |
Abbreviations: ANCOVA, analysis of covariance; DOR, diagnostic odds ratio; SE, standard error; SROC, summary receiver operating characteristic.
Numbers in parentheses are column percentages unless otherwise stated. Percentages may not add up to 100% because of rounding.
Numbers in parentheses are row percentages.
These methods either involve a comparative meta-analysis or follow-on from a meta-analysis of each test individually.
Moses et al. [11] proposed the Q* statistic as an alternative to the area under the curve. Q* is the point on the SROC curve where sensitivity is equal to specificity, that is, the intersection of the summary curve and the line of symmetry.
Investigations of heterogeneity in comparative and multiple test reviews
| Characteristic | Comparative reviews | Multiple test reviews | Total | |
|---|---|---|---|---|
| Statistical analyses to compare test accuracy | ||||
| Yes | No or unclear | |||
| Number of reviews | 53 (42) | 29 (23) | 45 (35) | 127 (100) |
| Formal investigation performed | ||||
| Yes—meta-regression and subgroup analyses | 5 (9) | 1 (3) | 2 (4) | 8 (6) |
| Yes—meta-regression | 15 (28) | 5 (17) | 4 (9) | 24 (19) |
| Yes—subgroup analyses | 13 (25) | 8 (28) | 14 (31) | 35 (28) |
| No—limited data | 8 (15) | 2 (7) | 1 (2) | 11 (9) |
| No—only tested for heterogeneity | 3 (6) | 8 (28) | 16 (36) | 27 (21) |
| No—nothing reported | 7 (13) | 5 (17) | 8 (18) | 20 (16) |
| Unclear | 2 (4) | 0 | 0 | 2 (2) |
| If yes above, was effect on relative accuracy also investigated? | ||||
| Yes | 5 (15) | 0 | 0 | 5 (15) |
| No | 21 (64) | 0 | 0 | 21 (64) |
| Planned but no data | 1 (3) | 0 | 0 | 1 (3) |
| Unclear | 6 (18) | 0 | 0 | 6 (18) |
Numbers in parentheses are column percentages unless otherwise stated. Percentages may not add up to 100% because of rounding.
Numbers in parentheses are row percentages.
Reporting and presentation characteristics of the reviews
| Characteristic | Comparative reviews | Multiple test reviews | Total | |
|---|---|---|---|---|
| Statistical analyses to compare test accuracy | ||||
| Yes | No or unclear | |||
| Number of reviews | 53 (42) | 29 (23) | 45 (35) | 127 (100) |
| Reporting guideline used | 2 (4) | 5 (17) | 6 (13) | 13 (10) |
| Clear comparative objective stated | 45 (85) | 25 (86) | 0 | 70 (55) |
| Role of the tests | ||||
| Add-on | 6 (11) | 3 (10) | 2 (4) | 11 (9) |
| Replacement | 8 (15) | 6 (21) | 6 (13) | 20 (16) |
| Triage | 4 (8) | 1 (3) | 11 (24) | 16 (13) |
| Any two of the above | 4 (8) | 4 (14) | 2 (4) | 10 (8) |
| Unclear | 31 (58) | 15 (52) | 24 (53) | 70 (55) |
| Flow diagram presented | ||||
| Yes—included number of studies per test | 11 (21) | 6 (21) | 8 (18) | 25 (20) |
| Yes—excluded number of studies per test | 21 (40) | 12 (41) | 28 (62) | 61 (48) |
| No | 21 (40) | 11 (38) | 9 (20) | 41 (32) |
| Comparative studies identified | ||||
| Yes | 31 (58) | 9 (31) | 9 (20) | 49 (39) |
| No | 16 (30) | 7 (24) | 27 (60) | 50 (39) |
| No comparative studies in review | 6 (11) | 13 (45) | 9 (20) | 28 (22) |
| Study characteristics presented | 48 (91) | 26 (90) | 43 (96) | 117 (92) |
| Test comparison strategy | ||||
| Yes | 19 (36) | 2(7) | 1 (2) | 22 (17) |
| No | 32 (60) | 20 (69) | 44 (98) | 96 (76) |
| No—included only comparative studies | 2 (4) | 7 (24) | 0 | 9 (7) |
| Method used for test comparison | ||||
| Yes | 48 (91) | NA | NA | 48 (91) |
| Unclear | 5 (9) | NA | NA | 5 (9) |
| 2 × 2 data for each study | 30 (57) | 10 (34) | 14 (31) | 54 (43) |
| Individual study estimates of test accuracy | 46 (87) | 25 (86) | 36 (80) | 107 (84) |
| Forest plot(s) | 30 (57) | 19 (66) | 16 (36) | 65 (51) |
| SROC plot | ||||
| SROC plot comparing summary points or curves for 2 or more tests | 19 (36) | 7 (26) | 2 (4) | 28 (22) |
| Separate SROC plot per test | 17 (32) | 11 (38) | 19 (42) | 47 (37) |
| No SROC plot | 17 (32) | 11 (38) | 24 (53) | 52 (41) |
| Limitations of indirect comparison acknowledged | ||||
| Yes | 13 (25) | 3 (10) | 2 (4) | 18 (14) |
| No | 30 (57) | 15 (52) | 43 (96) | 88 (69) |
| No but only comparative studies included | 10 (19) | 11 (38) | 0 | 21 (17) |
Abbreviations: NA, not applicable; SROC, summary receiver operating characteristic.
Numbers in parentheses are column percentages unless otherwise stated. Percentages may not add up to 100% because of rounding.
Numbers in parentheses are row percentages.
These reviews included both comparative and noncomparative studies.
These methods either involve a comparative meta-analysis or follow-on from a meta-analysis of each test individually.
Fig. 2Reporting characteristics of 127 comparative and multiple test reviews. (A) Comparative reviews with statistical analyses performed to compare accuracy; (B) Comparative reviews without statistical analyses to compare accuracy; (C) Multiple test reviews. The colored cells in each row illustrate the reporting of the 10 items in each review. The box to the right of the figure gives the description of the reporting items. Reviews were ordered by year of publication and the number of missing items within each of the three review categories A to C. All multiple test reviews did not state a clear comparative objective (this was one of the four criteria used to classify the reviews as stated in section 2.1).
| Item | Description (PRISMA-DTA items) | Rationale and explanation |
|---|---|---|
| 1 | Role of tests in diagnostic pathway (3, D1) | Test evaluation requires a clear objective and definition of the intended use and role of a test within the context of a clinical pathway for a specific population with the target condition. The intended role of a test guides formulation of the review question and provides a framework for assessing test accuracy, including the choice of a comparator(s) and selection of studies. The role of a test is therefore important for understanding the context in which the tests will be used and the interpretation of the meta-analytic findings. The existing diagnostic pathway and the current or proposed role of the index test(s) in the pathway should be described. A new test may replace an existing one (replacement), be used before the existing test (triage) or after the existing test (add-on) [ |
| 2 | Test comparison strategy [ | Comparative studies are ideal but they are scarce [ |
| 3 | Meta-analytic methods (D2) | Hierarchical models which account for between-study correlation in sensitivity and specificity while also allowing for variability within and between studies are recommended for meta-analysis of test accuracy studies [ |
| 4 | Identification of included studies for each test [ | Review complexity increases with increasing number of tests, target conditions, uses and/or target populations within a single review. Therefore, distinguishing between the different groups of studies that contribute to different analyses in the review enhances clarity. The PRISMA flow diagram can be extended to show the number of included studies for each test or group of tests if inclusion is not limited to comparative studies. The detail shown—individual tests or groups of tests, settings and populations—will depend on the volume of information and the ability of the review team to neatly summarize the information. If such a comprehensive flow diagram is not feasible, the studies contributing to the assessment of each test can be clearly identified in the manuscript in some other way. The source of the evidence should be declared by stating types of included studies. Studies contributing direct evidence should also be clearly identified in the review. |
| 5 | Study characteristics [ | Relevant characteristics for each included study should be provided. This may be summarized in a table and should include elements of study design if eligibility was not restricted to specific design features. Heterogeneity is often observed in test accuracy reviews and differences between tests may be confounded by differences in study characteristics. Confounders can potentially be adjusted for in indirect test comparisons, though this is likely to be unachievable due to small number of studies and/or incomplete information on confounders. The effect of factors that may explain variation in test performance is typically assessed separately for each test. |
| 6 | Study estimates of test performance and graphical summaries e.g., forest plot and/or SROC plot [ | It is desirable to report 2 × 2 data (number of true positives, false positives, false negatives, and true negatives) and summary statistics of test performance from each included study. This may be done graphically (e.g., forest plots) or in tables. Such summaries of the data will inform the reader about the degree to which study-specific estimates deviate from the overall summaries, as well as the size and precision of each study. It is plausible that study results for one test may be more consistent or precise than those of another test in an indirect comparison. In addition to forest plots, reviews may include SROC plots such as those shown in |
| 7 | Limitations of the evidence from indirect comparisons [ | This is only applicable for reviews that include indirect comparisons. Be clear about the quality and strength of the evidence when interpreting the results, including limitations of including noncomparative studies in a test comparison. The results of indirect comparisons should be carefully interpreted taking into account the possibility that differences in test performance may be confounded by clinical and/or methodological factors. This is essential because it is seldom feasible to assess the effect of potential confounders on relative accuracy. |
Related to the PRISMA-DTA item(s) indicated in parentheses.