| Literature DB >> 36131330 |
Sue Mallett1, Jacqueline Dinnes2,3, Yemisi Takwoingi2,3, Lavinia Ferrante de Ruffano2,4.
Abstract
The Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy (DTA) provides guidance on important aspects of conducting a test accuracy systematic review. In this paper we present TOMAS-R (Template of Multiplicity and Analysis in Systematic Reviews), a structured template to use in conjunction with current Cochrane DTA guidance, to help identify complexities in the review question and to assist planning of data extraction and analysis when clinically important variation and multiplicity is present. Examples of clinically important variation and multiplicity could include differences in participants, index tests and test methods, target conditions and reference standards used to define them, study design and methodological quality. Our TOMAS-R template goes beyond the broad topic headings in current guidance that are sources of potential variation and multiplicity, by providing prompts for common sources of heterogeneity encountered from our experience of authoring over 100 reviews. We provide examples from two reviews to assist users. The TOMAS-R template adds value by supplementing available guidance for DTA reviews by providing a tool to facilitate discussions between methodologists, clinicians, statisticians and patient/public team members to identify the full breadth of review question complexities early in the process. The use of a structured set of prompting questions at the important stage of writing the protocol ensures clinical relevance as a main focus of the review, while allowing identification of key clinical components for data extraction and later analysis thereby facilitating a more efficient review process.Entities:
Keywords: Diagnostic test accuracy; Heterogeneity; Meta-analysis; Methodology; Multiplicity; SROC; Systematic review; Template; forest
Year: 2022 PMID: 36131330 PMCID: PMC9494799 DOI: 10.1186/s41512-022-00131-z
Source DB: PubMed Journal: Diagn Progn Res ISSN: 2397-7523
Summary of TOMAS-R steps
| Step 1 | Summary of review objectives and proposed eligibility criteria Set out key review objectives with broad definition of study participants, target condition and reference standard, index test(s) and study design |
| Step 2 | Scoping potential complexities resulting from clinically important variation and multiplicity. Identify and record potential complexities that could ultimately affect how data are extracted, presented and combined. Examples of variation could include differences in participants, index tests and their methods, target conditions and reference standards used to define them, study design and methodological quality. |
| Step 3 | Simplify the review whilst maintaining clinical relevance For each potential source of variation (complexity) consider whether differences in test accuracy might be observed. Consider whether separate analysis or heterogeneity investigation is appropriate |
| Step 4 | Planning data extraction Develop and pilot a standardised data extraction sheet. Define any data or categories of data to be preferentially extracted, e.g. by participant group, by definition of target condition, by index test method or threshold |
| Step 5 | Planning presentation and analysis of data Record plan for meta-analysis. Record how data complexity will be presented using graphs, tables, and additional analyses such as investigation of heterogeneity or sensitivity analysis where appropriate and feasible with available data. Recommended graphical presentation includes summary ROC (SROC) plots with individual study data and summary estimates of sensitivity and specificity (summary point) with 95% confidence and prediction regions. |
Example TOMAS-R template to identify clinically important variation and multiplicity for typhoid review
To identify which types and brands of commercial test best detect enteric fever To investigate the sources of heterogeneity between study results including: | ||||
| ||||
|
| ||||
|
|
|
|
|
|
Are there important differences between participants that could affect test accuracy? Examples • Different clinical pathways or healthcare settings (primary care, secondary, tertiary care) • Different prior tests (referral based on different prior tests) • Differences in other conditions likely to be present at same time • Different geographical settings |
Two groupings: • clinically-suspected enteric fever • unselected febrile patients • some studies may include a mixture of patients | Keep as separate groups if possible. Retain studies with mixed or unclear populations. Report grouping based on study inclusion criteria in TOC. Reason: studies could include populations with varying pre-test probabilities of disease, or other concomitantly circulating infectious diseases | Preferential data extraction in separate groupsb. Otherwise extract as a mixed population group. | Planned SROC or forest plots with groups indicated for each study. Planned heterogeneity analysis. |
Use two groups for level of disease endemicity to take account of pre-test probability of disease (e.g. medium versus high using classification of Crump 2004). | Keep as separate groups if possible. Report grouping based on study inclusion criteria in table of study characteristics. Use prevalence in study as measure of endemicity if not otherwise reported. Reason: tests have potential for varying performance in endemic and non-endemic regions | If a study includes data from two different endemicity disease levels separately (e.g. different centres or different seasons) preferentially extract data in separate groups. Where not reported, or populations are mixed, use study prevalence as proxy for endemicity. | Planned SROC or forest plots with groups indicated for each study or ordered by individual study prevalence. Planned heterogeneity analysis. | |
Use two groups (sub-Saharan Africa versus the rest of the world). | If sufficient studies then keep separate, otherwise combine. Report country in TOC Reason: in sub-Saharan Africa non-typhoidal Salmonellae are an important cause of bacteraemia; may affect the performance of enteric fever RDTs in this region. | Preferential data extraction in separate groups. Otherwise extract as a mixed population group. | Planned SROC or forest plots with groups indicated for each study. Planned heterogeneity analysis. | |
Are there groupings within participants by disease type or severity that could affect test accuracy? Examples • Different severity of disease: patients with mild disease vs with severe disease • Different disease state, e.g. active vs past disease (inactive) • Different types of diseased, e.g. pigmented vs non-pigmented lesions in skin cancer | Not considered in this review | Not applicable in this review Reason: diagnosis is presence of typhoid disease. Severity of disease is measured by patient symptoms and signs. | Not applicable in this review | Not applicable in this review |
Are there any important groupings by participant age, gender, ethnicity? Example separate groups by • Different ages such as children and adults • Different demographics such as gender, ethnicity, genetic groups | Two groups by age • Adult (over 16 years) • children (16 years or younger) | If sufficient studies then keep separate, otherwise combine. Reason: test might perform differently in children and adults, in part due to different prevalence of other infectious diseases. | Preferential data extraction in separate groups. Otherwise extract as a mixed population group. | Planned SROC or forest plots with groups indicated for each study. |
| ||||
|
|
|
|
|
|
Is more than one underlying type of index test included that could affect test accuracy? Examples • Different indications of disease presence e.g. DNA of infectious agent, antibodies against infectious agent. • Different formats of test, e.g. ELISA, PCR, dipstick • Different equipment needed that affect test e.g. laboratory test using specialist equipment, point of care test. | 3 main commercial tests 1. Typhidot 2. TUBEX 3. KIT Other tests include PanBio Multi-test Dip-S-Tick, Mega Salmonella and SD Bioline tests | Separate groups for main tests. Variations within a test are grouped together 1. Typhidot 2. TUBEX 3. KIT All other tests considered separately include PanBio Multi-test Dip-S-Tick, Mega Salmonella and SD Bioline tests Reason: Identified 3 tests where sufficient studies to consider meta-analysis. Meta-analysis across other tests using different tests and test approaches not useful for review | Extract all test data from each study. | Separate meta-analysis for each commercial testa,c. Where insufficient number of studies for meta-analysis, then graphical data presentation with descriptive analysis. Where sufficient number of studies available, make comparison between tests. |
Is there more than one method or manufacturer for a test that could affect test accuracy? Also consider if the test might be done by people with different level of experience or using different approaches to interpretation. Examples • Different test versions of tests • Different participant samples used to detect disease, e.g. blood sample, urine sample • Differences in staff, e.g. trained laboratory staff vs nurse point of care test • Different treatment of inconclusive test results • Different approach to assist test interpretation, e.g. algorithms or checklists |
1. Typhidot; Typhidot-M; TyphiRapid Tr-02 - grouping within Typhidot of IgM or IgG Ab detection 2. KIT: dipstick assay; latex agglutination assay; lateral flow immuno-chromatographic test 3. TUBEX: 1 format | Different versions within a test will be combined Reason: main review question about accuracy of 3 main test types. | Record test version in TOC. | All test versions combined in meta-analysis as single groupa,b. |
|
| Separate by sample type Reason: sample type considered important for test results | Separate data extraction by sample type | Planned heterogeneity analysis if sufficient studies. In review, no heterogeneity analysis as all studies use blood samples | |
|
| Will combine Typhidot tests regardless of treatment of inconclusive results. Reason: Most studies report results for Typhidot such that IgM results can be extracted, so expect data extraction to standardise inconclusive results reporting. | For Typhidot, we extracted IgM in preference to IgG. Reason: IgM indicates recent infections whereas IgG can pick up previously resolved infections. Using Typhidot IgM allowed better comparison with TUBEX and KIT as these tests both detect IgM antibodies | Individual results will be presented in SROC plot labelled by method of treatment of inconclusive results. Main analysis across all Typhidota,c, but with sensitivity analysis limited to those reporting inconclusive results or where test format means there are no inconclusive results. Reason: different treatment of inconclusive results could influence results | |
Are different thresholds used to define a positive result that could affect test accuracy? Has a clinically relevant index test threshold been identified for this review? Examples • Different test thresholds used to define a positive test result for semi-quantitative or continuous test results | Some KIT tests provide semi-quantitative test results where different thresholds can be used to define positive test results. Other test formats provide qualitative test results without any thresholds. | Keep KIT thresholds separate. Main result for KIT based on threshold of > 1+ which was judged the most meaningful clinically. Reason: reporting results at clinically relevant test threshold(s) is most important result for clinical practice. Results combined across very different thresholds do not give a result that can be interpreted at any clinically relevant threshold, but correspond to an average result reflecting how often different thresholds are reported. | Results extracted separately for each KIT test threshold. KIT threshold of > 1+ was judged the most meaningful clinically | Meta-analysis undertaken for the threshold of > 1+ only, as this was judged the most meaningfula,c. Individual study results presented in SROC graphs |
|
| ||||
|
|
|
|
|
|
Are there different target conditions included that could affect test accuracy? Examples • Different causes of disease (e.g. different organisms causing typhoid infection, different causes of trauma injury) • Different types or severity of disease that are treated differently, e.g. malignant and borderline disease in ovarian cancer disease diagnosis, any melanoma or melanoma with high potential to progress to malignancy | Salmonella typhi or Paratyphi A | Keep as separate groups if possible. Retain studies with mixed or unclear populations. Reason: tests likely to perform differently for different bacteria and bacterial subtypes | Preferential data extraction in separate groups. Otherwise extract as a mixed population group. | Protocol planned heterogeneity analysis if sufficient number of studies.
|
Are different methods used to verify disease presence or absence that could affect test accuracy? Examples • Different methods to detect typhoid infection measured by detection of viral DNA or by bacterial culture | Four methods: bone marrow culture, blood culture, PCR peripheral blood, combinations of tests. For labelling of studies, two grouping of reference standards are defined. Grade 1 study was defined as one using both bone marrow culture and peripheral blood culture. A Grade 2 study was defined as using either peripheral blood culture only, or peripheral blood culture and peripheral blood PCR as the composite reference standard. | Keep reference standards separate. Categorise as grade 1 or grade 2 for TOC. For subsequent analysis it may be important to compare different reference standards within studies where possible. Reason: reference standards are likely to have differing ability to detect low levels of infection (blood vs bone marrow) and also to be able to differently detect live untreated (both culture and PCR), treated (unlikely with culture but should detect with PCR) and dead bacteria (not culture but should detect with DNA). | Extract all test data from each study, so index test data maybe extracted for two or more reference standards. | Meta-analysis using most commonly used reference standards as prioritya,c. Planned SROC or forest plot with groups indicated for each study. Planned heterogeneity analysis if sufficient number of studies.
|
Are different criteria or thresholds used to define presence of disease that could affect test accuracy? Examples • Different definitions of fasting blood glucose to define diabetes | Not applicable in this review | Not applicable in this review Reason: infection classified as present or not | Not applicable in this review | Not applicable in this review |
Are there differences in when reference standard is completed that that could affect test accuracy? Examples • Different time points reference standard assessed • Different maximum or minimum time intervals between reference standard and index test | Not applicable in this review | Not applicable in this review Reason: index test and reference standard are determined at a single time point. | Not applicable in this review | Not applicable in this review |
|
| ||||
|
|
|
|
|
|
Are there differences in whether test results are a single test or more than one test result per participant (unit of analysis)? Examples • Test results could refer to individual participants, lesions, organ, clinic visits or imaging scans | Per participant | Not applicable Reason: In this review all results were reported based on disease status of patients | Not applicable | Not applicable |
Based on risk of bias assessed using QUADAS-2/QUADAS-C, are there important differences between studies that could affect test accuracy? Examples • Single signalling question, e.g. specific design criteria (case control vs better design using cohort or nested case control) • Differences in QUADAS-2/QUADAS-C overall domain assessment of bias, e.g. participant domain | Study design: case control, prospective cohort, randomised controlled trial, paired comparative trial | Decision to retain all study designs in main analyses and to graphically present variation in study design. Reason: In this review, test type and test threshold prioritised as sources of variation, due to the limited number of studies. The bias due to case-control design will be presented graphically. Case-control studies were only included where controls consisted of patients with similar clinical presentation. Case-control studies with most extreme bias, due to use of healthy control patients, were excluded from the review. | Only one group of data at study level. | Planned SROC plot with groups indicated for each study. Protocol planned heterogeneity analysis if sufficient number of studies. In review, no heterogeneity analysis but descriptive comment. |
Based on applicability of study results assessed using QUADAS-2, are there important differences between studies that could affect test accuracy? Applicability of participants could depend on several factors and might be best summarised by analysis grouped by applicability of the participant recruitment assessed in QUADAS-2
• Participants • Index tests • Reference standard | Not considered | Not applicable Reason: main biases from individual QUADAS-2 domains are addressed in presentation and analyses above. Participant domain: case control vs cohort presentation Index test: threshold bias addressed. Interpretation of inconclusive results addressed. Reference standard domain: 3 grades of reference standard addressed. Flow and timing: Verification bias not applicable, time intervals not issue in review, inconclusive results included missing data issues | Not applicable | Not applicable |
The first section is where the summary of the review is reported (step 1 of TOMAS-R), including the review title, objectives and components of PICO adapted for a diagnostic accuracy review. Steps 2, 3, 4 and 5 are reported in separate columns of the TOMAS-R template for each domain (participants, index test, target condition and study design and quality. Within each domain the first table column describes and gives examples of potential sources of clinically important complexity commonly found in reviews. This allows a systematic discussion of potential sources and reporting of identified complexities alongside data extraction and analysis decisions and rationale used to handle these complexities. The template is filled in for the rapid typhoid review. A blank template is included in the supplementary appendix
Abbreviations: DNA deoxyribonucleic acid, ELISA enzyme-linked immunosorbent assay, PCR polymerase chain reaction, QUADAS-2 quality assessment of diagnostic accuracy studies, RDTs rapid diagnostic tests, ROC receiver operating characteristic, SROC summary receiver operating characteristic, TOC table of study characteristics
aMeta-analysis will only be done if (i) there are four or more studies where results are given in the same format (e.g. 2 × 2 table for diagnosis) (ii) study results are sufficiently homogeneous visualised in forest plots or ROC space for a meaningful representation by a single summary statistic.
bPriority order of data extraction means that not all data will be extracted from published articles.
cTo avoid over representing results from a study in meta-analysis results, we will include only one set of results per index test from each study
Fig. 1Flowchart of planned analyses: Typhoid review. Different components of variation leading to complexity in review of typhoid rapid tests. Coloured boxes indicate the TOMAS-R domain where complexity identified: pink boxes review topic; light blue boxes participant domain; dark blue boxes index test domain; light green boxes target condition domain; dark green boxes study design and quality domain. Dotted lines separate complexity and allow alignment to the diagnostic accuracy that the analysis would address. Each bullet point follows the question “What is the diagnostic accuracy...” so for example if all rapid test results are combined the first bullet point is used so the analysis will answer the question “What is the diagnostic accuracy averaged over all tests and test thresholds?” Yellow stars indicate key complexities identified as requiring separate analyses for the review to have clinical relevance
Fig. 2Flowchart of planned analyses: Ovarian cancer serum biomarker review. Different components of variation leading to complexity in review of serum biomarkers in ovarian cancer. Coloured boxes indicate the TOMAS-R domain where complexity identified: pink boxes review topic; light blue boxes participant domain; dark blue boxes index test domain; light green boxes target condition domain. Dotted lines separate complexity and allow alignment to the diagnostic accuracy that the analysis would address. Each bullet point follows the question “What is the diagnostic accuracy...” so for example if test results are separated by the menopausal status of the women, the second bullet point is used so the analysis will answer the question “What is the diagnostic accuracy averaged separately for each menopausal group?” Yellow stars indicate key complexities identified as requiring separate analyses for the review to have clinical relevance