| Literature DB >> 22648673 |
P Lina Santaguida1, Crystal M Riley, David B Matchar.
Abstract
Assessing methodological quality is a necessary activity for any systematic review, including those evaluating the evidence for studies of medical test performance. Judging the overall quality of an individual study involves examining the size of the study, the direction and degree of findings, the relevance of the study, and the risk of bias in the form of systematic error, internal validity, and other study limitations. In this chapter of the Methods Guide for Medical Test Reviews, we focus on the evaluation of risk of bias in the form of systematic error in an individual study as a distinctly important component of quality in studies of medical test performance, specifically in the context of estimating test performance (sensitivity and specificity). We make the following recommendations to systematic reviewers: 1) When assessing study limitations that are relevant to the test under evaluation, reviewers should select validated criteria that examine the risk of systematic error, 2) categorizing the risk of bias for individual studies as "low," "medium," or "high" is a useful way to proceed, and 3) methods for determining an overall categorization for the study limitations should be established a priori and documented clearly.Entities:
Mesh:
Year: 2012 PMID: 22648673 PMCID: PMC3364359 DOI: 10.1007/s11606-012-2030-8
Source DB: PubMed Journal: J Gen Intern Med ISSN: 0884-8734 Impact factor: 5.128
Commonly Reported Sources of Systematic Bias in Studies of Medical Test Performance
| Source of systematic bias | Description |
|---|---|
| Spectrum effect | Tests may perform differently in various samples. Therefore, demographic features or disease severity may lead to variations in estimates of test performance |
| Context bias | Prevalence of the target condition varies according to setting and may affect estimates of test performance. Interpreters may consider test results to be positive more frequently in settings with higher disease prevalence, which may also affect estimates of test performance |
| Selection bias | The selection process determines the composition of the study sample. If the selection process does not aim to include a patient spectrum similar to the population in which the test will be used, the results of the study may not accurately portray the results for the identified target population |
| Variation in test execution | A sufficient description of the execution of index and reference standards is important because variation in measures of diagnostic accuracy result from differences in test execution |
| Variation in test technology | When the characteristics of a medical test change over time as a result of technological improvement or the experience of the operator of the test, estimates of test performance may be affected |
| Treatment paradox | Occurs when treatment is started on the basis of the knowledge of the results of the index test, and the reference standard is applied after treatment has started |
| Disease progression bias | Occurs when the index test is performed an unusually long time before the reference standard, so the disease is at a more advanced stage when the reference standard is performed |
| Inappropriate reference standard | Errors of imperfect reference standard bias the measurement of diagnostic accuracy of the index test |
| Differential verification bias | Part of the index test results is verified by a different reference standard |
| Partial verification bias | Only a selected sample of patients who underwent the index test is verified by the reference standard |
| Review bias | Interpretation of the index test or reference standard is influenced by knowledge of the results of the other test. Diagnostic review bias occurs when the results of the index test are known when the reference standard is interpreted. Test review bias occurs when results of the reference standard are known while the index test is interpreted |
| Clinical review bias | Availability of clinical data such as age, sex, and symptoms, during interpretation of test results may affect estimates of test performance |
| Incorporation bias | The result of the index test is used to establish the final diagnosis |
| Observer variability | The reproducibility of test results is one determinant of the diagnostic accuracy of an index test. Because of variation in laboratory procedures or observers, a test may not consistently yield the same result when repeated. In two or more observations of the same diagnostic study, intraobserver variability occurs when the same person obtains different results, and interobserver variability occurs when two or more people disagree |
| Handling of indeterminate results | A medical test can produce an uninterpretable result with varying frequency depending on the test. These problems are often not reported in test efficacy studies; the uninterpretable results are simply removed from the analysis. This may lead to biased assessment of the test characteristics |
| Arbitrary choice of threshold value | The selection of the threshold value for the index test that maximizes the sensitivity and specificity of the test may lead to over-optimistic measures of test performance. The performance of this cutoff in an independent set of patients may not be the same as in the original study |
QUADAS-2 Questions for Assessing Risk of Bias in Diagnostic Accuracy Studies*
| Was a consecutive or random sample of patients enrolled? (Yes/No/Unclear) |
| Was a case-control design avoided? (Yes/No/Unclear) |
| Did the study avoid inappropriate exclusions? (Yes/No/Unclear) |
| Could the selection of patients have introduced bias? Risk: Low/High/Unclear |
| Were the index test results interpreted without knowledge of the reference standard? (Yes/No/Unclear) |
| If a threshold was used, was it pre-specified? (Yes/No/Unclear) |
| Could the conduct or interpretation of the index test have introduced bias? Risk: Low/High/Unclear |
| Is the reference standard likely to correctly classify the target condition? (Yes/No/Unclear) |
| Were the reference standard results interpreted without knowledge of the results of the index test? (Yes/No/Unclear) |
| Could the reference standard, its conduct, or its interpretation have introduced bias? Risk: Low/High/Unclear |
| Was there an appropriate interval between index test(s) and reference standard? (Yes/No/Unclear) |
| Did all patients receive a reference standard? (Yes/No/Unclear) |
| Did all patients receive the same reference standard? (Yes/No/Unclear) |
| Were all patients included in the analysis? (Yes/No/Unclear) |
| Could the patient flow have introduced bias? Risk: Low/High/Unclear |
*Questions related to assessing applicability were excluded here. See the original reference for the complete scale13
Categorizing Individual Studies into General Quality Classes*
| Category | Application to randomized controlled trials | Application to medical test performance studies |
|---|---|---|
| Low. No major features that risk biased results | The study avoids problems such as failure to apply true randomization, selection of a population unrepresentative of the target patients, low dropout rates, or analysis by intention-to-treat. Key study features are described clearly, including the population, setting, interventions, comparison groups, outcome measurements, and reasons for dropouts | RCTs are considered a high-quality study design, but studies that include consecutive patients representative of the intended sample for whom diagnostic uncertainty exists may also meet this standard. A “low risk” study avoids the multiple biases to which medical test studies are subject (e.g., use of an inadequate reference standard, verifcation bias), and key study features are clearly described, including the comparison groups, outcomes measurements, and characteristics of patients who failed to be have actual state (diagnosis or prognosis) verified |
| Medium. Susceptible to some bias, but flaws not sufficient to invalidate the results | The study does not meet all the criteria required for a rating of low risk, but no flaw is likely to cause major bias. The study may be missing information, making it difficult to assess limitations and potential problems | Application of this category to medical test performance studies is similar to application to RCTs |
| High. Significant flaws imply biases of various types that may invalidate the results | The study has large amounts of missing information, discrepancies in reporting, or serious errors in design, analysis, and/or reporting | The study has significant biases determined a priori to be major or “fatal” (i.e., likely to make the results either uninterpretable or invalid) |
*Adapted from AHRQ’s General Methods Guide1
Interpretation of Partial Verification Bias: the Example of Family History17, 18*
| Modified QUADAS item (Topic/Bias) | Interpretation |
|---|---|
| 5. Did the whole sample or a random selection of the sample receive verification using a reference standard of diagnosis? (Partial verification bias) | This item concerns partial verification bias, which is a form of selection bias that occurs when not all of the study participants receive the reference standard (in our context, confirmation of the TRUE disease status of the relative). Sometimes the reason only part of the sample receives the reference standard is that knowledge of the index test results influence the decision to perform the reference standard. Note that in the context of family history, the reference standard can only be applied to family members or relatives. The self report by the probands or informants is the “index test” |
| We consider the whole sample to be ALL relatives for which the proband or informant provided information (including “don’t know” status) | |
| YES: All relatives that the proband identifies/ reports upon represent the whole sample of relatives. As such, some form of verification is attempted for all identified relatives | |
| NO: Not all relatives receive verification via the reference standard. As such, we consider partial verification bias to be present in the following situations: | |
| 1) Knowledge of the index test will determine which relatives are reported to have the disease status. Often UNAFFECTED relatives do not have their disease status verified by any method (assume proband/informant report is the true disease status); in this case, the disease status is verified in the AFFECTED relatives only. In this situation, the outcomes of sensitivity and specificity cannot be computed | |
| 2) Relatives for which the proband/ informant indicates “don’t know status” are excluded and do not have their disease status verified (no reference standard testing) | |
| 3) Relatives who are DECEASED are excluded from having any verification undertaken (no reference standard testing) | |
| 4) Relatives who are UNABLE TO PARTICIPATE in interviews or further clinical testing are excluded from having any verification method (no reference standard testing) | |
| UNCLEAR: Insufficient information to determine whether partial verification was present |
* See text
Abbreviation: QUADAS = Quality Assessment of Diagnostic Accuracy Studies