| Literature DB >> 31275427 |
Abstract
Diagnostic tests are approaches used in clinical practice to identify with high accuracy the disease of a particular patient and thus to provide early and proper treatment. Reporting high-quality results of diagnostic tests, for both basic and advanced methods, is solely the responsibility of the authors. Despite the existence of recommendation and standards regarding the content or format of statistical aspects, the quality of what and how the statistic is reported when a diagnostic test is assessed varied from excellent to very poor. This article briefly reviews the steps in the evaluation of a diagnostic test from the anatomy, to the role in clinical practice, and to the statistical methods used to show their performances. The statistical approaches are linked with the phase, clinical question, and objective and are accompanied by examples. More details are provided for phase I and II studies while the statistical treatment of phase III and IV is just briefly presented. Several free online resources useful in the calculation of some statistics are also given.Entities:
Year: 2019 PMID: 31275427 PMCID: PMC6558629 DOI: 10.1155/2019/1891569
Source DB: PubMed Journal: Comput Math Methods Med ISSN: 1748-670X Impact factor: 2.238
Anatomy on phases of a diagnostic test.
| Phase | What? | Design |
|---|---|---|
| I | Determination of normal ranges (pharmacokinetics, pharmacodynamics, and safe doses) | Observational studies on healthy subjects |
|
| ||
| II | Evaluation of diagnosis accuracy | Case-control studies on healthy subjects and subjects with the known (by a |
|
| ||
| III | Evaluation of clinical consequences (benefic and harmful effects) of introducing a diagnostic test | Randomized control trials, randomization determine whether a subject receive or not the diagnosis test |
|
| ||
| IV | Determination of the long-term consequences of introducing a new diagnostic test into clinical practice | Cohort studies of consecutive participants to evaluate if the diagnostic accuracy of a test in practice corresponds to predictions from systematic reviews of phase III trials |
Adapted from [7].
Anatomy of the role of a diagnostic test.
| Role | What? | Example (ref.) |
|---|---|---|
| Confirmation/exclusion | Confirm (rule-in) or exclude (rule-out) the disease | Brain natriuretic peptide: diagnostic for left ventricular dysfunction [ |
| Triage | An initial test that could be rapidly applied and have a small number of false-positive results | Renal Doppler resistive index: hemorrhagic shock in polytrauma patients [ |
| Monitoring | A repeated test that allows assessing the efficacy of an intervention | Glycohemoglobin (A1c Hb): overall glycemic control of patients with diabetes [ |
| Prognosis | Assessment of an outcome or the disease progression | PET/CT scan in the identification of distant metastasis in cervical and endometrial cancer [ |
| Screening | Presence of the disease in apparently asymptomatic persons | Cytology test: screening of cervical uterine cancer [ |
Diagnosis test result: type of data.
| Data | Example (ref.) |
|---|---|
| Qualitative dichotomial | Positive/negative or abnormal/normal |
|
| |
| Qualitative ordinal | (i) Prostate bed after radiation therapy: definitely normal/probably normal/uncertain/probably abnormal/definitely abnormal [ |
|
| |
| Qualitative nominal | (i) Apolipoprotein E gene (ApoE) genotypes: E2/E2, E2/E3, E2/E4, E3/E3, E3/E4, and E4/E4 [ |
|
| |
| Quantitative discrete | (i) Number of bacteria in urine or other fluids [ |
|
| |
| Quantitative continuous | (i) Biomarkers: chitotriosidase [ |
Statistical methods in the assessment of the utility of a diagnostic test.
| Phase | Clinical question | Objective(s) | Statistics for results | Example (ref.) |
|---|---|---|---|---|
| I | Which are the normal ranges of values of a diagnostic test? | Determination of the range of values on healthy subjects | Centrality and dispersion (descriptive) metrics: | (i) Levels of hepcidin and prohepcidin in healthy subjects [ |
|
| ||||
| I | Is the test reproducible? | Variability: | (i) Agreement analysis: % (95% confidence interval) and agreement coefficients (dichotomial data: Cohen, ordinal data: weighted kappa, numerical: Lin's concordance correlation coefficient, and Bland and Altman diagram) | (i) Intra- and interobserver variability of uterine measurements [ |
|
| ||||
| II | Is the test accurate? Which are performances of the diagnostic test? | Determine the accuracy as compared to a gold standard test | (i) Metrics (dichotomial outcome): Se (sensitivity), Sp (specificity), PPV (predictive positive value), NPV (negative predictive value), and DOR (diagnostic odds ratio) | (i) Digital breast tomosynthesis for benign and malignant lesions in breasts [ |
|
| ||||
| III | Which are the costs, risk, and acceptability of a diagnostic test? | (i) Evaluation of beneficial and harmful effects | Retrospective or prospective studies: | (i) The computed tomography in children, the associated radiation exposure, and the risk of cancer [ |
|
| ||||
| IV | Which are the consequences of introducing a new diagnostic test into clinical practice? | (i) Does the test result affect the clinical decision? | (i) Studies of pre- and posttest clinical decision-making | (i) Does the interferon-gamma release assays (IGRAs) change the clinical management of patients with latent tuberculosis infection (LTBI)? [ |
Online resources for confidence intervals calculation: coefficient of variation.
| What? | URL (accessed on August 26, 2018) |
|---|---|
| Two-sided confidence interval (CI) for s CVa |
|
| One-sided CIa |
|
| Two-sided CI for s CVb |
|
| Ratio of two CVsa |
|
aNormal distribution and blognormal distribution.
Intra- and interclass correlation coefficients and concordance correlation coefficient: an empirical assessment of the strength of agreement.
| Agreement | Continuous measurement | Ultrasound fetal measurements | Semiautomated measurements |
|---|---|---|---|
| Very good |
|
|
|
| Good | 0.95 < | 0.99 < | 0.80 < |
| Moderate | 0.90 < | 0.98 < | 0.65 |
| Poor | 0.70 < | 0.95 < |
|
| Very poor |
|
|
Source [141, 142].
2 × 2 contingency generic table.
| Diagnostic test result | Disease present | Disease absent | Total |
|---|---|---|---|
| Positive | TP (true positive) | FP (false positive) | TP + FP |
| Negative | FN (false negative) | TN (true negative) | FN + TN |
| Total | TP + FN | FP + TN |
|
Total on the rows represents the number of subjects with positive and respectively negative test results; total on the columns represents the number of subjects with (disease present) and respectively without (disease absent) the disease of interest; and the classification as test positive/test negative is done using the cutoff value for ordinal and continuous data.
Standard statistic indicators used to evaluate diagnostic accuracy.
| Statistic (Abb) | Formula | Remarks |
|---|---|---|
| Sensitivity (Se) | TP/(TP + FN) | (i) The highest the Se, the smallest the number of false negative results |
|
| ||
| Specificity (Sp) | TN/(TN + FP) | (i) The highest the Se, the smallest the number of false-positive results |
|
| ||
| Accuracy index (AI) | (TP + TN)/(TP + FP + FN + TN) | (i) Give information regarding the cases with the right diagnosis |
|
| ||
| Youden's index ( | Se + Sp − 1 | (i) Sums the cases wrongly classified by the diagnostic test |
|
| ||
| Positive predictive value (PPV) | TP/(TP + FP) | (i) Answer the question “what is the chance that a person with a positive test truly has the disease?” |
|
| ||
| Negative predictive value (NPV) | TN/(TN + FN) | (i) Answer the question “what is the chance that a person with a negative test truly not to have the disease?” |
|
| ||
| Positive likelihood ratio (PLR/LR+) | Se/(1 − Sp) | (i) Indicates how much the odds of the disease increase when a test is positive (indicator to rule-in) |
|
| ||
| Negative likelihood ratio (NLR/LR−) | (1 − Se)/Sp | (i) Indicates how much the odds of the disease decrease when a test is negative (indicator to rule-out) |
| Diagnostic odds ratio (DOR) | (TP/FN)/(FP/TN) | (i) High DOR indicates a better diagnostic test performance (ranges from 0 to infinite). A value of 1 indicates a test not able to discriminate between those with and those without the disease |
|
| ||
| Posttest odds (PTO) | Pretest odds (prevalence/(1 − prevalence)) × LR | (i) Gives the odds that the patient has to the target disorder after the test is carried out |
All indicators excepting J are reported with associated 95% confidence intervals; ROC = receiver-operating characteristic; patient-centered indicator; TP = true positive; FP = false positive; FN = false negative; TN = true negative; and PPV and NPV depend on the prevalence (to be used only if (no. of subjects with disease)/(no. of patients without disease) is equivalent with the prevalence of the disease in the studied population).
Other metrics used to evaluate diagnosis accuracy.
| Statistic (Abb) | Formula | Remarks |
|---|---|---|
| Number needed to diagnose (NND) [ | 1/[Se − (1 − Sp)]1/ | (i) The number of patients that need to be tested to give one correct positive test result |
|
| ||
| Number needed to misdiagnose (NNM) [ | 1/[1 − (TP + TN)/ | (i) The highest the NNM, the better the diagnostic test |
|
| ||
| Clinical utility index (CUI) [ | CUI+ = Se × PPV | (i) Gives the degree to which a diagnostic test is useful in clinical practice |
Abb = abbreviation; all indicators excepting J are reported with associated 95% confidence intervals; TP = true positive; FP = false positive; FN = false negative; and TN = true negative.
Metrics for global test accuracy evaluation or comparisons of performances of two tests.
| Statistic (Abb) | Method | Remarks |
|---|---|---|
| Area under the ROC curve (AUC) | (i) Nonparametric (no assumptions): empirical method (estimated AUC is biased if only a few points are in the curve) and smoothed-curve methods such as kernel density method (not reliable near the extremes of the ROC curve) | (i) AUC = 1 ⟶ perfect diagnostic test (perfect accuracy) |
|
| ||
| Partial area under the curve (pAUC) | (i) Nonparametric (no assumptions) | (i) Looks to a portion AUC for a predefined range of interest |
|
| ||
| Diagnostic odds ratio (DOR) | (i) Must use the same fixed cutoff | (i) DOR = 1 ⟶ test |
|
| ||
| TP fraction for a given FP fraction (TPFFPF) | (i) Need the same false-positive fraction | (i) Useful to compare two different tests at a specific FPF (decided based on clinical reasoning), especially when the ROC curves cross |
|
| ||
| Comparison of two tests | (i) Comparison of AUC of two different tests | (i) Apply the proper statistical test; each AUC must be done relative to the “gold-standard” test |
Abb = abbreviation; all indicators are reported with associated 95% confidence intervals; patient-centered indicator; TP = true positive; FP = false positive; FN = false negative; and TN = true negative.
Figure 1Summary receiver operating characteristic (ROC) curve for BMI as an anthropometric marker to distinguish benign from malign breast tumors. The red line shows an equal proportion of correctly classified breast cancer sample and incorrectly classifies samples without breast cancer (random classification). The J max (max (Se + Sp − 1)) corresponds to a Se = 0.67 and a Sp = 0.88 for a cutoff > 29.5 kg/m2 (BMI) for the breast cancer sample.
Performances metrics for body mass index (BMI) as an anthropometric marker for breast cancer.
| Indicator | Cutoff–BMI (kg/m2) | ||||||
|---|---|---|---|---|---|---|---|
| 19.5 | 22.5 | 25.5 | 29.5 | 32.5 | 35.5 | 38.5 | |
| TP (true positives) | 100 | 96 | 87 | 67 | 43 | 25 | 13 |
| FP (false positives) | 200 | 176 | 117 | 24 | 3 | 1 | 0 |
| TN (true negatives) off | 0 | 24 | 83 | 176 | 197 | 199 | 200 |
| FN (false negatives) | 0 | 4 | 13 | 33 | 57 | 75 | 87 |
| Se (sensitivity) | 1 | 1 | 0.87 | 0.67 | 0.43 | 0.25 | 0.13 |
| Sp (specificity) | 0 | 0.10 | 0.42 | 0.88 | 0.99 | 0.99 | 1 |
| PPV (positive predictive value) | 0.33 | 0.40 | 0.43 | 0.74 | 0.94 | 0.96 | 1 |
| NPV (negative predictive value) | n.a. | 0.90 | 0.87 | 0.84 | 0.78 | 0.73 | 0.70 |
| PLR (positive likelihood ratio) | 1.00 | 1.10 | 1.49 | 5.58 | 28.7 | 50.0 | n.a. |
| NLR (negative likelihood ratio) | n.a. | 0.30 | 0.31 | 0.38 | 0.58 | 0.75 | 0.84 |
| AI (accuracy index) | 0.33 | 0.40 | 0.57 | 0.81 | 0.80 | 0.75 | 0.71 |
| CUI+ (clinical utility index positive) | 0.33 | 0.30 | 0.37 | 0.47 | 0.40 | 0.24 | 0.13 |
| CUI− (clinical utility index negative) | n.a. | 10 | 0.36 | 0.74 | 0.76 | 0.72 | 0.70 |
Online applications for diagnostic tests: characteristics.
| Name | Input | Output |
|---|---|---|
| Diagnostic test calculatora | TP, FP, TN, FN | Prevalence AND Se AND Sp AND PLR AND NLR |
|
| ||
| Diagnostic test calculator evidence-based medicine toolkitb | TP, FP, TN, FN | Se, Sp, PPV, NPV, PLR, NLR with associated 95% confidence intervals |
|
| ||
| MedCalc: Bayesian analysis modelc | Prevalence AND Se AND Sp | PPV, NPV, LPR, NLR, posttest probability |
|
| ||
| MedCalcd | TP, FP, TN, FN | Se, Sp, PPV, NPV, PLR, NLR, prevalence, AI with associated 95% confidence intervals |
|
| ||
| Clinical calculator 1e | TP, FP, TN, FN | Se, Sp, PPV, NPV, PLR, NLR, prevalence, AI with associated 95% confidence intervals |
|
| ||
| Clinical utility index calculatorf | TP, TN, total number of cases, the total number of noncases | Se, Sp, PPV, NPV, PLR, NLR, prevalence, AI with associated 95% confidence intervals |
|
| ||
| DiagnosticTestg | Number of positive and negative gold standard results for each level of the new diagnostic test | Se, Sp, PPV, NPV, PLR, NLR, AI, DOR, Cohen's kappa, entropy reduction, and a bias Index ROC curve if > 2 levels for all possible cutoff |
|
| ||
| Simple ROC curve analysish | Absolute frequencies for false positive and the true positive for up to ten diagnostic levels | Cumulative rates (false positive and true positive) and ROC curve (equation, |
|
| ||
| ROC analysisi | Five different type of input data: an example for each type is provided | Se, Sp, AI, positive cases missed, negative cases missed, AUC, ROC curve |
|
| ||
| AUSVET: EpiToolsj | TP, FP, TN, FN | Different tools from basic accuracy to comparison of two diagnostic tests to ROC analysis |
All URLs were retrieved on April 20, 2019. TP = true positive; FP = false positive; FN = false negative; TN = true negative; Se = sensitivity; Sp = specificity; AI = accuracy index; PPV = positive predictive value; NPV = negative predictive value; PLR = positive likelihood ratio; NLR = negative likelihood ratio; DOR = diagnostic odds ratio; ROC = receiver operating characteristic; AUC = area under the ROC curve; ahttp://araw.mede.uic.edu/cgi-bin/testcalc.pl; bhttps://ebm-tools.knowledgetranslation.net/calculator/diagnostic/; chttp://www.medcalc.com/bayes.html; dhttps://www.medcalc.org/calc/diagnostic_test.php; ehttp://vassarstats.net/clin1.html; fhttp://www.psycho-oncology.info/cui.html; ghttp://www.openepi.com/DiagnosticTest/DiagnosticTest.htm; hhttp://vassarstats.net/roc1.html; ihttp://www.rad.jhmi.edu/jeng/javarad/roc/JROCFITi.html; and jhttp://epitools.ausvet.com.au/content.php?page=TestsHome.