| Literature DB >> 34433345 |
Christof A Bertram1,2, Nikolas Stathonikos3, Taryn A Donovan4, Alexander Bartel2, Andrea Fuchs-Baumgartinger1, Karoline Lipnik1, Paul J van Diest3, Federico Bonsembiante5, Robert Klopfleisch2.
Abstract
Digital microscopy (DM) is increasingly replacing traditional light microscopy (LM) for performing routine diagnostic and research work in human and veterinary pathology. The DM workflow encompasses specimen preparation, whole-slide image acquisition, slide retrieval, and the workstation, each of which has the potential (depending on the technical parameters) to introduce limitations and artifacts into microscopic examination by pathologists. Performing validation studies according to guidelines established in human pathology ensures that the best-practice approaches for patient care are not deteriorated by implementing DM. Whereas current publications on validation studies suggest an overall high reliability of DM, each laboratory is encouraged to perform an individual validation study to ensure that the DM workflow performs as expected in the respective clinical or research environment. With the exception of validation guidelines developed by the College of American Pathologists in 2013 and its update in 2021, there is no current review of the application of methods fundamental to validation. We highlight that there is high methodological variation between published validation studies, each having advantages and limitations. The diagnostic concordance rate between DM and LM is the most relevant outcome measure, which is influenced (regardless of the viewing modality used) by different sources of bias including complexity of the cases examined, diagnostic experience of the study pathologists, and case recall. Here, we review 3 general study designs used for previous publications on DM validation as well as different approaches for avoiding bias.Entities:
Keywords: accuracy; concordance rate; digital microscopy; digital pathology; noninferiority; review; study design; validation; virtual microscopy; whole-slide images
Mesh:
Year: 2021 PMID: 34433345 PMCID: PMC8761960 DOI: 10.1177/03009858211040476
Source DB: PubMed Journal: Vet Pathol ISSN: 0300-9858 Impact factor: 2.221
Figure 1.Light microscopy and digital microscopy workflow for a validation study, including associated “parameters” (technical aspects of the digital microscopy workflow that can be optimized if needed) and source of bias (factors that may influence light and digital microscopy). This scheme is modified from Bertram et al.
Definitions of terms used in validation studies.
| Term | Definition |
|---|---|
| Accuracy | Agreement between a study diagnosis and the ground truth diagnosis. |
| Concordance | Agreement between 2 study diagnoses, typically comparing LM versus DM diagnoses of the same case read by the same pathologist (intraobserver concordance). |
| Concordance rate |
|
| Consensus diagnosis | Agreement between multiple pathologists on a specific diagnosis for a study case; used as a ground truth diagnosis. |
| Equivalency | Tested by comparison of DM and LM separately to a gold standard (GS). DM versus GS is equivalent or superior to LM versus GS if the diagnostic performance is not significantly lower. |
| Diagnosis pair | Two diagnoses for the same case rendered at 2 different examination time points. Typically, LM versus DM diagnosis using the same pathologist. |
| Discordance | Disagreement between 2 study diagnoses. A validation study should define the type of discrepancy between 2 diagnoses (process, type, grade, secondary diagnosis, severity, terminology, etc) that comprises a discordant diagnosis. May be categorized as minor (eg, no clinical relevance) or major (eg, clinically relevant) discordance. |
| Gold standard (GS) | The best available method for rendering the “correct,” that is, ground truth, diagnosis. A true GS may not be available for histologic specimens. |
| Ground truth diagnosis | Best available estimation of the correct diagnosis using the GS method. |
| Kappa agreement | Level of reliability that is corrected for chance. The coefficient ranges between 0 and 1 (1 is the highest degree of reliability). |
| Noninferiority | The difference in the concordance rate between test modalities (DM vs LM) is not significantly more than is acceptable (defined by the noninferiority margin) as compared with the reference modality (LM vs LM). |
| Overall concordance rate (OCR) |
|
| Referee pathologist | A pathologist that decides whether the diagnosis pairs from the study pathologist(s) are concordant or discordant. |
| Repeatability | Concordance rate for diagnosis pairs using the same viewing modality (LM vs LM or DM vs DM) by the same pathologist under the same conditions. Repeatability of LM is a suitable benchmark for a validation study. |
| Reproducibility | Concordance rate for diagnosis pairs using the same viewing modality (LM vs LM or DM vs DM) by different pathologists or under different conditions. This value may be used as an estimation for an acceptable diagnostic performance of LM versus DM. |
| Study pathologist | A pathologist that makes diagnoses from the study cases using LM and DM. They are blinded to the previously reported diagnoses and other study pathologists. |
| Validation of DM | A study with the goal of demonstrating and documenting acceptable performance (concordance rate, noninferiority, or at least equivalency) of the DM workflow for the intended application. |
| Washout period | Time gap between 2 examination time points (typically one with LM and one with DM) of the same case/slide read by the same pathologist in order to reduce recall of the previously rendered diagnosis. |
Abbreviations: DM, digital microscopy; LM, light microscopy; #, number of.
Figures 2–4.Comparison of different study designs. The diagrams depict the study course, data analysis and interpretation of the concordance rate of a simple modality comparison study (Fig. 2), ground truth study (Fig. 3), and benchmark study (Fig. 4). Raw data of the graphs are taken from previous studies. Δ, difference in the concordance rate.
Comparison of the 3 major validation study designs to compare digital microscopy (DM) and light microscopy (LM).
| Study design | |||
|---|---|---|---|
| Simple modality comparison | Ground truth | Benchmark | |
| Attributes | |||
| Study objective | Prove high concordance between DM and LM | Prove equivalency or superiority of DM compared to LM | Prove non-inferiority of DM vs LM |
| Examination time points | 2 | 2 (+ ground truth) | 3 or ideally 4 |
| Time investment | + | ++ | +++ |
| Case recall bias | Lower | Lower | Higher |
| Test modality | DM | DM and LM | DM vs LM |
| Reference modality | LM | Independent gold standard | LM vs LM (repeatability) |
| Gold standard dilemma | No/yes | Yes | No |
| Validity of results | +/++ | ++/+++ | +++ |
| Performance measurements and statistical tests | |||
| Concordance rate | Yes | Yes | Yes |
| Kappa agreement | Yes | Yes | Yes |
| Accuracy | No | Yes | No |
| Fisher’s exact test | No | Yes | (Yes) |
| Noninferiority test | (Yes) | (Yes) | Yes |
Figure 5.Proposed quality of different gold standard methods reported in validation studies for defining ground truth diagnoses.