| Literature DB >> 32923510 |
Sean D McGarry1, John D Bukowy2, Kenneth A Iczkowski3, Allison K Lowman2, Michael Brehler2, Samuel Bobholz1, Andrew Nencka2, Alex Barrington2, Kenneth Jacobsohn4, Jackson Unteriner2, Petar Duvnjak2, Michael Griffin2, Mark Hohenwalter2, Tucker Keuter5, Wei Huang6, Tatjana Antic7, Gladell Paner7, Watchareepohn Palangmonthip3,8, Anjishnu Banerjee5, Peter S LaViolette2,9.
Abstract
Purpose: Our study predictively maps epithelium density in magnetic resonance imaging (MRI) space while varying the ground truth labels provided by five pathologists to quantify the downstream effects of interobserver variability. Approach: Clinical imaging and postsurgical tissue from 48 recruited prospective patients were used in our study. Tissue was sliced to match the MRI orientation and whole-mount slides were stained and digitized. Data from 28 patients ( n = 33 slides) were sent to five pathologists to be annotated. Slides from the remaining 20 patients ( n = 123 slides) were annotated by one of the five pathologists. Interpathologist variability was measured using Krippendorff's alpha. Pathologist-specific radiopathomic mapping models were trained using a partial least-squares regression using MRI values to predict epithelium density, a known marker for disease severity. An analysis of variance characterized intermodel means difference in epithelium density. A consensus model was created and evaluated using a receiver operator characteristic classifying high grade versus low grade and benign, and was statistically compared to apparent diffusion coefficient (ADC).Entities:
Keywords: machine learning; magnetic resonance imaging; prostate cancer; rad-path
Year: 2020 PMID: 32923510 PMCID: PMC7479263 DOI: 10.1117/1.JMI.7.5.054501
Source DB: PubMed Journal: J Med Imaging (Bellingham) ISSN: 2329-4302
Fig. 1Whole-mount H&E-stained prostate slide annotated by five pathologists. While all observers have noted lesions in the left and right peripheral zone, the lesion size and grade vary slightly between observers. The annotation style differs between observers; in particular, how fine-grained the annotation is (observer 3 versus observer 4), and whether they have chosen to explicitly define atrophy. HGPIN, high-grade prostatic intraepithelial neoplasia; FG, fused gland; and CG, cribriform gland.
Fig. 2Experimental set up and outcomes of this study. In experiment 1, whole-mount annotations were compared between pathologists pair-wise using Krippendorff’s alpha. Experiment 2 trains pathologist-specific RPM models and compares the mean epithelium density values on a held-out test set. Experiment 3 combines the pathologist-specific models into a consensus model and compares the area under the ROC curve to ADC classifying high-grade tumors versus low-grade and benign regions.
Krippendorff’s alpha, each value is calculated considering all observers within one observer’s ROI.
| Observer | Krippendorff’s alpha (95% CI) |
|---|---|
| Observer 1 | 0.46 (0.30 to 0.61) |
| Observer 2 | 0.31 (0.18 to 0.42) |
| Observer 3 | 0.69 (0.55 to 0.80) |
| Observer 4 | 0.49 (0.35 to 0.60) |
| Observer 5 | 0.57 (0.41 to 0.70) |
Fig. 3ANOVA analysis comparing the mean predicted epithelium values for each model. Pair-wise comparisons were performed using Tukey’s honest significant difference.
Fig. 4(a) Deep annotation showing a single observer’s annotation of atrophy, low-grade, and high-grade prostate cancer overlaid on the T2. (b) Voxel-wise predictions of epithelium density in MRI space in three true positive cases (top) and one true negative case (bottom). Susceptibility to image noise and signal intensity varies across observers. The consensus model is generated by averaging the maps from the five observers.
Fig. 5The performance of the five RPM models, consensus model, and ADC were evaluated using an empirical ROC curve. The classification task was identifying high-grade tumors from low-grade and benign regions. RPM models consistently outperform ADC alone and the consensus model matches the AUC of the top pathologist. The consensus model statistically outperforms ADC.
Area under the ROC curve and bootstrapped confidence intervals. Results calculated on the SA dataset.
| Condition | AUC (95% CI) |
|---|---|
| Observer 1 | 0.77 (0.66 to 0.86) |
| Observer 2 | 0.79 (0.67 to 0.88) |
| Observer 3 | 0.79 (0.67 to 0.90) |
| Observer 4 | 0.77 (0.65 to 0.88) |
| Observer 5 | 0.80 (0.65 to 0.88) |
| Consensus | 0.80 (0.66 to 0.90) |
| ADC | 0.71 (0.57 to 0.83) |