| Literature DB >> 34965805 |
Christof A Bertram1,2, Marc Aubreville3, Taryn A Donovan4, Alexander Bartel2, Frauke Wilm5, Christian Marzahl5, Charles-Antoine Assenmacher6, Kathrin Becker7, Mark Bennett8, Sarah Corner9, Brieuc Cossic10, Daniela Denk11, Martina Dettwiler12, Beatriz Garcia Gonzalez8, Corinne Gurtner12, Ann-Kathrin Haverkamp7, Annabelle Heier13, Annika Lehmbecker13, Sophie Merz13, Erica L Noland9, Stephanie Plog8, Anja Schmidt13, Franziska Sebastian13, Dodd G Sledge9, Rebecca C Smedley9, Marco Tecilla14, Tuddow Thaiwong9, Andrea Fuchs-Baumgartinger1, Donald J Meuten15, Katharina Breininger5, Matti Kiupel9, Andreas Maier5, Robert Klopfleisch2.
Abstract
The mitotic count (MC) is an important histological parameter for prognostication of malignant neoplasms. However, it has inter- and intraobserver discrepancies due to difficulties in selecting the region of interest (MC-ROI) and in identifying or classifying mitotic figures (MFs). Recent progress in the field of artificial intelligence has allowed the development of high-performance algorithms that may improve standardization of the MC. As algorithmic predictions are not flawless, computer-assisted review by pathologists may ensure reliability. In the present study, we compared partial (MC-ROI preselection) and full (additional visualization of MF candidates and display of algorithmic confidence values) computer-assisted MC analysis to the routine (unaided) MC analysis by 23 pathologists for whole-slide images of 50 canine cutaneous mast cell tumors (ccMCTs). Algorithmic predictions aimed to assist pathologists in detecting mitotic hotspot locations, reducing omission of MFs, and improving classification against imposters. The interobserver consistency for the MC significantly increased with computer assistance (interobserver correlation coefficient, ICC = 0.92) compared to the unaided approach (ICC = 0.70). Classification into prognostic stratifications had a higher accuracy with computer assistance. The algorithmically preselected hotspot MC-ROIs had a consistently higher MCs than the manually selected MC-ROIs. Compared to a ground truth (developed with immunohistochemistry for phosphohistone H3), pathologist performance in detecting individual MF was augmented when using computer assistance (F1-score of 0.68 increased to 0.79) with a reduction in false negatives by 38%. The results of this study demonstrate that computer assistance may lead to more reproducible and accurate MCs in ccMCTs.Entities:
Keywords: artificial intelligence; automated image analysis; canine cutaneous mast cell tumors; computer assistance; deep learning; digital pathology; mitotic count; mitotic figures
Mesh:
Year: 2021 PMID: 34965805 PMCID: PMC8928234 DOI: 10.1177/03009858211067478
Source DB: PubMed Journal: Vet Pathol ISSN: 0300-9858 Impact factor: 2.221
Figure 1.Overview of the course of the study (stages 1–3) with different degrees of computer assistance (red arrows) of the study participants in 3 examination time points (stages). The deep learning model (concatenation of 2 convolutional neural networks) that was used in this study for computer-assistance was developed in a previous study using an independent training and test dataset with different WSI from the same laboratory that provided the study cases. ccMCT, canine cutaneous mast cell tumors; MC-ROI, mitotic count region of interest; MF, mitotic figure.
Figures 2–3.Immunohistochemistry-assisted ground truth. Figure 2. Labeling method of the ground truth dataset. The histological sections (hematoxylin and eosin stain, HE) were destained and relabeled with immunohistochemistry against phosphohistone H3 (pHH3). Subsequently, whole-slide images of both staining methods were aligned on the cellular level via automated image registration and combined to decide if a tumor cell has a mitotic figure (MF) or not. Ground truth annotations comprised pHH3-positive cells that were recognizable on HE images (green circles) or were not readily identifiable on HE images (blue circles; especially prophase MF) as well as unambiguous late phase (especially telophase) MF that were pHH3-negative (red circles). Here these patterns are displayed as 3 distinct colors but in the ground truth dataset those structures were labeled as one label class. Figure 3a. High-magnification image of a pHH3-stained tumor section used for creating the ground truth with 3 positive tumor cells (green circles) and a pHH3-negative mitotic figure imposter (arrow). Figure 3b. Histological image (HE stain) of the same tumor location as in Figure 3a with exemplary annotations by one of the study participants (blue circles). Compared to the pHH3-assisted ground truth, 2 annotations are true positives (TP), 1 annotation is a false positive (FP), and 1 annotation MF was missed (false negative, FN, arrow).
Figure 4.Scatterplots of the participant’s mitotic count (MC) values (stages 2 and 3) and the algorithmic (unverified) MC compared with the pHH3-assisted ground truth MC (all obtained in the same mitotic hotspot MC-ROI based on the algorithmic heatmap). The black line in the scatterplots indicate equal values for ground truth and pathologists or algorithmic MCs.
Accuracy of the 23 study participants (stages 1, 2, and 3) and the deep learning–based algorithm (without pathologist review) to classify mitotic counts (MC) as below (MC < 5) or above (MC ≥ 5) the prognostic cutoff as compared to the pHH3-assisted ground truth MC (GT-MC).
| GT-MC | Number of cases | Accuracy for | Stage 1a | Stage 2 | Stage 3 | Algorithm |
|---|---|---|---|---|---|---|
| 0–4 | 4 | Below cutoff | 75.0% | 50.0% | 50.0% | 50% |
| 5–9 | 7 | Above cutoff | 31.7% | 70.2% | 82.6% | 100% |
| 10–24 | 8 | Above cutoff | 63.0% | 89.1% | 99.5% | 100% |
| 25–49 | 6 | Above cutoff | 85.5% | 93.5% | 99.3% | 100% |
| ≥50 | 15 | Above cutoff | 99.4% | 100% | 100% | 100% |
| All cases | 40 | Below/above cutoff | 75.8% | 86.7% | 91.7% | 95% |
a The GT-MC and the participants’ MC of stage 1 were not determined in the same tumor location. The GT-MC and the MC of stages 2 and 3 were determined in the mitotic hotspot location based on the algorithmic heatmap.
Figures 5–8.Approximate location of the mitotic count region of interest (MC-ROI) selected manually by each study participant (represented by the rectangular boxes) in the whole-slide images. The black box with the dashed line represents the algorithmically preselected MC-ROIs (algorithmic hotspot). The estimated MC heatmap is visualized by variable opacity of a green overlay (scale on the right side of image) on the histological image (hematoxylin and eosin stain) and is based on algorithmic mitotic figure predictions. Dark green areas represent mitotic hotspots. Figure 5. Case no. 33 with widely distributed MC-ROIs. Figure 6. Case no. 46 with widely distributed MC-ROIs. Figure 7. Case no. 38 with MC-ROIs mostly along the tumor periphery. Figure 8. Case no. 5 with similar MC-ROIs at a site of local tumor invasion.
Performance (macro- and micro-averaged metrics with range or 95% confidence interval [CI]) of the 23 participants (partially or fully computer-assisted of stages 2 and 3, respectively) and the deep learning–based algorithm (unverified predictions) for detecting individual mitotic figures in mitotic hotspot regions of interest compared to a pHH3-assisted ground truth.
| Metrics | Precisiona | Recalla | F1 Scorea | |||
|---|---|---|---|---|---|---|
| Examination stage | 2 | 3 | 2 | 3 | 2 | 3 |
| Participants, macro-averaged values (range) | 0.80 (0.56–0.95) | 0.83 (0.58–0.94) | 0.62 (0.37–0.82) | 0.76 (0.54–0.87) | 0.68 (0.53–0.79) | 0.79 (0.67–0.84) |
| Participants, micro-averaged values (95% CI) | 0.74 (0.72–0.76) | 0.79 (0.78–0.81) | 0.62 (0.60–0.63) | 0.76 (0.75–0.78) | 0.63 (0.61–0.64) | 0.75 (0.74–0.77) |
| DL algorithm, macro-averaged value | 0.84 | 0.81 | 0.83 | |||
| DL algorithm, micro-averaged value (95% CI) | 0.83 (0.80–0.85) | 0.80 (0.76–0.84) | 0.80 (0.77–0.82) | |||
Abbreviations: DL, deep learning.
a The F1-score is the harmonic mean of precision (also known as positive predictive value) and recall (also known as sensitivity). The performance of the unverified algorithm is the same for stages 2 and 3.
Figures 9–11.Object detection performance (identification and classification of mitotic figures) of the 23 participants and the deep learning–based algorithm in stages 2 and 3. Figure 9. Micro-averaged F1-score (upper graph), precision (middle graph), and recall (lower graph) with their 95% confidence intervals. The difference is considered significant if the intervals do not overlap. Figure 10. Macro-averaged recall and precision for the individual participants in stages 2 and 3 (connected by a black arrow) and the precision-recall curve for the algorithm (at different classification thresholds). For the algorithmic predictions for the present study, a single classification threshold was used that resulted in a recall of 0.81 and precision of 0.84. Figure 11. Macro-averaged F1-scores for the individual participants for stages 2 and 3. The dashed black line represents the F1-score of the algorithmic predictions.