| Literature DB >> 33782535 |
Mengdan Zhu1, Bing Ren2, Ryland Richards2, Matthew Suriawinata1, Naofumi Tomita1, Saeed Hassanpour3,4,5,6.
Abstract
Renal cell carcinoma (RCC) is the most common renal cancer in adults. The histopathologic classification of RCC is essential for diagnosis, prognosis, and management of patients. Reorganization and classification of complex histologic patterns of RCC on biopsy and surgical resection slides under a microscope remains a heavily specialized, error-prone, and time-consuming task for pathologists. In this study, we developed a deep neural network model that can accurately classify digitized surgical resection slides and biopsy slides into five related classes: clear cell RCC, papillary RCC, chromophobe RCC, renal oncocytoma, and normal. In addition to the whole-slide classification pipeline, we visualized the identified indicative regions and features on slides for classification by reprocessing patch-level classification results to ensure the explainability of our diagnostic model. We evaluated our model on independent test sets of 78 surgical resection whole slides and 79 biopsy slides from our tertiary medical institution, and 917 surgical resection slides from The Cancer Genome Atlas (TCGA) database. The average area under the curve (AUC) of our classifier on the internal resection slides, internal biopsy slides, and external TCGA slides is 0.98 (95% confidence interval (CI): 0.97-1.00), 0.98 (95% CI: 0.96-1.00) and 0.97 (95% CI: 0.96-0.98), respectively. Our results suggest that the high generalizability of our approach across different data sources and specimen types. More importantly, our model has the potential to assist pathologists by (1) automatically pre-screening slides to reduce false-negative cases, (2) highlighting regions of importance on digitized slides to accelerate diagnosis, and (3) providing objective and accurate diagnosis as the second opinion.Entities:
Year: 2021 PMID: 33782535 PMCID: PMC8007643 DOI: 10.1038/s41598-021-86540-4
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Model’s performance on 78 surgical resection whole-slide images in our independent test set from DHMC.
| Subtype | Accuracy | Precision | Recall | F1-score | AUROC |
|---|---|---|---|---|---|
| Normal | 1.00 (1.00–1.00) | 1.00 (1.00–1.00) | 1.00 (1.00–1.00) | 1.00 (1.00–1.00) | 1.00 (1.00–1.00) |
| Oncocytoma | 0.97 (0.95–1.00) | 1.00 (1.00–1.00) | 0.80 (0.63–0.95) | 0.89 (0.77–0.97) | 0.97 (0.91–1.00) |
| Chromophobe RCC | 0.94 (0.90–0.97) | 0.93 (0.84–1.00) | 0.78 (0.65–0.89) | 0.85 (0.76–0.92) | 0.98 (0.95–1.00) |
| Clear cell RCC | 0.97 (0.95–1.00) | 0.91 (0.83–0.98) | 1.00 (1.00–1.00) | 0.95 (0.91–0.99) | 0.98 (0.96–1.00) |
| Papillary RCC | 0.96 (0.94–0.99) | 0.87 (0.78–0.95) | 1.00 (1.00–1.00) | 0.93 (0.88–0.97) | 0.99 (0.98–1.00) |
| Average | 0.97 (0.95–0.98) | 0.94 (0.91–0.97) | 0.92 (0.87–0.95) | 0.92 (0.88–0.96) | 0.98 (0.97–1.00) |
The 95% confidence interval is also included for each measure.
Model’s performance metrics and their 95% confidence intervals on 917 surgical resection whole-slide images from the TCGA database.
| Subtype | Accuracy | Precision | Recall | F1–score | AUROC |
|---|---|---|---|---|---|
| Normal | 1.00 (1.00–1.00) | 1.00 (1.00–1.00) | 1.00 (1.00–1.00) | 1.00 (1.00–1.00) | 1.00 (1.00–1.00) |
| Chromophobe RCC | 0.96 (0.96–0.97) | 0.86 (0.82–0.89) | 0.82 (0.78–0.85) | 0.84 (0.81–0.86) | 0.97 (0.95–0.98) |
| Clear cell RCC | 0.91 (0.90–0.92) | 0.98 (0.97–0.98) | 0.86 (0.85–0.87) | 0.91 (0.91–0.92) | 0.95 (0.94–0.97) |
| Papillary RCC | 0.92 (0.92–0.93) | 0.85 (0.83–0.87) | 0.93 (0.92–0.95) | 0.89 (0.87–0.90) | 0.96 (0.95–0.97) |
| Average | 0.95 (0.95–0.96) | 0.92 (0.91–0.94) | 0.90 (0.89–0.92) | 0.91 (0.90–0.92) | 0.97 (0.96–0.98) |
Model’s performance metrics and their 95% confidence intervals on 79 biopsy whole-slide images from DHMC.
| Subtype | Accuracy | Precision | Recall | F1–score | AUROC |
|---|---|---|---|---|---|
| Oncocytoma | 0.96 (0.94–0.99) | 1.00 (1.00–1.00) | 0.87 (0.79–0.95) | 0.93 (0.88–0.98) | 1.00 (1.00–1.00) |
| Clear cell RCC | 0.96 (0.94–0.99) | 1.00 (1.00–1.00) | 0.91 (0.85–0.97) | 0.95 (0.92–0.98) | 0.95 (0.89–1.00) |
| Papillary RCC | 0.97 (0.95–1.00) | 0.91 (0.83–0.98) | 1.00 (1.00–1.00) | 0.95 (0.91–0.99) | 0.99 (0.97–1.00) |
| Average | 0.97 (0.94–0.99) | 0.97 (0.94–0.99) | 0.93 (0.88–0.97) | 0.95 (0.90–0.98) | 0.98 (0.96–1.00) |
Figure 1Each confusion matrix compares the classification agreement of our model with pathologists’ consensus for each of our three test sets: (a) surgical resection whole-slide images from DHMC, (b) surgical resection whole-slide images from TCGA, and (c) biopsy whole-slide images from DHMC.
Figure 2Receiver operating characteristic curves for (a) surgical resection whole-slide images from DHMC, (b) surgical resection whole-slide images from TCGA, and (c) biopsy whole-slide images from DHMC.
Figure 3Examples of visualized slides from our test sets with highlighted regions of interest for predicted classes using our model. Clear cell RCC and papillary RCC classes are common among the three test sets and thus are used for this illustration. Top row: A surgical resection whole-slide image in the DHMC test set. Middle row: A surgical resection whole-slide image from the TCGA test set. Bottom row: A biopsy whole-slide image from DHMC.
Distribution of the collected whole-slide images among renal cell carcinoma and benign subtypes.
| Histologic subtype | Surgical resection WSIs | Biopsy WSIs | |||
|---|---|---|---|---|---|
| DHMC | TCGA | DHMC | |||
| Training set | Dev set | Test set #1 | Test set #2 | Test #3 | |
| Normal | 15 | 5 | 10 | 9 | – |
| Renal oncocytoma | 14 | 3 | 10 | – | 24 |
| Chromophobe RCC | 15 | 5 | 18 | 109 | – |
| Clear cell RCC | 285 | 5 | 20 | 505 | 34 |
| Papillary RCC | 56 | 5 | 20 | 294 | 21 |
| Total | 385 | 23 | 78 | 917 | 79 |
“–” indicates the corresponding subtype was not available in the dataset.
Figure 4Overview of our classification pipeline. Tissue patches are extracted from whole-slide images using a sliding-window method with 1/3 overlap after background removal. Deep neural networks extract histology features of the patches and compute patch-level confidence scores for each of the target classes. The patch-level predictions are filtered by low-confidence thresholding and aggregated by computing the percentage of patches that belong to each class in a whole-slide image. We classify a whole slide using a decision tree based on the computed percentages of each class. Patch predictions are also used for visualization, which illustrates the coverage of each class on slides.