Literature DB >> 33782535

Development and evaluation of a deep neural network for histologic classification of renal cell carcinoma on biopsy and surgical resection slides.

Mengdan Zhu¹, Bing Ren², Ryland Richards², Matthew Suriawinata¹, Naofumi Tomita¹, Saeed Hassanpour^3,4,5,6.

Abstract

Renal cell carcinoma (RCC) is the most common renal cancer in adults. The histopathologic classification of RCC is essential for diagnosis, prognosis, and management of patients. Reorganization and classification of complex histologic patterns of RCC on biopsy and surgical resection slides under a microscope remains a heavily specialized, error-prone, and time-consuming task for pathologists. In this study, we developed a deep neural network model that can accurately classify digitized surgical resection slides and biopsy slides into five related classes: clear cell RCC, papillary RCC, chromophobe RCC, renal oncocytoma, and normal. In addition to the whole-slide classification pipeline, we visualized the identified indicative regions and features on slides for classification by reprocessing patch-level classification results to ensure the explainability of our diagnostic model. We evaluated our model on independent test sets of 78 surgical resection whole slides and 79 biopsy slides from our tertiary medical institution, and 917 surgical resection slides from The Cancer Genome Atlas (TCGA) database. The average area under the curve (AUC) of our classifier on the internal resection slides, internal biopsy slides, and external TCGA slides is 0.98 (95% confidence interval (CI): 0.97-1.00), 0.98 (95% CI: 0.96-1.00) and 0.97 (95% CI: 0.96-0.98), respectively. Our results suggest that the high generalizability of our approach across different data sources and specimen types. More importantly, our model has the potential to assist pathologists by (1) automatically pre-screening slides to reduce false-negative cases, (2) highlighting regions of importance on digitized slides to accelerate diagnosis, and (3) providing objective and accurate diagnosis as the second opinion.

Entities: CellLine Chemical Disease Gene Species

Year: 2021 PMID： 33782535 PMCID： PMC8007643 DOI： 10.1038/s41598-021-86540-4

Source DB: PubMed Journal: Sci Rep ISSN： 2045-2322 Impact factor: 4.379

Introduction

Kidney cancer is among the ten most common cancers worldwide[1,2], and approximately 90% of all kidney cancers are renal cell carcinoma (RCC)[3]. The classification of RCC consists of three major histologic RCC subtypes. Clear cell renal cell carcinoma (ccRCC) is the most common subtype (around 75% of all cases), papillary renal cell carcinoma (pRCC) accounts for about 15–20% of RCC, and chromophobe renal cell carcinoma (chRCC) makes up approximately 5% of RCC[4]. The classic morphologic features of ccRCC include compact, alveolar, tubulocystic or rarely papillary architecture of cells with clear cytoplasm and characteristic network of small, thin-walled vessels[5]. Papillae or tubulopapillary architecture with fibrovascular cores and frequently with foamy macrophages are identified pRCC[6]. Of note, although renal oncocytoma is the most common benign renal tumor type, it still remains difficult to distinguish clinically from renal cell carcinoma including chRCC[7]. There are well-documented disparities in histologic appearances of chRCC and renal oncocytoma: chRCC shows prominent cell border, raisinoid nuclei and perinuclear halo, while oncocytoma displays nested architecture with myxoid or hyalinized stroma and cells with eosinophilic or granular cytoplasm and small round nuclei. However, the eosinophilic variant of chRCC can mimic the histologic features of oncocytoma, given their similar histogenesis[8]. Histological classification of RCC is of great importance in patient’s care, as RCC subtypes have significant implication in the prognosis and treatment of renal tumors[9-11]. Inspection and examination of complex RCC histologic patterns under the microscope, however, remain a time-consuming and demanding task for pathologists. The manual classification of RCC has shown a high rate of inter-observer and intra-observer variability[12], as renal tumors can have varied appearances and combined morphologic features, making classification difficult. With the advent of whole-slide images in digital pathology, automated histopathologic image analysis systems have shown great promise for diagnostic purposes[13-15]. Computerized image analysis has the advantage of providing a more efficient, objective, and consistent evaluation to assist pathologists in their medical decision-making processes. In recent years, significant progress has been made in applying deep learning techniques, especially convolutional neural networks (CNNs), to a wide range of computer vision tasks as well as biomedical imaging analysis applications[16-18]. CNN-based models can automatically process digitized histopathology images and learn to extract cellular patterns associated with the presence of tumors[19-21]. In this study, we developed a CNN-based model for classification of renal cell carcinoma based on surgical resection slides. We used resection slides from our tertiary medical institution, Dartmouth-Hitchcock Medical Center (DHMC), which include rare subtypes, such as normal, renal oncocytomas, and chRCC, for the development of our model. We evaluated this model on 78 independent surgical resection slides from our institution and 917 surgical resection RCC slides from The Cancer Genome Atlas (TCGA) database. Furthermore, we evaluated this model for RCC classification on 79 biopsy slides from our institution. The study presented in this paper utilizes deep neural networks to automatically and accurately differentiate RCC from benign renal tumor cases and classify RCC subtypes on both surgical resection and biopsy slides.

Results

Table 1 summarizes the per-class and average evaluation of our model on the first test set of surgical resection whole-slide images from DHMC. Our model achieved a mean accuracy of 0.97, a mean precision of 0.94, a mean recall of 0.92, a mean F1-score of 0.92, and a mean AUC of 0.98 (95% CI: 0.97–1.00) on this internal test set of resection slides. Table 2 shows the performance summary of our model on whole-slide images from the kidney renal carcinoma collection of the TCGA databases. We achieved high performance on these external resection whole-slide images with a mean accuracy of 0.95, a mean precision of 0.92, a mean recall of 0.90, a mean F1-score of 0.91, and a mean AUC of 0.97 (95% CI: 0.96–0.98). Table 3 presents the per-class and mean performance metrics of our model on 79 biopsy whole-slide images from DHMC. Our model shows great generalizability on the internal biopsy test set, with a mean accuracy of 0.97, a mean precision of 0.97, a mean recall of 0.93, a mean F1-score of 0.95, and a mean AUC of 0.98 (95% CI: 0.96–1.00).

Table 1

Model’s performance on 78 surgical resection whole-slide images in our independent test set from DHMC.

Subtype	Accuracy	Precision	Recall	F1-score	AUROC
Normal	1.00 (1.00–1.00)	1.00 (1.00–1.00)	1.00 (1.00–1.00)	1.00 (1.00–1.00)	1.00 (1.00–1.00)
Oncocytoma	0.97 (0.95–1.00)	1.00 (1.00–1.00)	0.80 (0.63–0.95)	0.89 (0.77–0.97)	0.97 (0.91–1.00)
Chromophobe RCC	0.94 (0.90–0.97)	0.93 (0.84–1.00)	0.78 (0.65–0.89)	0.85 (0.76–0.92)	0.98 (0.95–1.00)
Clear cell RCC	0.97 (0.95–1.00)	0.91 (0.83–0.98)	1.00 (1.00–1.00)	0.95 (0.91–0.99)	0.98 (0.96–1.00)
Papillary RCC	0.96 (0.94–0.99)	0.87 (0.78–0.95)	1.00 (1.00–1.00)	0.93 (0.88–0.97)	0.99 (0.98–1.00)
Average	0.97 (0.95–0.98)	0.94 (0.91–0.97)	0.92 (0.87–0.95)	0.92 (0.88–0.96)	0.98 (0.97–1.00)

Subtype

Accuracy

Precision

Recall

F1-score

AUROC

Normal

1.00

(1.00–1.00)

1.00

(1.00–1.00)

1.00

(1.00–1.00)

1.00

(1.00–1.00)

1.00

(1.00–1.00)

Oncocytoma

0.97

(0.95–1.00)

1.00

(1.00–1.00)

0.80

(0.63–0.95)

0.89

(0.77–0.97)

0.97

(0.91–1.00)

Chromophobe RCC

0.94

(0.90–0.97)

0.93

(0.84–1.00)

0.78

(0.65–0.89)

0.85

(0.76–0.92)

0.98

(0.95–1.00)

Clear cell RCC

0.97

(0.95–1.00)

0.91

(0.83–0.98)

1.00

(1.00–1.00)

0.95

(0.91–0.99)

0.98

(0.96–1.00)

Papillary RCC

0.96

(0.94–0.99)

0.87

(0.78–0.95)

1.00

(1.00–1.00)

0.93

(0.88–0.97)

0.99

(0.98–1.00)

Average

0.97

(0.95–0.98)

0.94

(0.91–0.97)

0.92

(0.87–0.95)

0.92

(0.88–0.96)

0.98

(0.97–1.00)

The 95% confidence interval is also included for each measure.

Table 2

Model’s performance metrics and their 95% confidence intervals on 917 surgical resection whole-slide images from the TCGA database.

Subtype	Accuracy	Precision	Recall	F1–score	AUROC
Normal	1.00 (1.00–1.00)	1.00 (1.00–1.00)	1.00 (1.00–1.00)	1.00 (1.00–1.00)	1.00 (1.00–1.00)
Chromophobe RCC	0.96 (0.96–0.97)	0.86 (0.82–0.89)	0.82 (0.78–0.85)	0.84 (0.81–0.86)	0.97 (0.95–0.98)
Clear cell RCC	0.91 (0.90–0.92)	0.98 (0.97–0.98)	0.86 (0.85–0.87)	0.91 (0.91–0.92)	0.95 (0.94–0.97)
Papillary RCC	0.92 (0.92–0.93)	0.85 (0.83–0.87)	0.93 (0.92–0.95)	0.89 (0.87–0.90)	0.96 (0.95–0.97)
Average	0.95 (0.95–0.96)	0.92 (0.91–0.94)	0.90 (0.89–0.92)	0.91 (0.90–0.92)	0.97 (0.96–0.98)

Subtype

Accuracy

Precision

Recall

F1–score

AUROC

Normal

1.00

(1.00–1.00)

1.00

(1.00–1.00)

1.00

(1.00–1.00)

1.00

(1.00–1.00)

1.00

(1.00–1.00)

Chromophobe RCC

0.96

(0.96–0.97)

0.86

(0.82–0.89)

0.82

(0.78–0.85)

0.84

(0.81–0.86)

0.97

(0.95–0.98)

Clear cell RCC

0.91

(0.90–0.92)

0.98

(0.97–0.98)

0.86

(0.85–0.87)

0.91

(0.91–0.92)

0.95

(0.94–0.97)

Papillary RCC

0.92

(0.92–0.93)

0.85

(0.83–0.87)

0.93

(0.92–0.95)

0.89

(0.87–0.90)

0.96

(0.95–0.97)

Average

0.95

(0.95–0.96)

0.92

(0.91–0.94)

0.90

(0.89–0.92)

0.91

(0.90–0.92)

0.97

(0.96–0.98)

Table 3

Model’s performance metrics and their 95% confidence intervals on 79 biopsy whole-slide images from DHMC.

Subtype	Accuracy	Precision	Recall	F1–score	AUROC
Oncocytoma	0.96 (0.94–0.99)	1.00 (1.00–1.00)	0.87 (0.79–0.95)	0.93 (0.88–0.98)	1.00 (1.00–1.00)
Clear cell RCC	0.96 (0.94–0.99)	1.00 (1.00–1.00)	0.91 (0.85–0.97)	0.95 (0.92–0.98)	0.95 (0.89–1.00)
Papillary RCC	0.97 (0.95–1.00)	0.91 (0.83–0.98)	1.00 (1.00–1.00)	0.95 (0.91–0.99)	0.99 (0.97–1.00)
Average	0.97 (0.94–0.99)	0.97 (0.94–0.99)	0.93 (0.88–0.97)	0.95 (0.90–0.98)	0.98 (0.96–1.00)

Subtype

Accuracy

Precision

Recall

F1–score

AUROC

Oncocytoma

0.96

(0.94–0.99)

1.00

(1.00–1.00)

0.87

(0.79–0.95)

0.93

(0.88–0.98)

1.00

(1.00–1.00)

Clear cell RCC

0.96

(0.94–0.99)

1.00

(1.00–1.00)

0.91

(0.85–0.97)

0.95

(0.92–0.98)

0.95

(0.89–1.00)

Papillary RCC

0.97

(0.95–1.00)

0.91

(0.83–0.98)

1.00

(1.00–1.00)

0.95

(0.91–0.99)

0.99

(0.97–1.00)

Average

0.97

(0.94–0.99)

0.97

(0.94–0.99)

0.93

(0.88–0.97)

0.95

(0.90–0.98)

0.98

(0.96–1.00)

Model’s performance on 78 surgical resection whole-slide images in our independent test set from DHMC. 1.00 (1.00–1.00) 1.00 (1.00–1.00) 1.00 (1.00–1.00) 1.00 (1.00–1.00) 1.00 (1.00–1.00) 0.97 (0.95–1.00) 1.00 (1.00–1.00) 0.80 (0.63–0.95) 0.89 (0.77–0.97) 0.97 (0.91–1.00) 0.94 (0.90–0.97) 0.93 (0.84–1.00) 0.78 (0.65–0.89) 0.85 (0.76–0.92) 0.98 (0.95–1.00) 0.97 (0.95–1.00) 0.91 (0.83–0.98) 1.00 (1.00–1.00) 0.95 (0.91–0.99) 0.98 (0.96–1.00) 0.96 (0.94–0.99) 0.87 (0.78–0.95) 1.00 (1.00–1.00) 0.93 (0.88–0.97) 0.99 (0.98–1.00) 0.97 (0.95–0.98) 0.94 (0.91–0.97) 0.92 (0.87–0.95) 0.92 (0.88–0.96) 0.98 (0.97–1.00) The 95% confidence interval is also included for each measure. Model’s performance metrics and their 95% confidence intervals on 917 surgical resection whole-slide images from the TCGA database. 1.00 (1.00–1.00) 1.00 (1.00–1.00) 1.00 (1.00–1.00) 1.00 (1.00–1.00) 1.00 (1.00–1.00) 0.96 (0.96–0.97) 0.86 (0.82–0.89) 0.82 (0.78–0.85) 0.84 (0.81–0.86) 0.97 (0.95–0.98) 0.91 (0.90–0.92) 0.98 (0.97–0.98) 0.86 (0.85–0.87) 0.91 (0.91–0.92) 0.95 (0.94–0.97) 0.92 (0.92–0.93) 0.85 (0.83–0.87) 0.93 (0.92–0.95) 0.89 (0.87–0.90) 0.96 (0.95–0.97) 0.95 (0.95–0.96) 0.92 (0.91–0.94) 0.90 (0.89–0.92) 0.91 (0.90–0.92) 0.97 (0.96–0.98) Model’s performance metrics and their 95% confidence intervals on 79 biopsy whole-slide images from DHMC. 0.96 (0.94–0.99) 1.00 (1.00–1.00) 0.87 (0.79–0.95) 0.93 (0.88–0.98) 1.00 (1.00–1.00) 0.96 (0.94–0.99) 1.00 (1.00–1.00) 0.91 (0.85–0.97) 0.95 (0.92–0.98) 0.95 (0.89–1.00) 0.97 (0.95–1.00) 0.91 (0.83–0.98) 1.00 (1.00–1.00) 0.95 (0.91–0.99) 0.99 (0.97–1.00) 0.97 (0.94–0.99) 0.97 (0.94–0.99) 0.93 (0.88–0.97) 0.95 (0.90–0.98) 0.98 (0.96–1.00) Of note, our pipeline is the same for resection and biopsy slides and processes each slide using the same method and parameters. Typical biopsy core specimens include 2 to 8 biopsy cores. In this study, the biopsy cores were analyzed and evaluated together at the slide level. Some biopsy cores may only consist of benign renal parenchyma, and some biopsy cores may contain an insufficient amount of lesional tissue; therefore, our approach considered all biopsy cores together at the level of slide. The confusion matrices for each of our three test sets are shown in Fig. 1. Overall, the normal cases could be easily recognized by our model, whereas a minor portion of oncocytoma cases could be misclassified as chromophobe RCC and papillary RCC in both surgical resection cases and biopsy cases. We provide a detailed error analysis in the discussion section. The Receiver Operating Characteristic (ROC) curves of all the test sets are plotted in Fig. 2.

Figure 1

Figure 2

Receiver operating characteristic curves for (a) surgical resection whole-slide images from DHMC, (b) surgical resection whole-slide images from TCGA, and (c) biopsy whole-slide images from DHMC.

Each confusion matrix compares the classification agreement of our model with pathologists’ consensus for each of our three test sets: (a) surgical resection whole-slide images from DHMC, (b) surgical resection whole-slide images from TCGA, and (c) biopsy whole-slide images from DHMC. Receiver operating characteristic curves for (a) surgical resection whole-slide images from DHMC, (b) surgical resection whole-slide images from TCGA, and (c) biopsy whole-slide images from DHMC. We visualize the patches on whole-slide images in our test sets with a color-coded scheme according to the classes predicted by our model. This visualization provides pathologists with insights into the major regions and features that contribute to the classification decisions of our method, to avoid the “black-box” approach toward the outputs. Figure 3 shows a sample visualization for slides from each test set. More visualization examples from the DHMC surgical resection test set are included in Figure S1 in the Supplementary Material. In addition, Figure S3 shows the visualization of patch-level classifications using GradCAM method to enhance the interpretability of our model[22].

Figure 3

Examples of visualized slides from our test sets with highlighted regions of interest for predicted classes using our model. Clear cell RCC and papillary RCC classes are common among the three test sets and thus are used for this illustration. Top row: A surgical resection whole-slide image in the DHMC test set. Middle row: A surgical resection whole-slide image from the TCGA test set. Bottom row: A biopsy whole-slide image from DHMC.

Discussion

Classification of renal cell carcinoma subtype is a clinically important task that enables clinicians to predict prognosis and to choose the optimal management for patients with RCC. Different RCC subtypes may have different prognosis, underlining the importance of differentiation of these subtypes. Clear cell RCC has a worse prognosis compared to chromophobe or papillary RCC at the same stage[23-26]. Of note, the most common benign renal tumor is oncocytoma (3–7% of all renal tumors) and is known for mimicking RCC on histology slides[27]. Therefore, it is very important to recognize different subtypes of RCC as well as benign renal neoplasms such as oncocytoma. This study proposed and evaluated a deep neural network model for automated renal cell carcinoma classification on both surgical resection and biopsy whole-slide images. We chose ResNet-18 architecture as the backbone of our pipeline, which involved a patch-prediction aggregation strategy. Our final model achieved an average F1-score of 0.92, 0.91, and 0.95 for independent resection whole-slide image test sets from DHMC and TCGA databases, and DHMC biopsy whole-slide images, respectively. This study is the first step toward utilizing deep learning methods to automatically classify RCC subtypes and oncocytoma on histopathology images. Previous work on machine learning applications to kidney cancer histopathology is mostly focused on resection slides and three RCC subtypes, without the consideration of benign or oncocytoma classes, with validation on a single test set[17,28-30]. To distinguish between chRCC and oncocytoma classes, which are less common in public datasets, we used region-of-interest annotations to develop a highly accurate patch classifier. Recently, a combination of a convolutional neural network and directed cyclic graph-support vector machine (DAG-SVM) was used for the classification of three RCC subtypes using the TCGA dataset[17]. Another common method in digital pathology is using a weakly supervised approach with a multiple instance learning framework to train a diagnostic model without region-of-interest annotations[31,32]. Our study stands out from this previous study for several reasons: (1) our approach follows a more intuitive methodology based on patch-level confidence scores and achieved an average AUC of > 0.95 on the test sets; (2) our method was evaluated on both DHMC and TCGA datasets to show its generalizability on surgical resection whole-slide images, establishing a strong baseline for future studies in the classification of renal cell carcinoma subtypes; (3) our study includes identification of benign renal neoplasm, oncocytoma, in addition to all major RCC subtypes; and (4) we showed the application and generalizability of our model to a test set of biopsy whole-slide images, which also achieved promising results. Of note, because the TCGA dataset is focused on malignant cancer cases, oncocytoma, a benign subtype, does not exist among the TCGA whole-slide images. Therefore, this subtype was not included in the surgical resection slides in our external test set. Additionally, chromophobe RCC makes up about 5% of RCC occurrences and we could only identify a few chromophobe biopsy slides at DHMC. Similarly, for clinical purposes, only a few normal biopsy slides are stored at DHMC, as more emphasis is put on renal tumor biopsy slides. Considering the prevalence and availability of chromophobe biopsy slides, and the availability of normal slides at our institution, we excluded chromophobe RCC and normal class from our biopsy test set, and evaluated our model on the two major RCC subtypes (i.e., clear cell RCC and papillary RCC) and the major renal benign tumor type (i.e., renal oncocytoma). Notably, the generalizability of our model to biopsy whole-slide images has a wide range of application, as it could assist clinicians with fast and reliable diagnoses and follow-up recommendations for patients. Manual histopathological analysis is a tedious and time-consuming task that could induce errors and variability among different pathologists. Our model addresses this limitation by providing a new technology that has the potential to help pathologists achieve a more efficient, objective, and accurate diagnosis and classification of renal cell carcinoma. In particular, our approach could provide clinical assistance and a second opinion to general surgical pathologists that are not specialized in genitourinary pathology. Our error analysis shown in Figure S2 in the Supplementary Material demonstrates that the misclassifications of our model are mainly due to the atypical morphologic patterns in the histopathologic images. In the DHMC resection test set, chromophobe RCC is misclassified as clear cell RCC because of the substantial clear cytoplasm and thin walled vasculature in the images (Figure S2a). Oncocytoma is misclassified as chromophobe RCC or papillary RCC due to focal tubular growth pattern and less characteristic stroma present (Figure S2b). In the TCGA test set, papillary RCC is misclassified as clear cell RCC due to focal tumor cells with clear cytoplasm and thin walled vasculature (Figure S2c) and clear cell RCC is misclassified as papillary RCC due to focal papillary formation and less clear cytoplasm (Figure S2d). In the DHMC biopsy test set, we observed focal tubular growth pattern and tumor cell cluster formation in oncocytoma cases, and these patterns share overlaying features with pRCC and may be mistakenly recognized as pRCC by our model. In our model’s errors for ccRCC cases, the tumor area consisted less than 5% of whole tissue, which was below the abnormality threshold in our approach and caused misclassification (Figure S3). Additional error analysis at whole-slide level is included in Figure S3 in Supplementary Materials. As a future direction, we plan to expand our dataset and test sets through external collaborations for a more robust and extensive evaluation of RCC subtypes. This extension will include rare subtypes and classes, such as clear-cell papillary renal cell carcinomas. In addition, Table S3 in Supplementary Materials shows the model performance on the TCGA test set, stratified by Fuhrman grade. We plan to investigate and identify salient morphological features in high-grade cases in future work. According to our error analysis, one of our model’s limitations is the misclassification of clear cell RCC as normal in the biopsy slides. To address this limitation, we will pursue developing an adaptive thresholding method that is attentive to differences between biopsy slides and resection slides. With more available data for rare classes, we expect weakly supervised frameworks could potentially remove the thresholding requirement for aggregation; however, aggregation methods such as recurrent neural networks (RNNs) that are used in these weakly supervised approaches could negatively affect the interpretability of these models. Moreover, recent studies suggest that weakly supervised learning frameworks with attention mechanisms are effective for whole-slide classification, which we can utilize in conjunction with our fully-supervised learning method to further improve the accuracy of our model[33,34]. Another direction of improvement is to utilize graph-based approaches, such as Slide Graph[35], to capture the location information of features in whole-slide classification. Of note, our proposed whole-slide inference utilizes aggregation of patch-level predictions, and is permutation invariant and does not consider the location information of patches. Finally, we plan to implement a prospective clinical trial to validate this approach in clinical settings and quantify its impact on the efficiency and accuracy of pathologists' diagnosis of renal cancer.

Materials and methods

Data collection

A total of 486 whole-slide images were collected from patients who underwent renal resection, including 30 normal slides with benign renal parenchyma and no renal neoplasm, from 2015 to 2019 from Dartmouth-Hitchcock Medical Center or DHMC, a tertiary medical institution in New Hampshire, USA. These hematoxylin and eosin (H&E) stained surgical resection slides were digitized by Aperio AT2 scanners (Leica Biosystems, Wetzlar, Germany) at 20× magnification (0.50 µm/pixel). We partitioned these slides into a training set of 385 slides, a development (dev) set of 23 slides, and a test set of 78 slides. Additionally, we collected 79 RCC biopsy slides from 2015 to 2017 from DHMC, as well as 917 whole-slide images of kidney cancer from TCGA for external validation. This study and the use of human participant data in this project were approved by the Dartmouth-Hitchcock Health Institutional Review Board (IRB) with a waiver of informed consent. The conducted research reported in this article is in accordance with this approved Dartmouth-Hitchcock Health IRB protocol and the World Medical Association Declaration of Helsinki on Ethical Principles for Medical Research involving Human Subjects[36]. The distribution of whole-slide images that were used in this study is summarized in Table 4. Additional information about inclusion and exclusion criteria for DHMC datasets are included in Supplementary Materials, Appendix A.

Table 4

Distribution of the collected whole-slide images among renal cell carcinoma and benign subtypes.

Histologic subtype	Surgical resection WSIs				Biopsy WSIs
	DHMC			TCGA	DHMC
	Training set	Dev set	Test set #1	Test set #2	Test #3
Normal	15	5	10	9	–
Renal oncocytoma	14	3	10	–	24
Chromophobe RCC	15	5	18	109	–
Clear cell RCC	285	5	20	505	34
Papillary RCC	56	5	20	294	21
Total	385	23	78	917	79

“–” indicates the corresponding subtype was not available in the dataset.

Distribution of the collected whole-slide images among renal cell carcinoma and benign subtypes. “–” indicates the corresponding subtype was not available in the dataset.

Data annotation

Two pathologists (R.R. & B.R.) from the Department of Pathology and Laboratory Medicine at DHMC manually annotated the surgical resection whole-slide images in our training and development sets. In this annotation process, bounding boxes outlining regions of interest (ROIs) for each subtype were generated using Automated Slide Analysis Platform (ASAP), a fast viewer and annotation tool for high-resolution histopathology images[37]. Each ROI was associated and labeled as clear cell RCC, papillary RCC, chromophobe RCC, oncocytoma, or normal. All annotated ROIs were confirmed by one pathologist at a time before being broken into fixed-size patches for our model training and validation steps.

Deep neural network for patch classification

Given the large size of high-resolution histology images and the memory restrictions of currently available computer hardware, it is not feasible to analyze a whole-slide image all at once. Therefore, in this work, we use a computational framework developed by our group that relies on deep neural network image analysis on small fixed-size patches with an overlap of 1/3 from the whole-slide images[38]. These results are then aggregated through a confidence-based inference mechanism to classify the whole-slide images. As a result, this framework allows us to analyze a high-resolution, whole-slide image with a feasible memory requirement. Figure 4 shows the overview of our model in this study.

Figure 4

Overview of our classification pipeline. Tissue patches are extracted from whole-slide images using a sliding-window method with 1/3 overlap after background removal. Deep neural networks extract histology features of the patches and compute patch-level confidence scores for each of the target classes. The patch-level predictions are filtered by low-confidence thresholding and aggregated by computing the percentage of patches that belong to each class in a whole-slide image. We classify a whole slide using a decision tree based on the computed percentages of each class. Patch predictions are also used for visualization, which illustrates the coverage of each class on slides. To do this, we utilized a sliding window approach[38] on the annotated ROIs in our training and development sets to generate fixed-size (i.e., 224 × 224 pixels) patches. To balance the dataset, we randomly selected the same number of 12,240 patches from the training set for each subtype. The distribution of this patch-level dataset is available in Table S1 in the Supplementary Material. We normalized the color intensity of patches and applied standard data augmentation methods, including random horizontal and vertical flips, random 90° rotations, and color jittering. For model training, we tried four variations of residual neural network (ResNet) architecture with different numbers of layers: ResNet-18, ResNet-34, ResNet-50 and ResNet-101. All the networks were initialized using He initialization[39]. These models used the multi-class cross entropy loss function and were trained for 40 epochs with an initial learning rate of 0.001. The learning rate was reduced by a factor of 0.9 every epoch during the training. The trained models assign a label with a confidence score (i.e., a prediction probability between 0 and 1) for each patch. We compared the trained models in our cross-validation process. Among the trained models, we selected a ResNet-18 model, which achieved the best average F1-score of 0.96 on the development set, for further whole-slide inference. The model’s performance on the development set is summarized in Table S2 in the Supplementary Material.

Whole-slide inference

For whole-slide classification, our approach aggregated patch-level predictions based on their confidence scores. For each whole-slide image, we automatically processed the image by removing the white background, breaking down the remaining areas in each whole-slide image into fixed-size (i.e., 224 × 224 pixels) patches, and feeding the patches to our trained deep neural network to generate a pool of patch-level predictions. Of note, to enhance the robustness of our method, we removed all low-confidence patches from this pool so that their confidence scores were less than the threshold of 0.9. We performed a grid search to find the best threshold for the patch-level confidence score on the development set. To aggregate the patch-level predictions, we computed the percentage of patches that belongs to each class in the pool of patches from a whole-slide image. We applied a grid-search optimization on patch-based statistics in the development set to build our inference criteria for whole-slide inference. In our whole-slide image inference criteria, if any of the renal subtypes (i.e., clear cell RCC, papillary RCC, chromophobe RCC, or oncocytoma) accounted for more than 5.0% of the total number of patches, we labeled the whole-slide image as an abnormal class with the greatest number of patches. Otherwise, we classified the whole-slide image as overall normal. The details of our grid search process are included in Supplementary Materials, Appendix B.

Evaluation metrics and statistical analysis

To show the accuracy and generalizability of our approach, we evaluated our method on three different test sets: (1) 78 independent surgical resection whole-slide images from DHMC, (2) 917 surgical resection whole-slide images from the TCGA database, and (3) 79 biopsy whole-slide images from DHMC. In this evaluation, we establish the gold standard for each whole-slide image in our test sets based on the original institutional label and the verification of a pathologist (R.R.) involved in our study. If there is any disagreement, we send the cases to our senior pathologist (B.R.) to resolve the disagreement. For this multi-class classification, we used precision, recall, the F1-score, and the area under the curve (AUC), as well as confusion matrices to show the discriminating performance of our approach for renal cancer classification. In addition, 95% confidence intervals (95% CIs) were computed using the bootstrapping method with 10,000 iterations for all the metrics[40]. Supplementary Information.

35 in total

1. Prognostic impact of histologic subtyping of adult renal epithelial neoplasms: an experience of 405 cases.

Authors: Mahul B Amin; Mitual B Amin; Pheroze Tamboli; Javid Javidan; Hans Stricker; Mariza de-Peralta Venturina; Anita Deshpande; Mani Menon
Journal: Am J Surg Pathol Date: 2002-03 Impact factor: 6.394

2. Renal tumors: diagnostic and prognostic biomarkers.

Authors: Puay Hoon Tan; Liang Cheng; Nathalie Rioux-Leclercq; Maria J Merino; George Netto; Victor E Reuter; Steven S Shen; David J Grignon; Rodolfo Montironi; Lars Egevad; John R Srigley; Brett Delahunt; Holger Moch
Journal: Am J Surg Pathol Date: 2013-10 Impact factor: 6.394

Review 3. Renal cell carcinoma.

Authors: James J Hsieh; Mark P Purdue; Sabina Signoretti; Charles Swanton; Laurence Albiges; Manuela Schmidinger; Daniel Y Heng; James Larkin; Vincenzo Ficarra
Journal: Nat Rev Dis Primers Date: 2017-03-09 Impact factor: 52.329

4. RMDL: Recalibrated multi-instance deep learning for whole slide gastric image classification.

Authors: Shujun Wang; Yaxi Zhu; Lequan Yu; Hao Chen; Huangjing Lin; Xiangbo Wan; Xinjuan Fan; Pheng-Ann Heng
Journal: Med Image Anal Date: 2019-08-30 Impact factor: 8.545

Review 5. 2004 WHO classification of the renal tumors of the adults.

Authors: Antonio Lopez-Beltran; Marina Scarpelli; Rodolfo Montironi; Ziya Kirkali
Journal: Eur Urol Date: 2006-01-17 Impact factor: 20.096

Review 6. Renal cell carcinoma: histological classification and correlation with imaging findings.

Authors: Valdair F Muglia; Adilson Prando
Journal: Radiol Bras Date: 2015 May-Jun

7. Pan-Renal Cell Carcinoma classification and survival prediction from histopathology images using deep learning.

Authors: Sairam Tabibu; P K Vinod; C V Jawahar
Journal: Sci Rep Date: 2019-07-19 Impact factor: 4.379

8. Computer-aided classification of lung nodules on computed tomography images via deep learning technique.

Authors: Kai-Lung Hua; Che-Hao Hsu; Shintami Chusnul Hidayati; Wen-Huang Cheng; Yu-Jen Chen
Journal: Onco Targets Ther Date: 2015-08-04 Impact factor: 4.147

9. Automated grading of renal cell carcinoma using whole slide imaging.

Authors: Fang-Cheng Yeh; Anil V Parwani; Liron Pantanowitz; Chien Ho
Journal: J Pathol Inform Date: 2014-07-30

10. Automated clear cell renal carcinoma grade classification with prognostic significance.

Authors: Katherine Tian; Christopher A Rubadue; Douglas I Lin; Mitko Veta; Michael E Pyle; Humayun Irshad; Yujing J Heng
Journal: PLoS One Date: 2019-10-03 Impact factor: 3.240

5 in total

1. Resolution-based distillation for efficient histology image classification.

Authors: Joseph DiPalma; Arief A Suriawinata; Laura J Tafe; Lorenzo Torresani; Saeed Hassanpour
Journal: Artif Intell Med Date: 2021-08-06 Impact factor: 7.011

Review 2. Precision Medicine: An Optimal Approach to Patient Care in Renal Cell Carcinoma.

Authors: Revati Sharma; George Kannourakis; Prashanth Prithviraj; Nuzhat Ahmed
Journal: Front Med (Lausanne) Date: 2022-06-14

3. Deep Learning Using CT Images to Grade Clear Cell Renal Cell Carcinoma: Development and Validation of a Prediction Model.

Authors: Lifeng Xu; Chun Yang; Feng Zhang; Xuan Cheng; Yi Wei; Shixiao Fan; Minghui Liu; Xiaopeng He; Jiali Deng; Tianshu Xie; Xiaomin Wang; Ming Liu; Bin Song
Journal: Cancers (Basel) Date: 2022-05-24 Impact factor: 6.575

4. Evaluation of an Artificial Intelligence-Augmented Digital System for Histologic Classification of Colorectal Polyps.

Authors: Mustafa Nasir-Moin; Arief A Suriawinata; Bing Ren; Xiaoying Liu; Douglas J Robertson; Srishti Bagchi; Naofumi Tomita; Jason W Wei; Todd A MacKenzie; Judy R Rees; Saeed Hassanpour
Journal: JAMA Netw Open Date: 2021-11-01

Review 5. Artificial Intelligence-Assisted Renal Pathology: Advances and Prospects.

Authors: Yiqin Wang; Qiong Wen; Luhua Jin; Wei Chen
Journal: J Clin Med Date: 2022-08-22 Impact factor: 4.964

5 in total