| Literature DB >> 36268091 |
Wayner Barrios1, Behnaz Abdollahi2, Manu Goyal2, Qingyuan Song2, Matthew Suriawinata2, Ryland Richards3, Bing Ren3, Alan Schned3, John Seigne4, Margaret Karagas5, Saeed Hassanpour1,2,5.
Abstract
Background: Recent studies indicate that bladder cancer is among the top 10 most common cancers in the world (Saginala et al. 2022). Bladder cancer frequently reoccurs, and prognostic judgments may vary among clinicians. As a favorable prognosis may help to inform less aggressive treatment plans, classification of histopathology slides is essential for the accurate prognosis and effective treatment of bladder cancer patients. Developing automated and accurate histopathology image analysis methods can help pathologists determine the prognosis of patients with bladder cancer. Materials and methods: In this study, we introduced Bladder4Net, a deep learning pipeline, to classify whole-slide histopathology images of bladder cancer into two classes: low-risk (combination of PUNLMP and low-grade tumors) and high-risk (combination of high-grade and invasive tumors). This pipeline consists of four convolutional neural network (CNN)-based classifiers to address the difficulties of identifying PUNLMP and invasive classes. We evaluated our pipeline on 182 independent whole-slide images from the New Hampshire Bladder Cancer Study (NHBCS) (Karagas et al., 1998; Sverrisson et al., 2014; Sverrisson et al., 2014) collected from 1994 to 2004 and 378 external digitized slides from The Cancer Genome Atlas (TCGA) database (https://www.cancer.gov/tcga).Entities:
Keywords: Bladder cancer; Computational pathology; Convolutional neural networks
Year: 2022 PMID: 36268091 PMCID: PMC9577122 DOI: 10.1016/j.jpi.2022.100135
Source DB: PubMed Journal: J Pathol Inform
Distribution of collected whole-slide images from four classes (PUNLMP, low-grade cases, high-grade cases, and IUC) and distribution of low-risk and high-risk images in our datasets.
| Histologic subtype | Internal (NHBCS) training set | Internal (NHBCS) test set | External (TCGA) test set |
|---|---|---|---|
| Low-risk cases (PUNLMP + Low Grade) | 248 (94 + 154) | 107 (39 + 68) | 11 (11 + 0) |
| High-risk cases (High Grade + IUC) | 177 (144 + 33) | 75 (62 + 13) | 367 (0 + 367) |
| Total | 425 | 182 | 378 |
Fig. 1Overview of the Bladder4Net pipeline. Tissue patches were extracted from whole-slide images using the sliding-window method with 1/3 overlap after background, marker, and stain removal. Next, these patches were forward passed through four different binary CNN classifiers. The resulting predictions were grouped, and the ratio of patches from each class was computed. The above was used as input for a Gaussian process classifier to determine the final prediction: low risk or high risk.
Fig. 2The ratio of patch predictions for four samples of whole-slide images (one per class). From left to right: the WSI corresponding to the PUNLMP class presents a high ratio of patch predictions for its respective class. The same behavior occurs for the low-grade class sample and high-grade class sample. However, the invasive sample does not follow this pattern because CNN classifiers cannot correctly predict this class; instead, they predict this class as a high-grade class.
Fig. 3Patch prediction aggregation. Each patch label set should be in one of the four presented combinations. Other patch label combinations, where a patch can belong to more than one class, indicate a low confident patch and are eliminated. This particular case corresponds to low-grade risk.
Model performance on 182 whole-slide images from the internal test set of NHBCS. The 95% confidence interval is also included for each measure.
| Subtype | Accuracy | Precision | Recall | F1-score |
|---|---|---|---|---|
| Low-risk case | 0.91 (0.86–0.94) | 0.89 (0.83–0.94) | 0.95 (0.91–0.98) | 0.92 (0.88–0.95) |
| High-risk case | 0.91 (0.86–0.95) | 0.93 (0.88–0.98) | 0.85 (0.78–0.92) | 0.89 (0.84–0.94) |
| Average | 0.91 (0.86–0.94) | 0.91 (0.87–0.95) | 0.91 (0.86–0.94) | 0.91 (0.86–0.94) |
Model performance on 378 whole-slide images from the external TCGA dataset. The 95% confidence interval is also included for each measure.
| Subtype | Accuracy | Recall | Precision | F1-score |
|---|---|---|---|---|
| Low-risk case | 0.99 (0.98–1.00) | 1.0 (1.0–1.0) | 0.73 (0.46–0.99) | 0.84 (0.63–1.00) |
| High-risk case | 0.99 (0.98–1.00) | 0.99 (0.98–1.0) | 1.0 (1.0–1.0) | 0.99 (0.98–1.00) |
| Average | 0.99 (0.98–1.00) | 0.99 (0.98–1.0) | 0.99 (0.98–1.00) | 0.99 (0.97–1.00) |
Fig. 4Each confusion matrix summarizes the model results compared to ground truth labels from pathologists on the (left) internal NHBCS test set and (right) external TCGA dataset.
Fig. 5Kaplan–Meier survival curve of patients from the internal NHBCS test set with up to 216 months of follow-up.
The hazard ratio of overall survival using the predicted risk group (predicted groups) versus the tumor grade-defined risk groups on patients from the internal NHBCS test set and the associated 95% CIs and p-values.
| Predictor | Hazard ratio | p-Value |
|---|---|---|
| Predicted risk groups | 1.958 (1.222, 3.137) | 0.00523 |
| Tumor grade-defined risk groups | 1.945 (1.218, 3.107) | 0.00537 |
Fig. 6Kaplan–Meier survival curve of patients from the TCGA bladder cancer dataset with up to 48 months of follow-up.