| Literature DB >> 36068311 |
Suzanne C Wetstein1, Vincent M T de Jong2, Nikolas Stathonikos3, Mark Opdam2, Gwen M H E Dackus2,3, Josien P W Pluim1, Paul J van Diest3, Mitko Veta4.
Abstract
Breast cancer tumor grade is strongly associated with patient survival. In current clinical practice, pathologists assign tumor grade after visual analysis of tissue specimens. However, different studies show significant inter-observer variation in breast cancer grading. Computer-based breast cancer grading methods have been proposed but only work on specifically selected tissue areas and/or require labor-intensive annotations to be applied to new datasets. In this study, we trained and evaluated a deep learning-based breast cancer grading model that works on whole-slide histopathology images. The model was developed using whole-slide images from 706 young (< 40 years) invasive breast cancer patients with corresponding tumor grade (low/intermediate vs. high), and its constituents nuclear grade, tubule formation and mitotic rate. The performance of the model was evaluated using Cohen's kappa on an independent test set of 686 patients using annotations by expert pathologists as ground truth. The predicted low/intermediate (n = 327) and high (n = 359) grade groups were used to perform survival analysis. The deep learning system distinguished low/intermediate versus high tumor grade with a Cohen's Kappa of 0.59 (80% accuracy) compared to expert pathologists. In subsequent survival analysis the two groups predicted by the system were found to have a significantly different overall survival (OS) and disease/recurrence-free survival (DRFS/RFS) (p < 0.05). Univariate Cox hazard regression analysis showed statistically significant hazard ratios (p < 0.05). After adjusting for clinicopathologic features and stratifying for molecular subtype the hazard ratios showed a trend but lost statistical significance for all endpoints. In conclusion, we developed a deep learning-based model for automated grading of breast cancer on whole-slide images. The model distinguishes between low/intermediate and high grade tumors and finds a trend in the survival of the two predicted groups.Entities:
Mesh:
Year: 2022 PMID: 36068311 PMCID: PMC9448798 DOI: 10.1038/s41598-022-19112-9
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Figure 1Overview of the deep learning model and training procedure. The components of the model outlined in yellow are used only at training time, this includes the multi-task learning with the auxiliary tasks and the computation of the loss term.
Patient characteristics of all 1392 women included in our cohort divided in the development dataset (n = 706) and test dataset (n = 686).
| Patient characteristics | Development dataset | Test dataset | |
|---|---|---|---|
| 706 | 686 | ||
| 1.00 | |||
| Median years (Interquartile range) | 36 (33–38) | 36 (33–38) | |
| 0.25 | |||
| Grade 1 | 113 (16) | 89 (13) | |
| Grade 2 | 244 (35) | 238 (35) | |
| Grade 3 | 349 (49) | 359 (52) | |
| 0.66 | |||
| 1 | 20 (3) | 16 (2) | |
| 2 | 335 (47) | 315 (46) | |
| 3 | 349 (49) | 354 (52) | |
| 0.62 | |||
| 1 | 46 (7) | 43 (6) | |
| 2 | 135 (19) | 118 (17) | |
| 3 | 524 (74) | 524 (76) | |
| 0.64 | |||
| 1 | 249 (35) | 230 (34) | |
| 2 | 170 (24) | 161 (23) | |
| 3 | 286 (41) | 295 (43) | |
| 0.06 | |||
| HR+/HER2− | 394 (56) | 345 (50) | |
| HR−/HER2− | 182 (26) | 218 (32) | |
| HR+/HER2+ | 85 (12) | 74 (11) | |
| HR−/HER2+ | 36 (5) | 42 (6) | |
| 0.22 | |||
| 1A–B | 128 (18) | 114 (17) | |
| 1C | 365 (52) | 366 (54) | |
| 2–3 | 195 (28) | 198 (29) | |
| Missing | 18 (3) | 8 (1) | |
| 0.81 | |||
| Absent | 587 (83) | 566 (83) | |
| Present | 119 (17) | 120 (17) | |
| 0.33 | |||
| Conserving surgery with radiotherapy | 448 (63) | 451 (66) | |
| Mastectomy without radiotherapy | 213 (30) | 184 (27) | |
| Other | 45 (6) | 51 (7) |
HR hormone receptor.
Agreement and accuracy of model versus pathologist grading of no special type (NST) tumors.
| Model targets | Cohen’s Kappa (SD) | Accuracy (SD) |
|---|---|---|
| Tumor grade only | 0.54 (± 0.10) | 0.77 (± 0.05) |
| Tumor grade and component grades | 0.61 (± 0.09) | 0.80 (± 0.05) |
| Tumor grade, component grades and HR and HER2 status | 0.58 (± 0.09) | 0.79 (± 0.05) |
The results for three models trained on different sets of targets are shown on the validation set (n = 142). The standard deviation (SD) was calculated using bootstrapping.
HR hormone receptor.
Agreement and accuracy of model versus pathologists grading, overall as well as split by nuclear pleomorphism, tubular differentiation and mitotic count.
| Target | Cohen’s Kappa (SD) | Accuracy (SD) |
|---|---|---|
| Tumor grade | 0.59 (± 0.04) | 0.80 (± 0.02) |
| Nuclear score | 0.41 (± 0.04) | 0.70 (± 0.02) |
| Tubular score | 0.35 (± 0.04) | 0.70 (± 0.02) |
| Mitoses score | 0.53 (± 0.04) | 0.77 (± 0.02) |
Results are shown on the test set (n = 686). The standard deviation (SD) was calculated using bootstrapping.
Figure 2Confusion matrices for no special type (NST) tumor grading and grade components (nuclear, tubular and mitoses scores) between pathologists and the deep learning model. These results are on the test set (n = 686).
Figure 3Kaplan–Meier survival curves for young breast cancer patients grouped by low/intermediate versus high grade tumors as assigned by pathologists (A) and the deep learning model (B) for overall survival (1), distant recurrence free survival (2) and recurrence free survival (3). These results are on the test set (n = 686).
Eight-year survival rates of low/intermediate and high grade groups as assigned by pathologists and the deep learning model for overall survival (OS), distant recurrence free survival (DRFS) and recurrence free survival (RFS).
| Survival endpoint | 8-year survival rate (% (95% CI)) | |
|---|---|---|
| Pathologists | Deep learning model | |
| Low/intermediate grade | 85.7 (81.3–90.4) | 86.4 (82.8–90.2) |
| High grade | 76.1 (72.3–80.2) | 72.8 (68.3–77.6) |
| Low/intermediate grade | 82.7 (78.6–86.9) | 82.3 (77.4–87.5) |
| High grade | 70.3 (65.6–75.4) | 73.2 (69.2–77.5) |
| Low/intermediate grade | 74.2 (69.6–79.2) | 75.6 (70.1–81.5) |
| High grade | 65.1 (60.3–70.4) | 66.5 (62.2–71.1) |
Results are shown on the test set (n = 686).
Univariate hazard ratios showing the prognostic value of high versus low/intermediate grade tumors as assessed by pathologists or the deep learning model for different survival endpoints.
| Survival endpoint | Pathologists | Deep learning model | ||
|---|---|---|---|---|
| Hazard ratio (95% CI) | Hazard ratio (95% CI) | |||
| Overall survival | 2.23 (1.56–3.19) | < 0.001 | 1.84 (1.24–2.72) | 0.025 |
| Distant recurrence free survival | 1.92 (1.38–2.67) | < 0.001 | 1.66 (1.16–2.39) | 0.006 |
| Recurrence free survival | 1.49 (1.12–1.97) | 0.006 | 1.50 (1.10–2.05) | 0.011 |
Results are shown on the test set (n = 686).
Multivariate hazard ratios showing the prognostic value of high versus low/intermediate grade tumors as assessed by pathologists or the deep learning model for different survival endpoints.
| Survival endpoint | Variables | Pathologists | Deep learning model | ||
|---|---|---|---|---|---|
| Hazard ratio (95% CI) | Hazard ratio (95% CI) | ||||
| Overall survival | |||||
| Low/intermediate | REF | REF | |||
| High | 1.87 (1.24–2.82) | < 0.01 | 1.39 (0.89–2.18) | 0.15 | |
| 1A–B | REF | REF | |||
| 1C | 1.59 (0.91–2.79) | 0.11 | 1.67 (0.95–2.92) | 0.07 | |
| 2–3 | 1.54 (0.85–2.80) | 0.15 | 1.67 (0.93–3.03) | 0.09 | |
| Absent | REF | REF | |||
| Present | 2.45 (1.68–3.56) | < 0.01 | 2.61 (1.80–3.78) | < 0.01 | |
| Conserving surgery with radiotherapy | REF | REF | |||
| Mastectomy without radiotherapy | 0.96 (0.64–1.44) | 0.84 | 1.02 (0.68–1.52) | 0.93 | |
| Other | 2.23 (1.29–3.85) | < 0.01 | 2.30 (1.33–3.98) | < 0.01 | |
| Distant recurrence free survival | |||||
| Low/intermediate | REF | REF | |||
| High | 1.70 (1.16–2.48) | < 0.01 | 1.49 (0.99–2.25) | 0.06 | |
| 1A–B | REF | REF | |||
| 1C | 2.15 (1.19–3.89) | 0.01 | 2.23 (1.23–4.02) | < 0.01 | |
| 2–3 | 2.57 (1.39–4.75) | < 0.01 | 2.72 (1.48–5.02) | < 0.01 | |
| Absent | REF | REF | |||
| Present | 2.70 (1.91–3.82) | < 0.01 | 2.87 (2.03–4.04) | < 0.01 | |
| Conserving surgery with radiotherapy | REF | REF | |||
| Mastectomy without radiotherapy | 1.08 (0.75–1.57) | 0.67 | 1.15 (0.79–1.66) | 0.47 | |
| Other | 2.24 (1.32–3.81) | < 0.01 | 2.31 (1.36–3.94) | < 0.01 | |
| Recurrence free survival | |||||
| Low/intermediate | REF | REF | |||
| High | 1.30 (0.94–1.80) | 0.12 | 1.37 (0.96–1.96) | 0.08 | |
| 1A–B | REF | REF | |||
| 1C | 1.64 (1.02–2.61) | 0.04 | 1.65 (1.04–2.63) | 0.03 | |
| 2–3 | 1.93 (1.18–3.17) | < 0.01 | 1.97 (1.21–3.21) | < 0.01 | |
| Absent | REF | REF | |||
| Present | 2.70 (1.98–3.68) | < 0.01 | 2.75 (2.02–3.73) | < 0.01 | |
| Conserving surgery with radiotherapy | REF | REF | |||
| Mastectomy without radiotherapy | 1.00 (0.72–1.39) | 1.00 | 1.02 (0.74–1.42) | 0.88 | |
| Other | 1.72 (1.05–2.82) | 0.03 | 1.74 (1.06–2.85) | 0.03 | |
This model was stratified by molecular subtype. Results are shown on the test set (n = 686).