| Literature DB >> 35690630 |
Khadijeh Saednia1,2, Andrew Lagree2, Marie A Alera2, Lauren Fleshner2, Audrey Shiner2, Ethan Law2, Brianna Law2, David W Dodington3, Fang-I Lu3, William T Tran2,4,5, Ali Sadeghi-Naini6,7,8,9.
Abstract
Complete pathological response (pCR) to neoadjuvant chemotherapy (NAC) is a prognostic factor for breast cancer (BC) patients and is correlated with improved survival. However, pCR rates are variable to standard NAC, depending on BC subtype. This study investigates quantitative digital histopathology coupled with machine learning (ML) to predict NAC response a priori. Clinicopathologic data and digitized slides of BC core needle biopsies were collected from 149 patients treated with NAC. The nuclei within the tumor regions were segmented on the histology images of biopsy samples using a weighted U-Net model. Five pathomic feature subsets were extracted from segmented digitized samples, including the morphological, intensity-based, texture, graph-based and wavelet features. Seven ML experiments were conducted with different feature sets to develop a prediction model of therapy response using a gradient boosting machine with decision trees. The models were trained and optimized using a five-fold cross validation on the training data and evaluated using an unseen independent test set. The prediction model developed with the best clinical features (tumor size, tumor grade, age, and ER, PR, HER2 status) demonstrated an area under the ROC curve (AUC) of 0.73. Various pathomic feature subsets resulted in models with AUCs in the range of 0.67 and 0.87, with the best results associated with the graph-based and wavelet features. The selected features among all subsets of the pathomic and clinicopathologic features included four wavelet and three graph-based features and no clinical features. The predictive model developed with these features outperformed the other models, with an AUC of 0.90, a sensitivity of 85% and a specificity of 82% on the independent test set. The results demonstrated the potential of quantitative digital histopathology features integrated with ML methods in predicting BC response to NAC. This study is a step forward towards precision oncology for BC patients to potentially guide future therapies.Entities:
Mesh:
Year: 2022 PMID: 35690630 PMCID: PMC9188550 DOI: 10.1038/s41598-022-13917-4
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Figure 1(a) An example of tumor bed annotation on a representative segment of the core on WSI (2188 × 4124 pixels), (b) the non-overlapping 768 × 768 pixel tiles extracted from the tumor region with the excluded tiles shaded (less than 50% tumor or more than 10% white background), (c) an extracted tile with 100% tumor tissue, and (d) the generated binary mask of the nuclei in the tile.
Demographic and clinical information of the patients involved in the study. The distribution of each variable was compared between the training and test sets using the Pearson's Chi-squared homogeneity test for categorical variables and the and t test for continuous variables; the p-values are reported in the last column.
| Patient demographics and clinicopathologic characteristics | Count (%) | p-value | |||
|---|---|---|---|---|---|
| Training (n = 111) | Test (n = 38) | ||||
| pCR (n = 39) | Non-pCR (n = 72) | pCR (n = 11) | Non-pCR (n = 27) | p = 0.49 | |
| Median age (years) | 48.4 | 52.2 | 50.6 | 52.1 | p = 0.33 |
| Pre/peri-menopausal | 22 (56%) | 36 (50%) | 6 (55%) | 15 (56%) | p = 0.75 |
| Post-menopausal | 17 (44%) | 36 (50%) | 5 (45%) | 12 (44%) | |
| ER positive | 14 (36%) | 53 (74%) | 2 (18%) | 18 (67%) | p = 0.27 |
| PR positive | 13 (33%) | 43 (60%) | 1 (9%) | 16 (59%) | p = 0.11 |
| HER2 positive | 27 (69%) | 22 (31%) | 7 (64%) | 10 (37%) | p = 0.32 |
| Invasive ductal carcinoma | 39 (100%) | 63 (88%) | 11 (100%) | 10 (37%) | p = 0.49 |
| Invasive lobular carcinoma | 0(0%) | 9 (9%) | 0 (0%) | 17 (63%) | |
| 1 | 1 (2%) | 3 (4%) | 0 (0%) | 0 (0%) | p = 0.50 |
| 2 | 9 (23%) | 36 (50%) | 2 (18%) | 18 (67%) | |
| 3 | 29 (75%) | 33 (46%) | 9 (82%) | 9 (33%) | |
| Mean tumor size (mm; ± SD) | 37.2 ± 21.2 | 50.9 ± 28.9 | 44.1 ± 25.8 | 48.9 ± 27.9 | p = 0.78 |
| Inflammatory breast cancer | 4 (10%) | 10 (14%) | 0 (0%) | 1 (4%) | p = 0.53 |
Figure 2The importance gain score of the first 15 features with highest contribution to the predictive model for different feature subsets: (a) clinical, (b) morphological, (c) intensity-based, (d) texture, (e) graph-based, (f) wavelet, and (g) all features. The green bars are associated with the features included in the NAC response biomarker in each experiment.
Figure 3Box plots of the selected features for the pCR and non-pCR cohorts of the training set obtained in the seven experiments conducted using different feature subsets: (a) clinical, (b) morphological, (c) intensity-based, (d) texture, (e) graph-based, (f) wavelet, and (g) all features. The feature values are normalized in the range of 0 and 1. The order of features in each plot is the same as that of the associated plot in Fig. 2.
Results of NAC response prediction at pre-treatment using the clinicopathological and/or pathomic features, on the training, validation and test sets. The features included in each optimal biomarker have been listed in Fig. 2. For the validation set, the 95% confidence intervals are reported over the five folds of cross validation. The best value in each column is in bold.
| Features | Number of features in optimal biomarker | Tr Acc | Val Acc | Val AUC | Val Sen | Val Spec | Te Acc | Te AUC | Te Sen | Te Spec |
|---|---|---|---|---|---|---|---|---|---|---|
| Clinical features | 6 | 0.72 | 0.71 ± 0.02 | 0.74 ± 0.04 | 0.70 ± 0.04 | 0.73 ± 0.03 | 0.71 | 0.73 | 0.70 | 0.73 |
| Morphological features | 9 | 0.76 | 0.76 ± 0.03 | 0.78 ± 0.05 | 0.78 ± 0.03 | 0.67 ± 0.02 | 0.74 | 0.78 | 0.78 | 0.64 |
| Intensity-based features | 10 | 0.78 | 0.75 ± 0.02 | 0.77 ± 0.03 | 0.77 ± 0.02 | 0.69 ± 0.02 | 0.74 | 0.78 | 0.78 | 0.64 |
| Texture features | 5 | 0.76 | 0.75 ± 0.01 | 0.70 ± 0.03 | 0.78 ± 0.02 | 0.68 ± 0.03 | 0.74 | 0.67 | 0.78 | 0.64 |
| Graph-based features | 5 | 0.75 | 0.76 ± 0.01 | 0.79 ± 0.02 | 0.79 ± 0.03 | 0.72 ± 0.03 | 0.76 | 0.80 | 0.78 | 0.73 |
| Wavelet features | 9 | 0.82 | 0.83 ± 0.02 | 0.84 ± 0.03 | 0.80 ± 0.04 | 0.82 | 0.87 | 0.81 | ||
| All features | 7 | 0.82 ± 0.02 |
Acc accuracy, AUC area under the curve, Sen sensitivity, Spec specificity, Tr training, Val validation, Te test.
Figure 4Receiver operating characteristic (ROC) curves on the independent test set for the predictive models developed with the selected features obtained in different experiments. In the last experiment and from all feature subsets, 7 wavelet and graph-based features were selected.