| Literature DB >> 31754628 |
Maeve Mullooly1,2, Babak Ehteshami Bejnordi3,4, Andrew Beck4, Mark E Sherman5, Gretchen L Gierach2, Ruth M Pfeiffer2, Shaoqi Fan2, Maya Palakal2, Manila Hada2, Pamela M Vacek6, Donald L Weaver6, John A Shepherd7,8, Bo Fan7, Amir Pasha Mahmoudzadeh7, Jeff Wang9, Serghei Malkov7, Jason M Johnson10, Sally D Herschorn6, Brian L Sprague6, Stephen Hewitt11, Louise A Brinton2, Nico Karssemeijer3, Jeroen van der Laak3.
Abstract
Breast density, a breast cancer risk factor, is a radiologic feature that reflects fibroglandular tissue content relative to breast area or volume. Its histology is incompletely characterized. Here we use deep learning approaches to identify histologic correlates in radiologically-guided biopsies that may underlie breast density and distinguish cancer among women with elevated and low density. We evaluated hematoxylin and eosin (H&E)-stained digitized images from image-guided breast biopsies (n = 852 patients). Breast density was assessed as global and localized fibroglandular volume (%). A convolutional neural network characterized H&E composition. In total 37 features were extracted from the network output, describing tissue quantities and morphological structure. A random forest regression model was trained to identify correlates most predictive of fibroglandular volume (n = 588). Correlations between predicted and radiologically quantified fibroglandular volume were assessed in 264 independent patients. A second random forest classifier was trained to predict diagnosis (invasive vs. benign); performance was assessed using area under receiver-operating characteristics curves (AUC). Using extracted features, regression models predicted global (r = 0.94) and localized (r = 0.93) fibroglandular volume, with fat and non-fatty stromal content representing the strongest correlates, followed by epithelial organization rather than quantity. For predicting cancer among high and low fibroglandular volume, the classifier achieved AUCs of 0.92 and 0.84, respectively, with epithelial organizational features ranking most important. These results suggest non-fatty stroma, fat tissue quantities and epithelial region organization predict fibroglandular volume. The model holds promise for identifying histological correlates of cancer risk in patients with high and low density and warrants further evaluation. © This is a U.S. government work and not under copyright protection in the U.S.; foreign copyright protection may apply 2019.Entities:
Keywords: Cancer epidemiology; Cancer prevention
Year: 2019 PMID: 31754628 PMCID: PMC6864056 DOI: 10.1038/s41523-019-0134-6
Source DB: PubMed Journal: NPJ Breast Cancer ISSN: 2374-4677
Selected characteristics of study participants from the BREAST-Stamp Project, who were referred for an image-guided breast biopsy, stratified by the training and testing sets (n = 852)
| Characteristic | Overall ( | Training ( | Testing ( | ||||
|---|---|---|---|---|---|---|---|
| % | % | % | |||||
| Age at ipsilateral mammogram (years) | 0.85 | ||||||
| <45 | 175 | 20.5 | 119 | 20.2 | 56 | 21.2 | |
| 45–49 | 217 | 25.5 | 152 | 25.9 | 65 | 24.6 | |
| 50–54 | 202 | 23.7 | 142 | 24.2 | 60 | 22.7 | |
| 55–59 | 145 | 17.0 | 95 | 16.2 | 50 | 18.9 | |
| 60+ | 113 | 13.3 | 80 | 13.6 | 33 | 12.5 | |
| Mean (SD) | 50.8 (6.9) | 50.8 (6.9) | 50.7 (6.8) | 0.91** | |||
| Race | 0.81 | ||||||
| White, non Hispanic | 778 | 91.3 | 536 | 91.2 | 242 | 91.7 | |
| Other | 74 | 8.7 | 52 | 8.8 | 22 | 8.3 | |
| Education level | 0.70 | ||||||
| <High school | 14 | 1.7 | 9 | 1.6 | 5 | 2.0 | |
| High school graduation | 132 | 16.0 | 95 | 16.7 | 37 | 14.6 | |
| College/graduation school degree | 678 | 82.3 | 466 | 81.8 | 212 | 83.5 | |
| BMI (kg/m2) | 0.22 | ||||||
| <25 | 427 | 50.4 | 283 | 48.4 | 144 | 54.8 | |
| 25-<30 | 212 | 25.0 | 153 | 26.2 | 59 | 22.4 | |
| 30+ | 209 | 24.7 | 149 | 25.5 | 60 | 22.8 | |
| Mean (SD) | 26.7 (6.3) | 26.8 (6.2) | 26.6 (6.5) | 0.43† | |||
| Age at menarche (years) | 0.41 | ||||||
| ≤12 | 326 | 38.9 | 216 | 37.2 | 110 | 42.8 | |
| 13 | 322 | 38.4 | 233 | 40.1 | 89 | 34.6 | |
| 14 | 114 | 13.6 | 80 | 13.8 | 34 | 13.2 | |
| 15+ | 76 | 9.1 | 52 | 9.0 | 24 | 9.3 | |
| Age at first birth (years) | 0.54 | ||||||
| Nulliparous | 189 | 22.3 | 134 | 22.9 | 55 | 21.2 | |
| <25 | 269 | 31.8 | 193 | 32.9 | 76 | 29.2 | |
| 25–30 | 202 | 23.9 | 135 | 23.0 | 67 | 25.8 | |
| 30+ | 186 | 22.0 | 124 | 21.2 | 62 | 23.9 | |
| Menopausal status | 0.89 | ||||||
| Premenopausal | 472 | 58.1 | 326 | 58.2 | 146 | 57.7 | |
| Postmenopausal | 341 | 41.9 | 234 | 41.8 | 107 | 42.3 | |
| Menopausal hormone therapy use | 0.88 | ||||||
| Never | 719 | 86.1 | 497 | 86.0 | 222 | 86.4 | |
| Ever | 116 | 13.9 | 81 | 14.0 | 35 | 13.6 | |
| First degree family history of breast cancer | 0.45 | ||||||
| 0 | 636 | 77.0 | 439 | 77.6 | 197 | 75.8 | |
| 1 | 167 | 20.2 | 114 | 20.1 | 53 | 20.4 | |
| 2+ | 23 | 2.8 | 13 | 2.3 | 10 | 3.9 | |
| Breast biopsy prior to enrollment | 0.54 | ||||||
| No | 580 | 68.9 | 404 | 69.5 | 176 | 67.4 | |
| Yes | 262 | 31.1 | 177 | 30.5 | 85 | 32.6 | |
| Global FGV (%)c | 0.55 | ||||||
| ≤34.4 (%) | 439 | 51.5 | 307 | 52.2 | 132 | 50.0 | |
| >34.4 (%) | 413 | 48.5 | 281 | 47.8 | 132 | 50.0 | |
| Median (Range) | 34.4 (0.6, 99.5) | 34.4 (0.6, 99.5) | 36.2, (1.4, 99.3) | ||||
| Localized FGV (%)c | 0.21 | ||||||
| ≤40 (%) | 406 | 50.7 | 289 | 52.2 | 117 | 47.4 | |
| >40 (%) | 395 | 49.3 | 265 | 47.8 | 130 | 52.6 | |
| Median (Range) | 40.0 (0, 100) | 39.8 (0, 100) | 43.3 (0, 100) | ||||
| Biopsy type | 0.82b | ||||||
| Ultrasound-guided (14-guage) | 445 | 52.2 | 309 | 52.6 | 136 | 51.5 | |
| Stereotactic-guided (9-guage) | 406 | 47.7 | 279 | 47.5 | 127 | 48.1 | |
| Both | 1 | 0.1 | 0 | 0.0 | 1 | 0.4 | |
| BI-RADS mammography assessment | 0.89 | ||||||
| Probably benign finding | 47 | 5.9 | 32 | 5.8 | 15 | 6.1 | |
| Suspicious abnormality | 670 | 83.7 | 463 | 83.4 | 207 | 84.2 | |
| Highly suggestive of malignancy | 84 | 10.5 | 60 | 10.8 | 24 | 9.8 | |
| Pathologic diagnosisa# | 0.23 | ||||||
| Benign non-proliferative | 282 | 33.1 | 190 | 32.3 | 92 | 34.9 | |
| Proliferative without atypia | 316 | 37.1 | 215 | 36.6 | 101 | 38.3 | |
| Proliferative with atypia | 57 | 6.7 | 44 | 7.5 | 13 | 4.9 | |
| In-situ (LCIS or DCIS) | 76 | 8.9 | 48 | 8.2 | 28 | 10.6 | |
| Invasive breast cancer | 121 | 14.2 | 91 | 15.5 | 30 | 11.4 | |
| Characteristic (per biopsy target, | |||||||
| Biopsy type | 0.63b | ||||||
| Ultrasound-guided (14-guage) | 566 | 54.6 | 372 | 54.2 | 194 | 55.6 | |
| Stereotactic-guided (9-guage) | 469 | 45.3 | 315 | 45.9 | 154 | 44.1 | |
| Both | 1 | 0.1 | 0 | 0.0 | 1 | 0.3 | |
| Pathologic diagnosis# | 0.39 | ||||||
| Benign non-proliferative | 373 | 36.0 | 242 | 35.2 | 131 | 37.5 | |
| Proliferative without atypia | 369 | 35.6 | 242 | 35.2 | 127 | 36.4 | |
| Proliferative with atypia | 68 | 6.6 | 52 | 7.6 | 16 | 4.6 | |
| In-situ (LCIS or DCIS) | 83 | 8.0 | 53 | 7.7 | 30 | 8.6 | |
| Invasive breast cancer | 143 | 13.8 | 98 | 14.3 | 45 | 12.9 | |
BMI body mass index, DCIS ductal carcinoma in situ, FGV fibroglandular volume, LCIS lobular carcinoma in situ, SD standard deviation
Missing data were excluded from percentage calculations and statistical comparisons: 28 for education levels, 4 BMI, 14 age at menarche, 6 age at first birth, 39 menopausal status, 17 menopausal hormone therapy use, 26 first degree family history of breast cancer, 10 breast biopsy prior to enrollment, 51 percent volumetric local density (biopsy radius 0–2 mm), 51 BI-RADS mammography assessment
*P-values from Chi-Square test except where noted
**P-value from two-sample t-test
†P-values from Kruskal–Wallis test
aAmong women with multiple biopsies, this was the worst pathologic diagnosis
bOne woman from the test group who had both biopsy types was excluded from the Chi-square test
cThe median cut points of breast density were determined among the training population and were consistent among all 852 women
#Benign non-proliferative diagnosis includes non-proliferative fibrocystic change and other benign and discrete entities; Proliferative without atypia includes ductal hyperplasia and sclerosing adensosis; Proliferative with atypia includes atypical ductal and lobular hyperplasia
Summary of top 10 ranked histologic features identified in the random forest model for the prediction of global and localized % fibroglandular volume (FGV)
| Feature Name | Rank of feature importance Predicted model: global FGV (%) | Rank of feature importance Predicted model: localized FGV (%) |
|---|---|---|
| Global tissue amount | ||
| Fat amount (µm2) | 5 | 3 |
| Fat amount normalized (%) | 2 | 2 |
| Stroma amount (µm2) | 3 | 5 |
| Stroma amount normalized (%) | 1 | 1 |
| Epithelium amount normalized (%) | − | 8 |
| Morphology | ||
| Ecc epi regions (median) | 7 | − |
| Ecc epi regions (IQ) | 6 | 9 |
| Spatial arrangement of the epithelial regions (Area-Voronoi diagram) | ||
| Voronoi area (mean µm2) | 10 | − |
| Voronoi area (median µm2) | − | − |
| Voronoi area (IQ µm2) | 8 | 6 |
| Ratio epi to Voronoi (mean) | − | 7 |
| Ratio epi to Voronoi (median) | − | 4 |
| Ratio epi to non-epi (mean) | 9 | − |
| Spatial arrangement of the epithelial regions (Delaunay Triangulation) | ||
| Neighbors (SD) | 4 | 10 |
Ecc eccentricity, epi epithelial, IQ interquartile, FGV fibroglandular volume, SD standard deviation
Only features ranked within the top 10 for prediction of each, FGV density measure, are included in the table
Features are ranked numerically and sequentially from 1 to 10, with 1 representing the most important feature and 10 representing the 10th most important feature
Summary of top 10 ranked histologic features identified in the random forest model for the prediction of invasive cancer status among women with high and low % fibroglandular volume
| Feature Name | High global FGV (%) (> median) | Low global FGV (%) (≤ median) | High localized FGV (%) (> median) | Low localized FGV (%) (≤ median) |
|---|---|---|---|---|
| Global tissue amount | ||||
| Fat amount (µm2) | − | 8 | − | 5 |
| Fat amount normalized (%) | − | 3 | − | 7 |
| Stroma amount (µm2) | − | 9 | − | − |
| Stroma amount normalized (%) | − | 4 | 10 | 3 |
| Epithelium amount (µm2) | 4 | − | 4 | 1 |
| Epithelium amount normalized (%) | 6 | − | 7 | 8 |
| Morphology | ||||
| Epithelial regions (IQ µm2) | − | − | 1 | 4 |
| Epithelial regions (max µm2) | 9 | − | − | − |
| Ecc epi regions (mean) | 10 | − | 3 | 9 |
| Ecc epi regions (median) | − | − | 2 | 2 |
| Ecc epi regions (IQ) | − | − | 5 | 10 |
| Spatial arrangement of the epithelial regions (Area-Voronoi diagram) | ||||
| Voronoi area (mean µm2) | 5 | 7 | 6 | − |
| Voronoi area (median µm2) | 3 | 5 | − | − |
| Voronoi area (SD µm2) | − | − | 9 | − |
| Voronoi area (IQ µm2) | 7 | 6 | − | − |
| Ratio epi to Voronoi (mean) | 1 | 2 | 8 | − |
| Ratio epi to Voronoi (median) | 2 | 1 | − | − |
| Ratio epi to Voronoi (IQ) | − | 10 | − | − |
| Ratio epi to non-epi (median) | 6 | |||
| Spatial arrangement of the epithelial regions (Delaunay Triangulation) | ||||
| Neighbors (mean number) | 8 | − | − | − |
Ecc eccentricity, Epi epithelial, IQ interquartile, FGV fibroglandular volume, SD standard deviation
Only histologic features ranked within the top 10 for prediction of each density measure are included in the table
Features are ranked numerically and sequentially from 1–10, with 1 representing the most important feature and 10 representing the 10th most important feature
The median cut points of breast density used in stratification were: global FGV (%) 34.4, localized FGV (%) 40.0
Fig. 2a, b Representative histological whole slide H&E images of breast biopsies and corresponding full-field digital mammograms from patients with similar radiological global fibroglandular volume but whose biopsies yielded different diagnoses of atypical ductal hyperplasia a and invasive carcinoma b
Fig. 3ROC curves (AUC with 95% confidence intervals) for the prediction of invasive cancer among women with high a and low b percent global fibroglandular volume, high c and low d percent localized fibroglandular volume. AUC area under the curve, ROC receiver-operating characteristic
Fig. 1Workflow overview utilizing training and testing sets for the prediction of global and localized fibroglandular volume (FGV) measures from identified convolutional neural network model features