Literature DB >> 33644680

Deep Learning Image Analysis of Benign Breast Disease to Identify Subsequent Risk of Breast Cancer.

Adithya D Vellal¹, Korsuk Sirinukunwattan¹, Kevin H Kensler², Gabrielle M Baker¹, Andreea L Stancu¹, Michael E Pyle¹, Laura C Collins¹, Stuart J Schnitt³, James L Connolly¹, Mitko Veta⁴, A Heather Eliassen⁵, Rulla M Tamimi⁵, Yujing J Heng¹.

Abstract

Background: New biomarkers of risk may improve breast cancer (BC) risk prediction. We developed a computational pathology method to segment benign breast disease (BBD) whole slide images into epithelium, fibrous stroma, and fat. We applied our method to the BBD BC nested case-control study within the Nurses' Health Studies to assess whether computer-derived tissue composition or a morphometric signature was associated with subsequent risk of BC.
Methods: Tissue segmentation and nuclei detection deep-learning networks were established and applied to 3795 whole slide images from 293 cases who developed BC and 1132 controls who did not. Percentages of each tissue region were calculated, and 615 morphometric features were extracted. Elastic net regression was used to create a BC morphometric signature. Associations between BC risk factors and age-adjusted tissue composition among controls were assessed using analysis of covariance. Unconditional logistic regression, adjusting for the matching factors, BBD histological subtypes, parity, menopausal status, and body mass index evaluated the relationship between tissue composition and BC risk. All statistical tests were 2-sided.
Results: Among controls, direction of associations between BBD subtypes, parity, and number of births with breast composition varied by tissue region; select regions were associated with childhood body size, body mass index, age of menarche, and menopausal status (all P < .05). A higher proportion of epithelial tissue was associated with increased BC risk (odds ratio = 1.39, 95% confidence interval = 0.91 to 2.14, for highest vs lowest quartiles, P trend = .047). No morphometric signature was associated with BC. Conclusions: The amount of epithelial tissue may be incorporated into risk assessment models to improve BC risk prediction.

Entities: Chemical

Mesh：

Year: 2021 PMID： 33644680 PMCID： PMC7898083 DOI： 10.1093/jncics/pkaa119

Source DB: PubMed Journal: JNCI Cancer Spectr ISSN： 2515-5091

One in 8 women in the United States will develop breast cancer (BC) in her lifetime (1). Although early detection is imperative, identifying and lowering BC risk may help reduce BC morbidity and mortality. BC risk factors may be nonmodifiable (eg, genetics, dense breast tissue, and benign breast disease [BBD]) or modifiable (eg, adiposity and alcohol consumption). Among women diagnosed with BBD, the subsequent BC risk varies with the subtype of BBD in this order: nonproliferative, proliferative without atypia, and proliferative with atypia (2–4). Researchers continue to identify new biomarkers of risk (2,3,5–8) as well as update risk assessment models (9–13) to improve BC risk prediction. For example, the well-validated Rosner–Colditz model includes age at menarche, age at first birth, age at subsequent births, age at menopause, family history of BC, body mass index (BMI), alcohol intake, and postmenopausal hormone therapy use (14). Recent studies demonstrated that the including of genetic risk variants, mammographic density, and endogenous hormones improves the Rosner–Colditz model to predict BC risk (11,12). Technological advances have enabled the engineering of deep-learning algorithms to analyze whole slide images (WSIs) for disease detection and diagnosis (15‐20), including discriminating between BC and benign breast tissue (21‐23). For example, terminal duct lobular unit (TDLU) involution assessed using qualitative and semi-quantitative methods was suggested to be linked to lower BC risk (24‐27). We developed and applied an automated deep-learning method to capture quantitative measures of TDLU involution (28,29) in a large, nested case-control study (30). Here, we engineered another deep-learning method to segment BBD histopathological images into epithelial, fibrous stroma, and fat regions; calculate the amount of each tissue region expressed as a percentage of total tissue; and extract morphometric features from each region. We applied our method to the BBD BC Nested Case-Control study within the Nurses’ Health Study (NHS) and NHSII to evaluate whether computer-derived tissue composition or a morphometric signature in women diagnosed with BBD was associated with subsequent risk of BC.

Materials and Methods

Study Population

The NHS and NHSII participants completed questionnaires that provided a medical history, diagnoses of BBD or BC, as well as extensive information about demographic, lifestyle, reproductive, and dietary risk factors for BC (3,31‐33). Details about the study design methods for the NHS and NHSII have been published previously (34). Eligible women with biopsy-confirmed BBD were placed into 2 substudies—the BBD Incidence study (35‐38) and/or the BBD BC nested case-control study (2,3,5,24,30,32,33,39‐41). BC diagnosis was confirmed verbally by the participant, via medical record review, or via the cancer registry. WSIs from women in the BBD Incidence study were used in the development phase; the BBD BC nested case-control study was used in the application phase. The study protocol was approved by the institutional review boards of the Brigham and Women’s Hospital and Harvard T.H. Chan School of Public Health, and those of participating registries as required.

Engineering the Networks

The tissue segmentation network was engineered using 48 hematoxylin and eosin WSIs from the BBD Incidence study and a custom 21-layer fully convolutional network (42‐47) to segment WSIs into background, epithelium (normal TDLUs, TDLUs exhibiting proliferative or metaplastic changes, and various BBD lesions), fibrous stroma (inter- and intra-lobular), and fat (42‐47) (Supplementary Table 1, available online). The nuclei detection network was created using a set of previously annotated 30 hematoxylin and eosin BC WSIs from The Cancer Genome Atlas (48) and a fully convolutional U-Net architecture (43) with the sliding window approach (44). An example of an original image, ground truth, and automated segmentation for each network is presented in Figure 1. The majority of the precision, recall, and Dice similarity coefficient values of the tissue segmentation network and nuclei detection were greater than 0.75 (Supplementary Table 2, available online).

Figure 1.

BBD BC Nested Case-Control Study Participants

The BBD BC Nested Case-Control study consisted of 293 cases and 1132 controls (Supplementary Table 3, available online). Cases were women who had previously reported a BBD diagnosis and were diagnosed with BC a median of 7.67 years after BBD diagnoses (interquartile range = 4.33-11.75 years). Tumor estrogen receptor (ER) status was obtained from centralized review of breast tissue microarrays (49). Controls were women diagnosed with BBD who did not develop BC. Cases and controls were matched 1:4 on year of BBD diagnosis, age at BC diagnosis (index date for controls), and years between BBD and BC diagnosis (or index date). A total of 3795 slides were digitized at 20× (n = 213) or 40× magnification (n = 3582). Each woman contributed between 1 and 4 WSIs (median WSIs n = 3). Central pathology review classified BBD lesion as nonproliferative, proliferative without atypia, or proliferative with atypia. Participant BMI, age at menarche, parity, age at first birth, breastfeeding history, and menopausal status were obtained from questionnaires of the participants closest to but before BBD biopsy. The average body sizes at ages 5 and 10 years were reported by using a 9-level pictogram (level 1 as leanest) (40). Birth index, a surrogate metric that reflects the timing and spacing of births, was calculated (50). A higher birth index indicates a higher number of births occurring at earlier ages.

Applying Our Networks to the BBD BC Nested Case-Control Study

Figure 2 shows an overview of our image analysis pipeline. Briefly, tissue-containing areas were located for each WSI (Figure 2, B), the WSI was split into patches of 2048 × 2048 pixels, tissue segmentation and nuclei detection were performed (Figure 2, C), and each patch resulted as a segmentation map with every pixel classified as epithelium, stroma, fat, or background (see Supplementary Methods, available online).

Figure 2.

Overview of our benign breast disease image analysis pipeline. A) A whole slide image (WSI). B) Image processing to extract tissue-containing areas of the WSI. C) Applying our tissue segmentation and nuclei detection networks created in the development phase to a WSI to obtain a segmentation map. D) From the segmentation map, computer-derived morphometric features were extracted. Percentages of tissue regions were also computed from the map. Morphometric data were summarized from all WSIs belonging to the same woman. E) Identifying if morphometric features are associated with breast cancer.

An example of an original image, ground truth, and automated segmentation or detection for each deep-learning network. A) For tissue segmentation, white represents background, green represents fibrous stroma, red is epithelium, and purple is fat. B) For cell nuclei detection, white represents background, red is nucleus, and cyan is nuclei membrane border. The final output produces a binary mask that considers nucleus membrane pixels to be part of the background. Overview of our benign breast disease image analysis pipeline. A) A whole slide image (WSI). B) Image processing to extract tissue-containing areas of the WSI. C) Applying our tissue segmentation and nuclei detection networks created in the development phase to a WSI to obtain a segmentation map. D) From the segmentation map, computer-derived morphometric features were extracted. Percentages of tissue regions were also computed from the map. Morphometric data were summarized from all WSIs belonging to the same woman. E) Identifying if morphometric features are associated with breast cancer. Each tissue region was expressed as a percentage of the total amount of tissue analyzed for each woman. Pixels classified as epithelium, stroma, or fat were individually summed across patches from a single WSI, combined across WSIs pertaining to each woman, and divided by the total number of pixels detected across all tissue regions. Because fat regions were mostly empty white spaces, fat and stroma regions were combined as stroma for feature extraction. Morphology, texture, and graph-based spatial features (ie, computer-derived morphometric features; n = 615) were extracted using the WSIs in conjunction with the automated tissue segmentation and nuclei detection results (Figure 2, D) (51‐55). For women with more than 1 WSI, the value for each feature was further summarized using the median calculated across all her WSIs. A morphometric signature associated with BC was constructed using a training set of 855 women (60%) and elastic net regularized regression model (see Supplementary Methods, available online) (56). A signature score for each woman in the test set was computed.

Statistical Analysis

Preliminary assessments using Wilcoxon rank sum and Kruskal-Wallis tests evaluated if there was any difference in tissue composition between cases and controls and when stratified by BBD histological subtypes. The associations between risk factors and tissue composition (natural log-transformed) among controls were assessed using analysis of covariance (ANCOVA) adjusting for age at BBD biopsy (emmeans R package version 1.4.4) (57). Each tissue region was categorized into quartiles as defined by the distribution among controls. Unconditional logistic regression models accounting for the matching factors to estimate odds ratios (ORs) and 95% confidence intervals (CI) were used to determine the relationship between each tissue region (in quartiles) and BC risk (Figure 2, E). Model 1 adjusted for matching factors (year of BBD biopsy, age at index date, time between BBD biopsy and index date); model 2 adjusted for matching factors and BBD histological subtypes; and model 3 adjusted for matching factors, BBD histological subtypes, parity, menopausal status, and BMI. Analyses were also conducted by stratifying the women according to BBD histological subtype, parity, menopausal status, or BMI. Polytomous logistic regression models assessed the association between each tissue region and risk of BC defined by tumor ER expression. The ratio of epithelium to fibrous stroma was calculated and log-transformed, and its association with BC risk was evaluated using logistic regression models in all women and women stratified by BBD histological subtype. The level of statistical significance used for all statistical tests was P less than .05. All tests were 2-sided. All statistical analyses were performed using R (see Supplementary Methods, available online).

Results

Preliminary Assessment of Breast Tissue Composition

Cases have statistically significantly more epithelium (P < .001; Wilcoxon test) and suggestively more stroma (P = .07) than controls; controls have statistically significantly more fat (P < .001; Wilcoxon test) than cases (Figure 3, A). When stratified by BBD histological subtypes, there were statistically significant differences among cases and controls, or between cases and controls for each tissue region (epithelium P < .001, stroma P = .02, fat P < .001; Kruskal-Wallis tests; Figure 3, B-D).

Figure 3.

Boxplots display the amount of each tissue region (%) among cases and controls and when stratified by benign breast disease (BBD) histological subtypes. A) Cases have more epithelium than controls (Wilcoxon test). Controls have statistically significantly more fat than cases (Wilcoxon test). When stratified by BBD histological subtypes, there were statistically significant differences among cases or controls, or between cases and controls within epithelium P less than .001 (B), fibrous stroma P = .02 (C), and fat P less than .001 (D) (Kruskal-Wallis tests). Statistically significant Kruskal-Wallis tests were further evaluated using Dunn’s post hoc tests with Benjamini-Hochberg multiple testing method to obtain adjusted P values; only meaningful statistically significant comparisons within cases, controls, and between case and controls were indicated in B, C, and D. Cases are represented by boxes with slanted lines. Controls are represented by clear boxes. Each box displays the median and the 25th and 75th percentiles (upper and lower hinges). The lower whisker represents the smallest observation greater than or equal to the lower hinge: 1.5 * interquartile range (IQR); the upper whisker represents the largest observation less than or equal to upper hinge + 1.5 * IQR. The black dots represent outliers. All statistical tests were 2-sided.

Age-Adjusted Tissue Composition and Risk Factors Among Controls

Table 1 displays the age-adjusted means (95% confidence intervals) and the ANCOVA P values of the associations between risk factors and the tissue composition among the controls. Controls with the nonproliferative subtype of BBD had lower percentages of epithelium and stroma but higher percentages of fat than those with proliferative subtypes (all P < .001; ANCOVA). Women with a larger childhood body size (levels 1.5-2 and ≥2.5) had less stroma (P = .048; ANCOVA) compared with women with body sizes of 1 or 1.5-2 at ages 5-10 years. Breast tissues of women with a BMI of 30 or more at the time of BBD biopsy had a lower amount of stroma (P < .001; ANCOVA) but higher amount of fat (P < .001) compared with women with lower BMI.

Table 1.

Tissue composition and BC risk factors among 1132 controls

Risk factors	No.	Epithelium, %(95% CI)	Fibrous stroma, %(95% CI)	Fat, %(95% CI)
Mean age at BBD biopsy, y
<40	251	9.2 (8.5 to 10.0)	76.0 (74.3 to 77.7)	7.8 (6.9 to 8.8)
40-49	438	7.8 (7.3 to 8.3)	72.0 (70.8 to 73.2)	13.2 (12.0 to 14.4)
50-59	293	6.1 (5.7 to 6.6)	69.1 (67.7 to 70.5)	17.5 (15.6 to 19.6)
≥60	150	5.0 (4.5 to 5.6)	63.5 (61.7 to 65.4)	23.9 (20.4 to 28.0)
P value^b		<.001	<.001	<.001
BBD histological subtype
Nonproliferative	331	5.7 (5.3 to 6.1)	68.3 (67.0 to 69.7)	16.2 (14.6 to 18.0)
Proliferative without atypia	645	7.8 (7.5 to 8.2)	71.8 (70.9 to 72.8)	12.5 (11.6 to 13.5)
Atypical hyperplasia	156	8.0 (7.2 to 8.8)	72.8 (70.8 to 74.8)	13.3 (11.4 to 15.6)
P value^b		<.001	<.001	<.001
Body size at age 5-10 y
Level 1	322	7.5 (7.0 to 8.0)	72.0 (70.6 to 73.4)	12.5 (11.2 to 14.0)
Level 1.5-2	290	7.1 (6.6 to 7.7)	71.8 (70.4 to 73.3)	12.8 (11.4 to 14.3)
Level ≥2.5	367	7.0 (6.5 to 7.5)	69.9 (68.6 to 71.2)	14.6 (13.2 to 16.2)
P value^b		.42	.048	.09
BMI, kg/m²
<25	641	7.1 (6.8 to 7.5)	72.6 (71.6 to 73.6)	12.3 (11.4 to 13.3)
25 to <30	303	7.3 (6.7 to 7.8)	70.7 (69.3 to 72.1)	13.5 (12.1 to 15.0)
≥30	173	7.2 (6.5 to 7.9)	65.5 (63.8 to 67.2)	19.8 (17.1 to 22.8)
P value^b		.91	<.001	<.001
Mean age of menarche, y
≤12	532	7.0 (6.6 to 7.4)	70.0 (68.9 to 71.0)	14.6 (13.5 to 15.9)
13	335	7.2 (6.7 to 7.7)	71.1 (69.7 to 72.5)	12.5 (11.3 to 13.9)
≥14	260	7.4 (6.9 to 8.1)	72.8 (71.2 to 74.4)	13.0 (11.6 to 14.7)
P value^b		.50	.01	.05
Parity
Nulliparous	107	5.2 (4.6 to 5.9)	73.8 (71.3 to 76.4)	9.7 (8.1 to 11.7)
Parous	1020	7.4 (7.1 to 7.7)	70.6 (69.8 to 71.4)	14.2 (13.3 to 15.0)
P value^b		<.001	.02	<.001
No. of births
Nulliparous	107	5.8 (5.1 to 6.7)	75.8 (73.2 to 78.5)	8.1 (6.7 to 9.9)
Primiparous (1 birth)	97	7.0 (6.1 to 8.1)	73.4 (70.8 to 76.2)	12.6 (10.3 to 15.5)
Multiparous (≥2 births)	923	7.3 (7.0 to 7.7)	70.1 (69.3 to 71.0)	14.6 (13.7 to 15.6)
P value^b		.005	<.001	<.001
Time between last birth and BBD biopsy, y
0 (ie, nulliparous)	107	5.2 (4.6 to 5.9)	73.7 (71.2 to 76.3)	9.8 (8.1 to 11.8)
<20 (among parous women)	578	7.6 (7.1 to 8.0)	70.4 (69.2 to 71.5)	15.1 (13.8 to 16.5)
≥20 (among parous women)	409	7.0 (6.5 to 7.5)	70.3 (68.8 to 71.8)	14.3 (12.8 to 16.1)
P value^b		<.001	.04	<.001
Mean age at first birth among parous women, y
<25	563	7.1 (6.8 to 7.5)	70.5 (69.4 to 71.5)	14.8 (13.8 to 15.8)
25-29	359	7.7 (7.2 to 8.2)	69.6 (68.3 to 70.9)	14.4 (13.2 to 15.7)
≥30	101	7.1 (6.3 to 8.1)	72.8 (70.3 to 75.4)	12.8 (10.9 to 15.0)
P value^b		.19	.08	.27
Birth index among parous women
≤30	229	7.4 (6.8 to 8.1)	72.0 (70.3 to 73.8)	13.3 (11.9 to 14.9)
31-59	281	7.7 (7.2 to 8.3)	71.5 (70.0 to 72.9)	13.8 (12.6 to 15.2)
≥60	231	7.8 (7.2 to 8.6)	70.9 (69.2 to 72.6)	13.5 (12.1 to 15.0)
P value^b		.65	.67	.85
Breastfeeding among parous women
Never	409	7.1 (6.7 to 7.5)	70.2 (69.0 to 71.4)	15.3 (14.1 to 16.5)
<6 mo	209	7.3 (6.7 to 8.0)	70.9 (69.2 to 72.6)	15.4 (13.8 to 17.2)
≥6 mo	305	7.5 (7.0 to 8.1)	70.4 (69.0 to 71.8)	13.5 (12.3 to 14.8)
P value^b		.47	.79	.09
Menopausal status
Pre	679	7.8 (7.3 to 8.2)	72.4 (71.2 to 73.5)	13.2 (12.0 to 14.4)
Post	365	6.3 (5.7 to 6.9)	68.8 (67.2 to 70.6)	13.7 (11.9 to 15.7)
P value^b		.001	.004	.71

Data presented for age are means (95% CI). Data for other variables are presented as age-adjusted means (95% CI); age was adjusted as a continuous variable. ANCOVA = analysis of covariance; BBD = benign breast disease; BC = breast cancer; BMI = body mass index; CI = confidence interval.

The P values were using ANCOVA adjusting for age at BBD biopsy.

Tissue composition and BC risk factors among 1132 controls Data presented for age are means (95% CI). Data for other variables are presented as age-adjusted means (95% CI); age was adjusted as a continuous variable. ANCOVA = analysis of covariance; BBD = benign breast disease; BC = breast cancer; BMI = body mass index; CI = confidence interval. The P values were using ANCOVA adjusting for age at BBD biopsy. Parous women had more epithelium and fat and less stroma compared with nulliparous women (all P < .05; ANCOVA). When parous women were further subdivided, women who had 2 and more births (multiparous) had more epithelium and fat but less stroma than women who had 1 birth (primiparous) or nulliparous women (P < .05). Women who had their last birth within 20 years had more epithelium and fat compared with nulliparous women and women who had their last birth 20 and more years before BBD diagnosis (P < .05). Postmenopausal women had less epithelium (P = .001; ANCOVA) and stroma (P = .004) compared with premenopausal women. The age of menarche positively correlated with the amount of stroma (P = .01; ANCOVA). Age at first birth, birth index, and breastfeeding were not associated with breast tissue composition.

Tissue Composition and BC Risk

Higher percentages of epithelium were statistically significantly associated with subsequent BC risk when accounting for matching factors (OR = 1.53, 95% CI = 1.04 to 2.27 comparing highest and lowest quartiles, Ptrend = .02). On additional adjustment for BBD histological subtype, parity, menopausal status, and BMI, the association modestly attenuated but remained statistically significant (OR = 1.39, 95% CI = 0.91 to 2.14 comparing highest and lowest quartiles, Ptrend = .047; Table 2). Neither the amount of stroma nor fat was associated with BC risk (all Ptrend > .05; Table 2).

Table 2.

The association between tissue composition and BC risk was evaluated using unconditional logistic regression models to estimate odds ratios and 95% confidence intervals

Tissue region	Quartile 1	Quartile 2	Quartile 3	Quartile 4	P _trend ^b
Epithelium
Cases/controls, No.	56/283	65/283	68/283	104/283
Quartile cutoff, %	<4.8	≥4.8 to <7.5	≥7.5 to <11.2	≥11.2
Model 1, OR (95% CI)	Ref	1.12 (0.76 to 1.67)	1.12 (0.75 to 1.67)	1.53 (1.04 to 2.27)	.02
Model 2, OR (95% CI)	Ref	0.95 (0.63 to 1.43)	0.92 (0.61 to 1.39)	1.36 (0.91 to 2.03)	.06
Model 3, OR (95% CI)	Ref	0.95 (0.61 to 1.49)	0.95 (0.61 to 1.49)	1.39 (0.91 to 2.14)	.047
Fibrous stroma
Cases/controls, No.	62/283	67/283	78/283	86/283
Quartile cutoff, %	<64.5	≥64.5 to <73.5	≥73.5 to <81.3	≥81.3
Model 1, OR (95% CI)	Ref	0.98 (0.66 to 1.45)	1.07 (0.73 to 1.57)	1.20 (0.81 to 1.76)	.33
Model 2, OR (95% CI)	Ref	0.87 (0.58 to 1.30)	0.96 (0.65 to 1.42)	1.07 (0.72 to 1.59)	.65
Model 3, OR (95% CI)	Ref	0.78 (0.51 to 1.20)	0.86 (0.56 to 1.31)	0.93 (0.61 to 1.41)	.85
Fat
Cases/controls, No.	102/283	80/283	49/283	62/283
Quartile cutoff, %	<8.7	≥8.7 to <16.7	≥16.7 to <27.0	≥27.0
Model 1, OR (95% CI)	Ref	0.81 (0.57 to 1.15)	0.55 (0.36 to 0.81)	0.75 (0.50 to 1.12)	.11
Model 2, OR (95% CI)	Ref	0.81 (0.56 to 1.15)	0.55 (0.36 to 0.82)	0.83 (0.55 to 1.25)	.27
Model 3, OR (95% CI)	Ref	0.83 (0.58 to 1.21)	0.56 (0.36 to 0.85)	0.93 (0.59 to 1.45)	.52

Each tissue region was categorized into quartiles as defined by the distribution among the controls. Model 1 adjusted for matching factors. Model 2 adjusted for matching factors and BBD histological subtypes. Model 3 adjusted for matching factors, BBD histological subtypes, parity, menopausal status, and BMI. BC = breast cancer; BBD = benign breast disease; BMI = body mass index; CI = confidence interval; OR = odds ratio.

The median value for each quartile was included as a continuous variable in the unconditional logistic regression for models 1, 2, and 3 to obtain the Ptrend value (Wald test).

The association between tissue composition and BC risk was evaluated using unconditional logistic regression models to estimate odds ratios and 95% confidence intervals Each tissue region was categorized into quartiles as defined by the distribution among the controls. Model 1 adjusted for matching factors. Model 2 adjusted for matching factors and BBD histological subtypes. Model 3 adjusted for matching factors, BBD histological subtypes, parity, menopausal status, and BMI. BC = breast cancer; BBD = benign breast disease; BMI = body mass index; CI = confidence interval; OR = odds ratio. The median value for each quartile was included as a continuous variable in the unconditional logistic regression for models 1, 2, and 3 to obtain the Ptrend value (Wald test). Within the proliferative without atypia subtype of BBD, women with percentage of epithelium in the fourth quartile had a higher BC risk compared with women in the first quartile (adjusted OR = 1.92, 95% CI = 1.11 to 3.40, Ptrend = .01; Supplementary Table 4, available online). In general, the association between tissue regions and BC risk defined by tumor ER expression demonstrated no heterogeneity. Fat was associated with lower BC risk among ER-positive women in the crude model 1 (second vs first tertile: OR = 0.62, 95% CI = 0.42 to 0.92; third vs first tertile: OR = 0.62, 95% CI = 0.41–0.95, Ptrend = .04; Supplementary Table 5, available online). Further analyses were conducted to understand the substitution effects by using each tissue region as a continuous variable per 10% change and with 2 of the 3 tissue regions in the model. The association between per 10% change of epithelium and BC risk remained the strongest in fully adjusted models, irrespective of whether it was substituted for stroma (adjusted OR = 1.30, 95% CI = 1.05 to 1.61) or fat tissue (adjusted OR = 1.26, 95% CI = 1.03 to 1.54; Supplementary Table 6, available online). The ratio of epithelium to fibrous stroma was statistically significantly associated with BC risk in the fully adjusted model (OR = 1.29, 95% CI = 1.05 to 1.59). When stratified by BBD histological subtype, the association of this ratio and BC risk only remained statistically significant among women with nonproliferative subtype of BBD (matching factor adjusted model 1 OR = 1.42, 95% CI = 1.02 to 1.99; fully adjusted model 3 OR = 1.44, 95% CI = 1.00 to 2.06; Supplementary Table 7, available online).

Morphometric Signature

The morphometric signature built using training data consisted of 4 features in the epithelium (area under the receiver operator curve [AUC ROC] = 0.61, optimal λ = 0.08). When evaluated on the test set of 570 women, the AUC ROC was 0.51. Due to the poor AUC ROC of the test set, the association of the signature score with BC was not further evaluated.

Discussion

The identification of new biomarkers may improve BC risk prediction. We developed a deep-learning–based computational pathology method to segment BBD histopathological images into epithelial, fibrous stroma, and fat regions. Among women who did not develop BC, BBD histological subtypes, parity, and number of births were statistically significantly associated with breast tissue composition; the direction of association varied by tissue region. Select regions were associated with body size, BMI, age of menarche, and menopausal status. Women whose breast tissues had higher percentages of epithelium had a statistically significantly increased risk of BC compared with women with lower percentages, especially among women with proliferative without atypia subtype of BBD. The ratio of epithelium to stroma was also statistically significantly associated with BC risk, particularly among women with nonproliferative subtype of BBD. We were unable to construct a BC morphometric signature. Our study showed that the percentage of epithelium may be used as a potential biomarker of BC risk. BBD and BC originate from TDLUs. The epithelium captured by our computational method was all-encompassing. This study was the first, to our knowledge, to demonstrate a direct quantitative relationship between the percentage of epithelium and BC risk in women diagnosed with BBD, supporting the long-held hypothesis that elevated cellular mass increases cancer risk (58). Some lesion types within the proliferative without atypia subtype such as adenosis and radical scar are highly cellular, thus explaining why when stratified by BBD histological subtype, the association of the percentage of epithelium and BC risk remained statistically significant among those women. Our study also demonstrated that the ratio of epithelium to fibrous stroma may be an important measure to further refine the BC risk among women with the nonproliferative subtype of BBD. The associations of age-adjusted breast tissue composition and BC risk factors among controls provided histopathological evidence to support epidemiological studies, mainly by demonstrating the link between breast tissue cellularity and cancer risk (58). Our work suggests that risk factors have different influences on epithelium and stroma. Gertig et al. (59) evaluated the proportion of epithelium and stroma in 300 BBD women who did not develop BC. Our findings support Gertig et al. (59) by also demonstrating that breast tissues associated with the nonproliferative subtype of BBD were less cellular (ie, lower epithelium and stroma but higher fat percentages) than proliferative subtypes, thus partly explaining why women with the nonproliferative subtype have lower BC risk (4,39,60‐62). Adiposity during childhood or in young adults is inversely associated with BC risk (63‐65). Body adiposity is correlated with the amount of breast fat when evaluated using percentage mammographic density (ie, proportion of dense [epithelium and stroma] to nondense tissues [fat]) (66,67). In 153 normal breast tissue samples, Gabrielson et al. (68) observed statistically significant inverse associations of BMI with percentages of epithelium and stroma. Our study and the study by Gertig et al. (59), conducted using more participants, observed a statistically significant inverse association only between BMI and proportion of stroma. Nevertheless, all 3 studies provided histological evidence to partially explain the differential BC risk by adiposity; breast tissues of women with a larger childhood body size or younger women with a BMI of 30 or more have lower overall cellularity and thus are less dense compared with women with a leaner childhood body size or women with lower BMI, respectively. Parity had the strongest influence on breast tissue composition among the reproductive risk factors investigated in our study. Gertig et al. (59) and Gabrielson et al. (68) observed more epithelium and less stroma in parous women compared with nulliparous women. Our findings in multiparous women who had a live birth within the last 20 years were similar to other studies that observed less TDLU involution in parous vs nulliparous women (30,69); supported epidemiological reports of increased BC risk in parous women who had a live birth within the last 5 to 24 years compared with nulliparous women (70); and highlighted the extensive stroma remodeling in mammary glands during pregnancy to accommodate expanding epithelium (71). The correlation between age of menarche and proportion of stroma reported by us and others (59,68) is in line with a higher percent breast density in young women who had later ages of menarche (72). The null associations between age of first birth and length of breastfeeding with breast tissue composition agreed with Gertig et al. (59), whereas Gabrielson et al. (68) found an association between percentage of epithelium and length of breastfeeding, but not percentage of stroma. Using our other method that measures normal TDLUs, we also did not find an association between length of breastfeeding and TDLU involution (30). Older women have less dense breasts than younger women, with the greatest change in density occurring during the menopause years (73). Indeed, we and Gertig et al. (59) reported that postmenopausal women had less epithelium and stroma compared with premenopausal women. However, this was not observed by Gabrielson et al. (68), possibly due to low power. Computer-derived morphometric signatures have shown potential as prognostic or diagnostic biomarkers (17,18,74). We did not identify a BC morphometric signature in women with BBD. Morphometric feature data are typically noisy. In an effort to reduce signal noise, we attempted unsuccessfully to create a BC signature within each BBD histopathological subtype due to low power. Extracting and combining morphometric features from different types of epithelium may have diluted meaningful signals. Using the median metric, a common method of aggregating morphometric features (17), may not be optimal for this dataset. There is no gold standard method for feature aggregation, and this remains an active area of research. Future work can include improving methods for morphometric feature aggregation or create specific BC morphometric signatures for each type of BBD lesion. The strengths of our study include the application of a computer pathology method to assess breast tissue composition in a large study with rich data on risk factors (2,3,24,32,33,40), BBD samples underwent centralized pathology review, and BC cases were confirmed through review of medical records. Some limitations of our study include being underpowered to evaluate the association of breast composition and ER-negative BC, BC molecular subtypes (75,76), or mammographic density (77,78) because mammogram data were available for only 105 women (7.8%) in this study. Our findings were limited to White women, the predominant race of the NHS and NHSII participants. Dysfunctional epithelial-stroma interactions in the breast have been implicated in breast carcinogenesis (79); however, our study was not designed to investigate epithelium-stroma interactions. Lastly, the majority of our BBD biopsies were surgical biopsies, and the sampling of breast tissue may not be random in nature—pathologists tend to oversample nonfatty tissue for histological processing because firm and fibrous regions are more likely to represent cancer. Although such selection bias may result in misclassification or measurement error, this would have been conducted at the time of BBD biopsy and is unlikely to be different between those who later developed BC and those who did not. In conclusion, we found that BBD histopathological subtypes and anthropometric and selective reproductive risk factors were associated with breast tissue composition. Higher percentages of epithelium were associated with increased risk of BC, specifically among women with the proliferative without atypia subtype of BBD. No morphometric signature was associated with subsequent BC. Future work can include incorporation of the percentage of epithelium into risk assessment models as well as explore end-to-end deep-learning BC prediction models. We can also conduct studies to understand how modifiable BC risk factors modulate breast tissue composition.

Funding

This work was supported by the National Cancer Institute of the National Institutes of Health R21CA187642 (RMT), R01CA175080 (RMT), R01CA240341 (RMT, YJH), UM1CA186107 (AHE), and U01 CA176726 (AHE); Susan G. Komen for the Cure IIR13264020 (RMT); Breast Cancer Research Foundation 17–174, the Klarman Family Foundation (YJH); and Beth Israel Deaconess Medical Center High School Summer Research Program (ADV).

Footnotes

Role of the funder: The funding sources listed in the Funding section were not involved in the design of the study; the collection, analysis, and interpretation of the data; the writing of the manuscript; and the decision to submit the manuscript for publication. Disclosures: K. Sirinukunwattan is a co-founder of University of Oxford spinout Ground Truth Labs. Ground Truth Labs has no financial or commercial interest in this work. All other authors have nothing to disclose. All authors have no conflict of interest. Prior presentation: Part of this work was previously presented at the United States and Canadian Academy of Pathology 2018 annual meeting in Vancouver, BC, Canada. Author contributions: Conceived and designed the study: RMT YJH KS. Data analysis: YJH ADV KS MV KHK RMT. Epidemiological data collection: RMT AHE KHK. Breast pathology expertise: GMB LCC SJS JLC. Computational method and data acquisition: ADV KS ALS MEP YJH. All authors contributed to the writing and reviewing of the manuscript. Acknowledgements: We thank the following state cancer registries for their help: AL, AZ, AR, CA, CO, CT, DE, FL, GA, ID, IL, IN, IA, KY, LA, ME, MD, MA, MI, NE, NH, NJ, NY, NC, ND, OH, OK, OR, PA, RI, SC, TN, TX, VA, WA, WY. The authors assume full responsibility for analyses and interpretation of these data.

Data Availability

The source code for our deep learning networks is available at https://github.com/avellal14/BBD_Pipeline. The data that support the findings of this study are available from the Nurses’ Health Studies. Investigators interested in using the data can request access, and feasibility will be discussed at an investigators meeting. Limits are not placed on scientific questions or methods, and there is no requirement for co-authorship. Data sharing information and policy details are available at http://www.nurseshealthstudy.org/researchers. Click here for additional data file.

67 in total

1. Age-related lobular involution and risk of breast cancer.

Authors: Tia R Milanese; Lynn C Hartmann; Thomas A Sellers; Marlene H Frost; Robert A Vierkant; Shaun D Maloney; V Shane Pankratz; Amy C Degnim; Celine M Vachon; Carol A Reynolds; Romayne A Thompson; L Joseph Melton; Ellen L Goode; Daniel W Visscher
Journal: J Natl Cancer Inst Date: 2006-11-15 Impact factor: 13.506

2. A Dataset and a Technique for Generalized Nuclear Segmentation for Computational Pathology.

Authors: Neeraj Kumar; Ruchika Verma; Sanuj Sharma; Surabhi Bhargava; Abhishek Vahadane; Amit Sethi
Journal: IEEE Trans Med Imaging Date: 2017-03-06 Impact factor: 10.048

3. Premenopausal Plasma Osteoprotegerin and Breast Cancer Risk: A Case-Control Analysis Nested within the Nurses' Health Study II.

Authors: Rulla M Tamimi; A Heather Eliassen; Joanne Kotsopoulos; Emma E McGee; Susana Lozano-Esparza; Judy E Garber; Jennifer Ligibel; Laura C Collins; Kornelia Polyak; Myles Brown; Steven Narod
Journal: Cancer Epidemiol Biomarkers Prev Date: 2020-04-10 Impact factor: 4.254

4. Breast cancer statistics, 2019.

Authors: Carol E DeSantis; Jiemin Ma; Mia M Gaudet; Lisa A Newman; Kimberly D Miller; Ann Goding Sauer; Ahmedin Jemal; Rebecca L Siegel
Journal: CA Cancer J Clin Date: 2019-10-02 Impact factor: 508.702

5. Regularization Paths for Generalized Linear Models via Coordinate Descent.

Authors: Jerome Friedman; Trevor Hastie; Rob Tibshirani
Journal: J Stat Softw Date: 2010 Impact factor: 6.440

6. Radial scars and subsequent breast cancer risk: results from the Nurses' Health Studies.

Authors: Sarah A Aroner; Laura C Collins; James L Connolly; Graham A Colditz; Stuart J Schnitt; Bernard A Rosner; Susan E Hankinson; Rulla M Tamimi
Journal: Breast Cancer Res Treat Date: 2013-04-23 Impact factor: 4.872

7. Using clinical factors and mammographic breast density to estimate breast cancer risk: development and validation of a new predictive model.

Authors: Jeffrey A Tice; Steven R Cummings; Rebecca Smith-Bindman; Laura Ichikawa; William E Barlow; Karla Kerlikowske
Journal: Ann Intern Med Date: 2008-03-04 Impact factor: 25.391

8. Correlating nuclear morphometric patterns with estrogen receptor status in breast cancer pathologic specimens.

Authors: Rishi R Rawat; Daniel Ruderman; Paul Macklin; David L Rimm; David B Agus
Journal: NPJ Breast Cancer Date: 2018-09-04

9. Recalibration of the Gail model for predicting invasive breast cancer risk in Spanish women: a population-based cohort study.

Authors: Roberto Pastor-Barriuso; Nieves Ascunce; María Ederra; Nieves Erdozáin; Alberto Murillo; José E Alés-Martínez; Marina Pollán
Journal: Breast Cancer Res Treat Date: 2013-02-03 Impact factor: 4.872

10. Deep learning assessment of breast terminal duct lobular unit involution: Towards automated prediction of breast cancer risk.

Authors: Suzanne C Wetstein; Allison M Onken; Christina Luffman; Gabrielle M Baker; Michael E Pyle; Kevin H Kensler; Ying Liu; Bart Bakker; Ruud Vlutters; Marinus B van Leeuwen; Laura C Collins; Stuart J Schnitt; Josien P W Pluim; Rulla M Tamimi; Yujing J Heng; Mitko Veta
Journal: PLoS One Date: 2020-04-15 Impact factor: 3.240

2 in total

1. Early-Life and Adult Adiposity, Adult Height, and Benign Breast Tissue Composition.

Authors: Hannah Oh; Lusine Yaghjyan; Rebecca J Austin-Datta; Yujing J Heng; Gabrielle M Baker; Korsuk Sirinukunwattana; Adithya D Vellal; Laura C Collins; Divya Murthy; A Heather Eliassen; Bernard A Rosner; Rulla M Tamimi
Journal: Cancer Epidemiol Biomarkers Prev Date: 2020-12-07 Impact factor: 4.090

2. Associations of reproductive breast cancer risk factors with breast tissue composition.

Authors: Lusine Yaghjyan; Rebecca J Austin-Datta; Hannah Oh; Yujing J Heng; Adithya D Vellal; Korsuk Sirinukunwattana; Gabrielle M Baker; Laura C Collins; Divya Murthy; Bernard Rosner; Rulla M Tamimi
Journal: Breast Cancer Res Date: 2021-07-05 Impact factor: 6.466

2 in total