Literature DB >> 31677238

Individual and joint performance of DNA methylation profiles, genetic risk score and environmental risk scores for predicting breast cancer risk.

Zhong Guan^1,2, Janhavi R Raut^2,3, Korbinian Weigl^1,4, Ben Schöttker^1,5, Bernd Holleczek⁶, Yan Zhang^1,4, Hermann Brenner^1,3,4.

Abstract

DNA methylation patterns in the blood, genetic risk scores (GRSs), and environmental risk factors can potentially improve breast cancer (BC) risk prediction. We assessed the individual and joint predictive performance of methylation, GRS, and environmental risk factors for BC incidence in a prospective cohort study. In a cohort of 5462 women aged 50-75 from Germany, 101 BC cases were identified during 14 years of follow-up and were compared to 263 BC-free controls in a nested case-control design. Three previously suggested methylation risk scores (MRSs) based on methylation of 423, 248, and 131 cytosine-phosphate-guanine (CpG) loci, and a GRS based on the risk alleles from 269 recently identified single nucleotide polymorphisms were constructed. Additionally, multiple previously proposed environmental risk scores (ERSs) were built based on environmental variables. Areas under the receiver operating characteristic curves (AUCs) were estimated for evaluating BC risk prediction performance. MRS and ERS showed limited accuracy in predicting BC incidence, with AUCs ranging from 0.52 to 0.56 and from 0.52 to 0.59, respectively. The GRS predicted BC incidence with a higher accuracy (AUC = 0.61). Adjusted odds ratios per standard deviation increase (95% confidence interval) were 1.07 (0.84-1.36) and 1.40 (1.09-1.80) for the best performing MRS and ERS, respectively, and 1.48 (1.16-1.90) for the GRS. A full risk model combining the MRS, GRS, and ERS predicted BC incidence with the highest accuracy (AUC = 0.64) and might be useful for identifying high-risk populations for BC screening.

Entities: Chemical Disease Gene Species

Keywords: DNA methylation; SNPs; breast cancer; environmental risk; genetic risk; risk prediction model

Year: 2019 PMID： 31677238 PMCID： PMC6944111 DOI： 10.1002/1878-0261.12594

Source DB: PubMed Journal: Mol Oncol ISSN： 1574-7891 Impact factor: 6.603

area under the receiver operating characteristic curve breast cancer cytosine‐phosphate‐guanine environmental risk score genetic risk score methylation risk score single nucleotide polymorphism

Introduction

Breast cancer (BC) is the most commonly diagnosed cancer and the leading cause of cancer death among women worldwide, accounting for nearly 2.08 million new cases and 630 000 deaths in 2018 (Bray et al., 2018). Among women aged 50–69 years, the detection of early‐stage disease through mammography has led to a decline in BC mortality (Independent, 2012). Although mammography is widely used for BC screening, it has limitations such as high rates of false‐positive results and overdiagnosis (Independent, 2012; Pace and Keating, 2014). Furthermore, screening offers so far (except for women who had family history of BC or mutation in BRCA1 or BRCA2) do not take interindividual variation of BC risk into account (Winters et al., 2017). DNA methylation markers detected in whole blood have emerged as potential candidates for the identification of high‐risk populations for earlier or further specific BC screening in recent years (Guan et al., 2018). While these markers have been identified in diverse study populations through different approaches (Joo et al., 2018; van Veldhoven et al., 2015; Xu et al., 2013; Xu et al., 2019; Yang et al., 2019), their predictive value in prospectively collected samples needs to be further validated. Additionally, genome‐wide association studies (GWASs) have identified an increasing number of single nucleotide polymorphisms (SNPs) that are independently associated with BC risk (Mavaddat et al., 2015; Michailidou et al., 2015; Michailidou et al., 2017). Although these SNPs confer small risk individually, their combined effect can substantially influence BC risk. Genetic risk scores (GRSs) based on multiple SNPs can be used to stratify women according to their risk of developing BC which may lead to more refined, personalized prevention strategies (Burton et al., 2013; Mavaddat et al., 2015). Recently, Mavaddat et al. (2019) constructed a GRS based on 313 SNPs which discriminated BC cases from controls with an area under the receiver operating characteristic curve (AUC) of 0.63 in prospective studies conducted among white European populations. Besides epigenetic and genetic factors, reproductive (e.g., menarche, pregnancy, and menopause), lifestyle (e.g., alcohol consumption), and anthropometric (e.g., body mass index) factors, as well as the use of hormone medications (Madigan et al., 1995; Peto, 2011), have long been identified to be related to BC risk. Risk prediction models combining these known risk factors in ‘environmental risk scores’ (ERSs) have been developed (Dierssen‐Sotos et al., 2018; Gail et al., 1989; Maas et al., 2016; Novotny et al., 2006; Park et al., 2013; Rudolph et al., 2018; Wang et al., 2016), but their predictive accuracy was found to be modest (Anothaisintawee et al., 2012). In this study, we aimed to simultaneously assess the individual and joint performance of whole‐blood DNA methylation markers, GRS, and ERS for BC incidence in a prospective cohort study.

Materials and methods

Study population and data collection

We performed a case–control study nested within the ESTHER (Epidemiologische Studie zu Chancen der Verhütung, Früherkennung und optimierten Therapie chronischer Erkrankungen in der älteren Bevölkerung) cohort, a population‐based study, conducted in Saarland, Germany. Details of the ESTHER cohort have been previously described (Schottker et al., 2013). As shown in Fig. 1, 9949 older adults aged 50–75 years of whom 5462 were women were recruited by their general practitioners during routine health checkups between July 2000 and December 2002, and followed up thereafter. The participants completed a standardized self‐administered questionnaire (collecting information on sociodemographic, reproductive, and lifestyle factors) and donated blood samples at baseline. Prevalent and incident cancer was determined by self‐report and record linkage with the Saarland Cancer Registry. DNA methylation measurements and genotyping were performed in the baseline blood samples of the ESTHER participants. Overall, we identified 101 women with incident BC and 263 women without BC at baseline or during the follow‐ups for whom both DNA methylation and genetic data were available. The study was approved by the ethics committees of the University of Heidelberg and of the state medical board of Saarland, Germany. Written informed consent was provided by all participants.

Figure 1

Flow diagram of inclusion of study participants.

Methylation assessments

Blood samples collected at baseline (available for 98.8% of participants) were stored at – 80 °C until further processing. DNA was extracted from whole‐blood samples using a salting out procedure (Miller et al., 1988). DNA methylation levels of 866836 cytosine‐phosphate‐guanine (CpG) loci were quantified by the Infinium Methylation EPIC (850K) BeadChip Assay (Illumina Inc., San Diego, CA, USA) (Zaimi et al., 2018). Briefly, 1 µg DNA was bisulfite converted, and 250 ng bisulfite‐treated DNA was applied to the EPIC BeadChips following the manufacturer’s instruction. GenomeStudio® (version 2011.1; Illumina Inc.) was used to extract DNA methylation signals from the scanned arrays (module version 1.9.0; Illumina Inc.). Due to its straightforward biological interpretation, the methylation level of a specific CpG site was quantified as a β‐value ranging from 0 (no methylation) to 1 (full methylation) (Du et al., 2010; Xie et al. 2019). Illumina normalization and preprocessing methods implanted in Illumina’s GenomeStudio® were utilized. Data were normalized to internal controls provided by the manufacturer. All controls were checked for inconsistencies in each measured plate. Probes with a detection P‐value > 0.01 or with missing value > 10% of our samples were excluded from analysis (Zhang et al., 2017). Leukocyte composition was estimated using the algorithms of Houseman et al. (Houseman et al., 2012).

Genotyping

Extracted DNA from whole blood was genotyped using the Illumina Infinium OncoArray BeadChip (Illumina). General genotyping quality control assessment was done as previously described (Anderson et al., 2010). Genotypes for common variants across the genome were imputed using data from UK10K‐1000 Genomes Project (phase 3, October 2014) with impute v2.3.2 (https://mathgen.stats.ox.ac.uk/impute/impute_v2.html#download) after prephasing with shapeit software v2.12 (https://mathgen.stats.ox.ac.uk/genetics_software/shapeit/shapeit.html#download). Thresholds were set for imputation quality to retain both common and rare variants for validation. Poorly imputed SNPs were defined by an information metric I < 0.70 and excluded for the subsequent analysis. All genomic locations are given in NCBI Build 37/UCSC hg19 coordinates. All SNPs with a minor allele frequency (MAF) < 1% were excluded. After imputation, the SNP set consisted of 9 198 808 genotyped and imputed SNPs. PLINK v1.90b5.4 was then used to extract SNPs for the required regions of interest (Chang et al., 2015).

Statistical analysis

Methylation risk scores (MRSs)

We constructed MRSs based on three sets of CpGs. The first set included 450 CpGs associated with BC risk that were identified using genetic variants as instrument (Yang et al., 2019). The second set included 280 BC‐related CpGs identified in previous prospective epigenome‐wide association studies (Joo et al., 2018; van Veldhoven et al., 2015; Xu et al., 2013). The third set included 144 CpGs which was identified and validated in independent prospective studies (Xu et al., 2019). CpGs with missing value > 10% in our sample were excluded from subsequent analyses. Furthermore, CpGs only included in the Infinium Methylation 27K BeadChip Array or the Infinium Methylation 450K BeadChip Array but not the Infinium EPIC Array were excluded, which left 423 CpGs from the first set (423‐CpGs), 248 CpGs from the second set (248‐CpGs), and 131 CpGs from the third set (131‐CpGs). The individual CpGs and their associations with BC risk (none of which was statistically significant after adjustment for multiple testing by the Benjamini–Hochberg method) are provided in Table S1. These CpGs were classified into hyper‐ and hypomethylated CpGs according to the previously reported relationship with BC (Joo et al., 2018; van Veldhoven et al., 2015; Xu et al., 2013; Xu et al., 2019; Yang et al., 2019). MRSs were calculated as the sum of hypermethylated CpGs with methylation levels in the upper quartile of the distribution among controls, and of hypomethylated CpGs with methylation levels in the lower quartile of the distribution among controls.

Genetic risk score (GRS)

A newly established set of 313 SNPs which discriminated BC cases from controls with a modest AUC = 0.63 (95% CI: 0.63–0.65) in recent GWAS was used to build GRS (Mavaddat et al., 2019). SNPs were excluded from further analyses if they were missing in > 10% of our sample after imputation, if they were in high linkage disequilibrium (D' ≥ 0.95 and r 2 > 0.8) with each other, or if the MAF was low (< 1%). Specifically, we searched the website (proxy SNP website: https://ldlink.nci.nih.gov/) to find surrogates for the missing SNPs. By setting the criteria of D' ≥ 0.95, r 2 > 0.8, MAF ≥ 1%, distance < 250K bp, and missing value < 10% of our sample, only six SNPs (rs6656241, rs3791976, rs3008281, rs28489579, rs521667, and rs965352) could be used as surrogates for the six missing SNPs (rs56168262, rs3791977, rs66823261, rs4774565, rs527616, and rs6030585) in the ESTHER study. This resulted in inclusion of 269 SNPs in the GRS. GRSs for all eligible participants were calculated using the formula: GRS = β1X1 + β2X2+ … βkXk … + βX , where βk is the per‐allele log odds ratio (OR) for BC associated with the risk allele for SNP k reported in the previous independent GWAS (Mavaddat et al., 2019), Xk is the number of risk alleles for the same SNP (0, 1, or 2), and n is the total number of SNPs. SNPs and corresponding effect sizes for risk alleles reported in the derivation of the GRS (Mavaddat et al., 2019) are summarized in Table S2.

Environmental risk scores (ERSs)

For building ERSs, we conducted a literature search to identify previously published ERSs used for BC risk prediction. We identified and included seven risk scores (Dierssen‐Sotos et al., 2018; Gail et al., 1989; Maas et al., 2016; Novotny et al., 2006; Park et al., 2013; Rudolph et al., 2018; Wang et al., 2016), including two scores from two multicenter studies (study populations were mainly from Europe, the United States, and Australia) and one each from the United States, Czech, South Korea, China, and Spain. Four of the seven risk scores were built according to the previously reported score prediction algorithms (Gail et al., 1989; Novotny et al., 2006; Park et al., 2013; Wang et al., 2016). The remaining three scores (Dierssen‐Sotos et al., 2018; Maas et al., 2016; Rudolph et al., 2018) were derived from beta coefficients of the corresponding risk factors which were reported from multivariable logistic regression used in each study. The proportion of missing values for all variables including in the ERSs was below 5%, and the missing baseline values were imputed by mean value of each incomplete variable [age at menarche, age at first live birth, parity, menopausal status, age at menopause, current use of menopause hormone therapy (MHT)] within groups of cases and controls, respectively. In case the variables were not available in our data sets and could not be replaced, we built the risk scores without them. Algorithms applied to obtain the risk scores are summarized in Table S3.

Descriptive analyses

Main characteristics of cases and controls were described using frequencies for categorical variables, and means and standard deviations (SDs) for continuous variables. The correlations between the risk scores were estimated by Pearson's correlation coefficients.

Associations of MRS, GRS, and ERS with breast cancer risk

Crude associations of MRS, GRS, and ERS with BC risk were assessed by unconditional logistic regression with adjustment for leukocyte composition in case of MRS in Model I. In Model II, associations were adjusted for the complementary risk scores. Risk estimates for each risk score were included in the models either as quartiles (lowest quartile defined as the reference category) or as continuous variables (calculating ORs for an increase in risk scores by 1 SD). The individual association of previously identified CpGs with BC incidence was also estimated by unconditional logistic regression with adjustment for age, batch effects, and leukocyte composition. Areas under the receiver operating characteristic curves (AUCs) were estimated for evaluating the performance of the three types of risk scores in BC risk prediction. In addition to exploring the predictive value of risk scores over the entire period of follow‐up, analyses were repeated with follow‐up time restricted to the initial 7 years after recruitment or to subsequent years, respectively. Tenfold cross‐validation was employed to correct for potential overoptimism in prediction models. In 10‐fold cross‐validation, the original sample was randomly partitioned into 10 subsamples, the nine subsamples were then used as training data for the derivation of the prediction model and the remaining single subsample was used as the validation data for testing the model (Lanfear et al., 2017). This process was repeated 1000 times. Additionally, the goodness of fit of the combined model was tested using the Hosmer–Lemeshow test to evaluate the calibration of multiple risk scores and using the tailed‐based test by Song et al (Song et al., 2015). All statistical analyses were carried out in SAS 9.4 (SAS Institute, Cary, NC, USA), and 2‐sided P‐values < 0.05 were considered statistically significant.

Results

Table 1 presents the baseline characteristics of the study population. Of 5462 women aged 50–75 years recruited between July 2000 and December 2002, a total of 101 BC cases and 263 controls were included for whom both methylation and genomic data were available for the current analysis (Fig. 1). The median time to diagnosis for cases, defined as the time between recruitment/sample collection and BC diagnosis ranged from 0.24 months to 13.6 years [median (interquartile range): 6.8 (3.7–8.9) years]. Mean age was about 61 years for both cases and controls. The distributions of participant characteristics were similar between cases and controls except for age at menopause and parity. Controls were more likely to have had an early menopause and to have a higher parity.

Table 1

Characteristics	Cases (n = 101) No. (%)	Controls (n = 263) No. (%)	P‐value
Age (years)
Mean ± SD	60.6 ± 6.5	60.8 ± 6.0	0.80
50–59	45 (44.6)	105 (39.9)	0.54
60–69	47 (46.5)	139 (52.9)
≥ 70	9 (8.9)	19 (7.2)
Age at menarchea
≤ 12	22 (22.4)	50 (19.2)	0.45
12–14	53 (54.1)	150 (57.7)
≥ 15	23 (23.5)	60 (23.1)
Age at first live birtha
≤ 20	18 (18.6)	68 (26.3)	0.26
21–29	64 (66.0)	164 (63.3)
≥ 30	15 (15.4)	27 (10.4)
Paritya
0	9 (9.4)	14 (5.4)	0.03*
1	28 (29.2)	61 (23.6)
2	38 (39.6)	98 (38.0)
≥ 3	21 (21.8)	85 (33.0)
Ever breastfed
No	46 (45.5)	112 (42.6)	0.69
Yes	55 (54.5)	151 (57.4)	0.69
Menopausal statusa
Premenopausal	11 (11.1)	39 (15)	0.33
Postmenopausal	88 (88.9)	221 (85.0)	0.33
Age at menopausea
< 50	49 (50.5)	159 (61.2)	0.04*
≥ 50	48 (49.5)	101 (38.8)	0.04*
Ever use of oral contraceptive
No	31 (31.0)	101 (39.2)	0.27
Yes	69 (69.0)	157 (60.8)	0.27
Current use of MHTa
No	35 (35.0)	113 (43.5)	0.13
Yes	65 (65.0)	147 (56.5)	0.13
BMI (kg·m⁻²)
≤ 25	30 (29.7)	86 (32.7)	0.86
25–30	40 (39.6)	100 (38.0)
> 30	31 (30.7)	77 (29.3)
Height (m)
≤ 1.60	44 (43.6)	104 (39.5)	0.85
1.60–1.65	30 (29.7)	90 (34.2)
1.65–1.70	15 (14.6)	37 (14.1)
> 1.70	12 (11.9)	32 (12.2)
Alcohol consumption (g·day⁻¹)
0	49 (48.5)	109 (41.4)	0.31
≤ 5	16 (15.8)	62 (23.6)
5–30	29 (28.7)	77 (29.3)
> 30	7 (7.0)	15 (5.7)
Smoking status
Never smoker	63 (64.3)	166 (63.8)	0.52
Former smoker	16 (19.3)	49 (18.8)
Current smoker	19 (15.4)	45 (17.4)
Physical activity
< once/week	21 (20.8)	68 (25.9)	0.50
≥ once/week	80 (79.2)	194 (74.1)	0.50
Sleep quality
Good	87 (91.6)	215 (90.3)	0.62
Common	6 (6.3)	15 (6.3)
Poor	2 (2.1)	8 (3.4)
No. of FDRs with cancer history
0	50 (51.5)	119 (46.5)	0.68
1	35 (36.1)	89 (34.8)
≥ 2	12 (12.4)	48 (18.7)
No. of FDRs with BC
0	93 (92.1)	239 (90.9)	0.06
≥ 1	8 (7.9)	24 (9.1)	0.06

Numbers do not add to total numbers due to missing values.

P ‐value < 0.05.

Main characteristics of the study population at the time of cohort recruitment. Numbers shown were drawn from the not imputed data set. Missing values in cases/controls: age at menarche 3/3, age at first live birth 4/4, parity 5/5, menopausal status 3/3, age at menopause 4/3, and current use of MHT 2/3. No., number. Cases (n = 101) No. (%) Controls (n = 263) No. (%) Numbers do not add to total numbers due to missing values. P ‐value < 0.05. Table 2 provides an overview of risk factors included in the previously proposed ERS and the performance of these ERSs in our study. The most commonly included environmental risk factors were age, age at menarche, age at first live birth, BMI, parity, alcohol consumption, menopausal hormone therapy, and number of first‐degree relatives (FDR) with BC. Factors such as light at night (Wang et al., 2016), physical activity (Park et al., 2013), and sleep quality (Wang et al., 2016) were rarely included. The predictive accuracy of the ERS was limited with AUCs ranging from 0.517 to 0.594 in our cohort. The score of Dierssen–Sotos (Dierssen‐Sotos et al., 2018) which was derived from a case–control study in a Spanish population achieved a relatively good performance and was used for subsequent analyses. For the score of Wang et al. (2016) who had provided separate algorithms for premenopausal women and postmenopausal women, the predictive performance was estimated by the algorithm for postmenopausal women only since most of the women in our study population were postmenopausal.

Table 2

Predictors included in previous studies and performance of corresponding ERS in the ESTHER study. No., number.

Predictors	Previously proposed ERSs							Variables available in ESTHER
Predictors	Gail et al. (1989)	Novotny et al. (2006)	Park et al. (2013)	Wang et al. (2016)	Maas et al. (2016)	Dierssen‐Sotos et al. (2018)	Rudolph et al. (2018)
Age	✓	✓	✓	✓	✓	✓	✓	✓
Age at menarche	✓	✓	✓	✓	✓	✓	✓	✓
Age at first live birth	✓	✓		✓	✓	✓	✓	✓
Parity		✓	✓	✓	✓	✓	✓	✓
Breast feeding				✓				✓
Menopausal status			✓		✓	✓		✓
Age at menopause			✓		✓	✓		✓
Ever use of oral contraceptive			✓	✓				✓
Current use of MHT				✓	✓	✓	✓	✓
BMI		✓	✓	✓	✓	✓	✓	✓
Height					✓	✓	✓	✓
Alcohol consumption				✓	✓	✓	✓	✓
Smoking					✓	✓		✓
Physical activity			✓					✓
Sleep quality				✓				✓
Light at night				✓
No. of FDRs with cancer history		✓						✓
Family history of BC			✓		✓	✓		✓
No. of FDRs with BC	✓	✓	✔	✓				✓
Breast inflammation		✓
Number of previous biopsies	✓	✓		✓
AUC in ESTHER	0.540 (0.520–0.559)	0.552 (0.533–0.570)	0.517 (0.498–0.536)	0.519a (0.502–0.536)	0.556 (0.538–0.574)	0.594 (0.578–0.612)	0.560 (0.539–0.578)

AUC was calculated for postmenopausal women.

Predictors included in previous studies and performance of corresponding ERS in the ESTHER study. No., number. AUC was calculated for postmenopausal women. Figure S1 shows the pairwise correlations between the three types of risk scores. Very low, insignificant positive correlations were observed for each pair of scores. Individual associations for MRS, GRS, and ERS (Dierssen–Sotos Score) with BC risk are summarized in Table 3. For GRS and ERS, having a score in the top quartile was associated with a significantly increased risk of BC compared to the lowest quartile without adjustment. However, none of the current MRS presented a significant association with BC risk. These results did not materially change after adjusting for the complementary risk scores and leukocyte composition. For example, after adjusting for MRS, ERS, and leukocyte composition, women in the highest GRS quartile had a 2.21‐fold increased risk of BC compared with women in the lowest quartile. Adjusted ORs (95% CI) per standard deviation increase were 1.07 (0.84–1.36), 1.03 (0.81–1.31), 1.01 (0.79–1.30), 1.48 (1.16–1.90), and 1.40 (1.09–1.80) for MRS (423‐CpGs), MRS (248‐CpGs), MRS (131‐CpGs), GRS, and ERS, respectively.

Table 3

Association of the risk scores with BC incidence in the ESTHER study.

Risk scores	Quartiles of risk score	Cases	Controls	Model Ia OR (95% CI)	Model IIb OR (95% CI)
MRS (423‐CpGs)	IQR	104 (90, 118)	98 (85, 115)
	Q1 (≤ 85)	16	66	Ref.	Ref.
	Q2 (85–98]	26	66	1.60 (0.77–3.29)	1.49 (0.71–3.11)
	Q3 (98–115]	29	66	1.86 (0.91–3.81)	1.93 (0.93–4.01)
	Q4 > 115	30	65	1.82 (0.89–3.75)	1.67 (0.79–3.50)
	OR per SD increase			1.10 (0.87–1.39)	1.07 (0.84–1.36)
MRS (248‐CpGs)	IQR	53 (36, 67)	46 (35, 69)
	Q1 (≤ 35)	23	70	Ref.	Ref.
	Q2 (35–46]	22	62	1.03 (0.51–2.09)	1.08 (0.53–2.23)
	Q3 (46–69]	26	68	1.20 (0.61–2.35)	1.27 (0.64–2.54)
	Q4 > 69	30	63	1.33 (0.69–2.59)	1.27 (0.64–2.50)
	OR per SD increase			1.06 (0.84–1.40)	1.03 (0.81–1.31)
MRS (131‐CpGs)	IQR	26 (16, 45)	26 (15, 45)
	Q1 (≤ 15)	24	68	Ref.	Ref.
	Q2 (15–26]	24	65	0.94 (0.44–2.00)	0.82 (0.37–1.80)
	Q3 (26–45]	25	62	1.22 (0.58–2.53)	1.34 (0.63–2.85)
	Q4 > 45	28	68	1.16 (0.59–2.31)	1.30 (0.64–2.63)
	OR per SD increase			1.02 (0.80–1.32)	1.01 (0.79–1.30)
GRS	IQR	15.50 (15.06, 15.88)	15.21 (14.83, 15.59)
	Q1 (≤ 14.83)	20	66	Ref.	Ref.
	Q2 (14.83–15.21]	17	66	0.85 (0.41–1.77)	0.80 (0.37–1.72)
	Q3 (15.21–15.59]	23	66	1.15 (0.58–2.29)	1.18 (0.58–2.42)
	Q4 > 15.59	41	65	2.08 (1.10–3.93)	2.21 (1.14–4.30)
	OR per SD increase			1.50 (1.18–1.91)	1.48 (1.16–1.90)
ERS (Dierssen‐Sotos)	IQR	0.99 (0.39, 1.51)	0.69 (0.19, 1.42)
	Q1 (≤ 0.19)	13	60	Ref.	Ref.
	Q2 (0.19–0.69]	20	61	1.47 (0.70–3.07)	1.46 (0.67–3.14)
	Q3 (0.69–1.42]	28	61	2.07 (1.02–4.18)	2.06 (0.99–4.27)
	Q4 > 1.42	30	60	2.23 (1.11–4.50)	2.50 (1.19–5.25)
	OR per SD increase			1.36 (1.08–1.71)	1.40 (1.09–1.80)

Model I includes individual risk score and, in case of MRS, leukocyte composition.

Model II includes individual MRS (423‐CpGs or 248‐CpGs or 131‐CpGs), GRS, ERS, and leukocyte composition.

Association of the risk scores with BC incidence in the ESTHER study. Model I includes individual risk score and, in case of MRS, leukocyte composition. Model II includes individual MRS (423‐CpGs or 248‐CpGs or 131‐CpGs), GRS, ERS, and leukocyte composition. The predictive performances of individual risk scores and score combinations are presented in Table 4. The predictive accuracy of individual risk score varied with AUCs ranging from 0.517 to 0.612. The GRS was more predictive of BC risk [AUC = 0.612, (95% CI: 0.590–0.632)] than the MRS and ERS. The three individual MRSs, especially the MRS of 131‐CpGs, exhibited very limited accuracy in predicting BC incidence when comparing to GRS or ERS. Moreover, combining 423‐CpGs (the best performing MRS) with GRS or ERS improved the predictive performance to a very limited extent [AUC423‐CpGs + GRS = 0.621 (0.593–0.640) vs. AUCGRS = 0.612, (0.590–0.632) and AUC423‐CpGs + ERS = 0.603 (0.583–0.621) vs. AUCERS = 0.594 (0.578–0.612)]. Of note, a full risk model combing MRS, GRS, and ERS outperformed all other risk prediction models [AUC = 0.637 (0.616–0.657)]. Further tests for goodness of fit and tailed‐based tests for the combined model were not statistically significant at P < 0.05 (results not shown). Additionally, consistent performance of either individual (MRS, GRS, ERS) or combined risk score (MRS + GRS + ERS) was observed in the time‐to‐diagnosis‐specific analyses (Table S4).

Table 4

Risk prediction by individual and combined risk scores for BC.

Risk scores	AUC (95% CI)	Combined AUC (95% CI)
Single risk scores
423‐CpGs	0.557 (0.536–0.580)
248‐CpGs	0.545 (0.524–0.563)
131‐CpGs	0.517 (0.501–0.542)
GRS	0.612 (0.590–0.632)
ERS	0.594 (0.578–0.612)
Multiple risk scores
423‐CpGs + GRS		0.621 (0.593–0.640)
423‐CpGs + ERS		0.603 (0.583–0.621)
GRS + ERS		0.635 (0.615–0.656)
423‐CpGs + GRS + ERS		0.637 (0.616–0.657)

Risk prediction by individual and combined risk scores for BC.

Discussion

To our knowledge, this is the first study evaluating and comparing the predictive performance of whole‐blood DNA methylation, and genetic and ERSs for BC incidence in a prospective cohort with up to 14 years of follow‐up. All three types of risk scores were predictive of BC risk. A GRS based on multiple common variants (269 SNPs) predicted BC incidence with much higher accuracy than a MRS based on previously identified CpGs and an ERS derived from previous studies. The combination of MRS, GRS, and ERS enhanced the risk prediction with an AUC of approximately 0.64. Similar predictive accuracies of either individual or combined risk scores were observed in specific subgroups defined by time to diagnosis. Previous studies have demonstrated the diagnostic efficiency of aberrant DNA methylation for BC (Guan et al., 2018). In the current study, we constructed MRSs based on previously identified CpGs and evaluated its predictive performance for BC incidence. Although the discriminatory power of the MRS was limited, the poor performance should not be misinterpreted as implying that the methylation status of those CpGs does not play a role in BC development. A potential explanation for low predictive value could be differences between the study populations of the previous studies in which the CpGs were identified and our study population. For example, the majority of the 248‐CpGs (227 out of 248 CpGs) and all of the 131‐CpGs were both identified within a prospective cohort of women who were BC‐free themselves at recruitment, but had a biological sister with BC (Xu et al., 2013; Xu et al., 2019). So, the association of methylation at these CpGs with BC may not be generalizable to the women from the general population (Xu et al., 2019). Interestingly, the MRS based on 423 CpGs, which was derived from genetically predicted rather than directly measured CpG methylation levels, performed even better in our study than the MRS of the other two sets that had been derived from measured methylation levels. It is also worth noting that there was hardly any overlap between the genetically predicted CpGs and methylation measured CpGs (423‐CpGs and 248‐CpGs, 423‐CpGs and 131‐CpGs) which supports suggestions that there may still be much room for the improvement in the derivation of MRS. The performance of our version of GRS score is generally consistent with the observation from a recent GWAS used to establish a GRS for risk stratification and early detection of BC (Mavaddat et al., 2019). In the current analysis, the performance of our version of GRS (269 SNPs) with an OR (95% CI, per SD increase) of 1.48 (1.16–1.90) and an AUC (95% CI) of 0.61 (0.59–0.63) was similar albeit slightly less favorable compared with performance previously reported for the original GRS containing 313 SNPs [OR = 1.61, (1.57–1.65), AUC = 0.63 (0.63–0.65)]. A potential reason for the slightly weaker performance in our study could be incomplete validation, as 29 out of 313 SNPs were excluded because of missing values in > 10% of our sample. We also conducted a literature search to identify ERSs used for BC risk prediction and validated their predictive performance for BC incidence in our study sample. Although many researchers have put substantial efforts in developing models for risk prediction, the overall results are not promising (Anothaisintawee et al., 2012). In the current study, most of the ERSs yielded poor predictive accuracies with AUCs ranging from 0.52 to 0.59. The lack of predictive value of ERSs might be explained by heterogeneity of study populations used to derive the scores and missing information on some risk factors (i.e., numbers of previous breast biopsies, breast inflammation, and light at night) in several models (Gail et al., 1989; Novotny et al., 2006; Wang et al., 2016). However, the main reason for the low predictive values would probably be that the risk factors included in these scores were not good predictors of BC risk. Notably, improved predictive accuracy was observed when ERS was combined with GRS, suggesting that rather than developing more risk scores based on environmental risk factors, future studies should explore possibilities of enhancing predictive performance by combining risk factors with novel laboratory markers, such as MRSs or GRSs. We performed time‐to‐diagnosis‐specific analyses to explore whether the individual risk scores or score combinations performed better in cases who had shorter time to diagnosis. In a subgroup analysis, only the performance of ERS (Dierssen–Sotos) and a combination of three risk scores supported the hypothesis, but differences between shorter and longer time to diagnosis were small and did not reach statistical significance. However, power for such subgroup analyses was very limited due to sample size limitations. The predictive values of the three individual risk scores or score combinations for subgroup (e.g., defined by age and follow‐up time) or subtype‐specific BC screening warrant further exploration. Compared with GRS, MRS was less predictive of BC risk. This, however, does not imply that MRS does not hold potential for risk stratification for BC risk. Whereas large‐scale GWASs have been conducted since more than 10 years (Easton et al., 2007), large‐scale epigenome‐wide association studies have been initiated only recently (Xu et al., 2013) and may yield substantially better MRS in the future. In contrast to GRS, MRS and ERS are though not static over lifetime. Although this may be considered a disadvantage for straightforward risk stratification (e.g., determining a starting age of screening), MRS and ERS may have additional use in reflecting the merits of specific prevention efforts. Specific strengths of our study include its longitudinal design in which blood samples for methylation and genotyping analysis, along with environmental risk factor data, were collected years before BC diagnosis. This allowed for simultaneous evaluation and comparison of the ability of three different types of risk variables for BC risk prediction. In addition, the selection of incident BC cases through linkage to the Saarland Cancer Registry ensured an almost complete ascertainment of incident cancer cases in the population from which our study participants originated. However, our study has several limitations that require careful discussion. First, the limited sample size restricted the study’s power and precision of estimates, particularly in stratified analysis. For example, the AUC for score combination in cases with time to diagnosis ≤ 7 years was larger than that in cases with time to diagnosis > 7 years, but this difference did not reach statistical significance. Future studies with larger sample sizes should address predictive values of MRS, GRS, and ERS and their combination for BC risk prediction in specific population groups. Likewise, future studies with larger sample size should also address potential improvement of prediction by including interactions between the included factors such as gene–environment interactions (Barrdahl et al., 2014; Campa et al., 2011; Maas et al., 2016; Nickels et al., 2013; Rudolph et al., 2015; Schoeps et al., 2014). Second, it is difficult to compare the results between our study and previous studies, where different methylation analysis techniques were used to investigate different CpG sites. Although the EPIC assay covers more CpGs across the whole genome compared with the Illumina 27 or 450K used in previous studies, it cannot be ruled out that some key loci with powerful diagnostic performance were missed. Finally, although we paid careful attention to internal validation, further validation in external, independent cohorts would be highly desirable. In particular, our results pertain to a Caucasian study population from Germany; hence, the performance of our versions of MRS, GRS, and ERS needs to be validated and potentially adapted for ethnically diverse populations.

Conclusion

In summary, despite these limitations, our study provides new detailed insights into the individual and joint associations of MRS, GRS, and ERS with BC risk. Although the contribution of all three types of risk scores to risk stratification is still modest for the time being, with GRS so far slightly outperforming MRS and ERS, our findings demonstrated that combing MRS, GRS, and ERS can enable more precise risk prediction and therefore holds potential to improve risk stratification in BC screening. With further improvements of GRS and MRS by large‐scale international GWAS and EWAS efforts, and incorporation of additional risk signatures, such as microRNA‐based signatures (Hamam et al., 2017), substantial further improvement of risk stratification should become possible which will enable more targeted, risk‐adopted approaches in BC screening.

Conflict of interest

The authors declare no conflict of interest

Author contributions

HB and YZ conceived the study. ZG analyzed the data and drafted the manuscript under the supervision of HB and YZ. KW participated in statistical analysis and interpretation of genetic data. JR participated in manuscript revision. BH conducted data collection from the Saarland Cancer Registry. HB, BS, and YZ supervised the data collection of the ESTHER study. HB initiated and led the ESTHER study and contributed to all aspects of this work. All authors made significant contributions to the manuscript, and all authors read and approved the final version.

Ethics declaration and consent to participate

The study was conducted in adherence with the Code of Ethics of the World Medical Association (Declaration of Helsinki) for experiments involving humans, and informed consent was collected from all participants. The study was approved by the ethics committees of the University of Heidelberg and of the state medical board of Saarland, Germany.

Consent for publication

The consent of this manuscript has not been previously published and is not under consideration for publication elsewhere.

Authors’ information

All authors’ information has been presented in the title page. Table S1 . Individual association of the previously identified CpGs with BC risk in the ESTHER study and reported methylation direction. Table S2 . SNPs used for genetic risk score construction. Table S3 . Environmental risk scores from previous studies and availability/modification of related variables in ESTHER. Table S4 . Performance of risk scores with respect to time to diagnosis in cases. Fig. S1. Correlation among risk scores. Click here for additional data file.

43 in total

1. Differential methylation values in differential methylation analysis.

Authors: Changchun Xie; Yuet-Kin Leung; Aimin Chen; Ding-Xin Long; Catherine Hoyo; Shuk-Mei Ho
Journal: Bioinformatics Date: 2019-04-01 Impact factor: 6.937

2. Public health implications from COGS and potential for risk stratification and screening.

Authors: Hilary Burton; Susmita Chowdhury; Tom Dent; Alison Hall; Nora Pashayan; Paul Pharoah
Journal: Nat Genet Date: 2013-04 Impact factor: 38.330

3. Proportion of breast cancer cases in the United States explained by well-established risk factors.

Authors: M P Madigan; R G Ziegler; J Benichou; C Byrne; R N Hoover
Journal: J Natl Cancer Inst Date: 1995-11-15 Impact factor: 13.506

4. Targeted Metabolomic Profiling of Plasma and Survival in Heart Failure Patients.

Authors: David E Lanfear; Joseph J Gibbs; Jia Li; Ruicong She; Christopher Petucci; Jeffrey A Culver; W H Wilson Tang; Yigal M Pinto; L Keoki Williams; Hani N Sabbah; Stephen J Gardell
Journal: JACC Heart Fail Date: 2017-11 Impact factor: 12.035

Review 5. Breast Cancer Epidemiology, Prevention, and Screening.

Authors: Stella Winters; Charmaine Martin; Daniel Murphy; Navkiran K Shokar
Journal: Prog Mol Biol Transl Sci Date: 2017-10-10 Impact factor: 3.622

6. Interactions between genetic variants and breast cancer risk factors in the breast and prostate cancer cohort consortium.

Authors: Daniele Campa; Rudolf Kaaks; Loïc Le Marchand; Christopher A Haiman; Ruth C Travis; Christine D Berg; Julie E Buring; Stephen J Chanock; W Ryan Diver; Lucie Dostal; Agnes Fournier; Susan E Hankinson; Brian E Henderson; Robert N Hoover; Claudine Isaacs; Mattias Johansson; Laurence N Kolonel; Peter Kraft; I-Min Lee; Catherine A McCarty; Kim Overvad; Salvatore Panico; Petra H M Peeters; Elio Riboli; Maria José Sanchez; Fredrick R Schumacher; Guri Skeie; Daniel O Stram; Michael J Thun; Dimitrios Trichopoulos; Shumin Zhang; Regina G Ziegler; David J Hunter; Sara Lindström; Federico Canzian
Journal: J Natl Cancer Inst Date: 2011-07-26 Impact factor: 13.506

7. Projecting individualized probabilities of developing breast cancer for white females who are being examined annually.

Authors: M H Gail; L A Brinton; D P Byar; D K Corle; S B Green; C Schairer; J J Mulvihill
Journal: J Natl Cancer Inst Date: 1989-12-20 Impact factor: 13.506

8. Second-generation PLINK: rising to the challenge of larger and richer datasets.

Authors: Christopher C Chang; Carson C Chow; Laurent Cam Tellier; Shashaank Vattikuti; Shaun M Purcell; James J Lee
Journal: Gigascience Date: 2015-02-25 Impact factor: 6.524

9. DNA methylation arrays as surrogate measures of cell mixture distribution.

Authors: Eugene Andres Houseman; William P Accomando; Devin C Koestler; Brock C Christensen; Carmen J Marsit; Heather H Nelson; John K Wiencke; Karl T Kelsey
Journal: BMC Bioinformatics Date: 2012-05-08 Impact factor: 3.169

Review 10. Circulating microRNAs in breast cancer: novel diagnostic and prognostic biomarkers.

Authors: Rimi Hamam; Dana Hamam; Khalid A Alsaleh; Moustapha Kassem; Waleed Zaher; Musaad Alfayez; Abdullah Aldahmash; Nehad M Alajez
Journal: Cell Death Dis Date: 2017-09-07 Impact factor: 8.469

7 in total

Review 1. DNA methylation-based predictors of health: applications and statistical considerations.

Authors: Paul D Yousefi; Matthew Suderman; Ryan Langdon; Oliver Whitehurst; George Davey Smith; Caroline L Relton
Journal: Nat Rev Genet Date: 2022-03-18 Impact factor: 53.242

2. Novel DNA methylation loci and genes showing pleiotropic association with Alzheimer's dementia: a network Mendelian randomization analysis.

Authors: Di Liu; Youxin Wang; Huiquan Jing; Qun Meng; Jingyun Yang
Journal: Epigenetics Date: 2021-08-31 Impact factor: 4.861

3. Blood DNA methylation profiles improve breast cancer prediction.

Authors: Jacob K Kresovich; Zongli Xu; Katie M O'Brien; Min Shi; Clarice R Weinberg; Dale P Sandler; Jack A Taylor
Journal: Mol Oncol Date: 2021-11-09 Impact factor: 7.449

4. A polygenic risk score for nasopharyngeal carcinoma shows potential for risk stratification and personalized screening.

Authors: Yong-Qiao He; Tong-Min Wang; Mingfang Ji; Zhi-Ming Mai; Minzhong Tang; Ruozheng Wang; Yifeng Zhou; Yuming Zheng; Ruowen Xiao; Dawei Yang; Ziyi Wu; Changmi Deng; Jiangbo Zhang; Wenqiong Xue; Siqi Dong; Jiyun Zhan; Yonglin Cai; Fugui Li; Biaohua Wu; Ying Liao; Ting Zhou; Meiqi Zheng; Yijing Jia; Danhua Li; Lianjing Cao; Leilei Yuan; Wenli Zhang; Luting Luo; Xiating Tong; Yanxia Wu; Xizhao Li; Peifen Zhang; Xiaohui Zheng; Shaodan Zhang; Yezhu Hu; Weiling Qin; Bisen Deng; Xuejun Liang; Peiwen Fan; Yaning Feng; Jia Song; Shang-Hang Xie; Ellen T Chang; Zhe Zhang; Guangwu Huang; Miao Xu; Lin Feng; Guangfu Jin; Jinxin Bei; Sumei Cao; Qing Liu; Zisis Kozlakidis; Haiqiang Mai; Ying Sun; Jun Ma; Zhibin Hu; Jianjun Liu; Maria Li Lung; Hans-Olov Adami; Hongbing Shen; Weimin Ye; Tai-Hing Lam; Yi-Xin Zeng; Wei-Hua Jia
Journal: Nat Commun Date: 2022-04-12 Impact factor: 17.694

5. Burden, trends, and risk factors for breast cancer in China from 1990 to 2019 and its predictions until 2034: an up-to-date overview and comparison with those in Japan and South Korea.

Authors: Na Liu; Da-Wei Yang; Yan-Xia Wu; Wen-Qiong Xue; Dan-Hua Li; Jiang-Bo Zhang; Yong-Qiao He; Wei-Hua Jia
Journal: BMC Cancer Date: 2022-07-29 Impact factor: 4.638

6. Bacteria-related changes in host DNA methylation and the risk for CRC.

Authors: Iradj Sobhani; Hugo Rotkopf; Khashayarsha Khazaie
Journal: Gut Microbes Date: 2020-11-09

7. Harnessing Whole Genome Polygenic Risk Scores to Stratify Individuals Based on Cardiometabolic Risk Factors and Biomarkers at Age 10 in the Lifecourse-Brief Report.

Authors: Tom G Richardson; Katie O'Nunain; Caroline L Relton; George Davey Smith
Journal: Arterioscler Thromb Vasc Biol Date: 2022-01-20 Impact factor: 10.514

7 in total