Literature DB >> 30554720

Polygenic Risk Scores for Prediction of Breast Cancer and Breast Cancer Subtypes.

Nasim Mavaddat¹, Kyriaki Michailidou², Joe Dennis³, Michael Lush³, Laura Fachal⁴, Andrew Lee³, Jonathan P Tyrer⁴, Ting-Huei Chen⁵, Qin Wang³, Manjeet K Bolla³, Xin Yang³, Muriel A Adank⁶, Thomas Ahearn⁷, Kristiina Aittomäki⁸, Jamie Allen³, Irene L Andrulis⁹, Hoda Anton-Culver¹⁰, Natalia N Antonenkova¹¹, Volker Arndt¹², Kristan J Aronson¹³, Paul L Auer¹⁴, Päivi Auvinen¹⁵, Myrto Barrdahl¹⁶, Laura E Beane Freeman⁷, Matthias W Beckmann¹⁷, Sabine Behrens¹⁶, Javier Benitez¹⁸, Marina Bermisheva¹⁹, Leslie Bernstein²⁰, Carl Blomqvist²¹, Natalia V Bogdanova²², Stig E Bojesen²³, Bernardo Bonanni²⁴, Anne-Lise Børresen-Dale²⁵, Hiltrud Brauch²⁶, Michael Bremer²⁷, Hermann Brenner²⁸, Adam Brentnall²⁹, Ian W Brock³⁰, Angela Brooks-Wilson³¹, Sara Y Brucker³², Thomas Brüning³³, Barbara Burwinkel³⁴, Daniele Campa³⁵, Brian D Carter³⁶, Jose E Castelao³⁷, Stephen J Chanock⁷, Rowan Chlebowski³⁸, Hans Christiansen²⁷, Christine L Clarke³⁹, J Margriet Collée⁴⁰, Emilie Cordina-Duverger⁴¹, Sten Cornelissen⁴², Fergus J Couch⁴³, Angela Cox³⁰, Simon S Cross⁴⁴, Kamila Czene⁴⁵, Mary B Daly⁴⁶, Peter Devilee⁴⁷, Thilo Dörk⁴⁸, Isabel Dos-Santos-Silva⁴⁹, Martine Dumont⁵⁰, Lorraine Durcan⁵¹, Miriam Dwek⁵², Diana M Eccles⁵³, Arif B Ekici⁵⁴, A Heather Eliassen⁵⁵, Carolina Ellberg⁵⁶, Christoph Engel⁵⁷, Mikael Eriksson⁴⁵, D Gareth Evans⁵⁸, Peter A Fasching⁵⁹, Jonine Figueroa⁶⁰, Olivia Fletcher⁶¹, Henrik Flyger⁶², Asta Försti⁶³, Lin Fritschi⁶⁴, Marike Gabrielson⁴⁵, Manuela Gago-Dominguez⁶⁵, Susan M Gapstur³⁶, José A García-Sáenz⁶⁶, Mia M Gaudet³⁶, Vassilios Georgoulias⁶⁷, Graham G Giles⁶⁸, Irina R Gilyazova⁶⁹, Gord Glendon⁷⁰, Mark S Goldberg⁷¹, David E Goldgar⁷², Anna González-Neira⁷³, Grethe I Grenaker Alnæs⁷⁴, Mervi Grip⁷⁵, Jacek Gronwald⁷⁶, Anne Grundy⁷⁷, Pascal Guénel⁴¹, Lothar Haeberle¹⁷, Eric Hahnen⁷⁸, Christopher A Haiman⁷⁹, Niclas Håkansson⁸⁰, Ute Hamann⁸¹, Susan E Hankinson⁸², Elaine F Harkness⁸³, Steven N Hart⁸⁴, Wei He⁴⁵, Alexander Hein¹⁷, Jane Heyworth⁸⁵, Peter Hillemanns⁴⁸, Antoinette Hollestelle⁸⁶, Maartje J Hooning⁸⁶, Robert N Hoover⁷, John L Hopper⁸⁷, Anthony Howell⁸⁸, Guanmengqian Huang⁸¹, Keith Humphreys⁴⁵, David J Hunter⁸⁹, Milena Jakimovska⁹⁰, Anna Jakubowska⁹¹, Wolfgang Janni⁹², Esther M John⁹³, Nichola Johnson⁶¹, Michael E Jones⁹⁴, Arja Jukkola-Vuorinen⁹⁵, Audrey Jung¹⁶, Rudolf Kaaks¹⁶, Katarzyna Kaczmarek⁷⁶, Vesa Kataja⁹⁶, Renske Keeman⁴², Michael J Kerin⁹⁷, Elza Khusnutdinova⁶⁹, Johanna I Kiiski⁹⁸, Julia A Knight⁹⁹, Yon-Dschun Ko¹⁰⁰, Veli-Matti Kosma¹⁰¹, Stella Koutros⁷, Vessela N Kristensen²⁵, Ute Krüger⁵⁶, Tabea Kühl¹⁰², Diether Lambrechts¹⁰³, Loic Le Marchand¹⁰⁴, Eunjung Lee⁷⁹, Flavio Lejbkowicz¹⁰⁵, Jenna Lilyquist⁸⁴, Annika Lindblom¹⁰⁶, Sara Lindström¹⁰⁷, Jolanta Lissowska¹⁰⁸, Wing-Yee Lo¹⁰⁹, Sibylle Loibl¹¹⁰, Jirong Long¹¹¹, Jan Lubiński⁷⁶, Michael P Lux¹⁷, Robert J MacInnis¹¹², Tom Maishman⁵¹, Enes Makalic⁸⁷, Ivana Maleva Kostovska⁹⁰, Arto Mannermaa¹⁰¹, Siranoush Manoukian¹¹³, Sara Margolin¹¹⁴, John W M Martens⁸⁶, Maria Elena Martinez¹¹⁵, Dimitrios Mavroudis⁶⁷, Catriona McLean¹¹⁶, Alfons Meindl¹¹⁷, Usha Menon¹¹⁸, Pooja Middha¹¹⁹, Nicola Miller⁹⁷, Fernando Moreno⁶⁶, Anna Marie Mulligan¹²⁰, Claire Mulot¹²¹, Victor M Muñoz-Garzon¹²², Susan L Neuhausen²⁰, Heli Nevanlinna⁹⁸, Patrick Neven¹²³, William G Newman⁵⁸, Sune F Nielsen¹²⁴, Børge G Nordestgaard²³, Aaron Norman⁸⁴, Kenneth Offit¹²⁵, Janet E Olson⁸⁴, Håkan Olsson⁵⁶, Nick Orr¹²⁶, V Shane Pankratz¹²⁷, Tjoung-Won Park-Simon⁴⁸, Jose I A Perez¹²⁸, Clara Pérez-Barrios¹²⁹, Paolo Peterlongo¹³⁰, Julian Peto⁴⁹, Mila Pinchev¹⁰⁵, Dijana Plaseska-Karanfilska⁹⁰, Eric C Polley⁸⁴, Ross Prentice¹³¹, Nadege Presneau⁵², Darya Prokofyeva¹³², Kristen Purrington¹³³, Katri Pylkäs¹³⁴, Brigitte Rack⁹², Paolo Radice¹³⁵, Rohini Rau-Murthy¹³⁶, Gad Rennert¹⁰⁵, Hedy S Rennert¹⁰⁵, Valerie Rhenius⁴, Mark Robson¹³⁶, Atocha Romero¹²⁹, Kathryn J Ruddy¹³⁷, Matthias Ruebner¹⁷, Emmanouil Saloustros¹³⁸, Dale P Sandler¹³⁹, Elinor J Sawyer¹⁴⁰, Daniel F Schmidt¹⁴¹, Rita K Schmutzler⁷⁸, Andreas Schneeweiss¹⁴², Minouk J Schoemaker⁹⁴, Fredrick Schumacher¹⁴³, Peter Schürmann⁴⁸, Lukas Schwentner⁹², Christopher Scott⁸⁴, Rodney J Scott¹⁴⁴, Caroline Seynaeve⁸⁶, Mitul Shah⁴, Mark E Sherman¹⁴⁵, Martha J Shrubsole¹¹¹, Xiao-Ou Shu¹¹¹, Susan Slager⁸⁴, Ann Smeets¹²³, Christof Sohn¹⁴², Penny Soucy⁵⁰, Melissa C Southey¹⁴⁶, John J Spinelli¹⁴⁷, Christa Stegmaier¹⁴⁸, Jennifer Stone¹⁴⁹, Anthony J Swerdlow¹⁵⁰, Rulla M Tamimi¹⁵¹, William J Tapper¹⁵², Jack A Taylor¹⁵³, Mary Beth Terry¹⁵⁴, Kathrin Thöne¹⁰², Rob A E M Tollenaar¹⁵⁵, Ian Tomlinson¹⁵⁶, Thérèse Truong⁴¹, Maria Tzardi¹⁵⁷, Hans-Ulrich Ulmer¹⁵⁸, Michael Untch¹⁵⁹, Celine M Vachon⁸⁴, Elke M van Veen⁵⁸, Joseph Vijai¹²⁵, Clarice R Weinberg¹⁶⁰, Camilla Wendt¹¹⁴, Alice S Whittemore¹⁶¹, Hans Wildiers¹²³, Walter Willett¹⁶², Robert Winqvist¹³⁴, Alicja Wolk¹⁶³, Xiaohong R Yang⁷, Drakoulis Yannoukakos¹⁶⁴, Yan Zhang¹², Wei Zheng¹¹¹, Argyrios Ziogas¹⁰, Alison M Dunning⁴, Deborah J Thompson³, Georgia Chenevix-Trench¹⁶⁵, Jenny Chang-Claude¹⁶⁶, Marjanka K Schmidt¹⁶⁷, Per Hall¹⁶⁸, Roger L Milne¹⁶⁹, Paul D P Pharoah¹⁷⁰, Antonis C Antoniou³, Nilanjan Chatterjee¹⁷¹, Peter Kraft¹⁷², Montserrat García-Closas⁷, Jacques Simard⁵⁰, Douglas F Easton¹⁷⁰.

Abstract

Stratification of women according to their risk of breast cancer based on polygenic risk scores (PRSs) could improve screening and prevention strategies. Our aim was to develop PRSs, optimized for prediction of estrogen receptor (ER)-specific disease, from the largest available genome-wide association dataset and to empirically validate the PRSs in prospective studies. The development dataset comprised 94,075 case subjects and 75,017 control subjects of European ancestry from 69 studies, divided into training and validation sets. Samples were genotyped using genome-wide arrays, and single-nucleotide polymorphisms (SNPs) were selected by stepwise regression or lasso penalized regression. The best performing PRSs were validated in an independent test set comprising 11,428 case subjects and 18,323 control subjects from 10 prospective studies and 190,040 women from UK Biobank (3,215 incident breast cancers). For the best PRSs (313 SNPs), the odds ratio for overall disease per 1 standard deviation in ten prospective studies was 1.61 (95%CI: 1.57-1.65) with area under receiver-operator curve (AUC) = 0.630 (95%CI: 0.628-0.651). The lifetime risk of overall breast cancer in the top centile of the PRSs was 32.6%. Compared with women in the middle quintile, those in the highest 1% of risk had 4.37- and 2.78-fold risks, and those in the lowest 1% of risk had 0.16- and 0.27-fold risks, of developing ER-positive and ER-negative disease, respectively. Goodness-of-fit tests indicated that this PRS was well calibrated and predicts disease risk accurately in the tails of the distribution. This PRS is a powerful and reliable predictor of breast cancer risk that may improve breast cancer prevention programs.

Entities: CellLine Chemical Disease Gene Mutation Species

Keywords: breast; cancer; epidemiology; genetic; polygenic; prediction; risk; score; screening; stratification

Mesh：

Substances：
Receptors, Estrogen

Year: 2018 PMID： 30554720 PMCID： PMC6323553 DOI： 10.1016/j.ajhg.2018.11.002

Source DB: PubMed Journal: Am J Hum Genet ISSN： 0002-9297 Impact factor: 11.025

Introduction

Breast cancer is the most common cancer diagnosed among women in Western countries. While rare mutations in genes such as BRCA1 and BRCA2 confer high risks of developing breast cancer, these account for only a small proportion of breast cancer cases in the general population. Multiple common breast cancer susceptibility variants discovered through genome-wide association studies (GWASs)1, 2 confer small risk individually, but their combined effect, when summarized as a polygenic risk score (PRS), can be substantial.3, 4, 5 Such genomic profiles can be used to stratify women according to their risk of developing breast cancer. This in turn holds the promise of improved breast cancer prevention and survival, by targeting screening or other preventative strategies at those women most likely to benefit. We previously derived a PRS based on 77 established breast cancer susceptibility single-nucleotide polymorphisms (SNPs) and reported levels of risk stratification achieved by this PRS. Based on our findings, several studies have investigated the potential for combining PRSs and other known risk factors for risk stratification and evaluated the impact of risk reduction strategies across risk strata defined by the PRS.8, 9, 10 Preliminary studies investigating the use of the PRS to inform targeted breast cancer screening programs are underway (see CORDIS and GenomeCanada in Web Resources).11, 12 Empirical validation and characterization of the PRS in large-scale epidemiological studies has, however, not been carried out previously. In addition, more informative PRSs would improve the clinical utility of risk prediction. GWASs have now identified ∼170 breast cancer susceptibility loci.1, 2 Moreover, genome-wide heritability estimates indicate that these loci explain only ∼40% of the heritability explained by all common variants on genome-wide SNP arrays. This suggests that the discrimination provided by the PRS could be improved by incorporating variants associated at more liberal significance thresholds. In addition, many variants confer risks that differ by breast cancer subtype (estrogen-receptor [ER]-positive or -negative), suggesting that subtype-specific PRSs might allow better prediction of subtype-specific disease, including the more aggressive ER-negative breast cancer, and enable selection of women for preventative medication. Here, we used data from 79 studies conducted by the Breast Cancer Association Consortium (BCAC) to optimize PRSs for overall and subtype-specific disease, and we validate their performance in independent datasets.1, 13, 14, 15

Material and Methods

Study Subjects and Genotyping

The dataset used for development of the PRSs comprised 94,075 breast cancer-affected case subjects and 75,017 control subjects of European ancestry from 69 studies in the BCAC (Tables S1 and S2). Data collection for individual studies is described previously. Samples were genotyped using one of two arrays: iCOGS13, 14 and OncoArray.1, 15 The dataset was divided into a training and validation set. The validation set was randomly selected (approximately 10% of case and control subjects) from studies that had been genotyped with the OncoArray, after excluding studies of bilateral breast cancer, studies or sub-studies oversampling for family history, and individuals with in situ cancers or case subjects with unknown ER status. The best PRSs were evaluated in an independent test dataset comprising 11,428 invasive breast cancer-affected case subjects and 18,323 control subjects from ten studies nested within prospective cohorts, all genotyped using the OncoArray (Tables S3 and S4). The overall breast cancer PRS was also evaluated among 190,040 women of European ancestry from the UK Biobank cohort who had not had any cancer diagnosis or mastectomy prior to recruitment. A total of 3,215 incident registry-confirmed invasive breast cancers developed over 1,381,019 person years of prospective follow-up. Follow-up started 6 months after age of baseline questionnaire. The primary endpoint was invasive breast cancer. Follow-up was censored at the earliest of: risk-reducing mastectomy, diagnosis of any type of cancer, death, or January 15, 2017. Genotype calling, quality control, and imputation for iCOGS and OncoArray were performed as previously described.1, 14 Briefly, imputation was performed for the iCOGS and OncoArray datasets separately using the Phase 3 (October 2014) release of the 1000 Genomes data as reference. We followed a two-stage approach using SHAPEIT for phasing and IMPUTE2 for the imputation. Where samples were genotyped with iCOGS and OncoArray, the OncoArray calling was used. SNPs with MAF > 0.01 and imputation r2 > 0.9 for OncoArray and r2 > 0.3 for iCOGS were included in this analysis (∼7 million SNPs); a higher threshold was imposed for OncoArray to ensure accurate determination of the PRS in the validation and test datasets. UK Biobank samples were genotyped using Affymetrix UK BiLEVE Axiom array and Affymetrix UK Biobank Axiom array and imputed to the combined 1000 Genomes Project v.3 and UK10K reference panels using SHAPEIT3 and IMPUTE3. The lowest imputation info score for the SNPs used in these analyses was 0.86. Samples were included on the basis of female sex (genetic and self-reported) and ethnicity filter (Europeans/White British ancestry subset). Duplicates, individuals with high degree of relatedness (>10 relatives), and one of each related pair of first degree relatives were removed. Samples were also excluded using standard quality control criteria. Participants provided written informed consent, all studies were approved by the relevant ethics committees, and procedures followed were in accordance with the ethical standards of these committees.

Statistical Analysis

The general aim was to derive a PRS of the form:where βk is the per-allele log odds ratio (OR) for breast cancer associated with SNP k, x is the allele dosage for SNP k, and n is the total number of SNPs included in the PRS. Previous analyses found no evidence for statistically significant interactions between SNPs19, 20 and little evidence for departures from a log-additive model for individual SNPs. Assuming this is true in general, the PRS summarizes efficiently the combined effects of SNPs on disease risk. The main challenge is how to determine which SNPs to include and the weighting parameters βk to assign. Inclusion of only those SNPs reaching a stringent significance threshold (“genome-wide significant,” p < 5 × 10−8) threshold ignores information from larger numbers of SNPs that are likely, but not certain, to be associated with the risk of breast cancer. We used two general approaches for model selection: “hard-thresholding,” based on a stepwise regression model that retained SNPs significantly associated with overall or subtype-specific disease at a given threshold, and penalized regression using lasso.21, 22 A schema for the analyses is shown in Figure S1. To prioritize SNPs for analysis, single SNP association tests were first conducted in the training set. Per-allele ORs and standard errors were estimated separately in the iCOGS and OncoArray datasets, adjusting for study and nine ancestry informative principal components (PCs) in the iCOGS dataset and by country and ten PCs in the OncoArray dataset, using a purpose-written program. Combined p values were then derived using a fixed-effects meta-analysis with the software METAL. SNPs were sorted by p value and filtered on LD, such that uncorrelated SNPs (correlation r2 < 0.9) with lowest p value for association with overall breast cancer in the training set were retained (more rigorous pruning, for example at r2 < 0.2, would have removed from consideration informative SNPs from regions with multiple correlated signals24, 25). In the hard thresholding approach, a series of stepwise forward regression analyses were first carried out in 1 Mb regions centered on SNPs significant at a pre-specified threshold for association with either overall and/or subtype-specific disease in the training set. Only SNPs passing the specified p value thresholds were included in each 1 Mb region. Two analyses were performed in parallel: for overall breast cancer and ER-negative disease. At each stage the SNP with the smallest (conditional) p value for any analysis was added to the model, the threshold for the stepwise regression being the same as that for pre-selection. The process was repeated until no further SNPs could be added at the pre-defined threshold. A second stage of stepwise regressions were then carried out across all regions in each chromosome, to take into account correlated SNPs in different regions. Finally, the effect sizes for the selected SNPs were jointly estimated in a single logistic regression model. For the best-performing PRSs, SNPs associated with ER-positive at p < 10−6 but not with overall breast cancer (at p < 10−5) were added at the end of the final SNP list. A third round of stepwise forward regression was then carried out with p value for selection of p < 10−6 for ER-positive disease. For completeness we added to this final PRS two rarer variants (BRCA2 p.Lys3326X and CHEK2 p.Ile157Tyr) which are established to confer a moderate risk of breast cancer and were genotyped on the OncoArray but did not pass the allele frequency threshold in the PRS development phase. For the penalized regression using lasso, we used the program glmnet . SNPs with p < 0.001 in overall BC or ER-negative disease in the training set were pre-selected for inclusion in the lasso, and BRCA2 p.Lys3326X and CHEK2 p.Ile157Thr were added. Covariates for 19 PCs (9 for iCOGs and 10 for Oncoarray) and country were included in each model. For overall breast cancer, the penalty parameter (lambda) giving the best overall breast cancer PRS in the validation set was selected. To construct subtype-specific PRSs, we evaluated four different methods: (1) using effect sizes for overall breast cancer (for each of the subtypes), (2) using effect sizes for subtype-specific (ER-positive or ER-negative) disease, (3) using a hybrid method, in which effect sizes were estimated in the relevant subtype for SNPs passing a certain optimal significance threshold in a case-only logistic regression (ER-positive versus ER-negative disease), and otherwise, using effect sizes estimated for overall breast cancer, or (4) by estimating case-only ORs using lasso and combining these with the overall breast cancer ORs to derive subtype-specific estimates, using the formulae:where η = 0.27 was the proportion of ER-negative tumors in the validation set. For the lasso analysis, effect sizes for subtype-specific disease were estimated using method 4 above, combining the estimates from a case-only lasso analysis with the coefficients for overall breast cancer from the lasso analysis. The lambda for the case-only model giving the best subtype-specific PRS in the validation set was selected. To evaluate the performance of each potential PRS, we standardized the PRSs to have unit standard deviation (SD) in the validation set of control subjects. The association of the standardized PRSs was evaluated in the validation and test (prospective studies) datasets, by logistic regression. We used a Cox proportional hazards regression model to assess the association with risk of breast cancer in UK Biobank. Models were also compared in terms of the area under the receiver operator characteristic curves (AUC), adjusted for study, calculated using the Stata command comproc. Meta-analysis of study-specific effects was carried out using the Stata command metan. The goodness of fit of the continuous model (i.e., assuming a linear association between log(OR) and risk) was tested using the Hosmer-Lemeshow (HL) test to compare the observed and predicted risks by quantile and using the tail-based test proposed by Song et al. In addition, we considered specifically the risks in the highest and lowest 1% of the distribution. Effect modification of the PRS by age and family history of breast cancer in first-degree relatives was evaluated by fitting additional interaction terms in the model. The validation and prospective test datasets were combined for this analysis. The absolute risks of developing breast cancer (overall and subtype-specific disease) were calculated taking into account the competing risk of dying from causes other than breast cancer, as described previously, with the PRS modeled as a continuous covariate and including a linear “age × PRS” interaction term. The absolute risk of developing subtype-specific disease was obtained constraining to the incidence of overall incidence of ER-negative and ER-positive disease in the UK. Women are at risk of developing both ER-negative and ER-positive disease, so the absolute risks were calculated given that the individual has been free of breast cancer of any subtype. Analyses were carried out in R v.3.0.2 and Stata v.14.2. All tests of statistical significance were two-sided. Further details are provided in the Supplemental Material and Methods.

Results

Development of the PRS

We tried several approaches to develop PRSs; here we report results for models giving the highest prediction accuracy. Using stepwise forward selection, the best PRS for prediction of overall breast cancer was obtained at a p value threshold for pre-selection and stepwise regression of p < 10−5 (Table 1). The OR per unit standard deviation (SD) for this 305-SNP PRS with overall breast cancer in the validation set was 1.65 (95%CI: 1.58–1.72), compared with 1.59 (95%CI: 1.52–1.66) using a “genome-wide” (p < 5 × 10−8) threshold (123 SNPs).

Table 1

Comparison of Methods for Deriving the PRS: Results for Overall Breast Cancer in the Validation Set

p Value Cutoffa	SNPs Entering Model (n)	SNPs Selected (n)	ORb	95% CI	AUC
Published PRS⁷

	77	77	1.49	1.44–1.56	0.612

Hard-Thresholding Stepwise Forward Regression

<5 × 10⁻⁸	1,817	123	1.59	1.52–1.66	0.626
<10⁻⁶	2,603	197	1.62	1.55–1.68	0.634
<10⁻⁵	3,818	305	1.65	1.58–1.72	0.637
<10⁻⁴	6,743	669	1.62	1.56–1.69	0.631
<10⁻³	14,760	1,707	1.55	1.49–1.62	0.623

Penalized Regression

Lasso	15,032	3,820	1.71	1.64–1.79	0.647

The p value cut off refers to the SNPs considered based on their marginal associations in the training set; the same p value threshold was used in each case in the stepwise regression. Parameter selection and effect size estimation for derivation of the PRS was carried out in the training set as described in the Material and Methods.

OR per 1 SD for the PRS. OR for association with breast cancer in the validation set was derived using logistic regression adjusting for country and ten PCs. AUCs were adjusted for country. The lasso was carried out after pre-selecting SNPs at p < 10−3 based on their marginal association in the training set. For the lasso λ = 0.003 gave the optimal PRS in the validation set.

Comparison of Methods for Deriving the PRS: Results for Overall Breast Cancer in the Validation Set The p value cut off refers to the SNPs considered based on their marginal associations in the training set; the same p value threshold was used in each case in the stepwise regression. Parameter selection and effect size estimation for derivation of the PRS was carried out in the training set as described in the Material and Methods. OR per 1 SD for the PRS. OR for association with breast cancer in the validation set was derived using logistic regression adjusting for country and ten PCs. AUCs were adjusted for country. The lasso was carried out after pre-selecting SNPs at p < 10−3 based on their marginal association in the training set. For the lasso λ = 0.003 gave the optimal PRS in the validation set. Using lasso regression, the best PRS (OR = 1.71, 95%CI: 1.64–1.79) was more predictive than the best PRS developed using the stepwise regression model. In the best model (λ = 0.003), 3,820 SNPs were selected (Table 1).

Optimizing the PRS for Prediction of Subtype-Specific Disease

For evaluation of subtype-specific models following stepwise regression, SNP effect sizes were estimated, in the first instance, in each disease subtype. The best subtype-specific PRSs using this method were also obtained at a p value threshold of p < 10−5 (Table S5). The 305-SNP PRS was supplemented with 6 additional SNPs associated with ER-positive at p value < 10−6 and, in addition, by two known rare breast cancer susceptibility variants in the BRCA2 and CHEK2 genes, bringing the total number of SNPs included to 313 (PRS313). The optimum subtype-specific PRS was obtained when a subset of these 313 SNPs (196 SNPs with a case-only p value for association with ER-negative versus ER-positive disease of p < 0.025) were given subtype-specific weights, while the remaining SNPs were given overall breast cancer weights. For ER-negative disease, the OR improved from OR = 1.45 (95%CI: 1.35–1.56) to OR = 1.47 (95%CI: 1.37–1.58) using the hybrid method compared with using only subtype-specific estimates, while for ER-positive disease the results were similar (OR = 1.74) (Tables S6 and S7). Subtype-specific prediction using the lasso analysis was optimized using case-only lasso analysis. The OR per 1 SD in the validation set was 1.81 (95%CI: 1.73–1.89) for ER-positive and 1.48 (95%CI: 1.37–1.59) for ER-negative disease (Tables 2 and S8).

Table 2

Association between PRS and Breast Cancer Risk in the Validation Set and Prospective Test Datasets

	Validation Set			Prospective Test Set
	ORa	95% CI	AUC	ORa	95% CI	AUC
77 SNP PRS (PRS₇₇)

Overall BC	1.49	1.44–1.56	0.612	1.46	1.42–1.49	0.603
ER-positive	1.56	1.49–1.63	0.623	1.52	1.48–1.56	0.615
ER-negative	1.40	1.30–1.50	0.596	1.35	1.27–1.43	0.584

313 SNP PRS (PRS₃₁₃)

Overall BC	1.65	1.59–1.72	0.639	1.61	1.57–1.65	0.630
ER-positive	1.74	1.66–1.82	0.651	1.68	1.63–1.73	0.641
ER-negative	1.47	1.37–1.58	0.611	1.45	1.37–1.53	0.601

3,820 SNP PRS (PRS₃₈₂₀)

Overall BC	1.71	1.64–1.79	0.646	1.66	1.61–1.70	0.636
ER-positive	1.81	1.73–1.89	0.659	1.73	1.68–1.78	0.647
ER-negative	1.48	1.37–1.59	0.611	1.44	1.36–1.53	0.600

Parameter selection and effect size estimation for derivation of the PRS was carried out in the training set as described in the Material and Methods. The optimal subtype-specific PRS was obtained by carrying out case-only logistic regression and estimating effect sizes in the relevant subtype for SNPs passing a p value of 0.025 in case-only ordinary logistic regression (ER-positive versus ER-negative disease). OR for association with breast cancer in the validation set derived using logistic regression adjusting for country and ten PCs. AUCs were adjusted for by country. In the prospective test set, logistic regression models were adjusted for study and 15 PCs. AUCs were adjusted for by study.

OR per 1 SD for the PRS.

Association between PRS and Breast Cancer Risk in the Validation Set and Prospective Test Datasets Parameter selection and effect size estimation for derivation of the PRS was carried out in the training set as described in the Material and Methods. The optimal subtype-specific PRS was obtained by carrying out case-only logistic regression and estimating effect sizes in the relevant subtype for SNPs passing a p value of 0.025 in case-only ordinary logistic regression (ER-positive versus ER-negative disease). OR for association with breast cancer in the validation set derived using logistic regression adjusting for country and ten PCs. AUCs were adjusted for by country. In the prospective test set, logistic regression models were adjusted for study and 15 PCs. AUCs were adjusted for by study. OR per 1 SD for the PRS.

Validation of the PRS in the Prospective Test Dataset

The final PRSs were evaluated using data from 11,428 invasive breast cancer-affected case subjects and 18,323 control subjects from ten prospective studies. The ORs for both the overall and subtype-specific PRSs were slightly lower in the prospective test set compared to the validation set (Table 2). The difference between validation and test set may reflect some overfitting due to choosing the optimum p value threshold and for the lasso, the optimum lambda, in the validation set, but could also be due to somewhat different characteristics of the prospective studies. The ORs for overall and ER-positive, but not ER-negative, breast cancer were slightly higher for the 3,820-SNP PRS (PRS3820) compared with PRS313. The odds ratio (OR) for overall disease per 1 standard deviation (SD) of the PRS313 in the prospective studies was 1.61 (95%CI: 1.57–1.65) while for the 77-SNP PRS (PRS77) derived previously OR = 1.46 (95%CI: 1.42–1.49). For ER-negative disease the difference was OR = 1.45 (95%CI: 1.37–1.53) versus 1.35 (95%CI: 1.27–1.43) (Table 2). The associations between the PRS and overall, ER-positive, and ER-negative breast cancer by percentiles of the PRS313 are shown in Figure 1 and Table S9. Compared with women in the middle quintile (40th to 60th percentile), those in the highest 1% of risk for the subtype-specific PRS313 had 4.37 (95%CI: 3.59–5.33)- and 2.78 (95%CI: 1.83–4.24)-fold risks, and those in the lowest 1% had 0.16 (95%CI: 0.09–0.30)- and 0.27 (95%CI: 0.09–0.86)-fold risks of developing ER-positive and ER-negative disease, respectively. The ORs by percentile of the PRS3820 were similar (Table S10).

Figure 1

Association between the 313 SNP Polygenic Risk Score and Breast Cancer Risk

Association between the 313 SNP polygenic risk score (PRS) and breast cancer risk in women of European origin for (A) overall breast cancers, (B) estrogen receptor (ER)-positive disease, and (C) ER-negative disease, in the validation (dashed line) and test (solid line) sets. Odds ratios are for different quantiles of the PRS relative to the mean PRS. Odds ratios and 95% confidence intervals are shown.

Association between the 313 SNP Polygenic Risk Score and Breast Cancer Risk Association between the 313 SNP polygenic risk score (PRS) and breast cancer risk in women of European origin for (A) overall breast cancers, (B) estrogen receptor (ER)-positive disease, and (C) ER-negative disease, in the validation (dashed line) and test (solid line) sets. Odds ratios are for different quantiles of the PRS relative to the mean PRS. Odds ratios and 95% confidence intervals are shown.

Goodness of Fit of the PRS

The remaining analyses concentrated on PRS313. The associations between the PRS and breast cancer risk by percentiles of the risk score were compared with those predicted under a simple polygenic model with the PRS considered as a continuous covariate. The effect sizes did not differ from those predicted, and in particular the estimates for the highest and lowest centile were consistent with the predicted estimates (Table S9). Further tests for goodness of fit and tail-based tests (see Material and Methods) were not statistically significant at p < 0.05. There was no evidence of heterogeneity in the effect sizes among studies (Figure 2). All studies showed a significant association with similar effect sizes for overall and ER-positive breast cancer, and all but one study (FHRISK, based on only six case subjects) showed a significant effect for ER-negative breast cancer.

Figure 2

Prospective Validation for the 313 SNP Polygenic Risk Score

Prospective validation for the 313 SNP polygenic risk score (PRS) by study for (A) overall breast cancer, (B) ER-positive disease, and (C) ER-negative disease. Association between the 313 SNP PRS and breast cancer risk in women of European origin. Odds ratios and 95% confidence intervals are shown. I-squared and p value for heterogeneity were calculated using fixed effect meta-analysis.

Prospective Validation for the 313 SNP Polygenic Risk Score Prospective validation for the 313 SNP polygenic risk score (PRS) by study for (A) overall breast cancer, (B) ER-positive disease, and (C) ER-negative disease. Association between the 313 SNP PRS and breast cancer risk in women of European origin. Odds ratios and 95% confidence intervals are shown. I-squared and p value for heterogeneity were calculated using fixed effect meta-analysis. In the UK Biobank, the estimated hazard ratio (HR) for overall breast cancer per unit PRS (including 306 of the 313 SNPs) was HR = 1.59 (95%CI: 1.54–1.64) (Figure 2). By way of comparison, we also evaluated a PRS based on 177 previously published susceptibility loci.1, 2 The effect size for this PRS (OR = 1.61, 95%CI: 1.57–1.65) in the ten prospective studies was similar to the PRS313. However, this estimated effect size is biased because the validation and test datasets used here contributed to the GWAS discovery datasets; in the UK Biobank this PRS (based on 174 of 177 available SNPs) performed worse (HR = 1.53, 95%CI: 1.48–1.58).

PRS Effects by Age

A weak decline in the OR with age was observed for ER-positive disease (p = 0.001, for the combined validation and test set). There was some evidence that the decline in PRS OR was not linear, driven by a lower estimate below age 40 years (Table S11, Figure S2). There was no evidence of a decline in the OR by age for ER-negative disease (p = 0.39).

Combined Effects of PRS and Breast Cancer Family History

The association between PRS and disease risk was observed for women with and without a family history (Table 3). However, there was some evidence that for ER-positive disease, the PRS OR was smaller in women with a family history (interaction OR = 0.91, p = 0.004). The log OR for family history was attenuated by 21% (1.59 to 1.44) and 12% (1.66 to 1.56) for ER-positive and ER-negative disease, respectively, after adjusting for the PRS (Tables 3 and S12).

Table 3

Associations between the 313-SNP PRS (PRS313) and Breast Cancer Risk by First-Degree Family History of Breast Cancer in the Combined Validation and Prospective Test Dataset

Model	ER-Positive Disease		ER-Negative Disease
Model	ORa	95% CI	ORa	95% CI
Association of PRS and Breast Cancer Risk by Family History

PRS unadjusted	1.67	1.62–1.72	1.44	1.37–1.54
PRS in women without family history	1.71	1.65–1.78	1.45	1.36–1.57
PRS in women with family history	1.55	1.48–1.65	1.40	1.27–1.55
Interaction between PRS and family history	0.91	0.85–0.97 (p = 0.004)	0.96	0.85–1.09 (p = 0.53)

Association between Family History and Breast Cancer Risk (Adjusted and Unadjusted for PRS)

Family history unadjusted for PRS	1.59	1.46–1.72	1.66	1.41–1.95
Family history adjusted for PRS	1.44	1.33–1.57	1.56	1.32–1.83

Association with breast cancer risk was tested for using logistic regression adjusting for study and ten PCs. For these analyses the validation and test datasets were combined. Analyses were restricted to women with known age and family history information. For ER-negative disease, 4,440 women with and 13,132 women without a family history of breast cancer were included in these analyses. For ER-positive disease, 6,787 women with and 17,351 women without a family history of breast cancer were included in these analyses.

OR per 1 SD for the PRS.

Associations between the 313-SNP PRS (PRS313) and Breast Cancer Risk by First-Degree Family History of Breast Cancer in the Combined Validation and Prospective Test Dataset Association with breast cancer risk was tested for using logistic regression adjusting for study and ten PCs. For these analyses the validation and test datasets were combined. Analyses were restricted to women with known age and family history information. For ER-negative disease, 4,440 women with and 13,132 women without a family history of breast cancer were included in these analyses. For ER-positive disease, 6,787 women with and 17,351 women without a family history of breast cancer were included in these analyses. OR per 1 SD for the PRS.

Absolute Risk of Developing Breast Cancer According to the PRS

Estimated lifetime and 10-year absolute risks for UK women in percentiles of the PRS are shown in Figure 3. For ER-positive disease, the estimated lifetime absolute risk by age 80 years ranged from 2% for women in the lowest centile to 31% in the highest centile, while for ER-negative disease, the absolute risks ranged from 0.55% to 4%. The average 10-year absolute risk of breast cancer for a 47-year-old woman (i.e., the age at which women become eligible to enter the UK breast cancer screening program) in the general population is 2.6%. However, the 19% of women with the highest PRSs will attain this level of risk by age 40 years.

Figure 3

Cumulative and 10-Year Absolute Risk of Developing Breast Cancer

Cumulative and 10-year absolute risk of developing breast cancer for (A) overall breast cancer, (B) ER-positive disease, and (C) ER-negative disease by percentiles of the 313 SNP polygenic risk scores (PRSs). Note different scales and PRS categories in the different panels. The red line shows the 2.6% risk threshold corresponding to the mean risk for women aged 47 years. Absolute risks were calculated based on UK incidence and mortality data and using the PRS relative risks estimated as described in the Material and Methods.

Cumulative and 10-Year Absolute Risk of Developing Breast Cancer Cumulative and 10-year absolute risk of developing breast cancer for (A) overall breast cancer, (B) ER-positive disease, and (C) ER-negative disease by percentiles of the 313 SNP polygenic risk scores (PRSs). Note different scales and PRS categories in the different panels. The red line shows the 2.6% risk threshold corresponding to the mean risk for women aged 47 years. Absolute risks were calculated based on UK incidence and mortality data and using the PRS relative risks estimated as described in the Material and Methods.

Discussion

We report development and independent validation of polygenic risk scores for breast cancer, optimized for prediction of subtype-specific disease and based on the largest available GWAS dataset. The best PRS based on a hard thresholding approach included 313 SNPs and was significantly more predictive of risk than the previously reported 77-SNP PRS (OR per 1 SD in the prospective test set: 1.61 versus 1.46; Table 2). The effect sizes were remarkably consistent among the 10 cohorts in the prospective test set, and also consistent with that in the UK Biobank cohort (HR = 1.59, 95%CI: 1.54–1.64). Recently, Khera et al. derived a PRS using our publicly available summary statistics based on analysis of the BCAC data. We were able to construct a PRS based on 5,194 of their 5,218 listed SNPs and compared this to our 313-SNP PRS. In our analysis of this PRS in the prospective UK Biobank data, we obtained a HR of 1.49 (95%CI: 1.44–1.54), substantially lower than that for our PRS313. The corresponding AUCs were 0.613 (95%CI: 0.603–0.623) for their 5,194-SNP PRS versus AUC 0.630 (95%CI: 0.620–0.640) for PRS313. Similarly, PRS313 performed better than the Khera et al. PRS in a Biobank dataset consisting of 7,113 case subjects diagnosed before entry and 183,536 control subjects (AUC = 0.642 versus AUC = 0.627). Khera et al. report a much higher AUC (0.68), perhaps reflecting the inclusion of predictors other than SNPs in their model (for example age or principal components). We specifically aimed to improve prediction for ER-negative breast cancer as to date prediction of this more aggressive disease has been poor. SNP selection was based on association with either ER-negative or overall breast cancer, and the optimum subtype-specific PRSs were derived by weighting a subset of SNPs according to subtype-specific effect sizes, with overall breast cancer weights used for the remaining SNPs. These results are consistent with the observation from genome-wide analyses that the heritability of ER-positive and ER-negative disease are partially correlated. The performance of the PRS313 in predicting ER-negative disease was considerably improved over the PRS77 reported previously (OR = 1.45 versus 1.35). Nevertheless, the prediction is still better for ER-positive than ER-negative disease, reflecting the fact that ER-negative disease is more infrequent and hence the GWAS data are less powerful. The estimated heritability of ER-negative disease is similar to that of overall breast cancer,1, 2 suggesting that more powerful ER-negative PRSs should be achievable with larger sample sizes. The best PRS developed using lasso was more predictive for ER-positive disease but slightly less predictive for ER-negative disease in the prospective studies. Given the small differences between the models, we focused on PRS313 since this should be more straightforward to implement in diagnostic laboratories using next generation sequencing. However, this will change with developing technology, and the cost effectiveness of using a large marker panel should be further investigated. From a clinical viewpoint, an important consideration is the performance of the PRS in the tails of the distribution. According to the standard polygenic model, under which the effects of variants combine multiplicatively, the relationship between the PRS and the log-OR should be linear. The PRS was well calibrated at different quantiles. Even in this large study, we observed no deviation from this model, and in particular the observed risks in the highest and lowest centile were consistent with the predicted risk. The sample sizes in the extreme tails, however, were still relatively small, particularly for ER-negative disease. While the AUC may appear modest, the predicted risk differences in the tails of the distribution are large. For the new PRS313, the women in the top 1% of the distribution have a predicted risk that is approximately 4-fold larger than the risk in the middle quintile. The lifetime risk of overall breast cancer in the top centile of the PRSs, based on UK incidence and mortality data, was 32.6%. Women in the top centile would therefore meet the UK NICE definition of high risk (see Web Resources). In the general population, an estimated 3.6%, 12%, 21%, and 35% of all breast cancers would be expected to occur in women in the highest 1%, 5%, 10%, and 20% of the new PRS313, respectively, compared to only 9% of breast cancers in women in the lowest 20% of the distribution. We observed a decline in the relative risk with age for ER-positive disease but not ER-negative disease. Even for ER-positive disease, however, the predicted relative risk, under a linear model, only declined from 1.89 at age 40 to 1.67 at age 70. While there was some indication of a lower relative risk below age 40 (estimated as 1.63 in the test set; Figure S2), these results indicate that PRS313 is broadly applicable at all ages. We observed an attenuation of the association between breast cancer family history and breast cancer risk after adjustment for the PRS (∼21% for ER-positive, ∼12% for ER-negative disease). This finding is broadly in line with the predicted contribution of the PRS to the familial relative risk of breast cancer. The PRS was predictive in women with and without a family history of breast cancer, but the OR was slightly lower in women with a family history, at least for ER-positive disease. This might reflect a weaker relative effect of the PRS in carriers of BRCA1 or BRCA2 mutations. We note, however, that the absolute differences in risk by PRS will be larger in women with a family history. These results indicate that the joint effects of family history and PRS need to be considered in risk prediction. Although we used the largest training dataset available to date for development of the PRS, further improvement should still be possible. We previously estimated using GWAS data that the theoretically best PRS, if the effect sizes of all common SNPs were known with certainty, would explain ∼41% of the familial risk of breast cancer, corresponding to a standardized OR∼2.1: the PRS313 explains ∼45% of this “chip” heritability. This implies that larger GWASs, coupled with penalized approaches for subtype-specific disease, should further improve the predictive value of the PRS. Certain genomic features, notably transcription factor binding sites, are enriched among susceptibility loci. Preliminary analyses incorporating these features into the analysis did not improve the predictive value, presumably because the enrichment effect was too small to overcome the increased complexity of the model. Better definition of genomic features to predict causal variants, and more sophisticated methods for integrating external biological information into prediction models, may improve the PRS.29, 30 The PRS has the potential to improve stratification for screening, while ER-specific PRSs may be informative for prevention with endocrine therapies. Previous studies have suggested that the earlier PRS77 was more predictive for screen-detected breast cancers than interval cancers, and that breast cancers arising among women with a low PRS are more aggressive compared with those arising in women with a high PRS, perhaps reflecting the stronger associations with ER-positive disease.31, 32 It will therefore be important to evaluate carefully the associations between the new PRS313 and other tumor characteristics. Clinical translational studies are required to assess the risks and benefits of including the PRS in the context of current screening protocols. While the PRS provides powerful risk discrimination, better risk discrimination will be obtained by combining the PRS with family history and other risk factors. This can be accomplished by incorporating the PRS into risk prediction models, in particular BOADICEA, which can allow for the explicit effects of family history, age, genetic, and other risk factors33, 34 (see Supplemental Material and Methods). However, further studies to validate risk models for individualized risk prediction based on the combined effects of genetic and lifestyle risk factors will be needed. In addition, it is important to note that the PRSs generated in this study were developed and validated in white European populations and need to be validated and potentially adapted for other populations.

Consortia

ABCTB Investigators are Christine Clarke, Rosemary Balleine, Robert Baxter, Stephen Braye, Jane Carpenter, Jane Dahlstrom, John Forbes, C. Soon Lee, Deborah Marsh, Adrienne Morey, Nirmala Pathmanathan, Rodney Scott, Peter Simpson, Allan Spigelman, Nicholas Wilcken, Desmond Yip, and Nikolajs Zeps. kConFab/AOCS Investigators are Adrienne Sexton, Alex Dobrovic, Alice Christian, Alison Trainer, Allan Spigelman, Andrew Fellows, Andrew Shelling, Anna De Fazio, Anneke Blackburn, Ashley Crook, Bettina Meiser, Briony Patterson, Christine Clarke, Christobel Saunders, Clare Hunt, Clare Scott, David Amor, David Gallego Ortega, Deb Marsh, Edward Edkins, Elizabeth Salisbury, Eric Haan, Finlay Macrea, Gelareh Farshid, Geoff Lindeman, Georgia Trench, Graham Mann, Graham Giles, Grantley Gill, Heather Thorne, Ian Campbell, Ian Hickie, Liz Caldon, Ingrid Winship, James Cui, James Flanagan, James Kollias, Jane Visvader, Jennifer Stone, Jessica Taylor, Jo Burke, Jodi Saunus, John Forbes, John Hopper, Jonathan Beesley, Judy Kirk, Juliet French, Kathy Tucker, Kathy Wu, Kelly Phillips, Laura Forrest, Lara Lipton, Leslie Andrews, Lizz Lobb, Logan Walker, Maira Kentwell, Mandy Spurdle, Margaret Cummings, Margaret Gleeson, Marion Harris, Mark Jenkins, Mary Anne Young, Martin Delatycki, Mathew Wallis, Matthew Burgess, Melissa Brown, Melissa Southey, Michael Bogwitz, Michael Field, Michael Friedlander, Michael Gattas, Mona Saleh, Morteza Aghmesheh, Nick Hayward, Nick Pachter, Paul Cohen, Pascal Duijf, Paul James, Pete Simpson, Peter Fong, Phyllis Butow, Rachael Williams, Rick Kefford, Rodney Scott, Roger Milne, Rosemary Balleine, Sarah-Jane Dawson, Sheau Lok, Shona O'Connell, Sian Greening, Sophie Nightingale, Stacey Edwards, Stephen Fox, Sue-Anne McLachlan, Sunil Lakhani, Tracy Dudding, and Yoland Antill. NBCS collaborators are Kristine K. Sahlberg, Lars Ottestad, Rolf Kåresen, Ellen Schlichting, Marit Muri Holmen, Toril Sauer, Vilde Haakensen, Olav Engebråten, Bjørn Naume, Alexander Fosså, Cecile E. Kiserud, Kristin V. Reinertsen, Åslaug Helland, Margit Riis, Jürgen Geisler, and OSBREAC.

Declaration of Interests

D.G.E. reports grants from AstraZeneca and AmGen, outside the submitted work; U.M. has stock ownership and has received research funding from Abcodia Pvt Ltd.; A. Smeets reports other from MSD, outside of the submitted work; P.A.F. reports grants and personal fees from Novartis and personal fees from Pfizer, Roche, Teva, and Celgene, outside the submitted work; R.C. declares personal fees from Novartis, AstraZeneca, and Genentech, outside the submitted work. B.R. reports funding for the conduct of the clinical Success trial paid to her institution from AstraZeneca, Chugai, Lilly, Novartis, Veridex (now Janssen Diagnostics), and Sanofi Aventis. M. Robson reports grants, personal fees, and non-financial support from AstraZeneca, personal fees from McKesson, grants and personal fees from Pfizer, non-financial support from Myriad, non-financial support from Invitae, and grants from AbbVie, Tesaro, and Medivation, outside the submitted work; and M.P.L. reports personal fees from Novartis, Pfizer, Roche, Teva, AstraZeneca, Lilly, and Eisai, outside the submitted work.

32 in total

1. Combined associations of genetic and environmental risk factors: implications for prevention of breast cancer.

Authors: Montserrat Garcia-Closas; Necdet Burak Gunsoy; Nilanjan Chatterjee
Journal: J Natl Cancer Inst Date: 2014-11-12 Impact factor: 13.506

2. Public health implications from COGS and potential for risk stratification and screening.

Authors: Hilary Burton; Susmita Chowdhury; Tom Dent; Alison Hall; Nora Pashayan; Paul Pharoah
Journal: Nat Genet Date: 2013-04 Impact factor: 38.330

3. Additive interactions between susceptibility single-nucleotide polymorphisms identified in genome-wide association studies and breast cancer risk factors in the Breast and Prostate Cancer Cohort Consortium.

Authors: Amit D Joshi; Sara Lindström; Anika Hüsing; Myrto Barrdahl; Tyler J VanderWeele; Daniele Campa; Federico Canzian; Mia M Gaudet; Jonine D Figueroa; Laura Baglietto; Christine D Berg; Julie E Buring; Stephen J Chanock; María-Dolores Chirlaque; W Ryan Diver; Laure Dossus; Graham G Giles; Christopher A Haiman; Susan E Hankinson; Brian E Henderson; Robert N Hoover; David J Hunter; Claudine Isaacs; Rudolf Kaaks; Laurence N Kolonel; Vittorio Krogh; Loic Le Marchand; I-Min Lee; Eiliv Lund; Catherine A McCarty; Kim Overvad; Petra H Peeters; Elio Riboli; Fredrick Schumacher; Gianluca Severi; Daniel O Stram; Malin Sund; Michael J Thun; Ruth C Travis; Dimitrios Trichopoulos; Walter C Willett; Shumin Zhang; Regina G Ziegler; Peter Kraft
Journal: Am J Epidemiol Date: 2014-09-25 Impact factor: 4.897

Review 4. Breast Cancer Screening in the Precision Medicine Era: Risk-Based Screening in a Population-Based Trial.

Authors: Yiwey Shieh; Martin Eklund; Lisa Madlensky; Sarah D Sawyer; Carlie K Thompson; Allison Stover Fiscalini; Elad Ziv; Laura J Van't Veer; Laura J Esserman; Jeffrey A Tice
Journal: J Natl Cancer Inst Date: 2017-01-27 Impact factor: 13.506

5. Inclusion of biological knowledge in a Bayesian shrinkage model for joint estimation of SNP effects.

Authors: Miguel Pereira; John R Thompson; Christian X Weichenberger; Duncan C Thomas; Cosetta Minelli
Journal: Genet Epidemiol Date: 2017-04-10 Impact factor: 2.135

6. Haplotype estimation for biobank-scale data sets.

Authors: Jared O'Connell; Kevin Sharp; Nick Shrine; Louise Wain; Ian Hall; Martin Tobin; Jean-Francois Zagury; Olivier Delaneau; Jonathan Marchini
Journal: Nat Genet Date: 2016-06-06 Impact factor: 38.330

7. Prediction of breast cancer risk based on profiling with common genetic variants.

Authors: Nasim Mavaddat; Paul D P Pharoah; Kyriaki Michailidou; Jonathan Tyrer; Mark N Brook; Manjeet K Bolla; Qin Wang; Joe Dennis; Alison M Dunning; Mitul Shah; Robert Luben; Judith Brown; Stig E Bojesen; Børge G Nordestgaard; Sune F Nielsen; Henrik Flyger; Kamila Czene; Hatef Darabi; Mikael Eriksson; Julian Peto; Isabel Dos-Santos-Silva; Frank Dudbridge; Nichola Johnson; Marjanka K Schmidt; Annegien Broeks; Senno Verhoef; Emiel J Rutgers; Anthony Swerdlow; Alan Ashworth; Nick Orr; Minouk J Schoemaker; Jonine Figueroa; Stephen J Chanock; Louise Brinton; Jolanta Lissowska; Fergus J Couch; Janet E Olson; Celine Vachon; Vernon S Pankratz; Diether Lambrechts; Hans Wildiers; Chantal Van Ongeval; Erik van Limbergen; Vessela Kristensen; Grethe Grenaker Alnæs; Silje Nord; Anne-Lise Borresen-Dale; Heli Nevanlinna; Taru A Muranen; Kristiina Aittomäki; Carl Blomqvist; Jenny Chang-Claude; Anja Rudolph; Petra Seibold; Dieter Flesch-Janys; Peter A Fasching; Lothar Haeberle; Arif B Ekici; Matthias W Beckmann; Barbara Burwinkel; Frederik Marme; Andreas Schneeweiss; Christof Sohn; Amy Trentham-Dietz; Polly Newcomb; Linda Titus; Kathleen M Egan; David J Hunter; Sara Lindstrom; Rulla M Tamimi; Peter Kraft; Nazneen Rahman; Clare Turnbull; Anthony Renwick; Sheila Seal; Jingmei Li; Jianjun Liu; Keith Humphreys; Javier Benitez; M Pilar Zamora; Jose Ignacio Arias Perez; Primitiva Menéndez; Anna Jakubowska; Jan Lubinski; Katarzyna Jaworska-Bieniek; Katarzyna Durda; Natalia V Bogdanova; Natalia N Antonenkova; Thilo Dörk; Hoda Anton-Culver; Susan L Neuhausen; Argyrios Ziogas; Leslie Bernstein; Peter Devilee; Robert A E M Tollenaar; Caroline Seynaeve; Christi J van Asperen; Angela Cox; Simon S Cross; Malcolm W R Reed; Elza Khusnutdinova; Marina Bermisheva; Darya Prokofyeva; Zalina Takhirova; Alfons Meindl; Rita K Schmutzler; Christian Sutter; Rongxi Yang; Peter Schürmann; Michael Bremer; Hans Christiansen; Tjoung-Won Park-Simon; Peter Hillemanns; Pascal Guénel; Thérèse Truong; Florence Menegaux; Marie Sanchez; Paolo Radice; Paolo Peterlongo; Siranoush Manoukian; Valeria Pensotti; John L Hopper; Helen Tsimiklis; Carmel Apicella; Melissa C Southey; Hiltrud Brauch; Thomas Brüning; Yon-Dschun Ko; Alice J Sigurdson; Michele M Doody; Ute Hamann; Diana Torres; Hans-Ulrich Ulmer; Asta Försti; Elinor J Sawyer; Ian Tomlinson; Michael J Kerin; Nicola Miller; Irene L Andrulis; Julia A Knight; Gord Glendon; Anna Marie Mulligan; Georgia Chenevix-Trench; Rosemary Balleine; Graham G Giles; Roger L Milne; Catriona McLean; Annika Lindblom; Sara Margolin; Christopher A Haiman; Brian E Henderson; Fredrick Schumacher; Loic Le Marchand; Ursula Eilber; Shan Wang-Gohrke; Maartje J Hooning; Antoinette Hollestelle; Ans M W van den Ouweland; Linetta B Koppert; Jane Carpenter; Christine Clarke; Rodney Scott; Arto Mannermaa; Vesa Kataja; Veli-Matti Kosma; Jaana M Hartikainen; Hermann Brenner; Volker Arndt; Christa Stegmaier; Aida Karina Dieffenbach; Robert Winqvist; Katri Pylkäs; Arja Jukkola-Vuorinen; Mervi Grip; Kenneth Offit; Joseph Vijai; Mark Robson; Rohini Rau-Murthy; Miriam Dwek; Ruth Swann; Katherine Annie Perkins; Mark S Goldberg; France Labrèche; Martine Dumont; Diana M Eccles; William J Tapper; Sajjad Rafiq; Esther M John; Alice S Whittemore; Susan Slager; Drakoulis Yannoukakos; Amanda E Toland; Song Yao; Wei Zheng; Sandra L Halverson; Anna González-Neira; Guillermo Pita; M Rosario Alonso; Nuria Álvarez; Daniel Herrero; Daniel C Tessier; Daniel Vincent; Francois Bacot; Craig Luccarini; Caroline Baynes; Shahana Ahmed; Mel Maranian; Catherine S Healey; Jacques Simard; Per Hall; Douglas F Easton; Montserrat Garcia-Closas
Journal: J Natl Cancer Inst Date: 2015-04-08 Impact factor: 13.506

8. Fine-scale mapping of the FGFR2 breast cancer risk locus: putative functional variants differentially bind FOXA1 and E2F1.

Authors: Kerstin B Meyer; Martin O'Reilly; Kyriaki Michailidou; Saskia Carlebur; Stacey L Edwards; Juliet D French; Radhika Prathalingham; Joe Dennis; Manjeet K Bolla; Qin Wang; Ines de Santiago; John L Hopper; Helen Tsimiklis; Carmel Apicella; Melissa C Southey; Marjanka K Schmidt; Annegien Broeks; Laura J Van 't Veer; Frans B Hogervorst; Kenneth Muir; Artitaya Lophatananon; Sarah Stewart-Brown; Pornthep Siriwanarangsan; Peter A Fasching; Michael P Lux; Arif B Ekici; Matthias W Beckmann; Julian Peto; Isabel Dos Santos Silva; Olivia Fletcher; Nichola Johnson; Elinor J Sawyer; Ian Tomlinson; Michael J Kerin; Nicola Miller; Federick Marme; Andreas Schneeweiss; Christof Sohn; Barbara Burwinkel; Pascal Guénel; Thérèse Truong; Pierre Laurent-Puig; Florence Menegaux; Stig E Bojesen; Børge G Nordestgaard; Sune F Nielsen; Henrik Flyger; Roger L Milne; M Pilar Zamora; Jose I Arias; Javier Benitez; Susan Neuhausen; Hoda Anton-Culver; Argyrios Ziogas; Christina C Dur; Hermann Brenner; Heiko Müller; Volker Arndt; Christa Stegmaier; Alfons Meindl; Rita K Schmutzler; Christoph Engel; Nina Ditsch; Hiltrud Brauch; Thomas Brüning; Yon-Dschun Ko; Heli Nevanlinna; Taru A Muranen; Kristiina Aittomäki; Carl Blomqvist; Keitaro Matsuo; Hidemi Ito; Hiroji Iwata; Yasushi Yatabe; Thilo Dörk; Sonja Helbig; Natalia V Bogdanova; Annika Lindblom; Sara Margolin; Arto Mannermaa; Vesa Kataja; Veli-Matti Kosma; Jaana M Hartikainen; Georgia Chenevix-Trench; Anna H Wu; Chiu-Chen Tseng; David Van Den Berg; Daniel O Stram; Diether Lambrechts; Bernard Thienpont; Marie-Rose Christiaens; Ann Smeets; Jenny Chang-Claude; Anja Rudolph; Petra Seibold; Dieter Flesch-Janys; Paolo Radice; Paolo Peterlongo; Bernardo Bonanni; Loris Bernard; Fergus J Couch; Janet E Olson; Xianshu Wang; Kristen Purrington; Graham G Giles; Gianluca Severi; Laura Baglietto; Catriona McLean; Christopher A Haiman; Brian E Henderson; Fredrick Schumacher; Loic Le Marchand; Jacques Simard; Mark S Goldberg; France Labrèche; Martine Dumont; Soo-Hwang Teo; Cheng-Har Yip; Sze-Yee Phuah; Vessela Kristensen; Grethe Grenaker Alnæs; Anne-Lise Børresen-Dale; Wei Zheng; Sandra Deming-Halverson; Martha Shrubsole; Jirong Long; Robert Winqvist; Katri Pylkäs; Arja Jukkola-Vuorinen; Saila Kauppila; Irene L Andrulis; Julia A Knight; Gord Glendon; Sandrine Tchatchou; Peter Devilee; Robert A E M Tollenaar; Caroline M Seynaeve; Montserrat García-Closas; Jonine Figueroa; Stephen J Chanock; Jolanta Lissowska; Kamila Czene; Hartef Darabi; Kimael Eriksson; Maartje J Hooning; John W M Martens; Ans M W van den Ouweland; Carolien H M van Deurzen; Per Hall; Jingmei Li; Jianjun Liu; Keith Humphreys; Xiao-Ou Shu; Wei Lu; Yu-Tang Gao; Hui Cai; Angela Cox; Malcolm W R Reed; William Blot; Lisa B Signorello; Qiuyin Cai; Paul D P Pharoah; Maya Ghoussaini; Patricia Harrington; Jonathan Tyrer; Daehee Kang; Ji-Yeob Choi; Sue K Park; Dong-Young Noh; Mikael Hartman; Miao Hui; Wei-Yen Lim; Shaik A Buhari; Ute Hamann; Asta Försti; Thomas Rüdiger; Hans-Ulrich Ulmer; Anna Jakubowska; Jan Lubinski; Katarzyna Jaworska; Katarzyna Durda; Suleeporn Sangrajrang; Valerie Gaborieau; Paul Brennan; James McKay; Celine Vachon; Susan Slager; Florentia Fostira; Robert Pilarski; Chen-Yang Shen; Chia-Ni Hsiung; Pei-Ei Wu; Ming-Feng Hou; Anthony Swerdlow; Alan Ashworth; Nick Orr; Minouk J Schoemaker; Bruce A J Ponder; Alison M Dunning; Douglas F Easton
Journal: Am J Hum Genet Date: 2013-11-27 Impact factor: 11.025

9. Large-scale genotyping identifies 41 new loci associated with breast cancer risk.

Authors: Kyriaki Michailidou; Per Hall; Anna Gonzalez-Neira; Maya Ghoussaini; Joe Dennis; Roger L Milne; Marjanka K Schmidt; Jenny Chang-Claude; Stig E Bojesen; Manjeet K Bolla; Qin Wang; Ed Dicks; Andrew Lee; Clare Turnbull; Nazneen Rahman; Olivia Fletcher; Julian Peto; Lorna Gibson; Isabel Dos Santos Silva; Heli Nevanlinna; Taru A Muranen; Kristiina Aittomäki; Carl Blomqvist; Kamila Czene; Astrid Irwanto; Jianjun Liu; Quinten Waisfisz; Hanne Meijers-Heijboer; Muriel Adank; Rob B van der Luijt; Rebecca Hein; Norbert Dahmen; Lars Beckman; Alfons Meindl; Rita K Schmutzler; Bertram Müller-Myhsok; Peter Lichtner; John L Hopper; Melissa C Southey; Enes Makalic; Daniel F Schmidt; Andre G Uitterlinden; Albert Hofman; David J Hunter; Stephen J Chanock; Daniel Vincent; François Bacot; Daniel C Tessier; Sander Canisius; Lodewyk F A Wessels; Christopher A Haiman; Mitul Shah; Robert Luben; Judith Brown; Craig Luccarini; Nils Schoof; Keith Humphreys; Jingmei Li; Børge G Nordestgaard; Sune F Nielsen; Henrik Flyger; Fergus J Couch; Xianshu Wang; Celine Vachon; Kristen N Stevens; Diether Lambrechts; Matthieu Moisse; Robert Paridaens; Marie-Rose Christiaens; Anja Rudolph; Stefan Nickels; Dieter Flesch-Janys; Nichola Johnson; Zoe Aitken; Kirsimari Aaltonen; Tuomas Heikkinen; Annegien Broeks; Laura J Van't Veer; C Ellen van der Schoot; Pascal Guénel; Thérèse Truong; Pierre Laurent-Puig; Florence Menegaux; Frederik Marme; Andreas Schneeweiss; Christof Sohn; Barbara Burwinkel; M Pilar Zamora; Jose Ignacio Arias Perez; Guillermo Pita; M Rosario Alonso; Angela Cox; Ian W Brock; Simon S Cross; Malcolm W R Reed; Elinor J Sawyer; Ian Tomlinson; Michael J Kerin; Nicola Miller; Brian E Henderson; Fredrick Schumacher; Loic Le Marchand; Irene L Andrulis; Julia A Knight; Gord Glendon; Anna Marie Mulligan; Annika Lindblom; Sara Margolin; Maartje J Hooning; Antoinette Hollestelle; Ans M W van den Ouweland; Agnes Jager; Quang M Bui; Jennifer Stone; Gillian S Dite; Carmel Apicella; Helen Tsimiklis; Graham G Giles; Gianluca Severi; Laura Baglietto; Peter A Fasching; Lothar Haeberle; Arif B Ekici; Matthias W Beckmann; Hermann Brenner; Heiko Müller; Volker Arndt; Christa Stegmaier; Anthony Swerdlow; Alan Ashworth; Nick Orr; Michael Jones; Jonine Figueroa; Jolanta Lissowska; Louise Brinton; Mark S Goldberg; France Labrèche; Martine Dumont; Robert Winqvist; Katri Pylkäs; Arja Jukkola-Vuorinen; Mervi Grip; Hiltrud Brauch; Ute Hamann; Thomas Brüning; Paolo Radice; Paolo Peterlongo; Siranoush Manoukian; Bernardo Bonanni; Peter Devilee; Rob A E M Tollenaar; Caroline Seynaeve; Christi J van Asperen; Anna Jakubowska; Jan Lubinski; Katarzyna Jaworska; Katarzyna Durda; Arto Mannermaa; Vesa Kataja; Veli-Matti Kosma; Jaana M Hartikainen; Natalia V Bogdanova; Natalia N Antonenkova; Thilo Dörk; Vessela N Kristensen; Hoda Anton-Culver; Susan Slager; Amanda E Toland; Stephen Edge; Florentia Fostira; Daehee Kang; Keun-Young Yoo; Dong-Young Noh; Keitaro Matsuo; Hidemi Ito; Hiroji Iwata; Aiko Sueta; Anna H Wu; Chiu-Chen Tseng; David Van Den Berg; Daniel O Stram; Xiao-Ou Shu; Wei Lu; Yu-Tang Gao; Hui Cai; Soo Hwang Teo; Cheng Har Yip; Sze Yee Phuah; Belinda K Cornes; Mikael Hartman; Hui Miao; Wei Yen Lim; Jen-Hwei Sng; Kenneth Muir; Artitaya Lophatananon; Sarah Stewart-Brown; Pornthep Siriwanarangsan; Chen-Yang Shen; Chia-Ni Hsiung; Pei-Ei Wu; Shian-Ling Ding; Suleeporn Sangrajrang; Valerie Gaborieau; Paul Brennan; James McKay; William J Blot; Lisa B Signorello; Qiuyin Cai; Wei Zheng; Sandra Deming-Halverson; Martha Shrubsole; Jirong Long; Jacques Simard; Montse Garcia-Closas; Paul D P Pharoah; Georgia Chenevix-Trench; Alison M Dunning; Javier Benitez; Douglas F Easton
Journal: Nat Genet Date: 2013-04 Impact factor: 38.330

10. Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations.

Authors: Amit V Khera; Mark Chaffin; Krishna G Aragam; Mary E Haas; Carolina Roselli; Seung Hoan Choi; Pradeep Natarajan; Eric S Lander; Steven A Lubitz; Patrick T Ellinor; Sekar Kathiresan
Journal: Nat Genet Date: 2018-08-13 Impact factor: 38.330

223 in total

Review 1. A systems approach to clinical oncology uses deep phenotyping to deliver personalized care.

Authors: James T Yurkovich; Qiang Tian; Nathan D Price; Leroy Hood
Journal: Nat Rev Clin Oncol Date: 2019-10-16 Impact factor: 66.675

2. External Validation of Risk Prediction Models Incorporating Common Genetic Variants for Incident Colorectal Cancer Using UK Biobank.

Authors: Catherine L Saunders; Britt Kilian; Deborah J Thompson; Luke J McGeoch; Simon J Griffin; Antonis C Antoniou; Jon D Emery; Fiona M Walter; Joe Dennis; Xin Yang; Juliet A Usher-Smith
Journal: Cancer Prev Res (Phila) Date: 2020-02-18

3. Age dependency of the polygenic risk score for colorectal cancer.

Authors: Shuai Li; John L Hopper
Journal: Am J Hum Genet Date: 2021-03-04 Impact factor: 11.025

4. Breast Cancer-Related Low Penetrance Genes.

Authors: Daehee Kang; Ji-Yeob Choi
Journal: Adv Exp Med Biol Date: 2021 Impact factor: 2.622

5. Assessing thyroid cancer risk using polygenic risk scores.

Authors: Sandya Liyanarachchi; Julius Gudmundsson; Egil Ferkingstad; Huiling He; Jon G Jonasson; Vinicius Tragante; Folkert W Asselbergs; Li Xu; Lambertus A Kiemeney; Romana T Netea-Maier; Jose I Mayordomo; Theo S Plantinga; Hannes Hjartarson; Jon Hrafnkelsson; Erich M Sturgis; Pamela Brock; Fadi Nabhan; Gudmar Thorleifsson; Matthew D Ringel; Kari Stefansson; Albert de la Chapelle
Journal: Proc Natl Acad Sci U S A Date: 2020-03-04 Impact factor: 11.205

Review 6. Advancing the use of genome-wide association studies for drug repurposing.

Authors: William R Reay; Murray J Cairns
Journal: Nat Rev Genet Date: 2021-07-23 Impact factor: 53.242

7. Data Sharing for the Public Good.

Authors: Kathy J Helzlsouer; Jill Reedy
Journal: J Natl Cancer Inst Date: 2020-09-01 Impact factor: 13.506

8. Polygenic Risk Scores for Breast Cancer Risk Prediction: Lessons Learned and Future Opportunities.

Authors: Julie R Palmer
Journal: J Natl Cancer Inst Date: 2020-06-01 Impact factor: 13.506

9. Sex-Stratified Polygenic Risk Score Identifies Individuals at Increased Risk of Basal Cell Carcinoma.

Authors: Michelle R Roberts; Joanne E Sordillo; Peter Kraft; Maryam M Asgari
Journal: J Invest Dermatol Date: 2019-11-01 Impact factor: 8.551

10. Combined Utility of 25 Disease and Risk Factor Polygenic Risk Scores for Stratifying Risk of All-Cause Mortality.

Authors: Allison Meisner; Prosenjit Kundu; Yan Dora Zhang; Lauren V Lan; Sungwon Kim; Disha Ghandwani; Parichoy Pal Choudhury; Sonja I Berndt; Neal D Freedman; Montserrat Garcia-Closas; Nilanjan Chatterjee
Journal: Am J Hum Genet Date: 2020-08-05 Impact factor: 11.025