Literature DB >> 26414677

Large-scale genomic analyses link reproductive aging to hypothalamic signaling, breast cancer susceptibility and BRCA1-mediated DNA repair.

Felix R Day¹, Katherine S Ruth², Deborah J Thompson³, Kathryn L Lunetta^4,5, Natalia Pervjakova^6,7, Daniel I Chasman^8,9, Lisette Stolk^10,11, Hilary K Finucane^12,13, Patrick Sulem¹⁴, Brendan Bulik-Sullivan^15,16,17, Tõnu Esko^6,18,19,20, Andrew D Johnson⁵, Cathy E Elks¹, Nora Franceschini²¹, Chunyan He^22,23, Elisabeth Altmaier^24,25,26, Jennifer A Brody²⁷, Lude L Franke²⁸, Jennifer E Huffman^5,29, Margaux F Keller³⁰, Patrick F McArdle³¹, Teresa Nutile³², Eleonora Porcu^33,34,35, Antonietta Robino³⁶, Lynda M Rose⁸, Ursula M Schick³⁷, Jennifer A Smith³⁸, Alexander Teumer³⁹, Michela Traglia⁴⁰, Dragana Vuckovic^36,41, Jie Yao⁴², Wei Zhao³⁸, Eva Albrecht²⁵, Najaf Amin⁴³, Tanguy Corre^44,45, Jouke-Jan Hottenga⁴⁶, Massimo Mangino^47,48, Albert V Smith^49,50, Toshiko Tanaka⁵¹, Goncalo Abecasis³⁵, Irene L Andrulis^52,53, Hoda Anton-Culver⁵⁴, Antonis C Antoniou³, Volker Arndt⁵⁵, Alice M Arnold⁵⁶, Caterina Barbieri^36,40, Matthias W Beckmann⁵⁷, Alicia Beeghly-Fadiel⁵⁸, Javier Benitez^59,60, Leslie Bernstein⁶¹, Suzette J Bielinski⁶², Carl Blomqvist⁶³, Eric Boerwinkle^64,65, Natalia V Bogdanova⁶⁶, Stig E Bojesen^67,68, Manjeet K Bolla³, Anne-Lise Borresen-Dale^69,70, Thibaud S Boutin²⁹, Hiltrud Brauch^71,72,73, Hermann Brenner^55,73,74, Thomas Brüning⁷⁵, Barbara Burwinkel^76,77, Archie Campbell⁷⁸, Harry Campbell⁷⁹, Stephen J Chanock⁸⁰, J Ross Chapman⁸¹, Yii-Der Ida Chen⁴², Georgia Chenevix-Trench⁸², Fergus J Couch⁸³, Andrea D Coviello⁸⁴, Angela Cox⁸⁵, Kamila Czene⁸⁶, Hatef Darabi⁸⁶, Immaculata De Vivo^12,87, Ellen W Demerath⁸⁸, Joe Dennis³, Peter Devilee^89,90, Thilo Dörk⁹¹, Isabel Dos-Santos-Silva⁹², Alison M Dunning⁹³, John D Eicher⁵, Peter A Fasching^57,94, Jessica D Faul⁹⁵, Jonine Figueroa⁹⁶, Dieter Flesch-Janys^97,98, Ilaria Gandin^36,41, Melissa E Garcia⁹⁹, Montserrat García-Closas^100,101, Graham G Giles^102,103, Giorgia G Girotto⁴¹, Mark S Goldberg^104,105, Anna González-Neira⁵⁹, Mark O Goodarzi¹⁰⁶, Megan L Grove⁶⁴, Daniel F Gudbjartsson^14,107, Pascal Guénel^108,109, Xiuqing Guo⁴², Christopher A Haiman¹¹⁰, Per Hall⁸⁶, Ute Hamann¹¹¹, Brian E Henderson¹¹⁰, Lynne J Hocking¹¹², Albert Hofman⁴³, Georg Homuth¹¹³, Maartje J Hooning¹¹⁴, John L Hopper¹⁰², Frank B Hu^12,87,115, Jinyan Huang¹¹⁶, Keith Humphreys⁸⁶, David J Hunter^12,20,87,115, Anna Jakubowska¹¹⁷, Samuel E Jones², Maria Kabisch¹¹¹, David Karasik^9,118, Julia A Knight^119,120, Ivana Kolcic¹²¹, Charles Kooperberg³⁷, Veli-Matti Kosma^122,123,124, Jennifer Kriebel^24,26,125, Vessela Kristensen^69,70,126, Diether Lambrechts^127,128, Claudia Langenberg¹, Jingmei Li⁸⁶, Xin Li¹², Sara Lindström¹², Yongmei Liu¹²⁹, Jian'an Luan¹, Jan Lubinski¹¹⁷, Reedik Mägi⁶, Arto Mannermaa^122,123,124, Judith Manz^24,26, Sara Margolin¹³⁰, Jonathan Marten²⁹, Nicholas G Martin¹³¹, Corrado Masciullo⁴⁰, Alfons Meindl¹³², Kyriaki Michailidou³, Evelin Mihailov⁶, Lili Milani⁶, Roger L Milne^102,103, Martina Müller-Nurasyid^25,133,134, Michael Nalls¹³⁵, Ben M Neale^15,16,17, Heli Nevanlinna¹³⁶, Patrick Neven¹³⁷, Anne B Newman^138,139,140, Børge G Nordestgaard^67,68, Janet E Olson⁶², Sandosh Padmanabhan¹⁴¹, Paolo Peterlongo¹⁴², Ulrike Peters³⁷, Astrid Petersmann¹⁴³, Julian Peto⁹², Paul D P Pharoah^3,93, Nicola N Pirastu^36,41, Ailith Pirie³, Giorgio Pistis^33,34,35, Ozren Polasek¹²¹, David Porteous⁷⁸, Bruce M Psaty^{27,144,145,146}, Katri Pylkäs^147,148, Paolo Radice¹⁴⁹, Leslie J Raffel^150,151, Fernando Rivadeneira^10,11,43, Igor Rudan⁷⁹, Anja Rudolph¹⁵², Daniela Ruggiero³², Cinzia F Sala⁴⁰, Serena Sanna³³, Elinor J Sawyer¹⁵³, David Schlessinger¹⁵⁴, Marjanka K Schmidt¹⁵⁵, Frank Schmidt¹¹³, Rita K Schmutzler^156,157,158, Minouk J Schoemaker¹⁰⁰, Robert A Scott¹, Caroline M Seynaeve¹¹⁴, Jacques Simard¹⁵⁹, Rossella Sorice³², Melissa C Southey¹⁶⁰, Doris Stöckl²⁶, Konstantin Strauch^25,161, Anthony Swerdlow^100,162, Kent D Taylor⁴², Unnur Thorsteinsdottir^14,50, Amanda E Toland¹⁶³, Ian Tomlinson^81,164, Thérèse Truong^108,109, Laufey Tryggvadottir¹⁶⁵, Stephen T Turner¹⁶⁶, Diego Vozzi³⁶, Qin Wang³, Melissa Wellons¹⁶⁷, Gonneke Willemsen⁴⁶, James F Wilson^29,79, Robert Winqvist^147,148, Bruce B H R Wolffenbuttel^168,169, Alan F Wright²⁹, Drakoulis Yannoukakos¹⁷⁰, Tatijana Zemunik¹²¹, Wei Zheng⁵⁸, Marek Zygmunt¹⁷¹, Sven Bergmann^44,45, Dorret I Boomsma⁴⁶, Julie E Buring^8,9, Luigi Ferrucci⁵¹, Grant W Montgomery¹³¹, Vilmundur Gudnason^49,50, Tim D Spector⁴⁷, Cornelia M van Duijn⁴³, Behrooz Z Alizadeh¹⁷², Marina Ciullo³², Laura Crisponi³³, Douglas F Easton^3,93, Paolo P Gasparini^36,41, Christian Gieger^24,25,26, Tamara B Harris⁹⁹, Caroline Hayward²⁹, Sharon L R Kardia³⁸, Peter Kraft^12,173, Barbara McKnight⁵⁶, Andres Metspalu⁶, Alanna C Morrison⁶⁴, Alex P Reiner^37,144, Paul M Ridker^8,9, Jerome I Rotter⁴², Daniela Toniolo⁴⁰, André G Uitterlinden^10,11,43, Sheila Ulivi³⁶, Henry Völzke³⁹, Nicholas J Wareham¹, David R Weir⁹⁵, Laura M Yerges-Armstrong³¹, Alkes L Price¹², Kari Stefansson^14,50, Jenny A Visser¹⁰, Ken K Ong^1,174, Jenny Chang-Claude¹⁵², Joanne M Murabito^5,175, John R B Perry¹, Anna Murray².

Abstract

Menopause timing has a substantial impact on infertility and risk of disease, including breast cancer, but the underlying mechanisms are poorly understood. We report a dual strategy in ∼70,000 women to identify common and low-frequency protein-coding variation associated with age at natural menopause (ANM). We identified 44 regions with common variants, including two regions harboring additional rare missense alleles of large effect. We found enrichment of signals in or near genes involved in delayed puberty, highlighting the first molecular links between the onset and end of reproductive lifespan. Pathway analyses identified major association with DNA damage response (DDR) genes, including the first common coding variant in BRCA1 associated with any complex trait. Mendelian randomization analyses supported a causal effect of later ANM on breast cancer risk (∼6% increase in risk per year; P = 3 × 10(-14)), likely mediated by prolonged sex hormone exposure rather than DDR mechanisms.

Entities: Chemical

Mesh：

Substances：
BRCA1 Protein

Year: 2015 PMID： 26414677 PMCID： PMC4661791 DOI： 10.1038/ng.3412

Source DB: PubMed Journal: Nat Genet ISSN： 1061-4036 Impact factor: 38.330

Introduction

Younger age at natural (non-surgical) menopause (ANM) is associated with lower risk of breast cancer, but higher risks of osteoporosis, cardiovascular disease and type 2 diabetes [1]. Early menopause also has a substantial impact on fertility. It is estimated that natural fertility ceases on average 10 years before menopause [2], which is becoming increasingly relevant as women in many populations are delaying childbearing. For example, the birth rate in British women aged 30-34 years is now higher than in any other half decade (http://www.ons.gov.uk/ons/publications/). ANM is on average 51 years in Caucasian populations, while natural menopause before the age of 40, or primary ovarian insufficiency (POI), occurs in 1% of the population [3]. Previous genome wide association studies (GWAS) identified 18 common genetic loci associated with ANM, implicating several plausible gene candidates across a number of molecular pathways [4,5]. Together those reported variants explained <5% of the variation in ANM, compared to 21% explained by all common variants on GWAS arrays [4]. We therefore undertook a more comprehensive genetic analysis in a substantially larger sample of nearly 70,000 women, incorporating both common and, for the first time, low-frequency coding variants. We were able to triple the number of independent signals associated with ANM, including two low frequency coding variants in previously unreported loci. Our findings provide new insights into the causal relationship between ANM with breast cancer and identify molecular overlaps between ANM and puberty timing.

Results

GWAS HapMap 2 meta-analysis

In a combined analysis of up to 69,360 women of European ancestry (Supplementary Table 1), 1,208 SNPs, among a total of ~2.6 million, reached the genome-wide significance threshold (P<5×10−8) for association with ANM. Of these, we identified 54 independent signals located in 44 genomic regions using approximate conditional analysis implemented in GCTA (Figure 1, Table 1, Supplementary Tables 2 and 3). Eight loci contained secondary signals: six loci each contained two signals, and two loci each contained three signals. Across the 54 identified signals, MAFs ranged from 7% to 49%, and effect sizes from 0.07 to 0.88 years per allele with no significant heterogeneity between studies. All of the 18 previously reported independent signals for ANM [4,5] retained directionally concordant genome-wide significance (maximum P=3.7×10−11). These 18 signals were also directionally concordant in a sub-meta-analysis of studies that were not included in the previous publication (P-value range 1×10−30 to 1×10−3). The top 29,958 independent SNPs with association P<0.05 explained 21% (SE 9.7%, P=0.01) of the variance in ANM reducing to 6% (SE 1.6%, P=6.3×10−12) for the top 54 SNPs with P<5×10−8 (Supplementary Table 4). This contrasts with an estimate of 2.6% for the previously identified 18 index SNPs.

Figure 1

Miami plot of HapMap and exome SNP associations. Log-transformed P values are shown for association with ANM for SNPs from HapMap 2 (top; pink) and SNPs from the meta-analysis of exome chip data (bottom; blue). Previously known signals are shown in gray, and newly discovered signals are shown in red (HapMap 2) or purple (exome chip and HapMap 2). The yellow lines correspond to genome-wide significant levels in each direction; the gray lines indicate where the y axis has been truncated.

Table 1

54 common HapMap 2 signals at 44 genomic loci

							Univariate Model[5]		Joint Model[6]
Region	Best SNP[1]	Signal SNP[2]	Chr	Position[3]	Alleles[4]	N	Effect	P	Effect	P	Highlighted Gene[7]
1*	rs4246511	rs4246511	1	39,152,972	c/t/0.71	69116	−0.22 (0.02)	5.1E-21	-	-	RHBDL2^(B,N) / MYCBP^(B)
2	rs12142240	rs12142240	1	46,519,888	t/c/0.68	69356	−0.13 (0.02)	6.6E-09	-	-	RAD54L^(B,E)
3	rs1411478	rs1411478	1	179,228,905	a/g/0.41	68680	−0.13 (0.02)	1.4E-10	-	-	STX6^(N,E)
4*	rs2236918	rs2236918	1	240,084,449	c/g/0.45	69332	−0.15 (0.02)	8.3E-14	-	-	EXO1^(N,B,C)
5*	rs704795	rs704795	2	27,569,998	a/g/0.4	69341	−0.16 (0.02)	2.1E-15	-	-	BRE^(B) / GTF3C2^(B,E) / EIFB4^(B)
6*	rs1800932	rs1800932	2	47,871,585	a/g/0.81	69309	−0.17 (0.03)	3.2E-11	-	-	MSH6^(N,B,E)
7*	rs930036	rs930036	2	171,649,264	a/g/0.38	69357	−0.19 (0.02)	3.1E-19	-	-	TLK1^(N,E,B) / GAD1^(B)
8	rs16858210	rs16858210	3	185,106,704	g/a/0.75	69193	−0.14 (0.02)	3.1E-09	-	-	PARL^(B) / POLR2H^(B)
9*	rs4693089	rs4693089	4	84,592,646	a/g/0.51	69060	−0.20 (0.02)	9.2E-23	-	-	HELQ^(N,B) / FAM175A^(B)
10	rs6856693	rs6856693	4	185,985,800	a/g/0.58	67635	−0.16 (0.02)	9.8E-15	-	-	ASCL1^(N), MLF1IP^(B)
11	rs427394	rs427394	5	6,798,875	g/a/0.41	69284	−0.13 (0.02)	3.8E-09	-	-	PAPD7^(N,B)
12	rs11738223	rs11738223	5	171,867,097	a/g/0.68	69250	−0.12 (0.02)	2.0E-08	-	-	SH3PXD2B^(N)
13a*	rs365132	rs2241584	5	175,888,783	a/g/0.38	69341	−0.14 (0.02)	1.5E-11	−0.14 (0.02)	3.2E-11	UIMC1^(B,E)
13b*	“	rs365132	5	176,311,180	g/t/0.51	69349	−0.24 (0.02)	1.4E-33	−0.24 (0.02)	7.9E-33	UIMC1^(N,B,E)
14a*	rs6899676	rs6899676	6	11,003,246	a/g/0.8	69303	−0.23 (0.03)	2.2E-19	−0.21 (0.03)	6.2E-16	SYCP2L^(N,B) / MAK^(B)
14b*	“	rs9393800	6	11,059,723	g/a/0.27	69124	−0.17 (0.02)	3.5E-13	−0.14 (0.02)	1.1E-09	SYCP2L^(N,B) / MAK^(B)
15a*	rs1046089	rs2230365	6	31,633,427	c/t/0.84	67095	−0.17 (0.03)	7.6E-10	−0.16 (0.03)	2.7E-08	MSH5^(B) / HLA^(B)
15b*	“	rs707938	6	31,837,338	g/a/0.32	68582	−0.17 (0.02)	7.2E-15	−0.16 (0.02)	2.3E-13	MSH5^(B,N,E) / HLA^(B)
16	rs12196873	rs12196873	6	111,704,751	a/c/0.85	69313	−0.16 (0.03)	2.8E-08	-	-	REV3L^(B,C)
17*	rs2720044	rs2720044	8	38,099,744	a/c/0.84	63917	−0.29 (0.03)	7.3E-22	-	-	STAR^(B)
18	rs10957156	rs10957156	8	61,791,955	a/g/0.76	69341	−0.14 (0.02)	4.5E-09	-	-	CHD7^(N,B,E)
19	rs4879656	rs4879656	9	33,002,382	a/c/0.37	68919	−0.12 (0.02)	2.0E-08	-	-	APTX^(N,B,E)
20	rs10905065	rs10905065	10	5,809,833	a/g/0.61	69334	−0.11 (0.02)	3.9E-08	-	-	FBXO18^(B)
21a*	rs11031006	rs11031006	11	30,183,104	g/a/0.85	69309	−0.22 (0.03)	8.5E-14	−0.25 (0.03)	4.0E-17	FSHB^(N,B)
21b*	“	rs6484478	11	30,263,016	g/a/0.74	69099	−0.10 (0.02)	4.0E-05	−0.14 (0.02)	1.0E-08	FSHB^(B)
22	rs10734411	rs10734411	11	32,498,360	a/g/0.47	69142	−0.12 (0.02)	2.6E-09	-	-	EIF3M^(N)
23*	rs2277339	rs2277339	12	55,432,336	g/t/0.1	67603	−0.31 (0.03)	1.8E-19	-	-	PRIM1^(B,N,C,E) / TAC3^(B)
24a	rs12371165	rs3741604	12	64,982,677	t/c/0.52	69100	−0.09 (0.02)	1.9E-05	−0.29 (0.03)	1.8E-21	HELB^(N,B,E,C)
24b	“	rs1183272	12	65,021,688	c/t/0.45	68727	−0.07 (0.02)	7.3E-04	−0.31 (0.03)	3.0E-24	HELB^(B,N,C)
24c	“	rs7397861	12	65,100,733	g/c/0.64	69095	−0.10 (0.02)	6.7E-06	−0.13 (0.02)	4.6E-09	HELB^(B,E,C)
25	rs551087	rs551087	12	119,693,576	g/a/0.29	69001	−0.13 (0.02)	3.9E-08	-	-	SPPL3^(N) / SRSF9^(B)
26	rs1727326	rs1727326	12	122,166,039	c/g/0.15	68870	−0.19 (0.03)	1.7E-09	-	-	KNTC1^(B), PITPNM2^(N)
27	rs12824058	rs12824058	12	129,370,287	g/a/0.43	69047	−0.14 (0.02)	6.1E-11	-	-	PIWIL1^(N)
28*	rs4886238	rs4886238	13	60,011,740	g/a/0.66	69314	−0.18 (0.02)	2.5E-16	-	-	TDRD3^(B,N)
29	rs1713460	rs1713460	14	20,003,455	g/a/0.3	68528	−0.14 (0.02)	2.4E-10	-	-	APEX1^(B) / PARP2^(B) / PNP^(N,E)
30	rs9796	rs9796	15	39,058,739	t/a/0.46	69317	−0.13 (0.02)	1.3E-10	-	-	INO80^(B,N,E) / RAD51^(B)
31*	rs1054875	rs1054875	15	87,680,130	t/a/0.4	69288	−0.19 (0.02)	1.7E-19	-	-	POLG^(B,N) / FANCI^(B,C)
32	rs9039	rs9039	16	9,112,864	c/t/0.28	69341	−0.12 (0.02)	3.3E-08	-	-	C16orf72^(N) / ABAT^(B)
33*	rs10852344	rs10852344	16	11,924,420	t/c/0.59	69346	−0.16 (0.02)	1.3E-15	-	-	GSPT1^(N,C,E) / BCAR4^(B)
34	rs12599106	rs12599106	16	34,355,526	a/t/0.51	69320	−0.12 (0.02)	3.1E-08	-	-	UBE2MP1^(N)
35	rs8070740	rs8070740	17	5,272,620	a/g/0.76	68515	−0.15 (0.02)	1.5E-09	-	-	RPAIN^(N,E)
36	rs2941505	rs2941505	17	35,086,230	a/g/0.32	69302	−0.13 (0.02)	1.9E-09	-	-	STARD3^(B) / PGAP3^(N,E) / CDK12^(B)
37	rs1799949	rs1799949	17	38,498,992	g/a/0.68	69329	−0.14 (0.02)	8.4E-11	-	-	BRCA1^(N,E,B,C)
38	rs349306	rs349306	19	901,694	g/a/0.13	58278	−0.23 (0.04)	1.7E-10	-	-	POLR2E^(B) / KISS1R^(B)
39	rs7259376	rs7259376	19	22,299,545	a/g/0.46	69328	−0.11 (0.02)	4.2E-08	-	-	ZNF729^(N)
40a*	rs11668344	rs11668344	19	60,525,476	g/a/0.36	69329	−0.41 (0.02)	5.5E-85	−0.41 (0.02)	4.2E-84	BRSK1^(B,E) / NLRP11^(N) / U2AF2^(B)
40b*	“	rs2547274	19	61,002,040	g/c/0.91	66580	−0.28 (0.04)	3.4E-13	−0.22 (0.04)	2.7E-08	BRSK1^(B) / NLRP11^(N) / U2AF2^(B)
40c*	“	rs12461110	19	61,012,475	a/g/0.35	68518	−0.17 (0.02)	7.6E-16	−0.15 (0.02)	5.0E-12	BRSK1^(B) / NLRP11^(N,C) / U2AF2^(B)
41a*	rs16991615	rs451417	20	5,889,999	a/c/0.12	65420	−0.20 (0.03)	4.6E-09	−0.2 (0.03)	4.5E-09	MCM8^(N,C,B)
41b*	“	rs16991615	20	5,896,227	g/a/0.93	66210	−0.88 (0.04)	1.6E-89	−0.88 (0.04)	4.4E-89	MCM8^(N,C,B)
42a	rs13040088	rs2236553	20	60,760,188	c/t/0.24	62648	−0.16 (0.03)	6.1E-10	−0.16 (0.03)	4.4E-10	SLCO4A1^(N,C) / DIDO1^(B,E)
42b	“	rs13040088	20	61,019,647	g/a/0.21	69317	−0.16 (0.02)	2.4E-10	−0.16 (0.02)	1.9E-10	SLCO4A1^(C) / DIDO1^(N,B,E)
43	rs5762534	rs5762534	22	26,963,571	t/c/0.84	69322	−0.16 (0.03)	6.1E-09	-	-	CHEK2^(B)
44	rs763121	rs763121	22	37,209,886	g/a/0.36	66632	−0.16 (0.02)	2.3E-13	-	-	DMC1^(B) / DDX17^(N,E,B)

Best regional SNP selected by 1Mb distance based clumping,

Lead independent SNP(s) in region selected through approximate conditional analysis,

Position in build 36,

Effect allele / other allele / effect allele frequency,

Univariate test statistics reported from the primary meta-analysis (i.e no conditional analysis).

Test statistics derived from the joint model for regions containing more than one statistically independent SNP,

Highlighted gene in region based on following criteria: (N) = Nearest, (B) = Biological Candidate, (E) = eQTL effect, (C) non-synonymous SNP in high LD. Genes categorised as “DDR” are shown in bold.

denotes a region previously described at genome-wide significance.

We assessed functional enrichment of all ANM-SNP associations in regions containing active histone marks across 10 physiological cell-type groups using stratified LD score regression [6] (see Methods and Supplementary Table 5). Only the ‘kidney related cell types’ group showed significant enrichment (P=0.003), which could reflect the mesonephric embryonic origin of ovarian parenchymal cells [7]. Analysis by functional annotation revealed the strongest enrichment for variants located in UCSC defined coding regions (Supplementary Table 5), with ~1.5% of SNPs explaining 24.8% of the trait heritability (P=4.6×10−3). The heritable component increased to 55% (SE 11%, P=2.9×10−7) when a 500bp window was added to the coding regions, capturing ~6.5% of SNPs.

Exome array meta-analysis

To estimate the contribution of low-frequency coding variation to ANM, we performed a meta-analysis of up to 39,026 women genotyped on exome arrays (Supplementary Table 6). Only one signal, from two highly correlated (r2=0.73, D’=1) low-frequency missense variants in HELB, reached genome-wide significance in this discovery phase (Table 2, Figure 1, Supplementary Table 7). Ten low-frequency (MAF<5%), non-synonymous SNPs with association P<5×10−4 were selected for follow-up in an independent sample of 10,157 women from the deCODE study that imputed rare variant genotypes. Directionally concordant effect estimates were observed for 6/8 variants (2 of the 10 failed QC). The combined analysis identified missense alleles in HELB (rs75770066, MAF= 3.6%, beta = 0.85 year/allele, P=1.2×10−31) and SLCO4A1 (rs140267842, MAF= 0.8%, beta=0.79, P=1.6×10−8) as associated with ANM (Table 2, Supplementary Table 7 and Supplementary Figure 1).

Table 2

Results of the exome chip meta-analyses

SNP	Band	Gene	Aminoacidchange	Minor/commonallele	Analysis	MAF(%)	Effect (SE) ofminor allele inyears	p-value	n	Heterogeneityp-value
rs75770066	12q14.3	HELB	p.Asp506Gly	G/A	Discovery	3.6	0.91 (0.08)	1.79E-32	39,026
					Replication	1.7	0.32 (0.24)	0.171	10,157
					Combined	3.4	0.85 (0.07)	1.17E-31	49,183	0.050

rs148126992	12q14.3	HELB	p.Glu522Asp	C/G	Discovery	2.5	1.03 (0.09)	2.96E-30	38,707
					Replication	0.1	2.16 (1.75)	0.216	10,157
					Combined	2.5	1.04 (0.09)	1.69E-30	48,864	0.116

rs140267842	20q13.33	SLCO4A1	p.Val263Ile	A/G	Discovery	0.8	0.80 (0.16)	5.58E-07	39,026
					Replication	1.2	0.73 (0.28)	8.60E-03	10,157
					Combined	0.9	0.79 (0.14)	1.60E-08	49,183	0.241

Notes:

Amino acid change is from the amino acid coded by the common allele to the amino acid coded by the minor allele.

Significant p-values are in bold.

HELB is a DNA helicase that unwinds DNA during replication, transcription, repair and recombination. SLCO4A1 (solute carrier organic anion transporter family, member 4A1) transports organic anions such as thyroid hormones and estrone-3-sulfate. Both exome array signals in HELB and SLCO4A1 were located in ANM loci newly identified by our parallel HapMap2 GWAS meta-analysis. At HELB the association of the common index SNP, rs12371165, was fully explained by associations at the two rare exome chip SNPs, which are in high LD with each other (r2=0.73, D’=1) (Figure 2). In contrast, the three independent signal SNPs identified through GCTA were not explained by the rare variant(s) (Supplementary Table 8). It thus appears there are at least two non-redundant signals at this locus and future fine-mapping experiments will be required to fully elucidate the number of independent causal variants. Functional studies have shown that substitution of aspartate by a non-polar residue at amino acid 506 of HELB affects binding of HELB to Replication Protein A (RPA) [8]. At SLC04A1, all three variants (the common index SNP, second signal from GCTA and the exomechip variant) appeared to reflect non-redundant signals, such that the association of each with ANM was unaffected by the presence of either of the others (Supplementary Table 8).

Figure 2

Multiple signals at HELB and relationship to DNA helicase B protein sequence. Positions are given in Build 37 coordinates of the reference genome. The top signal from the exome chip analysis maps to an acidic motif of DNA helicase B and results in the replacement of an acidic aspartate residue by a nonpolar glycine residue. Concurrent alteration of three acidic amino acids, (including the aspartate residue identified by the exome chip analysis) to nonpolar residues has been shown to reduce RPA binding (8)

ANM SNPs strongly enriched in DNA damage-response pathways

Pathway analyses using MAGENTA and GRAIL indicated substantial enrichment of GWAS SNP associations in DNA damage response (DDR) pathways (Supplementary Tables 9 and 10). Seven of the 10 ANM pathways identified by MAGENTA at study-wise significance were involved in DDR, with the highest enrichment in the PANTHER defined ‘DNA Repair Pathway’ (P=1×10−6). After annotating likely causal genes at each locus, we found that 29 of the 44 GWAS highlighted regions contained one or more DDR genes within 500kb (Table 1). At 18 of these 29 regions, the DDR candidate was either the nearest gene or the signal was associated with expression of a DDR gene at the locus. The top SNP at GWAS Signal #37 (Table 1) was highly correlated (r2>0.95) with four common non-synonymous variants in BRCA1 [rs1799966, rs16942, rs16941, rs799917], none of which is listed in HGMD ( as a known breast cancer susceptibility variant and all of which are listed as “not clinically important” on the Breast Information Core http://research.nhgri.nih.gov/bic/. In our exome array data, no low frequency coding variants in BRCA1 were associated with ANM (P>0.05). Signal #37 was an eQTL for BRCA1 in multiple tissues, including: blood, skin, adipose and brain (Supplementary Table 11). There were 15 ANM signal genes that STRING analysis identified as having at least one direct link to BRCA1 (Supplementary Table 12, Supplementary Figure 2). Of these, there is experimental evidence that 7 code for direct binding partners of BRCA1: BRE (Signal #5), MSH6 (Signal #6), POLR2H (Signal #8), FAM175A (Signal #9), UIMC1 (Signal #13), RAD51 (Signal #30), and CHEK2 (Signal #43). While many of the DDR genes highlighted are involved in homologous recombination for repair of double strand breaks, such as the BRCA1 pathway, other mechanisms of repair are also represented, eg. mismatch repair (MSH5, MSH6) and base excision repair (APEX1, PARP2) (Figure 3). Two genes act as DNA damage checkpoints (CHEK2 and BRSK1), others are involved in the cellular response to damage, such as cell cycle arrest, DNA replication, transcription control and apoptosis (Figure 3). CHEK2 is a well-known breast cancer associated gene [9], but the ANM-associated signal was not in LD with the 1100delC variant associated with breast cancer (r2<0.01).

Figure 3

Classification of genes identified as being involved in the DNA damage response, at genetic loci associated with ANM.

ANM SNPs enriched in known POI genes

In addition to the DDR pathways, MAGENTA analyses also identified a four-fold enrichment of ANM GWAS SNP associations located in/near a set of 31 genes reportedly associated with monogenic primary ovarian insufficiency (Supplementary Tables 13 &14). Four of our genome-wide significant hits were located in or near reported POI genes. Autosomal recessive mutations in MCM8 cause primary amenorrhea, hypothyroidism, and hypergonadotropic hypogonadism [10]. Recessive mutations in EIF2B4 (signal #5) cause ovarioleukodystrophy with vanishing white matter syndrome [11]. POLG (signal #31) mutations have been linked to POI in isolation or associated with other neurologic conditions [12]. Mutations in MSH5 (Signals #15a and #15b) have been associated with various human diseases including POI [13]. In addition, TDRD3 (Signal #28) is a primary binding partner of FMR1 in which triplet repeat premutations are a risk factor for POI [14]. We saw no significant enrichment of ANM signals in our wider panel of ovarian function genes (Supplementary Tables 13 and 15).

Genetic correlation of ANM with other traits/diseases

We searched the GRASP database [15] and NHGRI catalogue (http://www.genome.gov/gwastudies/) for pleiotropy between ANM signals and proxies (r2>0.5) with other GWAS traits (Supplementary Table 16). The top overlapping signals were for liver enzymes, lipids, urate, height and fasting glucose (p=<10−10 for association of ANM SNP/proxy and second trait). We found no overlap with any autoimmune traits and only a very weak link with any cancer (upper airway tract cancer, p=1×10−8). To test the relationship between ANM and other health outcomes more broadly, we performed cross-trait LD score regression to estimate genetic correlation with 53 published GWAS meta-analyses (Supplementary Table 17). Adult obesity ranked highest in this analysis with a negative trait correlation (rg=−0.15, P=0.0004) with supporting evidence from other growth/anthropometric traits including age at menarche (rg=0.14, P=0.003), BMI (rg=−0.13, P=0.003), BMI in women but not men (P=0.002 vs 0.17), waist circumference in women but not men (P=0.009 vs 0.29) and WHR in men but not women (P=0.03 vs 0.27). Other nominally significant associations include HDL (rg=0.14, P=0.02) and current/former smoking status (rg=0.20, P=0.04) both of which are supported by epidemiological observations [16]. To elucidate the causal directions between these traits, we performed bi-directional Mendelian randomisation (MR) analyses on ANM with both age at menarche and BMI. We were unable to resolve the causal direction with BMI (BMI to ANM: Pscore= 0.668 (Supplementary Table 18); ANM to BMI: PBinomial=0.683, (Supplementary Table 19). However the 123 reported menarche SNPs collectively predicted ANM in the expected direction (Pscore=0.0005, Supplementary Table 20), but the ANM SNP score was not associated with age at menarche (Pscore = 0.571, Supplementary Table 21). We further explored the nature of this shared genetic architecture by testing for enrichment of all ANM-associated SNPs in/near genes implicated in monogenic or polygenic puberty timing [17]. Significant enrichment was found with the monogenic set (P=0.01), underscored by ANM-associated SNPs in/near five genes reportedly causal for hypogonadotrophic hypogonadism (KISS1R, TAC3, CHD7, SOX10 and FGFR1) (Supplementary Table 22).

ANM variants demonstrate causal link with breast cancer

Given the overwhelming enrichment of DDR genes and known epidemiological associations between ANM and breast cancer risk [18], we tested the causal relationships between these traits using a Mendelian Randomization approach [19]. Across the 56 ANM SNPs (54 HapMap 2 + 2 exome) there was a positive correlation between the effect sizes on ANM and the effect sizes for risk (logORs) of breast cancer (in 46,347 breast cancer cases and 41,736 controls from Breast Cancer Association Consortium (BCAC); r=0.67, P=2.25×10−8). A polygenic risk score comprising numbers of ANM-increasing alleles at the 56 SNPs, weighted by the effect size on ANM, was positively associated with breast cancer risk; each one-year older genetically predicted ANM was associated with a OR=1.064 higher breast cancer risk (1.050-1.081), P=2.78×10−14 (Supplementary Figure 3). This effect size is larger than that reported by the largest pooled analysis of observational epidemiological studies (OR=1.030 (1.026-1.034)) [18]. All of the women in the GWAS from the BCAC study were also included in the Mendelian randomization (MR) study (N=14884, ~14% of total MR study). To confirm that this overlap did not bias our results we conducted two analyses. Firstly, a sensitivity analysis tested the effect on breast cancer of 18 previously identified ANM SNPs, which were identified from a meta-analysis that did not include BCAC cases, and a similar effect estimate was observed (OR 1.062 [1.033-1.101, P=1.58×10−7]) Secondly, the reverse analysis tested 63 SNPs with independent robust associations with breast cancer [20], and found no association between these breast cancer signals and ANM (Pscore >0.05), which reduces the likelihood of case-ascertainment bias in our discovery meta-analysis (Supplementary Table 23). Stratified analyses revealed significantly larger effect estimates for the ANM risk score in ER positive vs ER negative breast cancer cases (OR=1.07 (1.05-1.10) P=1.73×10−12 vs OR=1.03 (1.00-1.07) P=0.043; P=0.0086 for the case-only analysis) and women aged >=55 vs <=45 years (OR=1.06 (1.04-1.10) P=2.23×10-7 vs OR=1.00 (0.97-1.05) P=0.95, case-only P=2.30×10−5). Consideration of DDR vs non-DDR linked SNPs in the polygenic risk score also produced discordant effect estimates (OR 1.05 [1.03-1.08], p=1.06×10−7 vs OR 1.12 [1.06-1.21], P=7.84×10−10 respectively, Phet=0.01), a difference which was further reinforced in the age stratified analyses (Supplementary Figure 3 and Table 24). Furthermore, lack of association between ANM risk scores with risk of prostate cancer in men (in 25,074 cases and 24,272 controls) (P=0.36, Supplementary Table 25) provides no evidence to support an effect of ANM-related DDR mechanisms on other cancer risks. We therefore surmise that ANM genetic variants influence breast cancer risk primarily through variation in menopause timing.

Discussion

Our study represents a largely expanded genetic discovery effort for ANM, both in terms of increased sample size and breadth of variation tested. By more than doubling the GWAS sample size we have increased the number of loci robustly associated with the trait three-fold. In addition, we assessed the role of low-frequency protein coding variation using exome genotyping arrays. This approach identified the first such variants of large effect for ANM, implicating both HELB and SLC04A1 in the aetiology of reproductive ageing. Both of these regions contain common variants we identified in parallel, producing “synthetic associations” at the HELB locus [21]. Our analyses suggest a far more substantial role for DNA damage response processes in ovarian ageing than originally estimated. Both manual assessment and formal computational approaches identified an overwhelming excess of DDR genes mapping to the 44 GWAS loci, possibly explaining up to ~2/3rds of the associations. Despite the limitations of our GWAS approach to map definitively SNPs to genes, 19/44 loci contained signal SNPs where plausible DDR candidates were either the closest gene or linked via altered expression levels to the associated variant. This level of enrichment is comparable to that observed in GWAS meta-analyses of several cancers [22,23]. A notable inclusion in our list of DDR annotated genes was BRCA1, which was the nearest gene, linked as an eQTL and contained multiple non-synonymous SNPs in high LD with the lead index SNP. Although rare loss of function alleles are well studied in the context of cancer pre-disposition, coding variants in BRCA1 are generally regarded as neutral and have not been previously mapped to any complex trait or disease, including breast cancer. Titus et al have shown that BRCA1 expression decreases in human ovaries with age and that reduced brca1 expression in mouse models leads to reduced ovarian reserve [24]. This is consistent with our data, where the ANM-lowering allele reduces expression in blood. BRCA1 directly inhibits a functional interaction with oestrogen receptor α and thus BRCA1 variants could also affect ANM through altered oestrogen signalling [25]. Of the 34 DDR genes highlighted in Table 1, 15 have experimental links to BRCA1, three of which form part of the BRCA1-A complex; BRE (BRCC45), FAM175A (Abraxas) and UIMC1 (RAP80). While dispensable for BRCA1’s major tumour suppressive role in promoting DNA double-strand break repair by homologous recombination (HR), the BRCA1-A complex components RAP80 and Abraxas are actually involved in counteracting this activity, restricting BRCA1-dependent HR to appropriate levels [26]. Similarly, the DNA helicase Fbh1 (FBXO18; Signal #20) negatively regulates HR [27,28]. While HR is essential for cell viability, such anti-recombinase activities are also important for maintaining genome stability, and failure of this regulation is associated with inappropriate recombination events, and the accumulation of toxic recombination intermediates, DNA repair activities associated with driving translocations, loss-of-heterozygosity, and chromosomal abnormalities [29]. Double strand break repair is an important response to metabolic and environmental damage to DNA, but is also a key process in meiosis for resolving recombination events. Aberrant meiotic recombination is known to cause meiotic arrest and affect the viability of oocytes. Menopause occurs when the number of oocytes in the ovary falls below a threshold number (approx. 1000) and thus processes that affect the size of the oocyte pool will affect timing of menopause. Recent studies have shown that recessive mutations in both MCM8 and MCM9 results in genomic instability, caused by a deficiency in double strand break repair, which has a devastating effect on the oocyte pool, causing POI [10,30]. MCM8 is one of the genes highlighted in our study (signal #41) and a further 12 are also involved in homologous recombination repair, including two which are specific for meiotic repair (MSH5 and DMC1 (DNA meiotic recombinase 1)). Thus double strand break repair, during recombination, at meiosis, appears to be a major mechanism by which oocyte numbers are regulated, thus determining depletion of the oocyte pool and ANM. In this study, however, the repair mechanisms highlighted are not confined to homologous recombination repair; mismatch repair and base excision repair are also implicated, as well as mitotic repair and repair checkpoints. Thus it appears that the mechanisms are not confined to repair of meiotic cross-overs, but more general mechanisms are also involved. Seven million oogonia are produced during fetal development by mitosis. Inefficient repair of DNA damage during these mitotic events could result in apoptosis and thus a reduction in the initial oocyte pool. Loss of oocytes throughout female life is predominantly by atresia rather than ovulation. It is likely that oocytes are particularly sensitive to DNA damage due to the prolonged state of cell cycle arrest, lasting up to 50-60 years. Thus aberrant repair throughout life could affect the rate of atresia and thus ANM. Several of the genes highlighted in our study are robust cancer predisposition genes, eg. BRCA1, CHEK2 and MSH6. Additionally BCAR4 and STARD3 have also been linked with breast cancer predisposition. However common susceptibility variants have not been mapped to any of these genes through GWAS approaches for any cancer [www.genome.gov/gwastudies/]. Patients with known pathogenic BRCA1 breast cancer predisposition mutations, have been reported to have lower ANM [31], although other studies have failed to replicate these findings [32]. We found that carrying higher numbers of ANM-increasing variants was associated with increased breast cancer risk. This was consistent with (indeed slightly larger than) the observed epidemiological association. Our Mendelian randomization approach indicates a causal relationship between ANM and breast cancer risk, with prolonged oestrogen and/or progesterone exposure likely to be the mechanism [33]. Consistent with this, the effect size was greater for ER-positive than ER-negative breast cancer. At first sight, this observation might appear paradoxical given the enrichment of DDR genes associated with menopause. However, we noted that the association between ANM variants and breast cancer risk was weaker for those in/near DDR genes than those in the non-DDR set. This raises the possibility that the DDR variants that reduce menopausal age do modestly increase breast cancer risk, but this is counterbalanced by the larger effect due to altered hormonal exposure. Alternatively, it is possible that variants in the non-DDR set may have a residual effect on breast cancer risk through hormonal or other mechanisms, or that both mechanisms could play a role (supplementary Figure 4). BRCA1 mutations are known to be risk factors for prostate cancer [34] and yet we found no association with prostate cancer predisposition for the ANM variants, supporting the hypothesis that the breast cancer association is mediated via menopause and not a direct effect of the DDR variants. That the effect of the ANM polygenic risk score on breast cancer risk was larger than that predicted from observation studies might indicate measurement error in the reporting of age at menopause or residual negative confounding in epidemiological studies; in either case, the Mendelian Randomisation analysis performed here using the polygenic risk score as an instrumental variable can give a more accurate estimate of the effect of age at menopause on breast cancer risk. Such measurement error would also be present in studies in the ANM GWAS from which the polygenic risk score weights were derived, hence the ‘true’ effect of later menopause on breast cancer risk may actually be larger even than the ~6% increase in risk/year predicted here. Our findings provide novel evidence for a neural influence on the timing of ovarian follicular ageing. Until now, it has been considered that hypothalamic/pituitary activity in relation to the menopause is simply secondary to the loss of feedback inhibition by ovarian hormones [35]. We identified five ANM loci containing genes reported causal for hypogonadotrophic hypogonadism. Of these, monogenic disruption of three (CHD7, FGFR1 and SOX10) are causes of Kallman syndrome, characterized by anosmic hypogonadotrophic hypogonadism due to failure of embryonic migration of GNRH secreting neurons from the olfactory bulb to the hypothalamus [36]. In addition, KISS1R (GPR54) encodes the receptor for kisspeptin, a key hypothalamic activator of the reproductive hormone axis, and TAC3 encodes neurokinin B, which is highly expressed in hypothalamic neurons that also express kisspeptin and promotes the pulse frequency of luteinising hormone (LH) secretion from the pituitary. A possible central influence on ovarian ageing is also supported by the ANM locus in/near FSHB (which is reportedly also associated with circulating FSH levels). Alternatively, recent studies have identified expression of TAC3, KISS1R and kisspeptin in ovarian granulosa cells [37], suggesting peripheral actions of these neuropeptides and their receptors [38]. Indeed, GPR54-haploinsufficiency in mice leads to progressive oocyte and follicle loss without affecting gonadotropin secretion [38]. Regardless of their site of action, our findings indicate several mechanisms that could link the regulation of puberty to ANM, and therefore impact both the start and end of the female reproductive lifespan. In summary, our findings suggest a surprisingly narrow range of biological pathways governing ANM, highlighting a substantial role for DNA damage response in the aetiology of ovarian ageing. We demonstrate the utility of genetics to inform epidemiological observations, revealing shared biological pathways linking puberty timing, breast cancer and reproductive ageing.

Online Methods

Menopause data collection

ANM was self-reported and defined as the age at last naturally occurring menstrual period followed by at least 12 consecutive months of amenorrhea. Recall bias/error for ANM may have reduced our power to detect associations, but would be unlikely to introduce systematic error. We assessed this issue in our previous meta-analysis and found no significant differences in effect estimates when considering retrospective versus prospective studies [4]. We included women with ANM 40–60 years in our analyses, excluding those with menopause induced by hysterectomy, bilateral ovariectomy, radiation or chemotherapy, and those using hormone replacement therapy (HRT) before menopause (Supplementary Table 1). Within each of the included studies, each participant provided written informed consent and the study protocol was approved by the Institutional Review Board at the parent institution.

GWAS

A total of 33 studies contributed genome-wide association data using self-reported ANM (Supplementary Table 1). One of the 33 studies was from the Breast Cancer Association Consortium (BCAC), comprising 17 separate studies with menopause data, genotyped using an Illumina iSelect array (iCOGs) [20]. This resulted in a maximum total sample of 69,360 individuals of European descent. Studies were asked to use the full imputed set of HapMap Phase 2 autosomal SNPs, and to run an additive model including top principal components and study specific covariates. In some cases, studies submitted data using 1000 Genomes based imputation; in these cases SNPs not included in the HapMap 2 set were removed. Once data were submitted, each study was quality controlled centrally according to standard QC protocols independently by two analysts. SNPs were filtered out if the minor allele frequency (MAF) was less than 1%, or if the imputation quality metrics were low (imputation quality<0.4). Studies and SNPs passing QC were combined using an inverse-variance weighted meta-analysis, implemented using METAL [39]. Again, this meta-analysis was run by two analysts independently, who then separately used PLINK clumping commands [40] to identify the most significant SNPs in associated regions (termed “Index SNPs”), using only those SNPs which had data from more than 50% of the studies. SNPs were considered genome-wide significant if p<5×10−8 (p of 0.05 Bonferroni corrected for a million tests). Comparisons were made to ensure concordance of the identified signals between the two independent analysts.

Exome chip

Exome genotyping data were analysed for 22 studies of European ancestry, with questionnaire data on ANM (Supplementary Table 6). Genotype calling was performed using the CHARGE (Cohorts for Heart and Aging Research in Genomic Epidemiology) joint calling protocol, including X chromosome variants. Each contributing study carried out study-level analysis in the R-packages skatMeta or seqMeta using the skatCohort command with the top genetic principal components included in the model and alleles coded according to a common reference file (SNPInfo_HumanExome-12v1_rev5.tsv.txt from http://www.chargeconsortium.com/main/exomechip) [41]. Following data submission, two data analysts carried out checks to ensure consistency of allele coding. We carried out a single variant meta-analysis in METAL [39], giving a total sample size of 39,026, with associations considered significant if p<5×10−8. Variants were put forward for replication in the deCODE study (n=10,157) if they were present in more than half of studies in the discovery stage and had p<5×10−5 if MAF was less than 1% or p<5×10−4 if MAF was 1–5%.

Selection of independent signals / conditional analysis

Independent signals (termed “Signal SNPs”) for ANM were identified using approximate conditional analysis implemented in the GCTA software package [42]. Linkage disequilibrium (LD) between variants was estimated using three independently genotyped studies as reference panels - the Rotterdam Study I (N=5,974) and two EPIC-InterAct datasets (N=7,397 and N=9,294); these comprised males and females of European ancestry with GWAS data imputed using CEU haplotypes from HapMap 2. We assumed zero correlation between SNPs more than 10 Mb apart or on different chromosomes. We considered independent signals to be those observed by at least two of the three LD reference panels and located in a 10 Mb region that contained a genome-wide significant SNP based on univariate test statistics. We assessed the independence between exome array and HapMap 2 signals by performing formal conditional analyses in the Women’s Genome Health Study (WGHS, N=11,664). Regression was performed including all significant index SNPs in additive models, including the same study covariates as used in the primary analysis. LD computation in Haploview [43]used experimental genotypes where possible (the rare exome chip variants and the common variants rs3741604 and rs2236553), but HapMap 2 imputed genotypes for the other common variants (MaCH v. 1.0.16, all Rsq >0.99).

Gene identification

At each locus identified by the GWAS meta-analysis, we annotated the likely causative gene(s) (Supplementary Table 3) using the following criteria: identified in at least one of the gene prioritisation/pathway programs (GRAIL or STRING), the top SNP or a proxy (r2>0.8) was an eQTL in one of 108 tissues, the top SNP or a proxy (r2>0.8) was a coding variant (Supplementary tables 9-12, 26, 27, Supplementary Figure 5). In case of overlap between the results of the GWAS and exome analyses, the gene indicated by the exome array analysis was chosen. Further manual annotation was used to select additional likely candidates based on known biology (e.g monogenic primary ovarian insufficiency) or biology highlighted by hypothesis-free pathway testing (Supplementary Table 15). If no candidate was identified by these methods the nearest gene was chosen. GRAIL is a literature based text mining program used to suggest the mostly likely casual gene at each locus [44], controlling for gene size and without any seed regions. A GRAIL p-value < 0.05 was taken to indicate a suggested causal gene (Supplementary Table 9). All genes located within 500kb of the top SNP at each locus were assessed using the STRING program (http://string-db.org/), which was used to highlight any connectivity between genes in different regions (Supplementary Table 12).

Expression quantitative trait loci (eQTL)

Each independent SNP signal was assessed in over 100 separate eQTL datasets (Supplementary methods and Table 11 for details [45]). If an independent signal SNP was in high LD (r2>0.8; using SNAP http://www.broadinstitute.org/mpg/snap/) with the most significant signal for an eQTL, then the eQTL gene was highlighted as a potential causal candidate. The collected eQTL results met criteria for statistical thresholds for association with gene transcript levels as described in the original papers.

Pathway identification

We tested for signal enrichment across 2,580 pre-defined biological pathways in GO, KEGG, Ingenuity, Panther, Reactome and Biocarta using MAGENTA [46] using the full HapMap Phase 2 imputed meta-analysis (Supplementary Table 10). Analysis was performed using the same default settings as described in our previous paper [4], with study-wise significance declared at an FDR<0.05. In addition to these pre-defined pathways, we also tested four custom pathways comprised of genes involved in POI (N=31), ovarian function (N=130), monogenic disorders of puberty (N=21) and age at menarche (N=154) (Supplementary Tables 13-15, 22).

Estimating variance explained by SNP sets

An estimate of the total variance explained by highlighted ANM SNPs was calculated using REML (restricted maximum likelihood) implemented in GCTA [42]. Using individual level data from the EPIC-InterAct cohort (N=1,761), we calculated the attributable variance for the genome-wide significant SNPs and at varying significance thresholds (5 × 10−7, 5 × 10−6, 5 × 10−5, 5 × 10−4, 0.005, 0.05, and all SNPs passing QC) obtained from a repeated meta-analysis excluding EPIC-InterAct. We used stratified LD score regression to quantify evidence of functional enrichment specific to groups of cell types [6]. We used the same baseline model as in Finucane et al. [6] which comprises 53 overlapping categories including basic annotations such as coding, UTR, promoter, and intron, as well as several histone marks, DNase I Hypersensitivity Site (DHS) regions, chromHMM predictions [47], regions that are conserved in mammals [48], super enhancers [49], and FANTOM5 enhancers [50]. We evaluated enrichments for each of these non-cell-type specific categories. We then took 230 cell-type-specific annotations in four histone marks-H3K4me1, H3K4me3, H3K9ac [51] and H3K27ac [52] (Supplementary Table 5), and grouped them into 10 cell-type groups, (adrenal/pancreas; central nervous system; cardiovascular; connective/bone; gastrointestinal; immune/hematopoietic; kidney; liver; skeletal muscle; other) [6]. We added each cell-type group to the baseline model one at a time and measured the p-value of the resulting LD Score regression coefficient of the cell-type group using the −h2 flag in ldsc (https://github.com/bulik/ldsc) with LD Scores from 1000G Genomes Europeans [http://www.1000genomes.org/]. We ranked the cell-type groups by whether the per-SNP heritability in the ‘functional’ annotation was larger than the per-SNP heritability outside this annotation, controlling for the other annotations in the baseline model.

Breast and prostate cancer Mendelian Randomisation (MR)

To assess the association of the ANM SNPs with breast cancer risk, we used breast cancer cases (n=46,347) and controls (n=41,736) of European ancestry from 41 studies in the BCAC, who had been genotyped using a custom Illumina Infinium array (iCOGS). Following standard quality control exclusions (as described in [20]) genotypes were available for 199,961 SNPs. Further genotypes were imputed in a two-stage procedure using SHAPEIT and IMPUTEv2 [53] with the 1000 Genomes Project March 2012 release as the reference dataset [54], giving ~11.6 million SNPs with imputation r2>0.3 and MAF>0.005. The 4,747 breast cancer cases and 7,285 controls in the BCAC dataset for whom ANM information was available had also been included in the ANM GWAS analysis. The genotypes or imputed genotype dosages for the 56 significant SNPs in Tables 1 and 2 were used to construct a polygenic risk score for each breast cancer case and control, such that for the ith woman where βj is the ANM regression coefficient for the effect allele of the jth SNP (conditional βs were used for the correlated SNPs) and Gij is the number of copies of the effect allele at the jth SNP carried by the ith woman (Gij is between 0 and 2). The association between the polygenic risk score and breast cancer was tested using unconditional logistic regression, adjusting for study and for seven principal components (as estimated based on a subset of 37,000 uncorrelated markers including ~1000 selected as ancestry informative markers). The log(OR) was scaled according to the effect size of a one-unit increase in polygenic risk score on ANM in control subjects, so as to obtain an estimated logOR for a one-year increase in genetically predicted ANM. Hence the polygenic risk score can be thought of as an instrumental variable in a Mendelian Randomisation of ANM against breast cancer. Additional analyses were conducted specifically for estrogen receptor (ER) positive (N=27,026) or ER negative (N=7,401) cases, and for participants with age at diagnosis (for cases) or interview (for controls) ≤45 years (8,547 cases and 8,029 controls) or ≥55 years (24,841 cases and 20,410 controls)(as a surrogate for pre- or post-menopausal age at diagnosis, because ANM was not known for all participants), with heterogeneity evaluated in case-only analyses. We also tested the association of ANM SNPs on prostate cancer risk, to determine whether any effect of genetic variants was specific to breast cancer. Prostate cancer data were available from a similar sample size to breast cancer and there is known overlap in genetic risk for breast and prostate cancer. Individual level data was not available for prostate cancer, we therefore assessed the impact of ANM using an approximated allele score comprised of the 54 HapMap2 GWAS SNPs on summary level results [55]. The score was assessed using summary statistics from a recent prostate cancer meta-analysis, comprising 25,074 cases and 24,272 controls from 32 studies in the PRACTICAL Consortium [56], genotyped using the iCOGs array, with quality control and imputation carried out in the same way as for the BCAC iCOGs study.

Genetic correlation with additional traits

Cross-trait LD score regression was used to estimate the genetic correlation between menopause timing and 54 individual traits from published studies including anthropometric and metabolic traits [57]. We estimated genetic correlations with the method described in [58] and the --rg flag in the ldsc software package (https://github.com/bulik/ldsc) with LD Scores from 1000 Genomes Europeans and default settings. Briefly, this method regresses the product of effect size estimates for trait 1 and trait 2 for each SNP against LD Score. The product of the slope and a constant estimates the genetic correlation, and the intercept estimates the product of the number of overlapping samples and the correlation between phenotypes among the overlapping samples. Bi-directional Mendelian randomisation analyses on ANM with age at menarche and BMI were carried out using similar methods as for prostate cancer, with a weighted allele score [55] generated from summary statistics. Information on the associations with age at menarche came from the most recent genome-wide association study for the trait (N=182,416 women from 57 studies) [17]. The BMI data were taken from the most recent analysis (N=249,796 from 64 studies) [59]. While it was possible to calculate a full allele score for the genome-wide significant BMI SNPs to ANM analysis, this was not possible for the ANM SNPs to BMI analysis; instead a binomial test of consistency of effect direction was used.

59 in total

1. Genetics of reproductive lifespan.

Authors: Patricia Hartge
Journal: Nat Genet Date: 2009-06 Impact factor: 38.330

Review 2. Approach to the patient with hypogonadotropic hypogonadism.

Authors: Letícia Ferreira Gontijo Silveira; Ana Claudia Latronico
Journal: J Clin Endocrinol Metab Date: 2013-05 Impact factor: 5.958

3. MCM9 mutations are associated with ovarian failure, short stature, and chromosomal instability.

Authors: Michelle A Wood-Trageser; Fatih Gurbuz; Svetlana A Yatsenko; Elizabeth P Jeffries; L Damla Kotan; Urvashi Surti; Deborah M Ketterer; Jelena Matic; Jacqueline Chipkin; Huaiyang Jiang; Michael A Trakselis; A Kemal Topaloglu; Aleksandar Rajkovic
Journal: Am J Hum Genet Date: 2014-12-04 Impact factor: 11.025

4. Super-enhancers in the control of cell identity and disease.

Authors: Denes Hnisz; Brian J Abraham; Tong Ihn Lee; Ashley Lau; Violaine Saint-André; Alla A Sigova; Heather A Hoke; Richard A Young
Journal: Cell Date: 2013-10-10 Impact factor: 41.582

5. Ovarian failure related to eukaryotic initiation factor 2B mutations.

Authors: Anne Fogli; Diana Rodriguez; Eléonore Eymard-Pierre; Françoise Bouhour; Pierre Labauge; Brandon F Meaney; Susan Zeesman; Christine R Kaneski; Raphael Schiffmann; Odile Boespflug-Tanguy
Journal: Am J Hum Genet Date: 2003-04-21 Impact factor: 11.025

6. Tdrd3 is a novel stress granule-associated protein interacting with the Fragile-X syndrome protein FMRP.

Authors: Bastian Linder; Oliver Plöttner; Matthias Kroiss; Enno Hartmann; Bernhard Laggerbauer; Gunter Meister; Eva Keidel; Utz Fischer
Journal: Hum Mol Genet Date: 2008-07-28 Impact factor: 6.150

7. Common inherited variation in mitochondrial genes is not enriched for associations with type 2 diabetes or related glycemic traits.

Authors: Ayellet V Segrè; Leif Groop; Vamsi K Mootha; Mark J Daly; David Altshuler
Journal: PLoS Genet Date: 2010-08-12 Impact factor: 5.917

8. Genetic variants in novel pathways influence blood pressure and cardiovascular disease risk.

Authors: Georg B Ehret; Patricia B Munroe; Kenneth M Rice; Murielle Bochud; Andrew D Johnson; Daniel I Chasman; Albert V Smith; Martin D Tobin; Germaine C Verwoert; Shih-Jen Hwang; Vasyl Pihur; Peter Vollenweider; Paul F O'Reilly; Najaf Amin; Jennifer L Bragg-Gresham; Alexander Teumer; Nicole L Glazer; Lenore Launer; Jing Hua Zhao; Yurii Aulchenko; Simon Heath; Siim Sõber; Afshin Parsa; Jian'an Luan; Pankaj Arora; Abbas Dehghan; Feng Zhang; Gavin Lucas; Andrew A Hicks; Anne U Jackson; John F Peden; Toshiko Tanaka; Sarah H Wild; Igor Rudan; Wilmar Igl; Yuri Milaneschi; Alex N Parker; Cristiano Fava; John C Chambers; Ervin R Fox; Meena Kumari; Min Jin Go; Pim van der Harst; Wen Hong Linda Kao; Marketa Sjögren; D G Vinay; Myriam Alexander; Yasuharu Tabara; Sue Shaw-Hawkins; Peter H Whincup; Yongmei Liu; Gang Shi; Johanna Kuusisto; Bamidele Tayo; Mark Seielstad; Xueling Sim; Khanh-Dung Hoang Nguyen; Terho Lehtimäki; Giuseppe Matullo; Ying Wu; Tom R Gaunt; N Charlotte Onland-Moret; Matthew N Cooper; Carl G P Platou; Elin Org; Rebecca Hardy; Santosh Dahgam; Jutta Palmen; Veronique Vitart; Peter S Braund; Tatiana Kuznetsova; Cuno S P M Uiterwaal; Adebowale Adeyemo; Walter Palmas; Harry Campbell; Barbara Ludwig; Maciej Tomaszewski; Ioanna Tzoulaki; Nicholette D Palmer; Thor Aspelund; Melissa Garcia; Yen-Pei C Chang; Jeffrey R O'Connell; Nanette I Steinle; Diederick E Grobbee; Dan E Arking; Sharon L Kardia; Alanna C Morrison; Dena Hernandez; Samer Najjar; Wendy L McArdle; David Hadley; Morris J Brown; John M Connell; Aroon D Hingorani; Ian N M Day; Debbie A Lawlor; John P Beilby; Robert W Lawrence; Robert Clarke; Jemma C Hopewell; Halit Ongen; Albert W Dreisbach; Yali Li; J Hunter Young; Joshua C Bis; Mika Kähönen; Jorma Viikari; Linda S Adair; Nanette R Lee; Ming-Huei Chen; Matthias Olden; Cristian Pattaro; Judith A Hoffman Bolton; Anna Köttgen; Sven Bergmann; Vincent Mooser; Nish Chaturvedi; Timothy M Frayling; Muhammad Islam; Tazeen H Jafar; Jeanette Erdmann; Smita R Kulkarni; Stefan R Bornstein; Jürgen Grässler; Leif Groop; Benjamin F Voight; Johannes Kettunen; Philip Howard; Andrew Taylor; Simonetta Guarrera; Fulvio Ricceri; Valur Emilsson; Andrew Plump; Inês Barroso; Kay-Tee Khaw; Alan B Weder; Steven C Hunt; Yan V Sun; Richard N Bergman; Francis S Collins; Lori L Bonnycastle; Laura J Scott; Heather M Stringham; Leena Peltonen; Markus Perola; Erkki Vartiainen; Stefan-Martin Brand; Jan A Staessen; Thomas J Wang; Paul R Burton; Maria Soler Artigas; Yanbin Dong; Harold Snieder; Xiaoling Wang; Haidong Zhu; Kurt K Lohman; Megan E Rudock; Susan R Heckbert; Nicholas L Smith; Kerri L Wiggins; Ayo Doumatey; Daniel Shriner; Gudrun Veldre; Margus Viigimaa; Sanjay Kinra; Dorairaj Prabhakaran; Vikal Tripathy; Carl D Langefeld; Annika Rosengren; Dag S Thelle; Anna Maria Corsi; Andrew Singleton; Terrence Forrester; Gina Hilton; Colin A McKenzie; Tunde Salako; Naoharu Iwai; Yoshikuni Kita; Toshio Ogihara; Takayoshi Ohkubo; Tomonori Okamura; Hirotsugu Ueshima; Satoshi Umemura; Susana Eyheramendy; Thomas Meitinger; H-Erich Wichmann; Yoon Shin Cho; Hyung-Lae Kim; Jong-Young Lee; James Scott; Joban S Sehmi; Weihua Zhang; Bo Hedblad; Peter Nilsson; George Davey Smith; Andrew Wong; Narisu Narisu; Alena Stančáková; Leslie J Raffel; Jie Yao; Sekar Kathiresan; Christopher J O'Donnell; Stephen M Schwartz; M Arfan Ikram; W T Longstreth; Thomas H Mosley; Sudha Seshadri; Nick R G Shrine; Louise V Wain; Mario A Morken; Amy J Swift; Jaana Laitinen; Inga Prokopenko; Paavo Zitting; Jackie A Cooper; Steve E Humphries; John Danesh; Asif Rasheed; Anuj Goel; Anders Hamsten; Hugh Watkins; Stephan J L Bakker; Wiek H van Gilst; Charles S Janipalli; K Radha Mani; Chittaranjan S Yajnik; Albert Hofman; Francesco U S Mattace-Raso; Ben A Oostra; Ayse Demirkan; Aaron Isaacs; Fernando Rivadeneira; Edward G Lakatta; Marco Orru; Angelo Scuteri; Mika Ala-Korpela; Antti J Kangas; Leo-Pekka Lyytikäinen; Pasi Soininen; Taru Tukiainen; Peter Würtz; Rick Twee-Hee Ong; Marcus Dörr; Heyo K Kroemer; Uwe Völker; Henry Völzke; Pilar Galan; Serge Hercberg; Mark Lathrop; Diana Zelenika; Panos Deloukas; Massimo Mangino; Tim D Spector; Guangju Zhai; James F Meschia; Michael A Nalls; Pankaj Sharma; Janos Terzic; M V Kranthi Kumar; Matthew Denniff; Ewa Zukowska-Szczechowska; Lynne E Wagenknecht; F Gerald R Fowkes; Fadi J Charchar; Peter E H Schwarz; Caroline Hayward; Xiuqing Guo; Charles Rotimi; Michiel L Bots; Eva Brand; Nilesh J Samani; Ozren Polasek; Philippa J Talmud; Fredrik Nyberg; Diana Kuh; Maris Laan; Kristian Hveem; Lyle J Palmer; Yvonne T van der Schouw; Juan P Casas; Karen L Mohlke; Paolo Vineis; Olli Raitakari; Santhi K Ganesh; Tien Y Wong; E Shyong Tai; Richard S Cooper; Markku Laakso; Dabeeru C Rao; Tamara B Harris; Richard W Morris; Anna F Dominiczak; Mika Kivimaki; Michael G Marmot; Tetsuro Miki; Danish Saleheen; Giriraj R Chandak; Josef Coresh; Gerjan Navis; Veikko Salomaa; Bok-Ghee Han; Xiaofeng Zhu; Jaspal S Kooner; Olle Melander; Paul M Ridker; Stefania Bandinelli; Ulf B Gyllensten; Alan F Wright; James F Wilson; Luigi Ferrucci; Martin Farrall; Jaakko Tuomilehto; Peter P Pramstaller; Roberto Elosua; Nicole Soranzo; Eric J G Sijbrands; David Altshuler; Ruth J F Loos; Alan R Shuldiner; Christian Gieger; Pierre Meneton; Andre G Uitterlinden; Nicholas J Wareham; Vilmundur Gudnason; Jerome I Rotter; Rainer Rettig; Manuela Uda; David P Strachan; Jacqueline C M Witteman; Anna-Liisa Hartikainen; Jacques S Beckmann; Eric Boerwinkle; Ramachandran S Vasan; Michael Boehnke; Martin G Larson; Marjo-Riitta Järvelin; Bruce M Psaty; Gonçalo R Abecasis; Aravinda Chakravarti; Paul Elliott; Cornelia M van Duijn; Christopher Newton-Cheh; Daniel Levy; Mark J Caulfield; Toby Johnson
Journal: Nature Date: 2011-09-11 Impact factor: 49.962

9. An integrated map of genetic variation from 1,092 human genomes.

Authors: Goncalo R Abecasis; Adam Auton; Lisa D Brooks; Mark A DePristo; Richard M Durbin; Robert E Handsaker; Hyun Min Kang; Gabor T Marth; Gil A McVean
Journal: Nature Date: 2012-11-01 Impact factor: 49.962

10. Integrative annotation of chromatin elements from ENCODE data.

Authors: Michael M Hoffman; Jason Ernst; Steven P Wilder; Anshul Kundaje; Robert S Harris; Max Libbrecht; Belinda Giardine; Paul M Ellenbogen; Jeffrey A Bilmes; Ewan Birney; Ross C Hardison; Ian Dunham; Manolis Kellis; William Stafford Noble
Journal: Nucleic Acids Res Date: 2012-12-05 Impact factor: 16.971

149 in total

1. COMT and Alpha-Tocopherol Effects in Cancer Prevention: Gene-Supplement Interactions in Two Randomized Clinical Trials.

Authors: Kathryn T Hall; Julie E Buring; Kenneth J Mukamal; M Vinayaga Moorthy; Peter M Wayne; Ted J Kaptchuk; Elisabeth M Battinelli; Paul M Ridker; Howard D Sesso; Stephanie J Weinstein; Demetrius Albanes; Nancy R Cook; Daniel I Chasman
Journal: J Natl Cancer Inst Date: 2019-07-01 Impact factor: 13.506

2. deTS: tissue-specific enrichment analysis to decode tissue specificity.

Authors: Guangsheng Pei; Yulin Dai; Zhongming Zhao; Peilin Jia
Journal: Bioinformatics Date: 2019-10-01 Impact factor: 6.937

3. Characterization of intermediate-sized insertions using whole-genome sequencing data and analysis of their functional impact on gene expression.

Authors: Saeideh Ashouri; Jing Hao Wong; Hidewaki Nakagawa; Mihoko Shimada; Katsushi Tokunaga; Akihiro Fujimoto
Journal: Hum Genet Date: 2021-05-12 Impact factor: 4.132

4. Antimüllerian hormone levels are lower in BRCA2 mutation carriers.

Authors: Lauren Johnson; Mary D Sammel; Susan Domchek; Allison Schanne; Maureen Prewitt; Clarisa Gracia
Journal: Fertil Steril Date: 2017-05 Impact factor: 7.329

5. Phenotype-Specific Enrichment of Mendelian Disorder Genes near GWAS Regions across 62 Complex Traits.

Authors: Malika Kumar Freund; Kathryn S Burch; Huwenbo Shi; Nicholas Mancuso; Gleb Kichaev; Kristina M Garske; David Z Pan; Zong Miao; Karen L Mohlke; Markku Laakso; Päivi Pajukanta; Bogdan Pasaniuc; Valerie A Arboleda
Journal: Am J Hum Genet Date: 2018-10-04 Impact factor: 11.025

6. Transcriptome-Wide Association Study Identifies Susceptibility Loci and Genes for Age at Natural Menopause.

Authors: Jiajun Shi; Lang Wu; Bingshan Li; Yingchang Lu; Xingyi Guo; Qiuyin Cai; Jirong Long; Wanqing Wen; Wei Zheng; Xiao-Ou Shu
Journal: Reprod Sci Date: 2018-05-30 Impact factor: 3.060

7. Genome-wide association study of anti-Müllerian hormone levels in pre-menopausal women of late reproductive age and relationship with genetic determinants of reproductive lifespan.

Authors: Katherine S Ruth; Ana Luiza G Soares; Maria-Carolina Borges; A Heather Eliassen; Susan E Hankinson; Michael E Jones; Peter Kraft; Hazel B Nichols; Dale P Sandler; Minouk J Schoemaker; Jack A Taylor; Anne Zeleniuch-Jacquotte; Deborah A Lawlor; Anthony J Swerdlow; Anna Murray
Journal: Hum Mol Genet Date: 2019-04-15 Impact factor: 6.150

8. Association Between Genetically Proxied Inhibition of HMG-CoA Reductase and Epithelial Ovarian Cancer.

Authors: James Yarmolinsky; Caroline J Bull; Emma E Vincent; Jamie Robinson; Axel Walther; George Davey Smith; Sarah J Lewis; Caroline L Relton; Richard M Martin
Journal: JAMA Date: 2020-02-18 Impact factor: 56.272

9. Genetic predictors of chemotherapy-related amenorrhea in women with breast cancer.

Authors: Kathryn J Ruddy; Daniel J Schaid; Ann H Partridge; Nicholas B Larson; Anthony Batzler; Lothar Häberle; Ralf Dittrich; Peter Widschwendter; Visnja Fink; Emanuel Bauer; Judith Schwitulla; Matthias Rübner; Arif B Ekici; Viktoria Aivazova-Fuchs; Elizabeth A Stewart; Matthias W Beckmann; Elizabeth Ginsburg; Liewei Wang; Richard M Weinshilboum; Fergus J Couch; Wolfgang Janni; Brigitte Rack; Celine Vachon; Peter A Fasching
Journal: Fertil Steril Date: 2019-07-29 Impact factor: 7.329

10. Comparison of treatment of early-stage breast cancer among Nurses' Health Study participants and other Medicare beneficiaries.

Authors: Andrea M Austin; Nirav S Kapadia; Gabriel A Brooks; Tracy L Onega; A Heather Eliassen; Rulla M Tamimi; Michelle Holmes; Qianfei Wang; Francine Grodstein; Anna N A Tosteson
Journal: Breast Cancer Res Treat Date: 2019-01-03 Impact factor: 4.872