Literature DB >> 34791234

Leveraging Northern European population history: novel low-frequency variants for polycystic ovary syndrome.

Jaakko S Tyrmi1,2,3, Riikka K Arffman4, Natàlia Pujol-Gualdo4,5, Venla Kurra6, Laure Morin-Papunen4, Eeva Sliz1,2,3, Terhi T Piltonen4, Triin Laisk5, Johannes Kettunen1,2,3,7, Hannele Laivuori8,9,10.   

Abstract

STUDY QUESTION: Can we identify novel variants associated with polycystic ovary syndrome (PCOS) by leveraging the unique population history of Northern Europe? SUMMARY ANSWER: We identified three novel genome-wide significant associations with PCOS, with two putative independent causal variants in the checkpoint kinase 2 (CHEK2) gene and a third in myosin X (MYO10). WHAT IS KNOWN ALREADY: PCOS is a common, complex disorder with unknown aetiology. While previous genome-wide association studies (GWAS) have mapped several loci associated with PCOS, the analysis of populations with unique population history and genetic makeup has the potential to uncover new low-frequency variants with larger effects. STUDY DESIGN, SIZE, DURATION: A population-based case-control GWAS was carried out. PARTICIPANTS/MATERIALS, SETTING,
METHODS: We identified PCOS cases from national registers by ICD codes (ICD-10 E28.2, ICD-9 256.4, or ICD-8 256.90), and all remaining women were considered controls. We then conducted a three-stage case-control GWAS: in the discovery phase, we had a total of 797 cases and 140 558 controls from the FinnGen study. For validation, we used an independent dataset from the Estonian Biobank, including 2812 cases and 89 230 controls. Finally, we performed a joint meta-analysis of 3609 cases and 229 788 controls from both cohorts. Additionally, we reran the association analyses including BMI as a covariate, with 2169 cases and 160 321 controls from both cohorts. MAIN RESULTS AND THE ROLE OF CHANCE: Two out of the three novel genome-wide significant variants associating with PCOS, rs145598156 (P = 3.6×10-8, odds ratio (OR) = 3.01 [2.02-4.50] minor allele frequency (MAF) = 0.005) and rs182075939 (P = 1.9×10-16, OR = 1.69 [1.49-1.91], MAF = 0.04), were found to be enriched in the Finnish and Estonian populations and are tightly linked to a deletion c.1100delC (r2 = 0.95) and a missense I157T (r2 = 0.83) in CHEK2. The third novel association is a common variant near MYO10 (rs9312937, P = 1.7 × 10-8, OR = 1.16 [1.10-1.23], MAF = 0.44). We also replicated four previous reported associations near the genes Erb-B2 Receptor Tyrosine Kinase 4 (ERBB4), DENN Domain Containing 1A (DENND1A), FSH Subunit Beta (FSHB) and Zinc Finger And BTB Domain Containing 16 (ZBTB16). When adding BMI as a covariate only one of the novel variants remained genome-wide significant in the meta-analysis (the EstBB lead signal in CHEK2 rs182075939, P = 1.9×10-16, OR = 1.74 [1.5-2.01]) possibly owing to reduced sample size. LARGE SCALE DATA: The age- and BMI-adjusted GWAS meta-analysis summary statistics are available for download from the GWAS Catalog with accession numbers GCST90044902 and GCST90044903. LIMITATIONS, REASONS FOR CAUTION: The main limitation was the low prevalence of PCOS in registers; however, the ones with the diagnosis most likely represent the most severe cases. Also, BMI data were not available for all (63% for FinnGen, 76% for EstBB), and the biobank setting limited the accessibility of PCOS phenotypes and laboratory values. WIDER IMPLICATIONS OF THE
FINDINGS: This study encourages the use of isolated populations to perform genetic association studies for the identification of rare variants contributing to the genetic landscape of complex diseases such as PCOS. STUDY FUNDING/COMPETING INTEREST(S): This work has received funding from the European Union's Horizon 2020 research and innovation programme under the MATER Marie Skłodowska-Curie grant agreement No. 813707 (N.P.-G., T.L., T.P.), the Estonian Research Council grant (PRG687, T.L.), the Academy of Finland grants 315921 (T.P.), 321763 (T.P.), 297338 (J.K.), 307247 (J.K.), 344695 (H.L.), Novo Nordisk Foundation grant NNF17OC0026062 (J.K.), the Sigrid Juselius Foundation project grants (T.L., J.K., T.P.), Finska Läkaresällskapet (H.L.) and Jane and Aatos Erkko Foundation (H.L.). The funders had no role in study design, data collection and analysis, publishing or preparation of the manuscript. The authors declare no conflicts of interest.
© The Author(s) 2021. Published by Oxford University Press on behalf of European Society of Human Reproduction and Embryology.

Entities:  

Keywords:  checkpoint kinase 2; genome-wide association study; myosin X; polycystic ovary syndrome; rare variants

Mesh:

Substances:

Year:  2022        PMID: 34791234      PMCID: PMC8804330          DOI: 10.1093/humrep/deab250

Source DB:  PubMed          Journal:  Hum Reprod        ISSN: 0268-1161            Impact factor:   6.918


Introduction

Polycystic ovary syndrome (PCOS) is a common, multifaceted endocrine disorder. The international evidence-based guideline recommends using the Rotterdam criteria for PCOS diagnosis, requiring the presence of at least two of the following symptoms: oligo- or anovulation, clinical or biochemical hyperandrogenism, or polycystic ovaries seen in ultrasound, after exclusion of related disorders (Teede et al., 2018). The criteria result in a prevalence as high as 18% for PCOS among fertile-aged women and produce several phenotypes (March ; Skiba ). PCOS is the most common cause for anovulatory infertility, caused by disrupted follicle development owing to dysregulation of the hypothalamus–pituitary axis. This results in follicle arrest and an increase in the number of antral follicles in the ovaries, as well as a 2- to 3-fold increase in levels of anti-Müllerian hormone (AMH) (Silva and Giacobini, 2021). Ovulatory dysfunction often subsides with age; however, women with PCOS still display higher AMH and later onset of menopause (Piltonen ; Li ; de Ziegler ; Minooee ; Forslund ). In addition to the reproductive features, PCOS is also characterized by metabolic disturbances such as obesity, insulin resistance and dyslipidemia (Ollila ; Lim ; Barber and Franks, 2021). Women with PCOS also have an increased risk for endometrial cancer; however, the majority of studies do not indicate a higher susceptibility to other types of cancer (Dumesic and Lobo, 2013; Barry ; Gottschau ; Ding ). Despite the high prevalence of the syndrome, the origins of PCOS remain unknown. Considering the complex nature of the syndrome, it is likely that both genetic and environmental factors contribute to its development (Abbott ; Koivuaho ; Moghetti and Tosi, 2021). Notably, the heritability of PCOS is estimated to be around 70% (Vink ; Risal ). To elucidate the genetic architecture of PCOS, several genome-wide association studies (GWAS) and meta-analysis studies have been conducted, identifying over 20 susceptibility loci for PCOS (Chen ; Shi ; Day , 2018; Hayes ; Lee ; Dapas ; Hong ; Zhang ). The identified loci indicate roles in PCOS for gonadotrophin signalling, folliculogenesis, epithelial growth factor signalling, DNA repair and structure, cell cycle and proliferation, and androgen biosynthesis. However, these common genetic variants explain only around 10% of the heritability (Azziz, 2016). Thus, it has been suggested that rare variants with larger effect sizes may contribute to the heritability of PCOS (Dapas and Dunaif, 2020). Nevertheless, the identification of these may be difficult in data sets with large genetic variations. The value of studying genetic isolates, such as the Finnish population, has been accepted for decades (Martin ). Such populations provide an excellent opportunity to facilitate the discovery of rare variants with larger effects and characterize the genetic basis of complex diseases such as PCOS. The Finnish population originates from a small founder population with several bottleneck events over centuries, followed by genetic drift. These events have led to an enrichment of many low-frequency variants almost absent in most European populations (1000 Genomes Project Consortium et al., 2012; Nelis ; Locke ). Replication of association results may be difficult when studying isolated populations, but for Finns, the genetically closest Estonian population provides a natural comparison (Nelis ; Tambets ). In this study, we first utilized genome-wide association analyses and data from the FinnGen project and the Estonian Biobank (EstBB) to detect novel PCOS-associated variants in these population isolates. Furthermore, as several studies suggest a causal role for obesity in PCOS (Legro, 2012; Brower ; Zhao ), we examined the influence of BMI on the detected associations with PCOS. As a result, we unravelled two rare, population-enriched variants located in the checkpoint kinase 2 (CHEK2) gene and described one novel variant in the intron of the myosin X (MYO10) gene. Additionally, we replicated the previously reported associations for Erb-B2 receptor tyrosine kinase 4 (ERBB4), DENN domain containing 1A (DENND1A), FSH subunit beta (FSHB) and zinc finger and BTB domain containing 16 (ZBTB16).

Materials and methods

This study is reported according to the Strengthening the Reporting of Genetic Association Studies (STREGA) guideline.

Study cohorts

FinnGen

The FinnGen study combines genotype data from the Finnish biobanks with the digital health record data from the Care Register for Health Care (CRCH, from 1968 onwards) and the cancer (1953–), cause of death (1969–), and medication reimbursement (1995–) registries (https://www.finngen.fi/en). FinnGen data freeze release 6 (R6) combines the genomic information of 141 355 women (6% of the female Finnish population). In FinnGen, cases of PCOS were defined as women with a record of the following International Classification of Diseases (ICD)-10 code E28.2, ICD-9 code 256.4, or ICD-8 code 256.90. Controls were all women without a PCOS diagnosis, and no other exclusions were made. With this definition, there were 797 cases and 140 558 controls. Patients and control subjects in FinnGen provided informed consent for biobank research based on the Finnish Biobank Act. Alternatively, older research cohorts, collected prior to the start of FinnGen (in August 2017), were collected based on study-specific consents and later transferred to the Finnish biobanks after approval by the National Supervisory Authority for Welfare and Health, Fimea. Recruitment procedures followed the biobank protocols approved by Fimea. The Coordinating Ethics Committee of the Hospital District of Helsinki and Uusimaa (HUS) approved the FinnGen study protocol (Nr HUS/990/2017). The FinnGen study was approved by Finnish Institute for Health and Welfare (permit numbers: THL/2031/6.02.00/2017, THL/1101/5.05.00/2017, THL/341/6.02.00/2018, THL/2222/6.02.00/2018, THL/283/6.02.00/2019, THL/1721/5.05.00/2019, THL/1524/5.05.00/2020 and THL/2364/14.02/2020); Digital and population data service agency (permit numbers: VRK43431/2017-3, VRK/6909/2018-3, VRK/4415/2019-3); the Social Insurance Institution (permit numbers: KELA 58/522/2017, KELA 131/522/2018, KELA 70/522/2019, KELA 98/522/2019, KELA 138/522/2019, KELA 2/522/2020, KELA 16/522/2020); and Statistics Finland (permit numbers: TK-53-1041-17 and TK-53-90-20). The Biobank access decisions for FinnGen samples and data utilized in the FinnGen data freeze R6 include: THL Biobank BB2017_55, BB2017_111, BB2018_19, BB_2018_34, BB_2018_67, BB2018_71, BB2019_7, BB2019_8, BB2019_26, BB2020_1, Finnish Red Cross Blood Service Biobank 7.12.2017, Helsinki Biobank HUS/359/2017, Auria Biobank AB17-5154, Biobank Borealis of Northern Finland_2017_1013, Biobank of Eastern Finland 1186/2018, Finnish Clinical Biobank Tampere MH0004, Central Finland Biobank 1-2017 and Terveystalo Biobank STB 2018001. A full list of FinnGen contributors can be found in Supplementary Data.

Estonian Biobank

The EstBB is a volunteer-based biobank with over 200 000 participants, currently including approximately 135 000 women (20% of the female Estonian population). The 150K data freeze was used for the analyses described in this paper. All biobank participants have signed a broad informed consent form. Individuals with PCOS were identified using the ICD-10 code E28.2, and all of the female biobank participants without this diagnosis served as controls. This included a total of 2812 cases and 89 230 controls. Information on the ICD codes was obtained via regular linking with the National Health Insurance Fund and other relevant databases (Leitsalu ). Analyses in the EstBB were carried out under ethical approval 1.1-12/624 from the Estonian Committee on Bioethics, and Human Research and data release N05 from the EstBB.

Genotyping and association analyses

Sample genotyping in FinnGen was performed using Illumina and Affymetrix arrays (Illumina Inc., San Diego, and Thermo Fisher Scientific, Santa Clara, CA, USA). Genotype calls were made using GenCall or zCall (Goldstein ) for Illumina and the AxiomGT1 algorithm for Affymetrix data. Genotypes with a Hardy–Weinberg Equilibrium (HWE) P-value below 1e-6, minor allele count <3, and genotyping success rate <98% were removed. Samples with ambiguous gender, those with high genotype missingness >5% and outliers in the population structure (>4 SD from the mean on the first two dimensions of principal component (PC) analysis) were omitted. Samples were pre-phased with Eagle 2.3.5 (Loh ) using 20 000 conditioning haplotypes. Genotypes were imputed with Beagle 4.1 using the SiSu v3 imputation reference panel, which consisted of 3775 individuals of Finnish ancestry with sequenced whole genomes. The post-imputation protocol is publicly available at https://dx.doi.org/10.17504/protocols.io.xbgfijw. Association analysis was performed using a generalized mixed model as implemented in SAIGE (Zhou ). Included adjustments were age, genotyping batches and the first 10 PCs. Formatting and preparation of the FinnGen association data for downstream analysis were managed with workflow management software STAPLER (Tyrmi, 2018). All EstBB participants were genotyped using Illumina GSAv1.0, GSAv2.0 and GSAv2.0_EST arrays at the Core Genotyping Lab of the Institute of Genomics, University of Tartu. Samples were genotyped and PLINK format files were created using Illumina GenomeStudio v2.0.4. Individuals were excluded from the analysis if their call rate was <95% or if their sex defined by heterozygosity of X chromosomes did not match their sex in the phenotype data. Before imputation, variants were filtered by call rate <95%, HWE P-value <1e-4 (autosomal variants only) and minor allele frequency <1%. Variant positions were updated to b37 and all variants were changed to be from the TOP strand using GSAMD-24v1-0_20011747_A1-b37.strand.RefAlt.zip files from the https://www.well.ox.ac.uk/~wrayner/strand/ webpage. Pre-phasing was conducted using Eagle v2.3 software (Loh ) (number of conditioning haplotypes Eagle2 uses when phasing each sample was set to: –Kpbwt = 20 000), and imputation was carried out using Beagle 4.1 with effective population size ne = 20 000. The population-specific imputation reference of 2297 whole-genome sequencing samples was used (Mitt ). Association analysis was carried out using SAIGE (v0.38) software to implement a mixed logistic regression model with a year of birth and 10 PCs as covariates in step I. A total of 2812 cases and 89 230 controls were included in the analyses.

Meta-analysis

In order to synchronize the build of the datasets, we lifted the FinnGen GWAS summary statistics over to hg37 build using UCSC liftOver (Kent ) before running the meta-analyses. METAL software was used to perform inverse-variance-weighted meta-analysis for FinnGen and EstBB GWAS results (Willer ). In total, 3609 cases and 229 788 controls were analyzed. High imputation quality markers (INFO score > 0.7) were kept from each study prior to the meta-analysis. A total of 24 157 216 markers were included in the analysis. Genome-wide significance was set to P < 5 × 10−8. The meta-analyses were conducted independently by two analysts and summary statistics were compared for consistency.

Functional annotation and gene prioritization

In order to identify plausible candidate genes, we used the FUMA platform (Watanabe ). FUMA uses GWAS summary statistics and performs extensive functional annotation and candidate gene mapping using positional, expression quantitative trait loci (eQTL) and chromatin interaction mapping in all genome-wide significant loci. Loci were defined by ±1000 kb of the top single nucleotide variant in the region. Gene-based analysis was also performed in this platform using MAGMA (de Leeuw ). We prioritized variants that were more likely to have a functional consequence, such as variants in high linkage disequilibrium (LD) (r2 > 0.6) with missense mutations or pathogenic variants. Secondly, we prioritized variants overlapping with regulatory marks, focusing on genes with modified expression or genes that showed chromatin interaction links with the variants. Furthermore, gene functions were examined in GenBank and UniProt portals. In addition, a literature search was performed for the genes of interest to gain further insight into the possible underlying molecular mechanisms. Genes showing relevant functions in relevant tissues or traits with similar PCOS pathophysiology were ultimately considered for gene candidate prioritization.

Colocalization analyses

We tested whether the GWAS signals colocalized with variants that affect gene expression using the following pipeline (https://github.com/eQTL-Catalogue/colocalisation) (Kerimov ). We compared our significant loci to all eQTL Catalogue RNA-Seq datasets containing QTLs for gene expression, exon expression, transcript usage and txrevise event usage; eQTL Catalogue microarray datasets containing QTLs for gene expression; and GTEx v7 datasets containing QTLs for gene expression (Kerimov ). We lifted the GWAS summary statistics over to the hg38 build to match the eQTL catalogue and convert the summary statistics to variant call format. For each genome-wide significant (P < 5×10−8) GWAS variant, we extracted the 1-Mb radius of its top hit from the QTL datasets. We then ran the colocalization analysis for those eQTL catalogue traits that had at least one cis-QTL within this region with P < 1×10−6. We considered two signals to colocalize if the posterior probability for a shared causal variant was 0.8 or higher.

Conditional analyses

Since considering most significant variants as the causal ones would lead to an underestimation of the total variance explained at each locus, we next performed conditional analyses, which were carried out similarly to the main association testing using SAIGE (Zhou ). This approach has been used to identify secondary association signals at a particular locus and involves association analysis conditioning on the primary associated variant at the locus to test for additional significantly associated variants (Yang ). We proceeded to test associations using a stepwise analysis, where markers were added to the model until no independent signals were identified.

Adjusting the GWAS for BMI

In the discovery dataset, an additional association analysis including BMI as a covariate was conducted with a total of 482 PCOS cases (60.5% of the original PCOS sample) and 91 631 controls from FinnGen (65.2% of the original control sample). Similarly, we ran an association analysis including BMI as a covariate for the validation dataset, which contained a total of 2137 PCOS cases (75% of the original PCOS sample size) and 68 690 controls from the EstBB (76.9% of the original control sample size). We then performed a second meta-analysis including the two GWAS adjusted for BMI from both cohorts. This analysis included 2619 cases and 160 321 controls, and a total of 24 461 102 genetic markers were analyzed.

Interaction analysis

We tested whether an interaction between c.1100delC mutation, obesity and PCOS could be detected, as such a phenomenon has been identified between the mutation carriers in invasive breast cancer (Greville-Heygate ). We fitted a logistic model where PCOS was the outcome, the lead variant genotype and BMI formed the interaction term, and the 10 first genetic PCs along with age were added as covariates. This analysis was performed with R version 4.0.5 (R Core Team, 2018).

Results

Discovery GWAS identified a rare novel association for PCOS in CHEK2

A discovery GWAS with 797 PCOS cases and 140 558 controls in the FinnGen study uncovered two loci close to ERBB4 and DENND1A that have previously been shown to be associated with PCOS. In addition, a previously unreported large effect association was found in chromosome 22 at 22q11 (Fig. 1A).
Figure 1.

Manhattan plot of the results from the age-adjusted genome-wide association studies. Genome-wide association studies (GWAS) from the Finnish dataset (A), GWAS from Estonian dataset (B) and joint GWAS meta-analysis of polycystic ovary syndrome (PCOS) (C). The novel gene candidates in the six genome-wide significant loci are highlighted in bold. The y axis represents −log(two-sided P-values) for the associations of variants with PCOS from meta-analysis, using an inverse-variance weighted fixed-effects model. The horizontal dashed line represents the threshold for genome-wide significance. ERBB4 (Erb-B2 Receptor Tyrosine Kinase 4); DENND1A (DENN Domain Containing 1A); FSHB (FSH Subunit Beta); ZBTB16 (Zinc Finger And BTB Domain); MYO10 (myosin X); CHEK2 (Checkpoint kinase 2).

Manhattan plot of the results from the age-adjusted genome-wide association studies. Genome-wide association studies (GWAS) from the Finnish dataset (A), GWAS from Estonian dataset (B) and joint GWAS meta-analysis of polycystic ovary syndrome (PCOS) (C). The novel gene candidates in the six genome-wide significant loci are highlighted in bold. The y axis represents −log(two-sided P-values) for the associations of variants with PCOS from meta-analysis, using an inverse-variance weighted fixed-effects model. The horizontal dashed line represents the threshold for genome-wide significance. ERBB4 (Erb-B2 Receptor Tyrosine Kinase 4); DENND1A (DENN Domain Containing 1A); FSHB (FSH Subunit Beta); ZBTB16 (Zinc Finger And BTB Domain); MYO10 (myosin X); CHEK2 (Checkpoint kinase 2). The lead variant rs145598156 (P = 1.7×10−11, odds ratio (OR) = 11.63 [5.69–23.77]) is located in an intronic region 11 kb from the transcription start site (TSS) of ZNFR3 (Table I and Fig. 2A). However, the tight LD spans an area of approximately 2 Mb surrounding the lead variant with many variants in high LD (Fig. 2A). Functional characterization of this locus revealed a frameshift variant, c.1100delC (rs555607708, P = 1.68×10−9, OR = 13.46 [5.68–31.89]) in CHEK2, with a high LD (r2 = 0.95) with the lead variant. Interestingly, the protein-truncating variant c.1100delC is enriched in the Finnish population (AF = 0.008) compared to the Estonian (AF = 0.003) and other European populations (AF = 0.002), according to the gnomAD database (Karczewski ). The analysis conditioned on c.1100delC resulted in no genome-wide significant associations in this locus, with a P-value of 3.29×10−4 for the lead variant rs145598156 (Fig. 2B).
Table I

Summary of association results of the genome-wide association meta-analysis of polycystic ovary syndrome.

SNPChr:BPCytobandEA/NEANearest geneCandidate geneCohortEAF (%)OR (95% CI)* P* OR (95% CI)** P**
rs7564590chr2: 2133879002q34T/C ERBB4 ERBB4 FinnGen34.501.43 (1.29–1.59)3.0×10−111.50 (1.31–1.72)4.4×10−09
EstBB35.871.12 (1.06–1.19)1.1×10−041.13 (1.05–1.20)2.5×10−04
Meta 35.56 1.19 (1.13–1.25) 4.8×10 11 1.19 (1.13–1.25)4.6×10−09
rs9312937 chr5: 16836005 5p15 C/T MYO10 MYO10 FinnGen42.00 1.15 (1.04–1.27) 6.8×10−031.06 (0.93–1.20)3.7×10−01
EstBB45.471.16 (1.10–1.23)7.7×10−071.18 (1.10–1.26)1.5×10−06
Meta 44.58 1.16 (1.10–1.22) 1.7×10081.15 (1.08–1.22)3.0×10−06
rs3945628chr9: 1265355539q33C/T DENND1A DENND1A FinnGen6.641.74 (1.42–2.15)1.5×10−071.68 (1.29–2.19)9.3×10−05
EstBB7.081.33 (1.19–1.48)2.7×10−071.32 (1.17–1.49)6.9×10−06
Meta 6.99 1.40 (1.27–1.55) 2.9×10121.38 (1.23–1.54)1.0×10−08
rs11031002chr11: 3021526111p14A/T FSHB FSHB FinnGen12.001.33 (1.14–1.56)2.3×10−041.31 (1.29–2.19)5.7×10−03
EstBB12.271.22 (1.12–1.32)5.8×10−061.16 (1.05–1.27)2.2×10−03
Meta 12.21 1.24 (1.15–1.34) 9.2×10091.18 (1.10–1.27)7.7×10−05
rs1672716chr11: 11395249711q23G/A ZBTB16 ZBTB16 FinnGen14.600.74 (0.64–0.86)5.2×10−050.78 (0.65–0.94)1.0×10−02
EstBB14.490.84 (0.77–0.91)1.7×10−050.81 (0.74–0.89)1.4×10−05
Meta 14.51 0.81 (0.76–0.87) 9.8×10 09 0.80 (0.74–0.87)4.7×10−07
rs145598156 chr22: 29416402 22q12 T/C ZNFR3 CHEK2 FinnGen0.7911.63 (5.69–23.77)1.7×10−1113.5 (5.35–34.38)4.5×10−08
EstBB0.371.68 (1.05–2.69)3.2×10−021.53 (0.90–2.61)1.1×10−01
Meta 0.52 3.01 (2.02–4.50) 3.6×10 08 2.61 (2.14–3.07)4.4×10−05
rs182075939 chr22: 29098376 22q12 G/A TTC28 CHEK2 FinnGen3.191.95 (1.44–2.65)1.8×10−052.18 (1.46–3.24)1.1×10−04
EstBB4.641.64 (1.43–1.88)1.3×10−121.68 (1.44–1.96)4.6×10−11
Meta 4.41 1.69 (1.49–1.91) 1.9×10 16 1.74 (1.5–2.01)4.9×10−14

Meta-analysis results of the genome-wide association studies from Estonian Biobank (EstBB) and FinnGen are shown in italics. Novel associations are underlined. Variant positions (BP) are according to GRCh37/hg19. ERBB4, Erb-B2 Receptor Tyrosine Kinase 4; DENND1A, DENN Domain Containing 1A; FSHB, FSH Subunit Beta; ZBTB16, Zinc Finger And BTB Domain; MYO10, myosin X; ZNFR3, Zinc And Ring Finger 3; CHEK2, Checkpoint kinase 2; TTC28, Tetratricopeptide Repeat Domain 28.

EA, effect allele; EAF, effect allele frequency; NEA, non-effect allele; OR, odds ratio; P, P-value; SNP, single-nucleotide polymorphism.

OR and P-values of age-adjusted results.

OR and P-values of age- and BMI-adjusted results.

Figure 2.

Regional plots before and after conditional analyses for lead variants in chromosome 22. FinnGen lead variant in locus 22q11 (A) along with conditional analysis results with frameshift variant (rs555607708) (B). Regional plot for the Estonian Biobank lead variant in the same locus 22q11 before and after conditional analysis with linked missense variant (rs17879961) are shown in (C) and (D). Regional plots were produced with R-package LocusZooms (https://github.com/Geeketics/LocusZooms/). r2 estimates were generated using LDstore (Benner ) with SiSu v3 project WGS data consisting of 3775 individuals with Finnish ancestry.

Regional plots before and after conditional analyses for lead variants in chromosome 22. FinnGen lead variant in locus 22q11 (A) along with conditional analysis results with frameshift variant (rs555607708) (B). Regional plot for the Estonian Biobank lead variant in the same locus 22q11 before and after conditional analysis with linked missense variant (rs17879961) are shown in (C) and (D). Regional plots were produced with R-package LocusZooms (https://github.com/Geeketics/LocusZooms/). r2 estimates were generated using LDstore (Benner ) with SiSu v3 project WGS data consisting of 3775 individuals with Finnish ancestry. Summary of association results of the genome-wide association meta-analysis of polycystic ovary syndrome. Meta-analysis results of the genome-wide association studies from Estonian Biobank (EstBB) and FinnGen are shown in italics. Novel associations are underlined. Variant positions (BP) are according to GRCh37/hg19. ERBB4, Erb-B2 Receptor Tyrosine Kinase 4; DENND1A, DENN Domain Containing 1A; FSHB, FSH Subunit Beta; ZBTB16, Zinc Finger And BTB Domain; MYO10, myosin X; ZNFR3, Zinc And Ring Finger 3; CHEK2, Checkpoint kinase 2; TTC28, Tetratricopeptide Repeat Domain 28. EA, effect allele; EAF, effect allele frequency; NEA, non-effect allele; OR, odds ratio; P, P-value; SNP, single-nucleotide polymorphism. OR and P-values of age-adjusted results. OR and P-values of age- and BMI-adjusted results. To investigate the influence of BMI on PCOS, we ran an additional association analysis, including BMI as a covariate. In this analysis, the FinnGen lead variant rs145598156 remained genome-wide significant (P = 4.5×10−8, OR = 13.5 [5.35–34.38]) (Table I and Supplementary Fig. S2). A recent study has suggested that patients with invasive breast cancer carrying the c.1100delC mutation are more likely to be obese, though this is not the case for the general population (Greville-Heygate ). Thus, when we tested for such an interaction between PCOS, c.1100delC, and obesity using a logit regression model, a P-value of 0.066 for c.1100delC-BMI interaction was obtained (OR 1.04, 95% CI 0.99–1.09).

Validation GWAS detected an independent association in CHEK2

A validation GWAS was performed in the EstBB, including 2812 cases and 89 230 controls. The validation also uncovered a genome-wide significant association (P = 1.3×10−12, OR = 1.64 [1.34–1.88]) in the 22q11 region. The lead variant rs182075939 was an intron variant located 22 kb from the TSS of TTC28 (Figs 1B and 2C). Functional annotation revealed a tightly linked missense variant rs17879961 (r2 = 0.83, P = 4.23×10−12), known as I157T, in CHEK2, which has been shown to alter CHEK2 ability to bind p53, BRCA1 (breast cancer gene 1) and Cdc25A proteins (Falck ,b). The EstBB lead variant rs182075939 presents a higher allele frequency in Estonians (AF = 0.048) compared to Finns (AF = 0.029) and other European populations (AF = 0.0025) according to gnomAD (Karczewski ). The analysis conditioned on I157T resulted in no genome-wide significant associations in this locus, with a P-value of 0.04 for the lead variant rs182075939 (Fig. 2D). When also adjusting the GWAS for BMI, the EstBB lead variant rs182075939 remained genome-wide significant (P = 4.6×10−11, OR = 1.68 [1.44–1.96]) (Table I and Supplementary Fig. S2). Interestingly, even though the association signals found in the EstBB and FinnGen data sets overlap with each other (Fig. 3), they seem to be part of independent haplotypes with an r2 value below 0.05 between the lead variants. The lead variant of FinnGen data had a P-value of 0.031 in the EstBB. The EstBB lead variant had a P-value of 1.8×10−5 in FinnGen (Table I).
Figure 3.

Checkpoint kinase 2 variants described. Independent FinnGen and Estonian Biobank (EstBB) GWAS associations overlapping the checkpoint kinase 2 (CHEK2) gene are plotted on a single LocusZooms figure. Genome-wide significant variants in FinnGen data are denoted with purple circles; Estonian Biobank-specific variants are not circled.

Checkpoint kinase 2 variants described. Independent FinnGen and Estonian Biobank (EstBB) GWAS associations overlapping the checkpoint kinase 2 (CHEK2) gene are plotted on a single LocusZooms figure. Genome-wide significant variants in FinnGen data are denoted with purple circles; Estonian Biobank-specific variants are not circled. We also tested if conditioning the discovery GWAS results with I157T and validation GWAS with c.1100delC would affect the significance of the lead variants. Conditioning the discovery analysis on I157T had a minimal effect on the genome-wide significant associations in this locus, with a P-value of 9.09×10−12 for the lead variant rs145598156. Similarly, when the validation GWAS in the EstBB was conditioned on c.1100delC, the P-value of the lead variant rs182075939 was only modestly affected (P = 9.16×10−13).

Meta-analysis confirmed and expanded novel associations with PCOS in CHEK2 and MYO10

A meta-analysis was performed for the FinnGen and EstBB GWAS incorporating a total of 3609 women with PCOS and 229 788 controls. In the meta-analysis, the FinnGen lead variant on chromosome 22 rs145598156 had a P-value of 3.6×10−8 with significant heterogeneity between cohorts (phet=9.58×10−6), while the EstBB lead variant rs182075939 showed a P-value of 1.9×10−16 in the meta-analysis results without significant heterogeneity between cohorts (phet = 0.3). When the FinnGen and EstBB results were conditioned for the c.1100delC and I157T variants and the results were meta-analyzed, there were no additional genome-wide significant signals in the CHEK2 locus. The meta-analysis also revealed three more variants associating with PCOS, in addition to the three detected in the FinnGen and EstBB GWAS separately (Table I and Supplementary Fig. S1). Two of the additional signals were in chromosome 11 and have been previously shown to be associated with PCOS: rs11031002 is located near FSHB, and rs1672716 is an intron variant of ZBTB16. The third new association peak in the meta-analysis (rs9312937, P = 1.7 × 10−8, OR = 1.16 [1.10–1.22], AF = 0.44) was a common variant in an intronic region of chromosome 5, located 100 kb from the TSS of the MYO10 gene, which to our knowledge has not previously been associated with PCOS. A total of two potentially causal genes were suggested by chromatin interaction data from 21 different tissues/cell types, with MYO10 being the closest one, while no significant eQTL associations were detected using FUMA (Watanabe ) in this locus. The average effect sizes of the novel alleles described in chromosome 22 (OR = 1.69–3.01) (Table I) were higher than the effects observed for alleles associated with PCOS in the rest of the common variants described (OR = 1.06–1.40), which could be explained by the often-observed inverse relationship between allele frequency and effect size (Manolio ). Moreover, we observed consistency in the direction of effects between the three datasets analyzed (discovery, validation and joint meta-analysis) (Fig. 4). We further assessed the robustness of our PCOS definition by comparing the effects sizes between the lead variants in the replicated loci presented in the non-NIH Rotterdam criteria (Day ) to our association results. We conclude that our results based on ICD codes alone are robust, as the effects are in the same direction and do not present significant heterogeneity (phet = 1) compared to those using non-NIH Rotterdam criteria (Supplementary Fig. S3) (Day ).
Figure 4.

Forest plot of effect estimates for the seven lead variants associated with PCOS. The odds ratios (dots) and 95% CI (whiskers) are shown for the two included cohorts and the meta-analysis.

Forest plot of effect estimates for the seven lead variants associated with PCOS. The odds ratios (dots) and 95% CI (whiskers) are shown for the two included cohorts and the meta-analysis. In colocalization analyses, all posterior probabilities for a shared causal variant were lower than 0.8. Thus, we did not find enough evidence that two association signals in the genome-wide association analysis and gene expression are consistent with a shared causal variant.

Discussion

In this study, we found two independent novel associations for PCOS on 22q11.2. In both cases, the lead single nucleotide polymorphisms had tightly linked variants, a frameshift (c.1100delC) and a missense (I157T), in the CHEK2 gene. A novel association was also detected in an intron of MYO10. We were also able to replicate signals commonly reported in PCOS GWAS—DENND1A, ERBB4 (HER4), ZBTB16 and FSHB—in our North-European populations. CHEK2 rs555607708 (c.1100delC), the likely association-driving variant in FinnGen, is a Finnish-enriched variant with a 3.7-fold enrichment compared to non-Finnish, non-Estonian Europeans and with an enrichment of 1.7 compared to Estonians (Mars ). Similarly, I157T, the likely association-driving variant in the EstBB, has a substantially higher allele frequency in the Estonian (0.048) and Finnish (0.029) populations, compared to the non-Finnish, Northwestern European population (0.0025), according to the gnomAD database (Karczewski ). The enrichment of the alleles likely allowed us to detect the associations with PCOS in the Finnish and Estonian populations, whereas in populations with lower minor allele frequencies, much larger study populations would need to be used. CHEK2 is a mediator of DNA damage signalling in response to double-stranded DNA breaks. CHEK2 can be considered an important factor in the quality control of cells. If CHEK2 function is disturbed, DNA repair is imbalanced, which can lead to genomic instability and tumorigenesis (Mustofa ). Whereas the association of CHEK2 c.1100delC with a moderate-risk breast cancer predisposition is well recognized (Meijers-Heijboer ), the pathogenic role of I157T remains controversial (Schutte ; Kilpivaara ; Muranen ). Several studies have shown the pathogenic impact of c.1100delC on breast cancer risk in the Finnish population (Kuusisto ; Hallamies ; Mars ). There are currently no studies evaluating the pathogenic role of c.1100delC or I157T in Estonians, which underlines the need for further research assessing the impact of these variants in this population. An interaction between BMI and PCOS-associated variants has previously been suggested (Wojciechowski ), and interestingly, the c.1100delC variant in CHEK2 has recently been shown to predispose particularly obese carriers to the development of breast cancer (Greville-Heygate ). Although our results did not support such an association between c.1100delC-related PCOS risk and obesity, a replication of this analysis with larger sample size is needed. Epidemiological studies show an increased risk for endometrial cancer in women with PCOS. However, this does not apply to other gynaecological cancers like ovarian, cervical or breast cancer (Barry ; Gottschau ; Hart and Doherty, 2015; Harris and Terry, 2016; Ding ). Nevertheless, three recent studies utilizing a Mendelian randomization approach have suggested a modest but significant causal effect between PCOS and breast cancer (Wu ; Wen ; Zhu ). The fact that the risks do not seem to translate into clinical findings is notable and may indicate, for example, more efficient DNA repair systems in women with PCOS, a feature also associated with later onset of menopause (Day ; Ruth ). Interestingly, CHEK2 also plays a crucial role in foetal oocyte attrition, a phenomenon through which 80% of the initial ovarian oocyte reserve is lost during foetal development in mammals (Tharp ). Deletion of Chk2 in mice leads to a maximized ovarian reserve at postnatal day 2 (Tharp ) and reduced follicle atresia, a higher number of ovulated metaphase II oocytes, and higher AMH levels at 13.5 months (Ruth ). It was also reported that a CHEK2 loss-of-function allele is associated with later menopausal age in humans (Ruth ). This would be in line with women with PCOS, as they also present with an increased ovarian reserve, higher AMH levels, even at later reproductive years, and delayed menopause (Piltonen ; de Ziegler ; Minooee ; Forslund ; Ward ). A specific association between menopause-delaying alleles and PCOS has also been previously demonstrated (Day ). In a recent preprint work, Ward et al. also found that CHEK2 was associated with the age of menopause. When conducting a phenome-wide association study (PheWAS) on their associations, an aggregate of CHEK2-damaging variants also associated with PCOS, which is in line with our findings (Ward ). Our study also detected the previously reported associations with PCOS for ERBB4, DENND1A, FSHB and ZBTB16. Interestingly, ERBB4 has also recently been linked to proper oocyte maturation and high AMH in mice (Veikkolainen ). Thus, the present study reinforces the links between PCOS, abnormal follicle development and high levels of AMH. This study also presents an interesting novel association in an intronic region of MYO10. The MYO10 gene codes for an atypical myosin, which is involved in filopodia formation, phagocytosis and cargo transport in cells (Sousa and Cheney, 2005). Genetic variation in MYO10 has previously been linked to type 2 diabetes (Salonen ) and traits of metabolic syndrome (Zhang ). Interestingly, the identified variant seems to be associated with the age at menarche (Kichaev ), indicating a reproductive function for MYO10. Although a metabolic link between MYO10 and PCOS seems likely, further research is needed to characterize the role of MYO10 in PCOS. As previous studies have suggested a causal role for obesity in PCOS (Brower ; Zhao ), we reran the association analyses adjusting for BMI. A reduction in the significance of several associations was expected owing to the limited availability of BMI data (60% in FinnGen and 75% in EstBB). Two of the replicated (FSHB, ZBTB16) and two of the novel associations (MYO10 and CHEK2) did not reach genome-wide significance after adjustment. As the effect sizes remain largely unchanged when adjusted for BMI, the statistical significance of the associations was diluted by the reduction in sample size. Thus, we mainly focused on age-adjusted associations and acknowledge that larger sample sizes are needed to further explore the interplay between BMI and PCOS-related genetic factors. Overall, it is important to note that complex LD patterns between association signals might eclipse more distant causal genes. To infer plausible shared causal variants between PCOS-related genetic variants and gene expression, we conducted colocalization analyses without significant findings. This might be explained by the low sample size in gene expression panels that study tissues of interest in PCOS, such as reproductive tissues. Thus, further functional studies are warranted to characterize the regulatory functions of the uncovered loci (Peltonen ; Lim ; Martin ; Prohaska ). The main strength of this study was the use of the two large, comprehensive genetic data sets, FinnGen and the EstBB, which have been extensively linked to national registers, such as the CRCH in Finland and the Estonian Health Insurance Fund registries in Estonia, as well as with other relevant databases (Leitsalu ). Both populations are genetically well-characterized (Salmela ). Furthermore, our main discovery of the two rare PCOS-associated variants near CHEK2 underlines the value of using study populations with a distinct genetic makeup. The interplay of past demographic events may result in regionally varying genetic architectures for medical conditions (Peltonen ; Martin ). When alleles enriched in such populations are causal or linked to causal variation, increased statistical power is present, enabling their detection in an association analysis (Lim ; Prohaska ). The register-based approach is also a limiting factor, as the health register-based prevalence of PCOS is very low in our study populations (0.57% for FinnGen and 3.15% for the EstBB), plausibly reflecting underdiagnosis of the syndrome. We were unable to validate the ICD codes, as the FinnGen dataset does not contain identifying information of the subjects; however, the coverage and accuracy of the Finnish CRCH have been validated in several studies, and they have been shown to be excellent (Sund, 2012). The CRHC diagnoses are hospital-based, and thus the PCOS cases were diagnosed by specialized doctors. The validity of the PCOS diagnosis is also supported by the fact that we were able to replicate four previously reported signals, ERBB4, DENND1A, FSHB and ZBTB16. In addition, there was consistency in the direction of effects between our association results and the non-NIH Rotterdam-criteria association results presented in the largest European GWAS meta-analysis to date (Day ), which adds robustness to our approach. Given the register-based approach, we could not assess in more detail the different PCOS phenotypes; however, a previous study indicated that women with PCOS diagnosed by a physician using different diagnostic criteria are genetically similar (Day ). In conclusion, we identified two rare population-enriched variants located in CHEK2 that are significantly associated with PCOS. The findings emphasize the benefits of utilizing isolated populations in genetic studies of complex diseases and advance the understanding of genetic factors underlying PCOS.

Data availability

The GWAS meta-analysis summary statistics that support the findings of this study are available for download from the GWAS Catalog at ebi.ac.uk/gwas/ with accession ID numbers GCST90044902 and GCST90044903. Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file.
  83 in total

1.  Body Mass Index and Polycystic Ovary Syndrome: A 2-Sample Bidirectional Mendelian Randomization Study.

Authors:  Yalin Zhao; Yuping Xu; Xiaomeng Wang; Lin Xu; Jianhua Chen; Chengwen Gao; Chuanhong Wu; Dun Pan; Qian Zhang; Juan Zhou; Ruirui Chen; Zhuo Wang; Han Zhao; Li You; Yunxia Cao; Zhiqiang Li; Yongyong Shi
Journal:  J Clin Endocrinol Metab       Date:  2020-06-01       Impact factor: 5.958

2.  Cancer risk and PCOS.

Authors:  Daniel A Dumesic; Rogerio A Lobo
Journal:  Steroids       Date:  2013-04-24       Impact factor: 2.668

3.  Genome-wide association study identifies susceptibility loci for polycystic ovary syndrome on chromosome 2p16.3, 2p21 and 9q33.3.

Authors:  Zi-Jiang Chen; Han Zhao; Lin He; Yuhua Shi; Yingying Qin; Yongyong Shi; Zhiqiang Li; Li You; Junli Zhao; Jiayin Liu; Xiaoyan Liang; Xiaoming Zhao; Junzhao Zhao; Yingpu Sun; Bo Zhang; Hong Jiang; Dongni Zhao; Yuehong Bian; Xuan Gao; Ling Geng; Yiran Li; Dongyi Zhu; Xiuqin Sun; Jin-E Xu; Cuifang Hao; Chun-E Ren; Yajie Zhang; Shiling Chen; Wei Zhang; Aijun Yang; Junhao Yan; Yuan Li; Jinlong Ma; Yueran Zhao
Journal:  Nat Genet       Date:  2010-12-12       Impact factor: 38.330

Review 4.  Human Disease Variation in the Light of Population Genomics.

Authors:  Ana Prohaska; Fernando Racimo; Andrew J Schork; Martin Sikora; Aaron J Stern; Melissa Ilardo; Morten Erik Allentoft; Lasse Folkersen; Alfonso Buil; J Víctor Moreno-Mayar; Thorfinn Korneliussen; Daniel Geschwind; Andrés Ingason; Thomas Werge; Rasmus Nielsen; Eske Willerslev
Journal:  Cell       Date:  2019-03-21       Impact factor: 41.582

5.  Low-penetrance susceptibility to breast cancer due to CHEK2(*)1100delC in noncarriers of BRCA1 or BRCA2 mutations.

Authors:  Hanne Meijers-Heijboer; Ans van den Ouweland; Jan Klijn; Marijke Wasielewski; Anja de Snoo; Rogier Oldenburg; Antoinette Hollestelle; Mark Houben; Ellen Crepin; Monique van Veghel-Plandsoen; Fons Elstrodt; Cornelia van Duijn; Carina Bartels; Carel Meijers; Mieke Schutte; Lesley McGuffog; Deborah Thompson; Douglas Easton; Nayanta Sodha; Sheila Seal; Rita Barfoot; Jon Mangion; Jenny Chang-Claude; Diana Eccles; Rosalind Eeles; D Gareth Evans; Richard Houlston; Victoria Murday; Steven Narod; Tamara Peretz; Julian Peto; Catherine Phelan; Hong Xiang Zhang; Csilla Szabo; Peter Devilee; David Goldgar; P Andrew Futreal; Katherine L Nathanson; Barbara Weber; Nazneen Rahman; Michael R Stratton
Journal:  Nat Genet       Date:  2002-04-22       Impact factor: 38.330

6.  Cohort Profile: Estonian Biobank of the Estonian Genome Center, University of Tartu.

Authors:  Liis Leitsalu; Toomas Haller; Tõnu Esko; Mari-Liis Tammesoo; Helene Alavere; Harold Snieder; Markus Perola; Pauline C Ng; Reedik Mägi; Lili Milani; Krista Fischer; Andres Metspalu
Journal:  Int J Epidemiol       Date:  2014-02-11       Impact factor: 7.196

7.  A comprehensive analysis of adiponectin QTLs using SNP association, SNP cis-effects on peripheral blood gene expression and gene expression correlation identified novel metabolic syndrome (MetS) genes with potential role in carcinogenesis and systemic inflammation.

Authors:  Yi Zhang; Jack W Kent; Michael Olivier; Omar Ali; Diana Cerjak; Ulrich Broeckel; Reham M Abdou; Thomas D Dyer; Anthony Comuzzie; Joanne E Curran; Melanie A Carless; David L Rainwater; Harald H H Göring; John Blangero; Ahmed H Kissebah
Journal:  BMC Med Genomics       Date:  2013-04-29       Impact factor: 3.063

8.  Recommendations from the international evidence-based guideline for the assessment and management of polycystic ovary syndrome.

Authors:  Helena J Teede; Marie L Misso; Michael F Costello; Anuja Dokras; Joop Laven; Lisa Moran; Terhi Piltonen; Robert J Norman
Journal:  Hum Reprod       Date:  2018-09-01       Impact factor: 6.918

9.  Weight Gain and Dyslipidemia in Early Adulthood Associate With Polycystic Ovary Syndrome: Prospective Cohort Study.

Authors:  Meri-Maija E Ollila; Terhi Piltonen; Katri Puukka; Aimo Ruokonen; Marjo-Riitta Järvelin; Juha S Tapanainen; Stephen Franks; Laure Morin-Papunen
Journal:  J Clin Endocrinol Metab       Date:  2015-12-10       Impact factor: 5.958

10.  Causal mechanisms and balancing selection inferred from genetic associations with polycystic ovary syndrome.

Authors:  Felix R Day; David A Hinds; Joyce Y Tung; Lisette Stolk; Unnur Styrkarsdottir; Richa Saxena; Andrew Bjonnes; Linda Broer; David B Dunger; Bjarni V Halldorsson; Debbie A Lawlor; Guillaume Laval; Iain Mathieson; Wendy L McCardle; Yvonne Louwers; Cindy Meun; Susan Ring; Robert A Scott; Patrick Sulem; André G Uitterlinden; Nicholas J Wareham; Unnur Thorsteinsdottir; Corrine Welt; Kari Stefansson; Joop S E Laven; Ken K Ong; John R B Perry
Journal:  Nat Commun       Date:  2015-09-29       Impact factor: 14.919

View more
  2 in total

1.  Association of maternal polycystic ovary syndrome and diabetes with preterm birth and offspring birth size: a population-based cohort study.

Authors:  Xinxia Chen; Mika Gissler; Catharina Lavebratt
Journal:  Hum Reprod       Date:  2022-05-30       Impact factor: 6.353

2.  Identifying novel genetic loci associated with polycystic ovary syndrome based on its shared genetic architecture with type 2 diabetes.

Authors:  Xiaoyi Li; Han Xiao; Yujia Ma; Zechen Zhou; Dafang Chen
Journal:  Front Genet       Date:  2022-08-29       Impact factor: 4.772

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.