Literature DB >> 27225129

Genome-wide association study identifies 74 loci associated with educational attainment.

Aysu Okbay^1,2,3, Jonathan P Beauchamp⁴, Mark Alan Fontana⁵, James J Lee⁶, Tune H Pers^7,8,9,10, Cornelius A Rietveld^1,2,3, Patrick Turley⁴, Guo-Bo Chen¹¹, Valur Emilsson^12,13, S Fleur W Meddens^3,14,15, Sven Oskarsson¹⁶, Joseph K Pickrell¹⁷, Kevin Thom¹⁸, Pascal Timshel^8,19, Ronald de Vlaming^1,2,3, Abdel Abdellaoui²⁰, Tarunveer S Ahluwalia^9,21,22, Jonas Bacelis²³, Clemens Baumbach^24,25, Gyda Bjornsdottir²⁶, Johannes H Brandsma²⁷, Maria Pina Concas²⁸, Jaime Derringer²⁹, Nicholas A Furlotte³⁰, Tessel E Galesloot³¹, Giorgia Girotto³², Richa Gupta³³, Leanne M Hall^34,35, Sarah E Harris^36,37, Edith Hofer^38,39, Momoko Horikoshi^40,41, Jennifer E Huffman⁴², Kadri Kaasik⁴³, Ioanna P Kalafati⁴⁴, Robert Karlsson⁴⁵, Augustine Kong²⁶, Jari Lahti^43,46, Sven J van der Lee², Christiaan deLeeuw^14,47, Penelope A Lind⁴⁸, Karl-Oskar Lindgren¹⁶, Tian Liu⁴⁹, Massimo Mangino^50,51, Jonathan Marten⁴², Evelin Mihailov⁵², Michael B Miller⁶, Peter J van der Most⁵³, Christopher Oldmeadow^54,55, Antony Payton^56,57, Natalia Pervjakova^52,58, Wouter J Peyrot⁵⁹, Yong Qian⁶⁰, Olli Raitakari⁶¹, Rico Rueedi^62,63, Erika Salvi⁶⁴, Börge Schmidt⁶⁵, Katharina E Schraut⁶⁶, Jianxin Shi⁶⁷, Albert V Smith^12,68, Raymond A Poot²⁷, Beate St Pourcain^69,70, Alexander Teumer⁷¹, Gudmar Thorleifsson²⁶, Niek Verweij⁷², Dragana Vuckovic³², Juergen Wellmann⁷³, Harm-Jan Westra^8,74,75, Jingyun Yang^76,77, Wei Zhao⁷⁸, Zhihong Zhu¹¹, Behrooz Z Alizadeh^53,79, Najaf Amin², Andrew Bakshi¹¹, Sebastian E Baumeister^71,80, Ginevra Biino⁸¹, Klaus Bønnelykke²¹, Patricia A Boyle^76,82, Harry Campbell⁶⁶, Francesco P Cappuccio⁸³, Gail Davies^36,84, Jan-Emmanuel De Neve⁸⁵, Panos Deloukas^86,87, Ilja Demuth^88,89, Jun Ding⁶⁰, Peter Eibich^90,91, Lewin Eisele⁶⁵, Niina Eklund⁵⁸, David M Evans^69,92, Jessica D Faul⁹³, Mary F Feitosa⁹⁴, Andreas J Forstner^95,96, Ilaria Gandin³², Bjarni Gunnarsson²⁶, Bjarni V Halldórsson^26,97, Tamara B Harris⁹⁸, Andrew C Heath⁹⁹, Lynne J Hocking¹⁰⁰, Elizabeth G Holliday^54,55, Georg Homuth¹⁰¹, Michael A Horan¹⁰², Jouke-Jan Hottenga²⁰, Philip L de Jager^8,103,104, Peter K Joshi⁶⁶, Astanand Jugessur¹⁰⁵, Marika A Kaakinen¹⁰⁶, Mika Kähönen^107,108, Stavroula Kanoni⁸⁶, Liisa Keltigangas-Järvinen⁴³, Lambertus A L M Kiemeney³¹, Ivana Kolcic¹⁰⁹, Seppo Koskinen⁵⁸, Aldi T Kraja⁹⁴, Martin Kroh⁹⁰, Zoltan Kutalik^62,63,110, Antti Latvala³³, Lenore J Launer¹¹¹, Maël P Lebreton^15,112, Douglas F Levinson¹¹³, Paul Lichtenstein⁴⁵, Peter Lichtner¹¹⁴, David C M Liewald^36,84, Anu Loukola³³, Pamela A Madden⁹⁹, Reedik Mägi⁵², Tomi Mäki-Opas⁵⁸, Riccardo E Marioni^11,36,115, Pedro Marques-Vidal¹¹⁶, Gerardus A Meddens¹¹⁷, George McMahon⁶⁹, Christa Meisinger²⁵, Thomas Meitinger¹¹⁴, Yusplitri Milaneschi⁵⁹, Lili Milani⁵², Grant W Montgomery¹¹⁸, Ronny Myhre¹⁰⁵, Christopher P Nelson^34,35, Dale R Nyholt^118,119, William E R Ollier⁵⁶, Aarno Palotie^{8,120,121,122,123,124}, Lavinia Paternoster⁶⁹, Nancy L Pedersen⁴⁵, Katja E Petrovic³⁸, David J Porteous³⁷, Katri Räikkönen^43,46, Susan M Ring⁶⁹, Antonietta Robino¹²⁵, Olga Rostapshova^4,126, Igor Rudan⁶⁶, Aldo Rustichini¹²⁷, Veikko Salomaa⁵⁸, Alan R Sanders^128,129, Antti-Pekka Sarin^123,130, Helena Schmidt^38,131, Rodney J Scott^55,132, Blair H Smith¹³³, Jennifer A Smith⁷⁸, Jan A Staessen^134,135, Elisabeth Steinhagen-Thiessen⁸⁸, Konstantin Strauch^136,137, Antonio Terracciano¹³⁸, Martin D Tobin¹³⁹, Sheila Ulivi¹²⁵, Simona Vaccargiu²⁸, Lydia Quaye⁵⁰, Frank J A van Rooij^2,140, Cristina Venturini^50,51, Anna A E Vinkhuyzen¹¹, Uwe Völker¹⁰¹, Henry Völzke⁷¹, Judith M Vonk⁵³, Diego Vozzi¹²⁶, Johannes Waage^21,22, Erin B Ware^78,141, Gonneke Willemsen²⁰, John R Attia^54,55, David A Bennett^76,77, Klaus Berger⁷², Lars Bertram^142,143, Hans Bisgaard²¹, Dorret I Boomsma²⁰, Ingrid B Borecki⁹⁴, Ute Bültmann¹⁴⁴, Christopher F Chabris¹⁴⁵, Francesco Cucca¹⁴⁶, Daniele Cusi^64,147, Ian J Deary^36,84, George V Dedoussis⁴⁴, Cornelia M van Duijn², Johan G Eriksson^46,148, Barbara Franke¹⁴⁹, Lude Franke¹⁵⁰, Paolo Gasparini^32,125,151, Pablo V Gejman^128,129, Christian Gieger²⁴, Hans-Jörgen Grabe^152,153, Jacob Gratten¹¹, Patrick J F Groenen¹⁵⁴, Vilmundur Gudnason^12,68, Pim van der Harst^72,150,155, Caroline Hayward^42,156, David A Hinds³⁰, Wolfgang Hoffmann⁷¹, Elina Hyppönen^157,158,159, William G Iacono⁶, Bo Jacobsson^23,105, Marjo-Riitta Järvelin^{160,161,162,163}, Karl-Heinz Jöckel⁶⁵, Jaakko Kaprio^33,58,123, Sharon L R Kardia⁷⁸, Terho Lehtimäki^164,165, Steven F Lehrer^166,167, Patrik K E Magnusson⁴⁵, Nicholas G Martin¹⁶⁸, Matt McGue⁶, Andres Metspalu^52,169, Neil Pendleton^170,171, Brenda W J H Penninx⁵⁹, Markus Perola^52,58, Nicola Pirastu³², Mario Pirastu²⁸, Ozren Polasek^66,172, Danielle Posthuma^14,173, Christine Power¹⁵⁹, Michael A Province⁹⁴, Nilesh J Samani^34,35, David Schlessinger⁶⁰, Reinhold Schmidt³⁸, Thorkild I A Sørensen^9,69,174, Tim D Spector⁵⁰, Kari Stefansson^26,68, Unnur Thorsteinsdottir^26,68, A Roy Thurik^1,3,175,176, Nicholas J Timpson⁶⁹, Henning Tiemeier^2,177,178, Joyce Y Tung³⁰, André G Uitterlinden^2,140, Veronique Vitart⁴², Peter Vollenweider¹¹⁶, David R Weir⁹³, James F Wilson^42,66, Alan F Wright⁴², Dalton C Conley^179,180, Robert F Krueger⁶, George Davey Smith⁶⁹, Albert Hofman², David I Laibson⁴, Sarah E Medland⁴⁸, Michelle N Meyer¹⁸¹, Jian Yang^11,92, Magnus Johannesson¹⁸², Peter M Visscher^11,92, Tõnu Esko^7,8,52,183, Philipp D Koellinger^3,14,15, David Cesarini^18,184, Daniel J Benjamin⁵.

Abstract

Educational attainment is strongly influenced by social and other environmental factors, but genetic factors are estimated to account for at least 20% of the variation across individuals. Here we report the results of a genome-wide association study (GWAS) for educational attainment that extends our earlier discovery sample of 101,069 individuals to 293,723 individuals, and a replication study in an independent sample of 111,349 individuals from the UK Biobank. We identify 74 genome-wide significant loci associated with the number of years of schooling completed. Single-nucleotide polymorphisms associated with educational attainment are disproportionately found in genomic regions regulating gene expression in the fetal brain. Candidate genes are preferentially expressed in neural tissue, especially during the prenatal period, and enriched for biological pathways involved in neural development. Our findings demonstrate that, even for a behavioural phenotype that is mostly environmentally determined, a well-powered GWAS identifies replicable associated genetic variants that suggest biologically relevant pathways. Because educational attainment is measured in large numbers of individuals, it will continue to be useful as a proxy phenotype in efforts to characterize the genetic influences of related phenotypes, including cognition and neuropsychiatric diseases.

Entities: Chemical

Mesh：

Year: 2016 PMID： 27225129 PMCID： PMC4883595 DOI： 10.1038/nature17671

Source DB: PubMed Journal: Nature ISSN： 0028-0836 Impact factor: 49.962

We study educational attainment (EA), which is measured in all main analyses as the number of years of schooling completed (EduYears, N = 293,723, mean = 14.33, SD = 3.61; Supplementary Information sections 1.1-1.2). All genome-wide association studies (GWAS) were performed at the cohort level in samples restricted to individuals of European descent whose EA was assessed at or above age 30. A uniform set of quality-control (QC) procedures was applied to the cohort-level summary statistics. In our GWAS meta-analysis of ∼9.3M SNPs from the 1000 Genomes Project, we used sample-size weighting and applied a single round of genomic control at the cohort level. Our meta-analysis identified 74 approximately independent genome-wide significant loci. For each locus, we define the “lead SNP” as the SNP in the genomic region that has the smallest P-value (Supplementary Information section 1.6.1). Fig. 1 shows a Manhattan plot with the lead SNPs highlighted. This includes the three SNPs that reached genome-wide significance in the discovery stage of our previous GWAS meta-analysis of EA[1]. The quantile-quantile (Q-Q) plot of the meta-analysis (Extended Data Fig. 1) exhibits inflation (λGC = 1.28), as expected under polygenicity[3].

Figure 1

Manhattan plot for EduYears associations (N = 293,723)

The x-axis is chromosomal position, and the y-axis is the significance on a –log10 scale. The black line shows the genome-wide significance level (5×10-8). The red x's are the 74 approximately independent genome-wide significant associations (“lead SNPs”). The black dots labeled with rs numbers are the 3 Rietveld et al.[1] SNPs.

Extended Data Figure 1

Quantile-quantile plot of the genome-wide association meta-analysis of 64 EduYears results files

Observed and expected P-values are on a –log10 scale. The grey region depicts the 95% confidence interval under the null hypothesis of a uniform P-value distribution. The observed λGC is 1.28. (As reported in Supplementary Information section 1.5.4, the unweighted mean λGC is 1.02, the unweighted median is 1.01, and the range across cohorts is 0.95–1.15.)

Extended Data Fig. 2 shows the estimated effect sizes of the lead SNPs. The estimates range from 0.014 to 0.048 standard deviations per allele (2.7 to 9.0 weeks of schooling), with incremental R[2] in the range 0.01% to 0.035%.

Extended Data Figure 2

The distribution of effect sizes of the 74 lead SNPs

a, SNPs ordered by absolute value of the standardized effect of one more copy of the education-increasing allele, with 95% confidence intervals. b, SNPs ordered by R2. Effects on EduYears are benchmarked against the top 74 genome-wide significant hits identified in the largest GWAS conducted to date of height and body mass index (BMI), and the 48 associations reported for waist-to-hip ratio adjusted for BMI (WHR). These results are based on the GIANT consortium's publicly available results for pooled analyses restricted to European-ancestry individuals: https://www.broadinstitute.org/collaboration/giant/index.php/GIANT_consortium.

To quantify the amount of population stratification in the GWAS estimates that remains even after the stringent controls used by the cohorts (Supplementary Information section 1.4), we used LD Score regression[4]. The regression results indicate that ∼8% of the observed inflation in the mean χ[2] is due to bias rather than polygenic signal (Extended Data Fig. 3a), suggesting that stratification effects are small in magnitude. We also found evidence for polygenic association signal in several within-family analyses, although these are not powered for individual SNP association testing (Supplementary Information section 2 and Extended Data Fig. 3b).

Extended Data Figure 3

Assessing the extent to which population stratification affects the estimates from the GWAS

a, LD Score regression plot with the summary statistics from the GWAS. Each point represents an LD Score quantile for a chromosome (the x and y coordinates of the point are the mean LD Score and the mean χ2 statistic of variants in that quantile). The facts that the intercept is close to one and that the χ2 statistics increase linearly with the LD Scores suggest that the bulk of the inflation in the χ2 statistics is due to true polygenic signal and not to population stratification. b, Estimates and 95% confidence intervals from individual-level and WF regressions of EduYears on polygenic scores, for scores constructed with sets of SNPs meeting different P-value thresholds. In addition to the analyses shown here, we conduct a sign concordance test, and we decompose the variance of the polygenic score. Overall, these analyses suggest that population stratification is unlikely to be a major concern for our 74 lead SNPs. See Supplementary Information section 3 for additional details.

To further test the robustness of our findings, we examined the within-sample and out-of-sample replicability of SNPs reaching genome-wide significance (Supplementary Information sections 1.7-1.8). We found that SNPs identified in the previous EA meta-analysis replicated in the new cohorts included here, and conversely, that SNPs reaching genome-wide significance in the new cohorts replicated in the old cohorts. For the out-of-sample replication analyses of our 74 lead SNPs, we used the interim release of the U.K. Biobank [5] (UKB) (N = 111,349). As shown in Extended Data Fig. 4, 72 out of the 74 lead SNPs have a consistent sign (P = 1.47×10−19), 52 are significant at the 5% level (P = 2.68×10−50), and 7 reach genome-wide significance in the U.K. Biobank dataset (P = 1.41×10−42). For comparison, the corresponding expected numbers, assuming each SNP's true effect size is its estimated effect adjusted for the winner's curse, are 71.4, 40.3, and 0.6. (Supplementary Information section 1.8.2). We also find out-of-sample replicability of our overall GWAS results: the genetic correlation between EduYears in our meta-analysis sample and in the UKB data is 0.95 (s.e. = 0.021; Supplementary Table 1.14).

Extended Data Figure 4

Replication of 74 lead SNPs in the UK Biobank data

Estimated effect sizes (in years of schooling) and 95% confidence intervals of the 74 lead SNPs in the meta-analysis sample (N = 293,723) and the UK Biobank replication sample (N = 111,349). The reference allele is the allele associated with higher values of EduYears in the meta-analysis sample. SNPs are in descending order of R2 in the meta-analysis sample. Of the 74 lead SNPs, 72 have the anticipated sign in the replication sample, 52 replicate at the 0.05 significance level, and 7 replicate at the 5×10−8 significance level.

It is known that EA, cognitive performance, and many neuropsychiatric phenotypes are phenotypically correlated, and several studies of twins find that the phenotypic correlations partly reflect genetic overlap[6-8] (Supplementary Information section 3.3.4). Here, we investigate genetic correlation using our GWAS results for EduYears and published GWAS results for 14 other phenotypes, using bivariate Linkage-Disequilibrium (LD) Score regression[9] (Supplementary Information section 3). First, we estimated genetic correlations with EduYears. As shown in Fig. 2, based on overall summary statistics for associated variants, we find genetic covariance between increased EA and increased cognitive performance (P = 9.9×10-50), increased intracranial volume (P = 1.2×10-6), increased risk of bipolar disorder (P = 7×10-13), decreased risk of Alzheimer's (P = 4×10-4), and lower neuroticism (P = 2.8×10-8). We also found positive, statistically significant, but very small, genetic correlations with height (P = 5.2×10-15) and risk of schizophrenia (P = 3.2×10-4).

Figure 2

Genetic correlations between EduYears and other traits

Results from bivariate Linkage-Disequilibrium (LD) Score regressions[9]: estimates of genetic correlation with brain volume, neuropsychiatric, behavioral, and anthropometric phenotypes using published GWAS summary statistics. The error bars show the 95% confidence intervals.

Second, we examined whether our 74 lead SNPs are jointly associated with each phenotype (Extended Data Fig. 5 and Supplementary Information section 3.3.1). We reject the null hypothesis of no enrichment at P < 0.05 for 10 of the 14 phenotypes (all the exceptions are subcortical brain structures).

Extended Data Figure 5

Q-Q plots for the 74 lead EduYears SNPs (or LD proxies) in published GWAS of other phenotypes

SNPs with concordant effects on both phenotypes are pink, and SNPs with discordant effects are blue. SNPs outside the gray area pass Bonferroni-corrected significance thresholds that correct for the total number of SNPs we tested (P < 0.05/74 = 6.8×10-4) and are labeled with their rs numbers. Observed and expected P-values are on a –log10 scale. For the sign concordance test: * P < 0.05, ** P < 0.01, and *** P < 0.001.

Third, for each phenotype, we tested (in the published GWAS results) each of our 74 lead SNPs or proxy for association at a significance threshold of 0.05/74. We found a total of 25 SNPs meeting this threshold for any of these phenotypes, but only one reaching genome-wide significance. While these results provide suggestive evidence that some of these SNPs may be associated with other phenotypes, further testing of these associations in independent cohorts is required (Supplementary Tables 3.2-3.4, Extended Data Fig. 6).

Extended Data Figure 6

Regional association plots for four of the ten prioritized SNPs for MHBA phenotypes identified using EduYears as a proxy phenotype

a, cognitive performance; b, hippocampus; c, intracranial volume; d, neuroticism. The four were selected because very few genome-wide significant SNPs have been previously reported for these traits. Data sources and methods are described in Supplementary Information section 3. The R2 values are from the hg19 / 1000 Genomes Nov 2014 EUR references samples. The figures were created with LocusZoom (http://csg.sph.umich.edu/locuszoom/). Mb, megabases.

To consider potential biological pathways, we first tested whether SNPs in particular regions of the genome are implicated by our GWAS results. Unlike what has been found for other phenotypes, SNPs in regions that are DNase I hypersensitive in the fetal brain are more likely to be associated with EduYears by a factor of ∼5 (95% confidence interval 2.89–7.07; Extended Data Fig. 7). Moreover, the 15% of SNPs residing in regions associated with histones marked in the central nervous system (CNS) explain 44% of the heritable variation (Extended Data Fig. 8a and Supplementary Table 4.4.2). This enrichment factor of ∼3 for CNS (P = 2.48×10−16) is greater than that of any of the other nine tissue categories in this analysis.

Extended Data Figure 7

Application of fgwas to EduYears. See Supplementary Information section 4.2 for further details

a, The results of single-annotation models. “Enrichment” refers to the factor by which the prior odds of association at an LD-defined region must be multiplied if the region bears the given annotation; this factor is estimated using an empirical Bayes method applied to all SNPs in the GWAS meta-analysis regardless of statistical significance. Annotations were derived from ENCODE and a number of other data sources. Plotted are the base-2 logarithms of the enrichments and their 95% confidence intervals. Multiple instances of the same annotation correspond to independent replicates of the same experiment. b, The results of combining multiple annotations and applying model selection and cross-validation. Although the maximum-likelihood estimates are plotted, model selection was performed with penalized likelihood. c, Reweighting of GWAS loci. Each point represents an LD-defined region of the genome, and shown are the regional posterior probabilities of association (PPAs). The x-axis give the PPA calculated from the GWAS summary statistics alone, whereas the y-axis gives the PPA upon reweighting on the basis of the annotations in b. The orange points represent genomic regions where the PPA is equivalent to the standard GWAS significance threshold only upon reweighting.

Extended Data Figure 8

Tissue-level biological annotation

a, The enrichment factor for a given tissue type is the ratio of variance explained by SNPs in that group to the overall fraction of SNPs in that group. To benchmark the estimates for EduYears, we compare the enrichment factors to those obtained when we use the largest GWAS conducted to date on body mass index, height, and waist-to-hip ratio adjusted for BMI. The estimates were produced with the LDSC python software, using the LD Scores and functional annotations introduced in Finucane et al. (2015) and the HapMap3 SNPs with MAF > 0.05. Each of the 10 enrichment calculations for a particular cell type is performed independently, while each controlling for the 52 functional annotation categories in the full baseline model. The error bars show the 95% confidence intervals. b, We took measurements of gene expression by the Genotype-Tissue Expression (GTEx) Consortium and determined whether the genes overlapping EduYears-associated loci are significantly overexpressed (relative to genes in random sets of loci matched by gene density) in each of 37 tissue types. These types are grouped in the panel by organ. The colored bars corresponding to tissues where there is significant overexpression. The y-axis is the significance on a –log10 scale.

Given that our findings disproportionately implicate SNPs in regions regulating brain-specific gene expression, we examined whether genes located near EduYears-associated SNPs show elevated expression in neural tissue. We tested this hypothesis using data on mRNA transcript levels in the 37 adult tissues assayed by the Genotype-Tissue Expression Project (GTEx)[10]. Remarkably, the 13 GTEx tissues that are components of the CNS—and only those 13 tissues—show significantly elevated expression levels of genes near EduYears-associated SNPs (FDR < 0.05; Extended Data Fig. 8b and Supplementary Table 4.5.2). To investigate possible functions of the candidate genes from the GWAS associated loci, we examined the extent of their overlap with groups of genes (“gene sets”) whose products are known or predicted to participate in a common biological process[11]. We found 283 gene sets significantly enriched by the candidate genes identified in our GWAS (FDR < 0.05; Supplementary Table 4.5.1). To facilitate interpretation, we used a standard procedure[11] to group the 283 gene sets into “clusters” defined by degree of gene overlap. The resulting 34 clusters, shown in Fig. 3, paint a coherent picture, with many clusters corresponding to stages of neural development: the proliferation of neural progenitor cells and their specialization (the cluster npBAF complex), the migration of new neurons to the different layers of the cortex (forebrain development, abnormal cerebral cortex morphology), the projection of axons from neurons to their signaling targets (axonogenesis, signaling by Robo receptor), the sprouting of dendrites and their spines (dendrite, dendritic spine organization), and neuronal signaling and synaptic plasticity throughout the lifespan (voltage-gated calcium channel complex, synapse part, synapse organization).

Figure 3

Overview of biological annotation

34 clusters of significantly enriched gene sets. Each cluster is named after one of its member gene sets. The color represents the P-value of the member set exhibiting the most statistically significant enrichment. Overlap between pairs of clusters is represented by an edge. Edge width represents the Pearson correlation ρ between the two vectors of gene membership scores (ρ < 0.3, no edge; 0.3 ≤ ρ < 0.5, thin edge; 0.5 ≤ ρ < 0.7, intermediate edge; ρ ≥ 0.7, thick edge), where each cluster's vector is the vector for the gene set after which the cluster is named.

Many of our results implicate candidate genes and biological pathways that are active during distinct stages of prenatal brain development. To directly examine how the expression levels of candidate genes identified in our GWAS vary over the course of development, we used gene expression data from the BrainSpan Developmental Transcriptome[12]. As shown in Extended Data Fig. 9, these candidate genes exhibit above-baseline expression in the brain throughout life but especially higher expression levels in the brain during prenatal development (1.36 times higher prenatally than postnatally, P = 6.02×10−8).

Extended Data Figure 9

Gene-level biological annotation

a, The DEPICT-prioritized genes for EduYears measured in the BrainSpan Developmental Transcriptome data (red curve) are more strongly expressed in the brain prenatally rather than postnatally. The DEPICT-prioritized genes exhibit similar gene-expression levels across different brain regions (gray lines). Analyses were based on log2-transformed RNA-Seq data. Error bars represent 95% confidence intervals. b, For each phenotype and disorder, we calculated the overlap between the phenotype's DEPICT-prioritized genes and genes believed to harbor de novo mutations causing the disorder. The bars correspond to odds ratios. EduYears, years of education; BMI, body mass index; WHR, waist-to-hip ratio adjusted for BMI. c, DEPICT-prioritized genes in EduYears-associated loci exhibit substantial overlap with genes previously reported to harbor sites where mutations increase risk of intellectual disability and autism spectrum disorder (Supplementary Table 4.6.1).

A summary overview of some promising candidate genes for follow-up work is provided in Table 1.

Table 1

Selected candidate genes implicated by bioinformatics analyses

Fifteen candidate genes implicated most consistently across various analyses. To assemble this list, each gene in a DEPICT-defined locus (Supplementary Information section 4.5) was assigned a score equal to the number of criteria it satisfies out of ten (see Supplementary Table 4.1 for details). The DEPICT prioritization P-value was used as the tiebreaker. “SNP”: the SNP in the gene's locus with the lowest P-value in the EduYears meta-analysis. “Syndromic”: which, if any, of three neuropsychiatric disorders have been linked to de novo mutations in the gene (Supplementary Information section 4.6). “Top-ranking gene sets”: DEPICT reconstituted gene sets of which the gene is a top-20 member (Supplementary Table 4.5.1). The three most significant gene sets are shown if more than three are available. ID, intellectual disability; ASD, autism spectrum disorder; SCZ, schizophrenia.

Gene	SNP	Syndromic	Score	Top-ranking gene sets
TBR1	rs4500960	ID, ASD	6	Developmental biology, decreased brain size, abnormal cerebral cortex morphology
MEF2C	rs7277187	ID, ASD	5	ErbB signaling pathway, abnormal sternum ossification, regulation of muscle cell differentiation
ZSWIM6	rs61160187	–	5	Transcription factor binding, negative regulation of signal transduction, PI3K events in ErbB4 signaling
BCL11A	rs2457660	ASD	5	Dendritic spine organization, abnormal hippocampal mossy fiber morphology, SWI/SNF-type complex
CELSR3	rs11712056	SCZ	5	Dendrite morphogenesis, dendrite development, abnormal hippocampal mossy fiber morphology
MAPT	rs192818565	ID	5	Dendrite morphogenesis, abnormal hippocampal mossy fiber morphology, abnormal axon guidance
SBNO1	rs7306755	SCZ	5	Protein serine/threonine phosphatase complex
NBAS	rs12987662	–	5	–
NBEA	rs9544418	SCZ	4	Developmental biology, signaling by Robo receptor, dendritic shaft
SMARCA2	rs1871109	ID	4	–
MAP4	rs11712056	ASD	4	Developmental biology, signaling by Robo receptor, SWI-SNF-type complex
LINC00461	rs10061788	–	4	Decreased brain size, abnormal cerebral cortex morphology, abnormal hippocampal mossy fiber morphology
POU3F2	rs9320913	–	4	Dendrite morphogenesis, developmental biology, decreased brain size
RAD54L2	rs11712056	SCZ	4	Decreased brain size, SWI/SNF-type complex, nBAF complex
PLK2	rs2964197	–	4	Negative regulation of signal transduction, PI3K events in ErbB4 signaling

We constructed polygenic scores[13] to assess the joint predictive power afforded by the GWAS results (Supplementary Information section 5.2). Across our two holdout samples, the mean predictive power of a polygenic score constructed from all measured SNPs is 3.2% (P = 1.18×10−39; Supplementary Table 5.2 and Supplementary Information section 5). Studies of genetic analyses of behavioral phenotypes have been prone to misinterpretation, such as characterizing identified associated variants as “genes for education.” Such characterization is not correct for many reasons: EA is primarily determined by environmental factors, the explanatory power of the individual SNPs is small, the candidate genes may not be causal, and the genetic associations with EA are mediated by multiple intermediate phenotypes[14]. To illustrate this last point, we studied mediation of the association between the all-SNPs polygenic score and EduYears in two of our cohorts. We found that cognitive performance can statistically account for 23-42% of the association (P < 0.001) and the personality trait “openness to experience” for approximately 7% (P < 0.001; Supplementary Information section 6). It would also be a mistake to infer from our findings that the genetic effects operate independently of environmental factors. Indeed, a recent meta-analysis of twin studies found that genetic influences on EA are heterogeneous across countries and birth cohorts[15]. We conducted exploratory analyses in the Swedish Twin Registry to illustrate how environmental factors may amplify or dampen the impact of genetic influences (Supplementary Information section 7). We found that the predictive power of the all-SNPs polygenic score is heterogeneous by birth cohort, with smaller explanatory power in younger cohorts (Extended Data Fig. 10; see also Supplementary Information section 7.4 for discussion of the contrast between these results and findings from a seminal twin study that estimated EA heritability by birth cohort[16]).

Extended Figure 10

The predictive power of a polygenic score (PGS) varies in Sweden by birth cohort

Five-year rolling regressions of years of education on the PGS (left axis in all four panels), share of individuals not affected by the comprehensive school reform (a, right axis), and average distance to nearest junior high school (b, right axis), nearest high school (c, right axis) and nearest college/university (d, right axis). The shaded area displays the 95% confidence intervals for the PGS effect.

Methods

All methods are described in the Supplementary Information.

Quantile-quantile plot of the genome-wide association meta-analysis of 64 EduYears results files

The distribution of effect sizes of the 74 lead SNPs

Assessing the extent to which population stratification affects the estimates from the GWAS

Replication of 74 lead SNPs in the UK Biobank data

Q-Q plots for the 74 lead EduYears SNPs (or LD proxies) in published GWAS of other phenotypes

Regional association plots for four of the ten prioritized SNPs for MHBA phenotypes identified using EduYears as a proxy phenotype

Application of fgwas to EduYears. See Supplementary Information section 4.2 for further details

Tissue-level biological annotation

Gene-level biological annotation

The predictive power of a polygenic score (PGS) varies in Sweden by birth cohort

13 in total

1. A population-based study of shared genetic variation between premorbid IQ and psychosis among male twin pairs and sibling pairs from Sweden.

Authors: Tom Fowler; Stanley Zammit; Michael J Owen; Finn Rasmussen
Journal: Arch Gen Psychiatry Date: 2012-05

2. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies.

Authors: Brendan K Bulik-Sullivan; Po-Ru Loh; Hilary K Finucane; Stephan Ripke; Jian Yang; Nick Patterson; Mark J Daly; Alkes L Price; Benjamin M Neale
Journal: Nat Genet Date: 2015-02-02 Impact factor: 38.330

3. Genomic inflation factors under polygenic inheritance.

Authors: Jian Yang; Michael N Weedon; Shaun Purcell; Guillaume Lettre; Karol Estrada; Cristen J Willer; Albert V Smith; Erik Ingelsson; Jeffrey R O'Connell; Massimo Mangino; Reedik Mägi; Pamela A Madden; Andrew C Heath; Dale R Nyholt; Nicholas G Martin; Grant W Montgomery; Timothy M Frayling; Joel N Hirschhorn; Mark I McCarthy; Michael E Goddard; Peter M Visscher
Journal: Eur J Hum Genet Date: 2011-03-16 Impact factor: 4.246

4. Genetic and environmental contributions to the covariance between occupational status, educational attainment, and IQ: a study of twins.

Authors: K Tambs; J M Sundet; P Magnus; K Berg
Journal: Behav Genet Date: 1989-03 Impact factor: 2.805

5. Education policy and the heritability of educational attainment.

Authors: A C Heath; K Berg; L J Eaves; M H Solaas; L A Corey; J Sundet; P Magnus; W E Nance
Journal: Nature Date: 1985 Apr 25-May 1 Impact factor: 49.962

6. Replicability and robustness of genome-wide-association studies for behavioral traits.

Authors: Cornelius A Rietveld; Dalton Conley; Nicholas Eriksson; Tõnu Esko; Sarah E Medland; Anna A E Vinkhuyzen; Jian Yang; Jason D Boardman; Christopher F Chabris; Christopher T Dawes; Benjamin W Domingue; David A Hinds; Magnus Johannesson; Amy K Kiefer; David Laibson; Patrik K E Magnusson; Joanna L Mountain; Sven Oskarsson; Olga Rostapshova; Alexander Teumer; Joyce Y Tung; Peter M Visscher; Daniel J Benjamin; David Cesarini; Philipp D Koellinger
Journal: Psychol Sci Date: 2014-10-06

7. The high heritability of educational achievement reflects many genetically influenced traits, not just intelligence.

Authors: Eva Krapohl; Kaili Rimfeld; Nicholas G Shakeshaft; Maciej Trzaskowski; Andrew McMillan; Jean-Baptiste Pingault; Kathryn Asbury; Nicole Harlaar; Yulia Kovas; Philip S Dale; Robert Plomin
Journal: Proc Natl Acad Sci U S A Date: 2014-10-06 Impact factor: 11.205

8. UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age.

Authors: Cathie Sudlow; John Gallacher; Naomi Allen; Valerie Beral; Paul Burton; John Danesh; Paul Downey; Paul Elliott; Jane Green; Martin Landray; Bette Liu; Paul Matthews; Giok Ong; Jill Pell; Alan Silman; Alan Young; Tim Sprosen; Tim Peakman; Rory Collins
Journal: PLoS Med Date: 2015-03-31 Impact factor: 11.069

9. Common polygenic variation contributes to risk of schizophrenia and bipolar disorder.

Authors: Shaun M Purcell; Naomi R Wray; Jennifer L Stone; Peter M Visscher; Michael C O'Donovan; Patrick F Sullivan; Pamela Sklar
Journal: Nature Date: 2009-07-01 Impact factor: 49.962

10. An atlas of genetic correlations across human diseases and traits.

Authors: Brendan Bulik-Sullivan; Hilary K Finucane; Verneri Anttila; Alexander Gusev; Felix R Day; Po-Ru Loh; Laramie Duncan; John R B Perry; Nick Patterson; Elise B Robinson; Mark J Daly; Alkes L Price; Benjamin M Neale
Journal: Nat Genet Date: 2015-09-28 Impact factor: 38.330

455 in total

1. Genetics and the geography of health, behaviour and attainment.

Authors: Daniel W Belsky; Avshalom Caspi; Louise Arseneault; David L Corcoran; Benjamin W Domingue; Kathleen Mullan Harris; Renate M Houts; Jonathan S Mill; Terrie E Moffitt; Joseph Prinz; Karen Sugden; Jasmin Wertz; Benjamin Williams; Candice L Odgers
Journal: Nat Hum Behav Date: 2019-04-08

2. Screening Human Embryos for Polygenic Traits Has Limited Utility.

Authors: Ehud Karavani; Or Zuk; Danny Zeevi; Nir Barzilai; Nikos C Stefanis; Alex Hatzimanolis; Nikolaos Smyrnis; Dimitrios Avramopoulos; Leonid Kruglyak; Gil Atzmon; Max Lam; Todd Lencz; Shai Carmi
Journal: Cell Date: 2019-11-21 Impact factor: 41.582

3. Pathways Between a Polygenic Score for Educational Attainment and Higher Educational Attainment in an African American Sample.

Authors: Jill A Rabinowitz; Sally I-Chun Kuo; Benjamin Domingue; Mieka Smart; William Felder; Kelly Benke; Brion S Maher; Nicholas S Ialongo; George Uhl
Journal: Behav Genet Date: 2019-11-23 Impact factor: 2.805

4. Recent genetic and functional insights in autism spectrum disorder.

Authors: Moe Nakanishi; Matthew P Anderson; Toru Takumi
Journal: Curr Opin Neurol Date: 2019-08 Impact factor: 5.710

5. Isolated Diastolic Hypertension in the UK Biobank: Comparison of ACC/AHA and ESC/NICE Guideline Definitions.

Authors: John W McEvoy; Nilanjan Chatterjee; Brian P McGrath; Prosenjit Kundu; Natalie Daya; Josef Coresh; Elizabeth Selvin
Journal: Hypertension Date: 2020-07-27 Impact factor: 10.190

6. Identification of Genetic Loci Shared Between Attention-Deficit/Hyperactivity Disorder, Intelligence, and Educational Attainment.

Authors: Kevin S O'Connell; Alexey Shadrin; Olav B Smeland; Shahram Bahrami; Oleksandr Frei; Francesco Bettella; Florian Krull; Chun C Fan; Ragna B Askeland; Gun Peggy S Knudsen; Anne Halmøy; Nils Eiel Steen; Torill Ueland; G Bragi Walters; Katrín Davíðsdóttir; Gyða S Haraldsdóttir; Ólafur Ó Guðmundsson; Hreinn Stefánsson; Ted Reichborn-Kjennerud; Jan Haavik; Anders M Dale; Kári Stefánsson; Srdjan Djurovic; Ole A Andreassen
Journal: Biol Psychiatry Date: 2019-11-29 Impact factor: 13.382

7. Male antisocial behaviour in adolescence and beyond.

Authors: Terrie E Moffitt
Journal: Nat Hum Behav Date: 2018-02-21

8. Pleiotropic Meta-Analysis of Cognition, Education, and Schizophrenia Differentiates Roles of Early Neurodevelopmental and Adult Synaptic Pathways.

Authors: Max Lam; W David Hill; Joey W Trampush; Jin Yu; Emma Knowles; Gail Davies; Eli Stahl; Laura Huckins; David C Liewald; Srdjan Djurovic; Ingrid Melle; Kjetil Sundet; Andrea Christoforou; Ivar Reinvang; Pamela DeRosse; Astri J Lundervold; Vidar M Steen; Thomas Espeseth; Katri Räikkönen; Elisabeth Widen; Aarno Palotie; Johan G Eriksson; Ina Giegling; Bettina Konte; Annette M Hartmann; Panos Roussos; Stella Giakoumaki; Katherine E Burdick; Antony Payton; William Ollier; Ornit Chiba-Falek; Deborah K Attix; Anna C Need; Elizabeth T Cirulli; Aristotle N Voineskos; Nikos C Stefanis; Dimitrios Avramopoulos; Alex Hatzimanolis; Dan E Arking; Nikolaos Smyrnis; Robert M Bilder; Nelson A Freimer; Tyrone D Cannon; Edythe London; Russell A Poldrack; Fred W Sabb; Eliza Congdon; Emily Drabant Conley; Matthew A Scult; Dwight Dickinson; Richard E Straub; Gary Donohoe; Derek Morris; Aiden Corvin; Michael Gill; Ahmad R Hariri; Daniel R Weinberger; Neil Pendleton; Panos Bitsios; Dan Rujescu; Jari Lahti; Stephanie Le Hellard; Matthew C Keller; Ole A Andreassen; Ian J Deary; David C Glahn; Anil K Malhotra; Todd Lencz
Journal: Am J Hum Genet Date: 2019-08-01 Impact factor: 11.025

9. Genetic Endowments and Wealth Inequality.

Authors: Daniel Barth; Nicholas W Papageorge; Kevin Thom
Journal: J Polit Econ Date: 2020-04

10. Sex differences in the genetic architecture of obsessive-compulsive disorder.

Authors: Ekaterina A Khramtsova; Raphael Heldman; Eske M Derks; Dongmei Yu; Lea K Davis; Barbara E Stranger
Journal: Am J Med Genet B Neuropsychiatr Genet Date: 2018-11-20 Impact factor: 3.568