| Literature DB >> 27225129 |
Aysu Okbay1,2,3, Jonathan P Beauchamp4, Mark Alan Fontana5, James J Lee6, Tune H Pers7,8,9,10, Cornelius A Rietveld1,2,3, Patrick Turley4, Guo-Bo Chen11, Valur Emilsson12,13, S Fleur W Meddens3,14,15, Sven Oskarsson16, Joseph K Pickrell17, Kevin Thom18, Pascal Timshel8,19, Ronald de Vlaming1,2,3, Abdel Abdellaoui20, Tarunveer S Ahluwalia9,21,22, Jonas Bacelis23, Clemens Baumbach24,25, Gyda Bjornsdottir26, Johannes H Brandsma27, Maria Pina Concas28, Jaime Derringer29, Nicholas A Furlotte30, Tessel E Galesloot31, Giorgia Girotto32, Richa Gupta33, Leanne M Hall34,35, Sarah E Harris36,37, Edith Hofer38,39, Momoko Horikoshi40,41, Jennifer E Huffman42, Kadri Kaasik43, Ioanna P Kalafati44, Robert Karlsson45, Augustine Kong26, Jari Lahti43,46, Sven J van der Lee2, Christiaan deLeeuw14,47, Penelope A Lind48, Karl-Oskar Lindgren16, Tian Liu49, Massimo Mangino50,51, Jonathan Marten42, Evelin Mihailov52, Michael B Miller6, Peter J van der Most53, Christopher Oldmeadow54,55, Antony Payton56,57, Natalia Pervjakova52,58, Wouter J Peyrot59, Yong Qian60, Olli Raitakari61, Rico Rueedi62,63, Erika Salvi64, Börge Schmidt65, Katharina E Schraut66, Jianxin Shi67, Albert V Smith12,68, Raymond A Poot27, Beate St Pourcain69,70, Alexander Teumer71, Gudmar Thorleifsson26, Niek Verweij72, Dragana Vuckovic32, Juergen Wellmann73, Harm-Jan Westra8,74,75, Jingyun Yang76,77, Wei Zhao78, Zhihong Zhu11, Behrooz Z Alizadeh53,79, Najaf Amin2, Andrew Bakshi11, Sebastian E Baumeister71,80, Ginevra Biino81, Klaus Bønnelykke21, Patricia A Boyle76,82, Harry Campbell66, Francesco P Cappuccio83, Gail Davies36,84, Jan-Emmanuel De Neve85, Panos Deloukas86,87, Ilja Demuth88,89, Jun Ding60, Peter Eibich90,91, Lewin Eisele65, Niina Eklund58, David M Evans69,92, Jessica D Faul93, Mary F Feitosa94, Andreas J Forstner95,96, Ilaria Gandin32, Bjarni Gunnarsson26, Bjarni V Halldórsson26,97, Tamara B Harris98, Andrew C Heath99, Lynne J Hocking100, Elizabeth G Holliday54,55, Georg Homuth101, Michael A Horan102, Jouke-Jan Hottenga20, Philip L de Jager8,103,104, Peter K Joshi66, Astanand Jugessur105, Marika A Kaakinen106, Mika Kähönen107,108, Stavroula Kanoni86, Liisa Keltigangas-Järvinen43, Lambertus A L M Kiemeney31, Ivana Kolcic109, Seppo Koskinen58, Aldi T Kraja94, Martin Kroh90, Zoltan Kutalik62,63,110, Antti Latvala33, Lenore J Launer111, Maël P Lebreton15,112, Douglas F Levinson113, Paul Lichtenstein45, Peter Lichtner114, David C M Liewald36,84, Anu Loukola33, Pamela A Madden99, Reedik Mägi52, Tomi Mäki-Opas58, Riccardo E Marioni11,36,115, Pedro Marques-Vidal116, Gerardus A Meddens117, George McMahon69, Christa Meisinger25, Thomas Meitinger114, Yusplitri Milaneschi59, Lili Milani52, Grant W Montgomery118, Ronny Myhre105, Christopher P Nelson34,35, Dale R Nyholt118,119, William E R Ollier56, Aarno Palotie8,120,121,122,123,124, Lavinia Paternoster69, Nancy L Pedersen45, Katja E Petrovic38, David J Porteous37, Katri Räikkönen43,46, Susan M Ring69, Antonietta Robino125, Olga Rostapshova4,126, Igor Rudan66, Aldo Rustichini127, Veikko Salomaa58, Alan R Sanders128,129, Antti-Pekka Sarin123,130, Helena Schmidt38,131, Rodney J Scott55,132, Blair H Smith133, Jennifer A Smith78, Jan A Staessen134,135, Elisabeth Steinhagen-Thiessen88, Konstantin Strauch136,137, Antonio Terracciano138, Martin D Tobin139, Sheila Ulivi125, Simona Vaccargiu28, Lydia Quaye50, Frank J A van Rooij2,140, Cristina Venturini50,51, Anna A E Vinkhuyzen11, Uwe Völker101, Henry Völzke71, Judith M Vonk53, Diego Vozzi126, Johannes Waage21,22, Erin B Ware78,141, Gonneke Willemsen20, John R Attia54,55, David A Bennett76,77, Klaus Berger72, Lars Bertram142,143, Hans Bisgaard21, Dorret I Boomsma20, Ingrid B Borecki94, Ute Bültmann144, Christopher F Chabris145, Francesco Cucca146, Daniele Cusi64,147, Ian J Deary36,84, George V Dedoussis44, Cornelia M van Duijn2, Johan G Eriksson46,148, Barbara Franke149, Lude Franke150, Paolo Gasparini32,125,151, Pablo V Gejman128,129, Christian Gieger24, Hans-Jörgen Grabe152,153, Jacob Gratten11, Patrick J F Groenen154, Vilmundur Gudnason12,68, Pim van der Harst72,150,155, Caroline Hayward42,156, David A Hinds30, Wolfgang Hoffmann71, Elina Hyppönen157,158,159, William G Iacono6, Bo Jacobsson23,105, Marjo-Riitta Järvelin160,161,162,163, Karl-Heinz Jöckel65, Jaakko Kaprio33,58,123, Sharon L R Kardia78, Terho Lehtimäki164,165, Steven F Lehrer166,167, Patrik K E Magnusson45, Nicholas G Martin168, Matt McGue6, Andres Metspalu52,169, Neil Pendleton170,171, Brenda W J H Penninx59, Markus Perola52,58, Nicola Pirastu32, Mario Pirastu28, Ozren Polasek66,172, Danielle Posthuma14,173, Christine Power159, Michael A Province94, Nilesh J Samani34,35, David Schlessinger60, Reinhold Schmidt38, Thorkild I A Sørensen9,69,174, Tim D Spector50, Kari Stefansson26,68, Unnur Thorsteinsdottir26,68, A Roy Thurik1,3,175,176, Nicholas J Timpson69, Henning Tiemeier2,177,178, Joyce Y Tung30, André G Uitterlinden2,140, Veronique Vitart42, Peter Vollenweider116, David R Weir93, James F Wilson42,66, Alan F Wright42, Dalton C Conley179,180, Robert F Krueger6, George Davey Smith69, Albert Hofman2, David I Laibson4, Sarah E Medland48, Michelle N Meyer181, Jian Yang11,92, Magnus Johannesson182, Peter M Visscher11,92, Tõnu Esko7,8,52,183, Philipp D Koellinger3,14,15, David Cesarini18,184, Daniel J Benjamin5.
Abstract
Educational attainment is strongly influenced by social and other environmental factors, but genetic factors are estimated to account for at least 20% of the variation across individuals. Here we report the results of a genome-wide association study (GWAS) for educational attainment that extends our earlier discovery sample of 101,069 individuals to 293,723 individuals, and a replication study in an independent sample of 111,349 individuals from the UK Biobank. We identify 74 genome-wide significant loci associated with the number of years of schooling completed. Single-nucleotide polymorphisms associated with educational attainment are disproportionately found in genomic regions regulating gene expression in the fetal brain. Candidate genes are preferentially expressed in neural tissue, especially during the prenatal period, and enriched for biological pathways involved in neural development. Our findings demonstrate that, even for a behavioural phenotype that is mostly environmentally determined, a well-powered GWAS identifies replicable associated genetic variants that suggest biologically relevant pathways. Because educational attainment is measured in large numbers of individuals, it will continue to be useful as a proxy phenotype in efforts to characterize the genetic influences of related phenotypes, including cognition and neuropsychiatric diseases.Entities:
Mesh:
Year: 2016 PMID: 27225129 PMCID: PMC4883595 DOI: 10.1038/nature17671
Source DB: PubMed Journal: Nature ISSN: 0028-0836 Impact factor: 49.962
Figure 1Manhattan plot for EduYears associations (N = 293,723)
The x-axis is chromosomal position, and the y-axis is the significance on a –log10 scale. The black line shows the genome-wide significance level (5×10-8). The red x's are the 74 approximately independent genome-wide significant associations (“lead SNPs”). The black dots labeled with rs numbers are the 3 Rietveld et al.[1] SNPs.
Extended Data Figure 1Quantile-quantile plot of the genome-wide association meta-analysis of 64 EduYears results files
Observed and expected P-values are on a –log10 scale. The grey region depicts the 95% confidence interval under the null hypothesis of a uniform P-value distribution. The observed λGC is 1.28. (As reported in Supplementary Information section 1.5.4, the unweighted mean λGC is 1.02, the unweighted median is 1.01, and the range across cohorts is 0.95–1.15.)
Extended Data Figure 2The distribution of effect sizes of the 74 lead SNPs
a, SNPs ordered by absolute value of the standardized effect of one more copy of the education-increasing allele, with 95% confidence intervals. b, SNPs ordered by R2. Effects on EduYears are benchmarked against the top 74 genome-wide significant hits identified in the largest GWAS conducted to date of height and body mass index (BMI), and the 48 associations reported for waist-to-hip ratio adjusted for BMI (WHR). These results are based on the GIANT consortium's publicly available results for pooled analyses restricted to European-ancestry individuals: https://www.broadinstitute.org/collaboration/giant/index.php/GIANT_consortium.
Extended Data Figure 3Assessing the extent to which population stratification affects the estimates from the GWAS
a, LD Score regression plot with the summary statistics from the GWAS. Each point represents an LD Score quantile for a chromosome (the x and y coordinates of the point are the mean LD Score and the mean χ2 statistic of variants in that quantile). The facts that the intercept is close to one and that the χ2 statistics increase linearly with the LD Scores suggest that the bulk of the inflation in the χ2 statistics is due to true polygenic signal and not to population stratification. b, Estimates and 95% confidence intervals from individual-level and WF regressions of EduYears on polygenic scores, for scores constructed with sets of SNPs meeting different P-value thresholds. In addition to the analyses shown here, we conduct a sign concordance test, and we decompose the variance of the polygenic score. Overall, these analyses suggest that population stratification is unlikely to be a major concern for our 74 lead SNPs. See Supplementary Information section 3 for additional details.
Extended Data Figure 4Replication of 74 lead SNPs in the UK Biobank data
Estimated effect sizes (in years of schooling) and 95% confidence intervals of the 74 lead SNPs in the meta-analysis sample (N = 293,723) and the UK Biobank replication sample (N = 111,349). The reference allele is the allele associated with higher values of EduYears in the meta-analysis sample. SNPs are in descending order of R2 in the meta-analysis sample. Of the 74 lead SNPs, 72 have the anticipated sign in the replication sample, 52 replicate at the 0.05 significance level, and 7 replicate at the 5×10−8 significance level.
Figure 2Genetic correlations between EduYears and other traits
Results from bivariate Linkage-Disequilibrium (LD) Score regressions[9]: estimates of genetic correlation with brain volume, neuropsychiatric, behavioral, and anthropometric phenotypes using published GWAS summary statistics. The error bars show the 95% confidence intervals.
Extended Data Figure 5Q-Q plots for the 74 lead EduYears SNPs (or LD proxies) in published GWAS of other phenotypes
SNPs with concordant effects on both phenotypes are pink, and SNPs with discordant effects are blue. SNPs outside the gray area pass Bonferroni-corrected significance thresholds that correct for the total number of SNPs we tested (P < 0.05/74 = 6.8×10-4) and are labeled with their rs numbers. Observed and expected P-values are on a –log10 scale. For the sign concordance test: * P < 0.05, ** P < 0.01, and *** P < 0.001.
Extended Data Figure 6Regional association plots for four of the ten prioritized SNPs for MHBA phenotypes identified using EduYears as a proxy phenotype
a, cognitive performance; b, hippocampus; c, intracranial volume; d, neuroticism. The four were selected because very few genome-wide significant SNPs have been previously reported for these traits. Data sources and methods are described in Supplementary Information section 3. The R2 values are from the hg19 / 1000 Genomes Nov 2014 EUR references samples. The figures were created with LocusZoom (http://csg.sph.umich.edu/locuszoom/). Mb, megabases.
Extended Data Figure 7Application of fgwas to EduYears. See Supplementary Information section 4.2 for further details
a, The results of single-annotation models. “Enrichment” refers to the factor by which the prior odds of association at an LD-defined region must be multiplied if the region bears the given annotation; this factor is estimated using an empirical Bayes method applied to all SNPs in the GWAS meta-analysis regardless of statistical significance. Annotations were derived from ENCODE and a number of other data sources. Plotted are the base-2 logarithms of the enrichments and their 95% confidence intervals. Multiple instances of the same annotation correspond to independent replicates of the same experiment. b, The results of combining multiple annotations and applying model selection and cross-validation. Although the maximum-likelihood estimates are plotted, model selection was performed with penalized likelihood. c, Reweighting of GWAS loci. Each point represents an LD-defined region of the genome, and shown are the regional posterior probabilities of association (PPAs). The x-axis give the PPA calculated from the GWAS summary statistics alone, whereas the y-axis gives the PPA upon reweighting on the basis of the annotations in b. The orange points represent genomic regions where the PPA is equivalent to the standard GWAS significance threshold only upon reweighting.
Extended Data Figure 8Tissue-level biological annotation
a, The enrichment factor for a given tissue type is the ratio of variance explained by SNPs in that group to the overall fraction of SNPs in that group. To benchmark the estimates for EduYears, we compare the enrichment factors to those obtained when we use the largest GWAS conducted to date on body mass index, height, and waist-to-hip ratio adjusted for BMI. The estimates were produced with the LDSC python software, using the LD Scores and functional annotations introduced in Finucane et al. (2015) and the HapMap3 SNPs with MAF > 0.05. Each of the 10 enrichment calculations for a particular cell type is performed independently, while each controlling for the 52 functional annotation categories in the full baseline model. The error bars show the 95% confidence intervals. b, We took measurements of gene expression by the Genotype-Tissue Expression (GTEx) Consortium and determined whether the genes overlapping EduYears-associated loci are significantly overexpressed (relative to genes in random sets of loci matched by gene density) in each of 37 tissue types. These types are grouped in the panel by organ. The colored bars corresponding to tissues where there is significant overexpression. The y-axis is the significance on a –log10 scale.
Figure 3Overview of biological annotation
34 clusters of significantly enriched gene sets. Each cluster is named after one of its member gene sets. The color represents the P-value of the member set exhibiting the most statistically significant enrichment. Overlap between pairs of clusters is represented by an edge. Edge width represents the Pearson correlation ρ between the two vectors of gene membership scores (ρ < 0.3, no edge; 0.3 ≤ ρ < 0.5, thin edge; 0.5 ≤ ρ < 0.7, intermediate edge; ρ ≥ 0.7, thick edge), where each cluster's vector is the vector for the gene set after which the cluster is named.
Extended Data Figure 9Gene-level biological annotation
a, The DEPICT-prioritized genes for EduYears measured in the BrainSpan Developmental Transcriptome data (red curve) are more strongly expressed in the brain prenatally rather than postnatally. The DEPICT-prioritized genes exhibit similar gene-expression levels across different brain regions (gray lines). Analyses were based on log2-transformed RNA-Seq data. Error bars represent 95% confidence intervals. b, For each phenotype and disorder, we calculated the overlap between the phenotype's DEPICT-prioritized genes and genes believed to harbor de novo mutations causing the disorder. The bars correspond to odds ratios. EduYears, years of education; BMI, body mass index; WHR, waist-to-hip ratio adjusted for BMI. c, DEPICT-prioritized genes in EduYears-associated loci exhibit substantial overlap with genes previously reported to harbor sites where mutations increase risk of intellectual disability and autism spectrum disorder (Supplementary Table 4.6.1).
Selected candidate genes implicated by bioinformatics analyses
Fifteen candidate genes implicated most consistently across various analyses. To assemble this list, each gene in a DEPICT-defined locus (Supplementary Information section 4.5) was assigned a score equal to the number of criteria it satisfies out of ten (see Supplementary Table 4.1 for details). The DEPICT prioritization P-value was used as the tiebreaker. “SNP”: the SNP in the gene's locus with the lowest P-value in the EduYears meta-analysis. “Syndromic”: which, if any, of three neuropsychiatric disorders have been linked to de novo mutations in the gene (Supplementary Information section 4.6). “Top-ranking gene sets”: DEPICT reconstituted gene sets of which the gene is a top-20 member (Supplementary Table 4.5.1). The three most significant gene sets are shown if more than three are available. ID, intellectual disability; ASD, autism spectrum disorder; SCZ, schizophrenia.
| Gene | SNP | Syndromic | Score | Top-ranking gene sets |
|---|---|---|---|---|
| rs4500960 | ID, ASD | 6 | Developmental biology, decreased brain size, abnormal cerebral cortex morphology | |
| rs7277187 | ID, ASD | 5 | ErbB signaling pathway, abnormal sternum ossification, regulation of muscle cell differentiation | |
| rs61160187 | – | 5 | Transcription factor binding, negative regulation of signal transduction, PI3K events in ErbB4 signaling | |
| rs2457660 | ASD | 5 | Dendritic spine organization, abnormal hippocampal mossy fiber morphology, SWI/SNF-type complex | |
| rs11712056 | SCZ | 5 | Dendrite morphogenesis, dendrite development, abnormal hippocampal mossy fiber morphology | |
| rs192818565 | ID | 5 | Dendrite morphogenesis, abnormal hippocampal mossy fiber morphology, abnormal axon guidance | |
| rs7306755 | SCZ | 5 | Protein serine/threonine phosphatase complex | |
| rs12987662 | – | 5 | – | |
| rs9544418 | SCZ | 4 | Developmental biology, signaling by Robo receptor, dendritic shaft | |
| rs1871109 | ID | 4 | – | |
| rs11712056 | ASD | 4 | Developmental biology, signaling by Robo receptor, SWI-SNF-type complex | |
| rs10061788 | – | 4 | Decreased brain size, abnormal cerebral cortex morphology, abnormal hippocampal mossy fiber morphology | |
| rs9320913 | – | 4 | Dendrite morphogenesis, developmental biology, decreased brain size | |
| rs11712056 | SCZ | 4 | Decreased brain size, SWI/SNF-type complex, nBAF complex | |
| rs2964197 | – | 4 | Negative regulation of signal transduction, PI3K events in ErbB4 signaling |
Extended Figure 10The predictive power of a polygenic score (PGS) varies in Sweden by birth cohort
Five-year rolling regressions of years of education on the PGS (left axis in all four panels), share of individuals not affected by the comprehensive school reform (a, right axis), and average distance to nearest junior high school (b, right axis), nearest high school (c, right axis) and nearest college/university (d, right axis). The shaded area displays the 95% confidence intervals for the PGS effect.