| Literature DB >> 34493871 |
Josine L Min1,2, Gibran Hemani3,4, Eilis Hannon5, Koen F Dekkers6, Juan Castillo-Fernandez7, René Luijk6, Elena Carnero-Montoro7,8, Daniel J Lawson3,4, Kimberley Burrows3,4, Matthew Suderman3,4, Andrew D Bretherick9, Tom G Richardson3,4, Johanna Klughammer10, Valentina Iotchkova11, Gemma Sharp3,4, Ahmad Al Khleifat12, Aleksey Shatunov12, Alfredo Iacoangeli12,13, Wendy L McArdle4, Karen M Ho4, Ashish Kumar14,15,16, Cilla Söderhäll17, Carolina Soriano-Tárraga18, Eva Giralt-Steinhauer18, Nabila Kazmi3,4, Dan Mason19, Allan F McRae20, David L Corcoran21, Karen Sugden21,22, Silva Kasela23, Alexia Cardona24,25, Felix R Day24, Giovanni Cugliari26,27, Clara Viberti26,27, Simonetta Guarrera26,27, Michael Lerro28, Richa Gupta29,30, Sailalitha Bollepalli29,30, Pooja Mandaviya31, Yanni Zeng9,32,33, Toni-Kim Clarke34, Rosie M Walker35,36, Vanessa Schmoll37, Darina Czamara37, Carlos Ruiz-Arenas38,39,40, Faisal I Rezwan41, Riccardo E Marioni35,36, Tian Lin20, Yvonne Awaloff37, Marine Germain42, Dylan Aïssi43, Ramona Zwamborn44, Kristel van Eijk44, Annelot Dekker44, Jenny van Dongen45, Jouke-Jan Hottenga45, Gonneke Willemsen45, Cheng-Jian Xu46,47, Guillermo Barturen8, Francesc Català-Moll48, Martin Kerick49, Carol Wang50, Phillip Melton51,52,53, Hannah R Elliott3,4, Jean Shin54, Manon Bernard54, Idil Yet7,55, Melissa Smart56, Tyler Gorrie-Stone57, Chris Shaw12,58, Ammar Al Chalabi12,58,59, Susan M Ring3,4, Göran Pershagen14, Erik Melén14,60, Jordi Jiménez-Conde18, Jaume Roquer18, Deborah A Lawlor3,4, John Wright19, Nicholas G Martin61, Grant W Montgomery20, Terrie E Moffitt21,22,62,63, Richie Poulton64, Tõnu Esko23,65, Lili Milani23, Andres Metspalu23, John R B Perry24, Ken K Ong24, Nicholas J Wareham24, Giuseppe Matullo26,27, Carlotta Sacerdote27,66, Salvatore Panico67, Avshalom Caspi21,22,62,63, Louise Arseneault63, France Gagnon28, Miina Ollikainen29,30, Jaakko Kaprio29,30, Janine F Felix68,69, Fernando Rivadeneira31, Henning Tiemeier70,71, Marinus H van IJzendoorn72,73, André G Uitterlinden31, Vincent W V Jaddoe68,69, Chris Haley9, Andrew M McIntosh34,36, Kathryn L Evans35,36, Alison Murray74, Katri Räikkönen75, Jari Lahti75, Ellen A Nohr76,77, Thorkild I A Sørensen3,4,78,79, Torben Hansen78, Camilla S Morgen78,80, Elisabeth B Binder37,81, Susanne Lucae37, Juan Ramon Gonzalez38,39,40, Mariona Bustamante38,39,40,82, Jordi Sunyer38,39,40,83, John W Holloway84,85, Wilfried Karmaus86, Hongmei Zhang86, Ian J Deary36, Naomi R Wray20,87, John M Starr36,88, Marian Beekman6, Diana van Heemst89, P Eline Slagboom6, Pierre-Emmanuel Morange90, David-Alexandre Trégouët42, Jan H Veldink44, Gareth E Davies91, Eco J C de Geus45, Dorret I Boomsma45, Judith M Vonk92, Bert Brunekreef93,94, Gerard H Koppelman46, Marta E Alarcón-Riquelme8,14, Rae-Chi Huang95, Craig E Pennell50, Joyce van Meurs31, M Arfan Ikram96, Alun D Hughes97, Therese Tillin97, Nish Chaturvedi97, Zdenka Pausova54, Tomas Paus98, Timothy D Spector7, Meena Kumari56, Leonard C Schalkwyk57, Peter M Visscher20,87, George Davey Smith3,4, Christoph Bock10,99, Tom R Gaunt3,4, Jordana T Bell7, Bastiaan T Heijmans6, Jonathan Mill5, Caroline L Relton3,4.
Abstract
Characterizing genetic influences on DNA methylation (DNAm) provides an opportunity to understand mechanisms underpinning gene regulation and disease. In the present study, we describe results of DNAm quantitative trait locus (mQTL) analyses on 32,851 participants, identifying genetic variants associated with DNAm at 420,509 DNAm sites in blood. We present a database of >270,000 independent mQTLs, of which 8.5% comprise long-range (trans) associations. Identified mQTL associations explain 15-17% of the additive genetic variance of DNAm. We show that the genetic architecture of DNAm levels is highly polygenic. Using shared genetic control between distal DNAm sites, we constructed networks, identifying 405 discrete genomic communities enriched for genomic annotations and complex traits. Shared genetic variants are associated with both DNAm levels and complex diseases, but only in a minority of cases do these associations reflect causal relationships from DNAm to trait or vice versa, indicating a more complex genotype-phenotype map than previously anticipated.Entities:
Mesh:
Substances:
Year: 2021 PMID: 34493871 PMCID: PMC7612069 DOI: 10.1038/s41588-021-00923-x
Source DB: PubMed Journal: Nat Genet ISSN: 1061-4036 Impact factor: 41.307
Figure 1Discovery and replication of mQTLs
a. Study Design. In the first phase, 22 cohorts performed a complete mQTL analysis of up to 480,000 sites against up to 12 million variants; retaining their results for p<1e-5. In the second phase, 120 million SNP-DNAm site pairs selected from the first phase, and GWA catalog SNPs against 345k DNAm sites, were tested in 36 studies (including 20 phase 1 studies) and meta-analyzed. QC, quality control. b. Distributions of the weighted mean of DNAm across 36 cohorts for cis only, cis+trans and trans only sites. The weighted mean DNAm level across 36 studies was defined as low (<20%), intermediate (20%-80%) or high (>80%). Plots are colored with respect to the genomic annotation. Cis only sites showed a bimodal distribution of DNAm. Cis+trans sites showed intermediate levels of DNAm. Trans only sites showed low levels of DNAm. c. Discovery and replication effect size estimates between GoDMC (n=27,750) and Generation Scotland (n=5,101) for 169,656 mQTL associations. The regression coefficient is 1.13 (se=0.0007). d. Relationship between DNAm site heritability estimates and DNAm variance explained in Generation Scotland. The center line of a boxplot corresponds to the median value. The lower and upper box limits indicate the first and third quartiles (the 25th and 75th percentiles). The length of the whiskers corresponds to values up to 1.5 times the IQR in either direction. The regression coefficient for the twin family study was 3.16 (se=0.008) and for the twin study 2.91 (se=0.008) across 403,353 DNAm sites. The variance explained for DNAm sites with missing r2 (n=277,428) and/or h2=0 (Twin family: n=80,726 Twins: n=34,537) were set to 0. GS, Generation Scotland.
Extended Data Fig. 1Quality control of 36 studies.
We used 337 independent SNPs on chromosome 20 with a p-value<1e-14. The number of SNPs used for each study are indicated in the bottom plot. a. Mstatistic for each of the 36 cohorts. b. Boxplot of mQTL effect sizes for each of the 36 studies. The center line of a boxplot corresponds to the median value. The lower and upper box limits indicate the first and third quartiles (the 25th and 75th percentiles). The length of the whiskers corresponds to values up to 1.5 times the IQR in either direction.
Extended Data Fig. 2Distance of SNP from DNAm site.
a. Density plot of the distance of SNP from DNAm site against the -log10 p-value of 4,533 intra-chromosomal trans-mQTL associations (>1Mb). b. Density plot of the distance of SNP from DNAm site against the -log10 p-value of 248,607 cis-mQTL associations (<1Mb).
Extended Data Fig. 3Effect sizes and weighted standard deviation (SD) for each mQTL category.
a. For each DNAm site, the strongest absolute effect size (the maximum absolute additive change in DNAm level measured in SD per allele) was selected. The kernel density estimations of the effect sizes were shown for all sites with a mQTL (n=190,102), sites with cis only effects (n=170,986), cis effects for sites with cis and trans effects (n=11,902), trans effects for sites with cis and trans effects (n=11,902) and sites with trans only effects (n=7,214). Comparing the strongest effect size for each site in a two-sided linear regression model showed that cis+trans sites had larger cis effect sizes (per allele SD change = 0.05 (s.e.= 0.002), p<2e-16) as compared to cis only sites and weaker trans effect sizes (per allele SD change = -0.06 (s.e.= 0.002), p<2e-16) as compared to trans only sites. To detect these small trans effect sizes at sites with both a cis and a trans association, it is crucial to regress out the cis effect to decrease the residual variance and improve power to detect a trans effect. b. The violin plots represent kernel density estimates of the weighted SD across 36 cohorts for each DNAm site. The center line of the boxplot in the violin plots corresponds to the median value. The lower and upper box limits indicate the first and third quartiles (the 25th and 75th percentiles). The length of the whiskers corresponds to values up to 1.5 times the IQR in either direction.
Extended Data Fig. 4Impact of the two-stage design on mQTL coverage.
a. Loss in power in two-stage design. We calculated the power of detecting a cis association in at least one of the 22 studies at p<1e-5 or a trans association in at least two of 22 studies at p<1e-5. b. Expected number of mQTLs. Using the number of mQTLs with a particular r2 value, and the power of detecting mQTLs with that r2 value, we calculated how many mQTLs would expect to exist with that value.
Figure 2Cis- and trans-mQTLs operate through distinct mechanisms
a. Distributions of enrichments for chromatin states and gene annotations among mQTL sites and SNPs. Enrichment analyses were performed using 25 combinatorial chromatin states from 127 cell types (including 27 blood cell types) and gene annotations. The heatmap represents the distribution of ORs for cis only, trans only, or cis+trans sites and SNPs. For the enrichment of chromatin states, ORs were averaged across cell types. The following chromatin states were analyzed: TssA, Active TSS; PromU, Promoter Upstream TSS; PromD1, Promoter Downstream TSS with DNase; PromD2, Promoter Downstream TSS; Tx5', Transcription 5'; Tx, Transcription; Tx3', Transcription 3'; TxWk, Weak transcription; TxReg, Transcription Regulatory; TxEnh5', Transcription 5' Enhancer; TxEnh3', Transcription 3' Enhancer; TxEnhW, Transcription Weak Enhancer; EnhA1, Active Enhancer 1; EnhA2, Active Enhancer 2; EnhAF, Active Enhancer Flank; EnhW1, Weak Enhancer 1; EnhW2, Weak Enhancer 2; EnhAc, Enhancer Acetylation Only; DNase, DNase only; ZNF/Rpts, ZNF genes & repeats; Het, Heterochromatin; PromP, Poised Promoter; PromBiv, Bivalent Promoter; ReprPC, Repressed PolyComb, Quies Quiescent/Low. The significance was categorized as: *=FDR<0.001;**=FDR<1e-10;***=FDR<1e-50 b. Distributions of enrichment for occupancy of TFBSs among mQTL sites and SNPs. Each density curve represents the distribution of ORs for cis only, trans only, or cis+trans sites (left) and SNPs (right). c. Distributions of enrichment of mQTLs among 41 complex traits and diseases. Each density curve represents the distribution of ORs for cis only, trans only, or cis+trans SNPs.
Extended Data Fig. 5Correlation of mQTL effects (p<1e-14) between blood and other tissues.
For each mQTL category, the correlation of genetic effects between tissues (r) was estimated using the r method[25] where we used the blood mQTLs as reference. DNAm levels are categorized as low (<0.2), intermediate (0.2-0.8) or high (>0.8).
Extended Data Fig. 6Two-dimensional enrichment of SNP and DNAm site TFBS annotation.
a. To test if the annotations of the SNPs involved in trans-mQTLs were specific to the annotations of the DNAm sites that they influence, we compared the real SNP-DNAm site pairs against permuted SNP-DNAm site pairs, where the biological link between SNP and site is severed whilst maintaining the distribution of annotations for the SNPs and sites. We constructed 100 such permuted datasets. b. SNP and site positions were annotated against genomic features, and we quantified how frequently mQTLs were found for each pair of SNP-DNAm site annotations. This enabled the construction of two-dimensional annotation matrices for both the real trans-mQTL list and the permuted trans-mQTL lists. c. Distribution of two-dimensional enrichment values of trans-mQTLs. There was substantial departure from the null in the real dataset for all tissues indicating that the TFBS of a site depended on the TFBS of the SNP that influenced it. d. A bipartite graph of the two-dimensional enrichment for trans-mQTLs, SNPs annotations (blue) with pemp< 0.01 after multiple testing correction co-occur with particular site annotations (red).
Figure 3Communities constructed from trans-mQTLs.
a. A network depicting all communities in which there were twenty or more sites. Random walks were used to generate communities (colors), so occasionally a DNA site connects different communities. b. The relationship between genomic annotations, mQTLs and communities. Communities 9 and 22 comprised DNAm sites that are related through shared genetic factors. The sankey plots show the genomic annotations for the genetic variants (left) and for the DNAm sites (right). The DNAm sites comprising these communities are enriched for TFBSs related to the cohesin complex and NFkB, respectively. c. Enrichment of GWA traits among community SNPs. The genomic loci for each of the 56 largest communities were tested for enrichment of low p-values in 133 complex trait GWASs (y-axis) against a null background of community SNPs. The x-axis depicts the two-sided -log10 p-value for enrichment, with the 5% FDR shown by the vertical dotted line. Colors represent log odds ratios. Enrichments were particularly strong for blood-related phenotypes (including circulating metal levels).
Figure 4Identifying putative causal relationships between sites and traits using bi-directional MR.
Aggregated results from a systematic bi-directional MR analysis between DNAm sites and 116 complex traits. The y-axis represents the two-sided p-value from MR analysis. The top plot depicts results from tests of DNAm sites colocalizing with complex traits. The light grey points represent MR estimates that either did not surpass multiple testing, or shared small p-values at both the DNAm site and complex trait but had weak evidence of colocalization. Bold, colored points are those that showed strong evidence for colocalization (Posterior probability>0.8 for H4 - one shared SNP for DNAm and trait.). The bottom plot shows the two-sided -log10 p-values from MR analysis of risk factor or genetic liability of disease on DNAm levels. Extensive follow up was performed on DNAm site-trait pairs with putative associations, and those that pass filters are plotted in bold and colored according to the trait category. A substantial number of MR results in both directions exhibited very strong effects but failed to withstand sensitivity analyses.
Extended Data Fig. 7Correspondence of MR estimates amongst multiple independent instruments.
a. To evaluate if a site having a shared causal variant with a trait was potentially due to the site being on the causal pathway to the trait, we reasoned that independent instruments for the site should exhibit consistent effects on the outcome consistent with the original colocalizing variant. b. Amongst the putative colocalizing signals, 440 involved a DNAm site that had at least one other independent mQTL. The plot shows the causal effect estimate estimated from the original colocalizing signal against the causal effect estimates obtained from the independent variants (n=440). Grey regions represent the 95% confidence of the slope. c. Correspondence of MR estimates amongst multiple independent instruments on 36 blood traits. To evaluate if a site having a shared causal variant with a blood trait was potentially due to the site being on the causal pathway to the trait, we reasoned that independent instruments for the site should exhibit consistent effects on the outcome consistent with the original colocalizing variant. Amongst the putative colocalizing signals, 30% involved a DNAm site that had at least one other independent mQTL. The plot shows the causal effect estimate estimated from the original colocalizing signal against the causal effect estimates obtained from the independent variants. The HLA region has been removed and betas are plotted.
Extended Data Fig. 8Genomic inflation factors for genome-wide scans of causal effects of traits on DNAm sites.
Each trait (x-axis) was tested for causal effects against (on average) 317,659 DNAm sites, excluding sites in the MHC region. The p-values from IVW MR analysis were used to estimate the genomic inflation for each trait (y-axis). Traits are ordered by genomic inflation factor.