Literature DB >> 30918249

Parent of origin genetic effects on methylation in humans are common and influence complex trait variation.

Yanni Zeng1, Carmen Amador1, Charley Xia1,2, Riccardo Marioni3,4, Duncan Sproul1,5, Rosie M Walker3,4, Stewart W Morris4, Andrew Bretherick1, Oriol Canela-Xandri1,2, Thibaud S Boutin1, David W Clark6, Archie Campbell4, Konrad Rawlik2, Caroline Hayward1, Reka Nagy1, Albert Tenesa1,2, David J Porteous3,4, James F Wilson1,6, Ian J Deary3,7, Kathryn L Evans3,4, Andrew M McIntosh3,8, Pau Navarro1, Chris S Haley9,10.   

Abstract

Parent-of-origin effects (POE) exist when there is differential expression of alleles inherited from the two parents. A genome-wide scan for POE on DNA methylation at 639,238 CpGs in 5,101 individuals identifies 733 independent methylation CpGs potentially influenced by POE at a false discovery rate ≤ 0.05 of which 331 had not previously been identified. Cis and trans methylation quantitative trait loci (mQTL) regulate methylation variation through POE at 54% (399/733) of the identified POE-influenced CpGs. The combined results provide strong evidence for previously unidentified POE-influenced CpGs at 171 independent loci. Methylation variation at 14 of the POE-influenced CpGs is associated with multiple metabolic traits. A phenome-wide association analysis using the POE mQTL SNPs identifies a previously unidentified imprinted locus associated with waist circumference. These results provide a high resolution population-level map for POE on DNA methylation sites, their local and distant regulators and potential consequences for complex traits.

Entities:  

Mesh:

Year:  2019        PMID: 30918249      PMCID: PMC6437195          DOI: 10.1038/s41467-019-09301-y

Source DB:  PubMed          Journal:  Nat Commun        ISSN: 2041-1723            Impact factor:   14.919


Introduction

DNA methylation plays a crucial role in regulating gene expression and maintaining genomic stability[1]. Inter-individual variation of DNA methylation levels at CpG sites (henceforth methylation CpGs) has been associated with complex diseases, quantitative traits, environmental exposures and the aging process[2-6]. Previous studies have estimated that on average across sites, 19% of variation in DNA methylation level is contributed by the additive genetic effects[7]. A number of genetic variants have been shown to regulate methylation CpGs (such variants being termed methylation quantitative trait loci (mQTL)) in an additive manner, acting locally (cis) or distantly (trans)[8,9]. In contrast, shared family environment has been shown to have a relatively smaller overall contribution (3% on average across sites) to variation in DNA methylation[7,10]. DNA methylation can also be affected by parent-of-origin effects (POEs), which are non-additive genetic effects that manifest as phenotypic differences depending on the allelic parent-of-origin[11]. There are several possible causes of observed POEs, but the most common is genomic imprinting[11]. These POEs caused by imprinting can lead to different phenotypic patterns, including classical paternal or maternal imprinting, and other complex forms (Supplementary Fig. 1)[11]. (We use the standard definition of maternal imprinting, where the maternal allele is silenced and the paternal expressed and vice versa for paternal imprinting.) Although previous studies estimated the number of imprinted expressed genes in the human genome at around 100[12], POEs caused by imprinting can spread to wider genomic regions through regulatory variants located in imprinted regions transmitting the POEs to their genomic targets (Supplementary Fig. 1)[13,14]. POEs caused by imprinting have been detected at the DNA methylation, gene expression and phenotypic levels[13-15]. The precise regulation of expression of genes influenced by imprinting is crucial for embryonic development, metabolism and behavioural traits, and the effect can last into later life[16-18]. Given the regulatory role of DNA methylation on gene expression, the identification of POEs on DNA methylation is of particular importance to facilitate understanding of the molecular mechanism of the POEs observed at the gene expression or phenotypic level[16]. Recent progress has been made towards profiling genome-wide imprinted regions and POE caused by imprinting in DNA methylation[13,19-21]. Imprinted regions can be identified by bisulfite sequencing detecting CpGs that display imbalanced or bimodal methylation between paired chromosomes, which could be caused by imprinting[19,21]. POEs in these regions can be explicitly modelled in association tests between SNPs and methylation levels for each CpG, assuming allelic effects differ between maternally and paternally inherited alleles[13,14]. Whereas this approach enables both the identification of imprinting-associated POE in methylation levels and the localisation of the SNPs associated with that POE, the limitation lies in the huge multiple testing burden introduced by the number of SNP–CpG pairwise tests for a genome-wide scan. Furthermore, the localisation of POEs from individual SNP does not ensure the elucidation of the overall genetic architecture underlying methylation levels at each CpG, particularly when multiple POEs from multiple independent SNPs target the same CpG. Therefore, further methodological advances are required in order to improve understanding of the role of POEs in the genetic control of methylation levels and hence potentially better explain the influence of POEs on phenotypic variation. Imprinting-caused POEs on DNA methylation may have downstream effects on complex traits. Others have shown that applying models which account for POEs in genome-wide association studies (GWAS) can identify genetic variants that underlie POEs on multiple complex traits and diseases[18,20,22], and that SNPs which play a regulatory role in DNA methylation through POEs also have significant associations in GWAS performed using an additive model[13]. Combining these with previous observations that disease-associated loci are enriched in regulatory regions[23], analyses that link POE regulation to DNA methylation and POE regulation to complex traits potentially provide important insights for the understanding of both non-additive genetic and epigenetic control mechanisms for complex traits and diseases. We propose here a variance component method to detect signatures of POEs caused by imprinting in the human DNA methylome, by identifying methylation CpGs showing an unusually increased full-sibling and/or one-parent–offspring similarity. Using this method, we perform a genome-wide scan for POE on each of 639,238 methylation CpGs in a homogenous Scottish sample (N = 5101) with complex pedigree structure[24], in which both previously unknown and known POE-influenced CpG sites are identified. We then perform a POE–mQTL analysis to identify local and distant regulatory genetic variants of methylation at the POE-influenced CpG sites identified in the variance component analyses. This is followed by an analysis to identify complex traits associated with the detected POE-influenced methylation CpGs. We also use identified POE–mQTL SNPs to guide a phenome-wide association analysis, through which we identify one locus affecting waist circumference through a previously unidentified POE, demonstrating that the use of methylation data and the proposed set of analyses contribute to increase our understanding of the non-additive genetic control of complex traits.

Results

Overview of the study design

Table 1 shows a summary of the study design. An established five-component variance component analysis accounting for genetic and environmental variation was first used to partition DNA methylation variation for each measured CpG. Following this, a two-stage pipeline was applied to identify potential POE-influenced methylation CpGs among all measured CpGs. The first stage applied a POE variance component method that targeted the localisation of POE-influenced methylation CpGs. The second stage applied a POE–mQTL (parent-of-origin effect mQTL) analysis that accounted for POE to localise the SNPs that introduce the POE on the identified CpG candidates from the first stage. This was followed by two additional analyses to profile the phenotypic consequence of the POE-influenced methylation CpGs and their POE–mQTL SNPs on complex traits.
Table 1

Study design

ANALYSISAIMMODEL N tests N sigresults
GKFSC VCUnderstand sources of variation of methylation at CpG sitesCpG ~ G + K + F + S + C639,238 (CpGs)G: 24,101K: 1531F: 0S: 78C: 0
POE-targeted VCFind POE-influenced CpGsBase: CpG ~ G + KComplex: CpG ~ G + K + SMaternal: CpG ~ G + K + SMPaternal: CpG ~ G + K + SP639,238 (CpGs)Complex: 606Maternal: 220Paternal: 158Total: 984
POE–mQTL(a) Find POE-influenced CpGs(b) Find SNPs associated with POE-influenced CpGsCpG ~ SNPADD + SNPDOM + SNPPOE7e9 (984 CpGs*7e6 SNP)CpGs: 586POE–mQTLs:cis: 1814trans: 103
POE–EWASPhenotypic consequence of POE-influenced CpGsTrait~CpG26,568 (984 CpGs*27 independent traits)CpGs: 14Traits: 10
POE–PheWASPhenotypic consequence of POE–mQTL SNPsTrait~SNPADD + SNPDOM + SNPPOE51,165 (1895 independent mQTLs*27 independent traits)Traits: 1SNPs: 1

The table shows an overview of the analyses performed (ANALYSIS), describing their aims (AIM) and the models used (MODEL), as well as the number of tests performed (Ntests) and the number of significant results obtained (N)

GKFSC VC variance component analyses to partition methylation level variation into its additive genetic (G: SNP associated, K: pedigree associated) and non-additive/environmental (F: family, S: sibling, C: couple) components, SNP single-nucleotide polymorphism

POE-targeted VC modified variance component analysis detects candidate methylation sites with parent-of-origin inheritance pattern (parent-of-origin effect, POE). Base: model without POE; complex: model including a complex POE component allowing for increased similarity between siblings; maternal: model including a POE component (SM) allowing for increased similarity between father and offspring and siblings; paternal: model including a POE component (Sp) allowing for increased similarity between mother and offspring and siblings;

POE–mQTL parent-of-origin effect methylation quantitative trait loci analyses, ADD Additive effect, DOM dominance effect,

POE–EWAS, complex trait association with methylation levels of POE CpGs,

POE–PheWAS, phenotype-wide association study accounting for parent-of-origin effects for parent-of-origin methylation level associated loci (POE–mQTL)

Study design The table shows an overview of the analyses performed (ANALYSIS), describing their aims (AIM) and the models used (MODEL), as well as the number of tests performed (Ntests) and the number of significant results obtained (N) GKFSC VC variance component analyses to partition methylation level variation into its additive genetic (G: SNP associated, K: pedigree associated) and non-additive/environmental (F: family, S: sibling, C: couple) components, SNP single-nucleotide polymorphism POE-targeted VC modified variance component analysis detects candidate methylation sites with parent-of-origin inheritance pattern (parent-of-origin effect, POE). Base: model without POE; complex: model including a complex POE component allowing for increased similarity between siblings; maternal: model including a POE component (SM) allowing for increased similarity between father and offspring and siblings; paternal: model including a POE component (Sp) allowing for increased similarity between mother and offspring and siblings; POE–mQTL parent-of-origin effect methylation quantitative trait loci analyses, ADD Additive effect, DOM dominance effect, POE–EWAS, complex trait association with methylation levels of POE CpGs, POE–PheWAS, phenotype-wide association study accounting for parent-of-origin effects for parent-of-origin methylation level associated loci (POE–mQTL)

Genetic and environmental contributions to DNA methylation

Using the GKFSC variance component model, we decomposed methylation variation at 639,238 CpGs into contributions from two genetic and three family environmental effects, including the additive genetic effect of common SNPs (), an additional additive genetic effect associated with pedigree (), shared environmental effects between nuclear family members (), shared environmental effects between full siblings () and shared environmental effects between members of couples (). The additive genetic effect () was the largest contributor (Table 2), explaining an average of 16.7% of the variation in DNA methylation (this average includes sites for which the additive genetic effect does not explain any variation), with an estimate of 9.5% and 7.2% of the DNA methylation variation explained by the common SNP-associated and the pedigree-associated additive genetic components, respectively. The contribution from common SNPs varied across genomic regions, with an increased contribution for CpGs within enhancer regions ( for CpGs in enhancers vs. for CpGs outside them), and a decreased contribution for CpGs surrounding TSSs (Supplementary Fig. 2). Shared environmental effects contribute an average of 1.2–2.1% of the variation in DNA methylation (Table 2), but the contributions also vary across genomic regions (Supplementary Fig. 3).
Table 2

DNA methylation variation decomposed into genetic and environmental components

SourceMean PVMaximum PVFirst quartile PVThird quartile PVNominal Sig. sitesGenome-wide Sig. sites
G 9.5%99.20%0.67%12.98%162,80024,101
K 7.2%97.10%0.00%11.05%59,1171531
F 1.2%19.10%0.00%1.84%19460
S 1.4%46.30%0.00%2.24%23,60078
C 2.1%33.50%0.00%3.18%14,5140

Proportion of variation in methylation levels at the 639,238 studied CpG sites explained (PV) by (common SNP-associated additive genetic component), (pedigree-associated additive genetic component), (shared environmental effects between nuclear family members), (non-additive genetic or shared environmental effects between full siblings) and (shared environmental effects between members of a couple). The number of CpG sites that were significant in the component of interest, both at nominal and genome-wide level (Sig. sites at nominal and genome-wide levels) is also shown for each of the five components fitted

DNA methylation variation decomposed into genetic and environmental components Proportion of variation in methylation levels at the 639,238 studied CpG sites explained (PV) by (common SNP-associated additive genetic component), (pedigree-associated additive genetic component), (shared environmental effects between nuclear family members), (non-additive genetic or shared environmental effects between full siblings) and (shared environmental effects between members of a couple). The number of CpG sites that were significant in the component of interest, both at nominal and genome-wide level (Sig. sites at nominal and genome-wide levels) is also shown for each of the five components fitted The number of CpGs with a statistically significant proportion of methylation variance explained by G, K, F, S or C, in the GKFSC model was 24101, 1531, 0, 78 and 0, respectively (Table 2). CpGs that showed genome-wide significance for the full-sibling component (N=78) were in regions highly enriched in published genomic imprinting regions (with 58 of the 78 CpGs being located within 2 kb of known imprinted regions, P(Fisher exact test) = 1.3 × 10−80), suggesting that (1) POEs caused by imprinting are likely to contribute to the variation of a subset of CpGs, (2) besides any shared environmental effect, the full-sibling associated component () also captures POE caused by imprinting (for a more detailed discussion see the Methods section) and (3) variance component analysis that accounts for the increased similarity between full siblings can be applied to identify CpGs potentially influenced by POEs caused by imprinting (see below).

POE-targeted variance component analysis

We developed a model selection-based approach to perform a genome-wide scan to identify methylation CpGs potentially influenced by POEs caused by imprinting targeting three main patterns of imprinting: paternal, maternal and complex (Fig. 1)[11]. Since each pattern reflects different phenotypic similarities between nuclear family members, for each CpG we tested three alternative models (complex, paternal and maternal), and performed model selection to select the best model for each CpG, that is, the model that better describes the observed phenotypic similarity (Aim 2 in Table 1).
Fig. 1

The expected phenotypic covariance structures between nuclear family members introduced by different POE patterns. The bar charts show putative levels of methylation associated with the four possible genotypes at a SNP controlling imprinting (paternal allele in blue, maternal allele in red). The family pedigrees show as shaded the family members between which similarity in methylation is increased due to these patterns of imprinting

The expected phenotypic covariance structures between nuclear family members introduced by different POE patterns. The bar charts show putative levels of methylation associated with the four possible genotypes at a SNP controlling imprinting (paternal allele in blue, maternal allele in red). The family pedigrees show as shaded the family members between which similarity in methylation is increased due to these patterns of imprinting This genome-wide scan identified 984 methylation CpGs that exceeded genome-wide significance for POEs at a FDR ≤ 0.05 level (Supplementary Fig. 4). Of these 984 POE-influenced CpG candidates, the selected model was complex imprinting for 606, paternal imprinting for 158 and maternal imprinting for 220 CpGs (Supplementary Data 1). An example of the genome-wide scan results for one of the previously unidentified maternal imprinting sites is shown in Fig. 2. The 984 CpGs included some in well-known imprinted genes, such as IGF2 and PEG3 (Supplementary Fig. 5, Supplementary Data 1), but more generally were located in genomic regions highly enriched in known imprinted regions, particularly when extending those known regions by 2 kb (OR = 15.3 (13.1–17.7), P(Fisher exact test) = 5.3 × 10−171, Supplementary Table 1). When overlapping these 984 CpGs with regions of different chromatin states and sub-genic structures, these CpGs were in regions enriched in noncoding RNA (P(Fisher exact test) = 1.05 × 10−5), Polycomb repressed regions (P(Fisher exact test) = 1.59 × 10−8), weak enhancers (P(Fisher exact test) = 6.57 × 10−11) and were depleted in active promoter regions (P(Fisher exact test) = 2.11 × 10−9) (Supplementary Data 2, Fig. 3, Supplementary Fig. 6). Compared with published epigenome-wide association studies (EWASs) (Supplementary Table 2), the 984 CpGs were also enriched in genic regions of genes containing methylation sites associated with body mass index (BMI) (P(Fisher exact test) = 4.85 × 10−9)[3] and alcohol consumption (P(Fisher exact test) = 2.67 × 10−6)[25] (Supplementary Data 2, Fig. 3). CpG sites were assigned to the nearest gene if they were located between 5 kb 5ʹ and 1 kb 3ʹ of the gene boundary. A gene set (pathway) analysis shows that the annotated genes were enriched in the Type I diabetes mellitus pathway (P(EASE test) = 4.92 × 10−5) (Supplementary Data 3).
Fig. 2

Example of a novel CpG site (cg05875302) influenced by maternal imprinting POE. Upper panel: bars represent estimated variance explained by each component in the selected model for the site displaying significant maternal imprinting (cg05875302, red arrow) and the sites within 20 kb on either side of the selected site. Bottom left panel: regional plot of –loge (p-value from LRT) of the POE in the selected (red ringed black dot) and surrounding CpG sites with matrix of pairwise correlations of methylation level between these sites in the heatmap below. Bottom right panel: pairwise correlation between methylation M values (corrected for technical and biological covariates) between different pairs of nuclear family members

Fig. 3

Genomic annotations significantly enriched in (red) or depleted of (blue) POE-influenced methylation CpGs. Error bars: 95% confidence interval

Example of a novel CpG site (cg05875302) influenced by maternal imprinting POE. Upper panel: bars represent estimated variance explained by each component in the selected model for the site displaying significant maternal imprinting (cg05875302, red arrow) and the sites within 20 kb on either side of the selected site. Bottom left panel: regional plot of –loge (p-value from LRT) of the POE in the selected (red ringed black dot) and surrounding CpG sites with matrix of pairwise correlations of methylation level between these sites in the heatmap below. Bottom right panel: pairwise correlation between methylation M values (corrected for technical and biological covariates) between different pairs of nuclear family members Genomic annotations significantly enriched in (red) or depleted of (blue) POE-influenced methylation CpGs. Error bars: 95% confidence interval Clumping (see the Methods section) the 984 CpGs based on their methylation correlations resulted in 733 independent sites of which 331 were previously unidentified (Supplementary Data 1), as they are located more than 2 Mb away from previously reported imprinting-influenced regions[13].

POE–mQTL analysis

Imprinted genetic variants potentially underlie the observed POEs affecting methylation levels at the 984 candidate CpGs identified by the variance components analyses (Supplementary Fig. 1)[11]. To identify variants causing POEs on methylation CpGs, a POE–mQTL analysis was performed for each of the 984 CpGs (Table 1, Aim 3). We used genome-wide imputed common SNPs, and assigned alleles to a paternal or maternal origin for individuals with pedigree information. This information was used to model an additive, a dominant and a POE effect, and these were fitted as explanatory variables for methylation levels at each CpG site (see the Methods section). This revealed that among the 984 CpGs (733 independent loci), 60% (586/984) of CpGs and 54% (399/733) of independent loci have at least one cis- or trans-POE–mQTL identified (Table 3, Supplementary Data 1, Supplementary Fig. 4); 58% (569/984) of CpGs have at least one cis-POE–mQTL (Supplementary Data 4), and 6.8% (67/984) have at least one trans-POE–mQTLs (Supplementary Data 5). For these 586 CpG sites, the identification of POE–mQTL SNPs provides strong evidence for the POE caused by imprinting (Table 3). A total of 1814 independent cis-POE–mQTLs were identified, 1% (18/1814) were also trans-POE–mQTLs, and 22% (409/1814) and 11% (202/1814) were previously identified as eQTLs and mQTLs, respectively, using additive genetic models[8,9,26,27]. Both cis- and trans-POE–mQTL SNPs were in regions highly enriched for known imprinted regions (cis: P(Fisher exact test) = 0, OR = 7.8 (7.6–8.1); trans: P(Fisher exact test) = 2.0 × 10−78, OR = 10.2 (8.4–12.2)), and non-genetic regulated imbalanced methylation regions as reported in ref. [19] (cis: P(Fisher exact test) = 0, OR = 3.2(3.1–3.4); trans: P(Fisher exact test) = 1.1 × 10−61, OR = 6.5 (5.4–7.8)). They were enriched but to a lesser extent in previously defined[28,29] imprinting control regions (ICR) (cis: P(Fisher exact test) = 9.8e-293, OR = 2.4 (2.5–2.6); trans: P(Fisher exact test) = 0.23, OR = 1.3 (0.8–1.8)). For each independent CpG site with at least one POE–mQTL (Ntotal = 586, Nindependent = 399), a median of six independent cis-POE–mQTLs and one independent trans-POE–mQTLs were identified. For CpGs showing POE with a maternal or paternal pattern as identified in the variance component analysis, we could infer the parental origin of the effective and silenced alleles in the POE–mQTL SNPs, based on the relative signs of the additive effect and the POE in the POE–mQTL model. This comparison could only be made when minor homozygotes at the SNP were present in the sample (so that the additive effect can be distinguished from the dominance effect) and the additive effect was significant (P( ≤ 0.001). The results showed a consistency of over 99% in the inference of the nature of parental effect (paternal or maternal) inferred from POE–mQTL analysis with that inferred from the variance component analysis.
Table 3

Classification of the identified 984 CpGs potentially influenced by POEs

Strength of EvidencePOE–mQTLOther studiesNCpGVC POE modelNCpG per POE model
Strong Replication (<2 kb)223Maternal/paternalComplex104119
Strong Overlap (2 kb−2 Mb)172Maternal/paternalComplex7993
StrongNot identified (>2 Mb)191Maternal/paternalComplex70121
Total strong586586
Moderate × Replication (<2 kb)11Maternal/paternalComplex29
Moderate × Overlap (2 kb–2 Mb)190Maternal/paternalComplex71119
Moderate × Not identified (>2 Mb)197Maternal/paternalComplex52145
Total moderate398398
Total984984

The table classifies the 984 candidate CpG sites identified with the targeted POE variance component (VC) analysis into groups representing the support for the detected POE (strength of evidence) based on having or not an identified POE–mQTL (POE–mQTL), and if their position overlaps with previously published studies (other studies: replication: the CpG is in a region within 2 kb of a known imprinted region; overlap: the CpG is in a region between 2 kb and 2 Mb of a known imprinted region; not identified: the CpG is more than 2 Mb away from a known imprinted region). NCpGs is the number of CpG sites in each category. VC POE model indicates the selected VC model (maternal/paternal or complex imprinting) and NCpGs per POE model is the number of CpG sites in each subgroup

Classification of the identified 984 CpGs potentially influenced by POEs The table classifies the 984 candidate CpG sites identified with the targeted POE variance component (VC) analysis into groups representing the support for the detected POE (strength of evidence) based on having or not an identified POE–mQTL (POE–mQTL), and if their position overlaps with previously published studies (other studies: replication: the CpG is in a region within 2 kb of a known imprinted region; overlap: the CpG is in a region between 2 kb and 2 Mb of a known imprinted region; not identified: the CpG is more than 2 Mb away from a known imprinted region). NCpGs is the number of CpG sites in each category. VC POE model indicates the selected VC model (maternal/paternal or complex imprinting) and NCpGs per POE model is the number of CpG sites in each subgroup Two independent trans-POE–mQTLs were identified as regulatory hubs, as they regulated more than one independent CpG target, both in cis and in trans. As an example, SNP rs231356 (chr11:2705343) was identified as a cis-POE–mQTL for three CpG sites (cg14958441, cg09518720 and cg02219360) on the same chromosome 11, the three displayed a complex bipolar imprinting pattern (Fig. 4, Supplementary Table 3). rs231356 also acted as a trans-POE–mQTL for another two CpGs, one on chromosome 18 (cg05884032) and one on chromosome 13 (cg23776532), both displaying a paternal imprinting pattern (Supplementary Table 3, Fig. 4). Notably, we failed to identify any cis-POE–mQTL for these two CpGs, suggesting that the trans effects from SNP rs231356 were potentially the cause of the parent-of-origin inheritance pattern detected in the variance component analyses for the two CpGs.
Fig. 4

Methylation CpGs regulated by SNP rs231356. SNP rs231356 acts both as a cis-POE–mQTL and a trans-POE–mQTL. Red arrows: location of the CpG in the chromosome. Boxplots show the allelic effects of rs231356 on methylation of three CpG sites (cg09518720, in cis, and cg05884032 and cg23776532, in trans). Boxplots: centre line, median; box limits, upper and lower quartiles; whiskers, 1.5× interquartile range; points, outliers

Methylation CpGs regulated by SNP rs231356. SNP rs231356 acts both as a cis-POE–mQTL and a trans-POE–mQTL. Red arrows: location of the CpG in the chromosome. Boxplots show the allelic effects of rs231356 on methylation of three CpG sites (cg09518720, in cis, and cg05884032 and cg23776532, in trans). Boxplots: centre line, median; box limits, upper and lower quartiles; whiskers, 1.5× interquartile range; points, outliers We compared our results with those published in a recent study, which also applies a POE–mQTL analysis on DNA methylation data (437,542 CpG sites) measured longitudinally in blood on a smaller sample (Noffspring = 740)[13]. Among the 327 CpGs that were associated with the POE from genetic variants (199 SNPs) detected[13], 260 CpGs were also analysed in our study. In our variance component analysis, 65% of those CpGs (162/260) showed a significant POE that exceeded the Bonferroni-corrected threshold for a replication and 50% of them (129/260) reached genome-wide significance (Supplementary Data 6). As we performed POE–mQTL analysis only for the 984 CpGs showing significant POE in our variance component analysis, we can only compare POE–mQTL results for the 129 CpGs analysed in both studies. We detected genome-wide significant POE–mQTL SNPs for all of the 129 CpGs in our cohort, and 94% (121/129) of those CpGs found an association with the same POE–mQTL SNPs reported[13]. To explore if candidate POE-influenced CpGs with at least one POE–mQTL SNP association (classified as strong POE evidence in Table 3, N = 586) differ from candidates without a POE–mQTL (classified as moderate POE evidence in Table 3, N = 398), we performed additional enrichment analysis for each group separately (Supplementary Data 7, 8). The results showed that CpGs in both groups were in regions depleted in promoters and enriched in Polycomb-repressed regions (Supplementary Fig. 7). The stronger POE evidence group displayed a much higher enrichment in known imprinted regions (based on a 2 kb distance from the regions, P(Fisher exact test) = 3.2 × 10−214), noncoding RNA (P(Fisher exact test) = 2.6 × 10−11), genic regions of genes containing methylation sites associated with BMI (P(Fisher exact test) = 2.3 × 10−9) and alcohol consumption (P(Fisher exact test) = 8 × 10−6) (Supplementary Data 7), whereas the moderate POE evidence group showed enrichment in genic regions of genes containing methylation sites associated with smoking (P(Fisher exact test) = 2.8 × 10−5) (Supplementary Data 8, Supplementary Fig. 7).

POE–EWAS for associations between complex trait and POE CpGs

To examine the potential consequences of variation in POE-influenced methylation CpGs on complex traits, we tested their association with 34 traits available in GS:SFHS[24], including anthropometric, cardiometabolic, psychiatric and psychological traits (Supplementary Table 4). Considering the potential biological difference between the CpG group with strong evidence and the group with moderate evidence of POEs (Supplementary Fig. 7), we performed the analyses separately for each group. These analyses identified 22 methylation–trait associations between 14 methylation sites and 10 traits at the multi-trait significance level (Table 4), and 81 methylation–trait associations between 47 methylation sites and 17 traits at the per-trait significance level in at least one group (Supplementary Data 9). These methylation sites were in regions highly enriched in paternal imprinting (P(chi-squared test) = 2.2 × 10−9), particularly for CpGs with strong POE evidence (Supplementary Table 5). Sixteen independent loci (defined as a region containing trait-associated CpGs mapped to the same gene, with between-locus-distance ≥ 1 Mb) were associated with more than one trait (Supplementary Data 9). In both groups, association signals (including those not reaching the significance threshold) for high-density lipoprotein (HDL) cholesterol, vegetable consumption frequency, body mass index (BMI), weight and intelligence (G), were ranked significantly higher than the other traits; waist circumference was ranked significantly higher than other traits only for the strong POE evidence group, whereas creatinine, education, alcohol consumption and blood pressure (systolic) were ranked higher only for the moderate POE evidence group (Mann–Whitney U test; Supplementary Tables 6, 7).
Table 4

Associations between POE CpGs and traits significant at the multi-trait level

CpGEvidenceChrPosition (bp)VCGenic regionGene nameTrait*P-valueEstSE
cg11078090Strong123878540CUpstream; downstreamE2F2; ID3; LOC101928163BMI2.22 × 10−70.0330.006
WC5.49 × 10−70.0300.006
cg08259905Strong362171428PIntronicPTPRGWeight1.85 × 10−6−0.0270.006
cg00329615Strong3118706648CIntronicIGSF11SBP2.48 × 10−70.0240.005
cg10755899Strong41772151CUpstream; downstreamFGFR3; TACC3HDL1.75 × 10−8−0.0290.005
BMI2.46 × 10−80.0200.004
%Fat4.18 × 10−80.9850.179
cg01290904Moderate45708474PIntronicEVC2HDL2.36 × 10−7−0.0460.009
cg11064966Strong532506514CIntergenicNoneWeight5.11 × 10−70.0560.011
cg12577411Strong615551489PIntronicDTNBP1%Fat9.60 × 10−9−2.270.394
BMI1.03 × 10−8−0.0450.008
WC3.47 × 10−7−0.0380.007
Weight3.50 × 10−7−0.0440.009
cg15773890Strong617259549PUpstreamRBM24Alcohol1.63 × 10−7−0.3150.060
cg05246100Strong755246275CIntronicEGFRBMI1.02 × 10−6−0.0370.008
cg11613559Strong10121577971CIntronicINPP5FAlcohol2.96 × 10−7−0.1320.026
cg14391737Moderate1186513429CIntronicPRSS23Hips1.85 × 10−70.0140.003
Weight6.86 × 10−70.0250.005
BMI8.50 × 10−70.0230.005
cg27272202Moderate125158794PDownstreamKCNA5CREAT2.49 × 10−70.0490.009
cg08698721Strong14101294147PncRNA intronicMEG3Height1.89 × 10−7−1.080.206
cg21740139Moderate1760753158PExonicMRC2CREAT1.83 × 10−60.0450.009

The table shows the CpGs displaying POE (CpG), their location (Chr: chromosome and position in bp, location relative to the nearest gene (genic region) and name of the nearest gene(s) (gene name)), the pattern of imprinting detected in the variance component analysis (VC, C: complex, P: paternal), the strength of the evidence supporting the inference of POE (evidence) and the estimated correlation between methylation level and traits (Est, Trait), together with standard errors (SE) and an indication of significance (P-value of t test). *Further details on traits are given in Supplementary Table 4

Associations between POE CpGs and traits significant at the multi-trait level The table shows the CpGs displaying POE (CpG), their location (Chr: chromosome and position in bp, location relative to the nearest gene (genic region) and name of the nearest gene(s) (gene name)), the pattern of imprinting detected in the variance component analysis (VC, C: complex, P: paternal), the strength of the evidence supporting the inference of POE (evidence) and the estimated correlation between methylation level and traits (Est, Trait), together with standard errors (SE) and an indication of significance (P-value of t test). *Further details on traits are given in Supplementary Table 4

POE-accounted phenotype-wide association study

A phenotype-wide association study, accounting for POE, (POE–PheWAS) was performed for 34 phenotypes (Supplementary Table 4) using those POE–mQTL SNPs identified in the previous analyses (for quantile–quantile plot: see Supplementary Fig. 8). One locus (rs6100212, chromosome 20: 57361064) exceeded phenome-wide significance for an association with waist circumference (P(= 8.57 × 10−7, Table 5). The index SNP was located 5ʹ of the PIEZO1P2 gene (Fig. 5). The same locus was also associated with waist-to-hip ratio (P(1.22 × 10−5), BMI (P( = 1.27 × 10−5), and body fat (P(= 2.09 × 10−5) at the per-trait significance level (Table 5). This cis-POE–mQTL was consistent in producing complex imprinting patterns at 12 DNA methylation sites, and also in waist circumference, BMI, body fat and waist-to-hip ratio (Fig. 5). Published GWAS that only accounted for additive genetic effects have failed to detect the association between this locus and the above traits as would be expected for a locus with a complex imprinting pattern (Supplementary Data 10). For the index SNP (rs6100212), we further detected a significant POE-by-sex interaction effect on waist circumference (P(= 1.31 × 10−3, Supplementary Table 8). When dividing the sample into age deciles, a nominally significant POE-by-age interaction effect on waist circumference was detected in the 10th decile (age > 47) (P(= 4.91 × 10−2, Supplementary Table 8). Combining these results, the largest POE of SNP rs6100212 on waist circumference was detected in females over 47 years old (Supplementary Table 9, Supplementary Fig. 9).
Table 5

Significant POE from cis-POE–mQTL rs6100212 on phenotypes and CpGs

TypeTrait*/CpGP-valueEstSE
TraitWC8.57 × 10−7−0.0050.001
TraitWHR1.22 × 10−5−0.0040.001
TraitBMI1.42 × 10−5−0.0050.001
Trait% Fat2.09 × 10−5−0.2120.050
Methylationcg038379035.04 × 10−60.0280.006
Methylationcg046776832.34 × 10−14−0.0450.006
Methylationcg062008571.96 × 10−4−0.0170.005
Methylationcg080915612.57 × 10−4−0.0200.005
Methylationcg094375225.56 × 10−7−0.0180.004
Methylationcg114802676.06 × 10−60.0290.006
Methylationcg151604454.60 × 10−16−0.0470.006
Methylationcg232493691.37 × 10−10−0.0270.004
Methylationcg242034658.53 × 10−10−0.0180.003
Methylationcg246173131.12 × 10−16−0.1590.019
Methylationcg253265702.12 × 10−12−0.0420.006
Methylationcg261025032.51 × 10−24−0.0450.004

*Further details on traits are given in Supplementary Table 4

Fig. 5

CpGs and complex traits regulated by SNP rs6100212. Upper: this SNP was located upstream of gene PIEZO1P2 and overlapped with H3K27ac and CTCF signals. The SNP was also significant for imbalanced methylation (GIT), but not significant in allelic-specific methylation (ASM) or mQTL (classical additive model) analysis as reported by an independent study. Left bottom: the SNP acted as a cis-POE–mQTL for methylation sites causing a complex imprinting pattern (cg26102503 as an example). Middle and right bottom: the SNP was also shown a regulatory role in waist (Phenome-wide significance), BMI, body fat and WHR (per-trait significance), introducing a similar complex imprinting pattern. Boxplots: centre line, median; box limits, upper and lower quartiles; whiskers, 1.5× interquartile range; points, outliers

Significant POE from cis-POE–mQTL rs6100212 on phenotypes and CpGs *Further details on traits are given in Supplementary Table 4 CpGs and complex traits regulated by SNP rs6100212. Upper: this SNP was located upstream of gene PIEZO1P2 and overlapped with H3K27ac and CTCF signals. The SNP was also significant for imbalanced methylation (GIT), but not significant in allelic-specific methylation (ASM) or mQTL (classical additive model) analysis as reported by an independent study. Left bottom: the SNP acted as a cis-POE–mQTL for methylation sites causing a complex imprinting pattern (cg26102503 as an example). Middle and right bottom: the SNP was also shown a regulatory role in waist (Phenome-wide significance), BMI, body fat and WHR (per-trait significance), introducing a similar complex imprinting pattern. Boxplots: centre line, median; box limits, upper and lower quartiles; whiskers, 1.5× interquartile range; points, outliers A replication analysis was performed using the subset of UK Biobank (UKB) with inferred parent-of-origin information (N = 4378). The significant POE from rs6100212 on waist circumference was not statistically significant in UKB (P(= 0.65), although a similar trend was observed with the point estimate of POE increasing at older ages and in females (Supplementary Table 10). The lack of significance in UKB data is potentially due to the age difference between the discovery and replication sample (Supplementary Fig. 10), particularly the small number of UKB participants who also have parents in the cohort (which is a pre-requisite for the inference of parental allele origin) and hence have SNP parent-of-origin information categorised as females over 47 (N = 130), where the largest POE was detected in GS:SFHS (Supplementary Table 9).

Discussion

We present here a population-based analysis of POEs caused by imprinting in human DNA methylation. Using a variance component method, we identified 733 independent CpGs (984 total), of which 331 were previously unidentified, where methylation levels displayed an increased full-sibling and/or one-parent–offspring methylation level similarity relative to expectations under additive inheritance patterns, suggesting putative POEs caused by imprinting. For 399 independent CpGs (171 previously unidentified), we identified genetic variants (POE–mQTLs) that regulate the CpGs through POEs. This provided additional evidence for POEs caused by imprinting on those CpGs (Table 3). CpG sites with putative POEs (candidate POE CpGs) without an associated POE–mQTL displayed distinct enrichment patterns in a range of genomic features and may represent a different biological phenomenon. A large proportion of the identified POE–mQTL associations followed a complex imprinting pattern (Fig. 1). Such a pattern is likely to be undetected in GWAS and classical mQTL studies which only model additive genetic effects, as the complex imprinting pattern produces no phenotypic difference in genotype means associated with allele substitution. We identified 22 significant associations between 14 of these candidate POE CpGs and 10 complex traits. We further examined the parent-of-origin effect of the identified POE–mQTL SNPs on complex traits and identified a locus associated with a complex imprinting effect on waist circumference and related traits that was not detectable by a standard additive effect GWAS. If such complex POEs proved to be widespread, they would contribute towards sibling similarity and hence some traditional pedigree estimates of trait heritability without contributing to SNP-based heritability estimates. Such effects could thus contribute towards the discrepancy between these two heritability estimates, i.e., the missing heritability[30]. The variance component analysis applied to detect POE signatures required data from parents and offspring but has advantages that include: (1) the ability to detect methylation sites with POEs without the need to know where the genetic variants responsible for the effect are located, or even without genotype data (although we used genotypes to construct the genomic relationship matrix, a pedigree-based relationship matrix could be used to replace this); (2) the ability to detect a number of CpGs displaying a complex imprinting pattern, substantially increasing the number of such sites reported with respect to previous studies. In particular, the proportion of sites displaying complex POE is higher among the previously unidentified sites than that for the previously known sites (67% vs. 57%), which means that our method potentially enables discovery of previously unidentified imprinted regions. The reliability of the parent-of-origin inheritance patterns inferred for those sites is supported by the fact that of the 331 previously unidentified sites, 51.7% (N = 171) have at least one POE–mQTL identified in our study, and the replication rate of 65% for the POE-influenced CpGs identified using a different and genetic-variant dependent method (POE–mQTL) by an independent study[13]. For more than half of the 984 CpGs displaying parent-of-origin inheritance in the variance component analysis, we detected associated genetic variants that influence the methylation variation through a POE. This provided strong evidence that the POEs on those CpGs were caused by imprinting (Table 3). The fact that the CpGs that display global parent-of-origin inheritance were associated with SNPs suggests that (1) the POE–mQTL SNPs were potentially located in imprinted regions and (2) the POEs affecting CpGs detected in this study were potentially the downstream consequence of imprinted states in regions where POE–mQTLs were located (Supplementary Fig. 1). This is supported by the substantial enrichment of detected POE–mQTLs in known imprinted regions and non-genetically regulated imbalanced methylation regions. In addition, a high level of complexity in the association between POE–mQTL SNPs and their regulated CpGs has been revealed in this study. Previous studies suggested that complex imprinting patterns were a consequence of regulation by two genetic regulators with opposite direction of POEs[11], whereas a paternal or maternal imprinting pattern could be regulated by single POE. Here, we observed cases where an individual CpG displayed different imprinting patterns when stratified by genotypes at different independent SNPs (Supplementary Fig. 11), showing that multiple regulations targeted the same CpGs. We also observed cases where single POE–mQTLs introduce different imprinting patterns to different CpG targets (Fig. 4), suggesting that the same SNP interacted with other regulators differently depending on which CpG it targeted. The enrichment analysis revealed both similarities and differences between the groups of candidate POE CpGs with and without POE–mQTL associations (with strong and moderate evidence of POE). In both groups, the CpGs were in regions enriched in Polycomb repressed regions and depleted in promoters. The group with POE–mQTLs was enriched in noncoding RNA regions. This was in line with previous findings suggesting that Polycomb proteins and imprinted ncRNA acted either cooperatively or independently to regulate imprinted gene clusters[31,32]. The group with POE–mQTL was also located in regions highly enriched in known imprinted region, whereas that was not the case for candidate POE CpGs with moderate evidence, suggesting that the former group mainly reflects POEs that are mechanistically similar with those reported in well-established studies of imprinting, whereas the latter group may reflect a separate group of CpGs influenced by POE, but further work is needed to understand the mechanisms behind our observations. Further differential enrichment between the two groups was observed in genic regions of genes containing published EWAS hits for various traits. The candidate POE CpGs without associated POE–mQTLs were enriched in genic regions of genes containing smoking-associated methylation CpGs, whereas the candidate POE CpGs with associated POE–mQTLs were in regions enriched in genes previously identified to contain CpGs associated with BMI and alcohol consumption. These results suggested the potential downstream consequence of the variation in these methylation CpGs on these specific traits. This was further supported by the observation that the majority of significantly associated traits for POE-influenced CpGs were metabolic traits (Table 4, Supplementary Data 9). Finally, our POE–PheWAS analysis for POE–mQTL SNPs identified an association between one POE–mQTL and waist circumference and other obesity-related traits. These convergent results imply that a potentially important consequence of identified POE is methylation-mediated variation on metabolic traits. In contrast to additive genetic models which are used in classical GWAS, the model that we used for POE–mQTL and POE–PheWAS analysis (Table 1) allowed us to detect SNP–phenotype (phenotype being either CpGs or complex traits) associations, in which the SNP causes a phenotypic difference between reciprocal heterozygous groups but not necessarily between the two homozygous groups. Using this model, we identified a locus (tagged by SNP rs6100212) that causes a consistent complex imprinting pattern in both methylation CpGs and waist circumference and related phenotypes. This locus was not significant either in published mQTL analyses, or published GWAS for waist circumference or any other related traits, potentially because of the lack of phenotypic difference between the two homozygous groups (Fig. 5). Our findings are in agreement with those of a recent independent study that found this locus was located in an imbalanced methylation region between paired chromosomes (Fig. 5)[19]. Given the fact that for waist circumference, this POE is particularly strong in the older female group (Supplementary Table 9, Supplementary Fig. 9), replication analysis of this finding in GS:SFHS would be best performed in a sample with a large number of older females with parental origin assigned for the alleles they carry. In UK Biobank, the number of females with parental allelic origin assigned in the >47 class was small (Supplementary Fig. 10), which may explain the lack of significant replication. rs6100212 was located in a regulatory region (supported by high-signal intensity of H3K27ac and CTCF binding as shown in Fig. 5) upstream of the pseudogene PIEZO1P2. Intriguingly, a previous GWAS identified a locus located within an enhancer (GH20H058887, located in the intron of nearby gene GNAS) targeting the same gene (PIEZO1P2) (http://www.genecards.org/) to be associated with waist circumference adjusted for body mass index[33]. Given these convergent lines of research, PIEZO1P2 and its regulatory regions should be treated as targets of obesity-related research. We combined results from different analyses to evaluate the strength of evidence for POEs caused by imprinting for each CpG (Table 3, Supplementary Data 1). CpGs with strong or very strong evidence of POEs (those with an associated POE–mQTL, N = 586) displayed clear patterns of overall POEs in the variance component analysis and had associated mQTL that drove the observed POE. CpGs in this group should be the focus of future studies targeting the downstream consequences of POE. Whereas candidate CpGs classified as not being supported by strong evidence (those without a POE–mQTL associated, N = 398), should be treated more cautiously. Although they displayed an overall POE pattern and some distinct features compared with the other group (Supplementary Fig. 7), the location and the type of genetic variants causing this pattern are yet to be identified and therefore need further validation. Other factors might also result in the increased full-sibling similarity observed in these CpGs, such as a full-sibling environmental effect, including some forms of maternal or paternal environmental effects, or a dominance or other non-additive effect. In the POE–mQTL analysis, a SNP dominance effect was also estimated in the model, allowing us to rule this out as responsible for the increased full-sibling similarity for the majority of candidate CpGs. For three CpGs (cg27572120, cg14614539, cg25885219), we failed to detect a POE–mQTL effect, but detected significant dominance effects from at least one SNP. In addition, POEs could be caused by other mechanisms, such as a genetic difference of reciprocal heterozygotes caused by gender-specific biased trinucleotide expansions, or situations where the expression of a locus in the mother (or father) influences the phenotypes in the offspring[11]. The contribution of those mechanisms to the observed POEs should be explored in future analyses. A limitation of this study is the relative lack of power to detect trans-POE–mQTLs (Nindividuals = 1668) and to detect SNP–trait associations in our POE–PheWAS (Nindividuals = 7106), given our sample size. This highlights a challenge for future POE studies because despite the very large size of some cohort studies, a focus on contemporary and unrelated individuals means that very limited parent–offspring data are available. This highlights the need to increase the number and size of family-based cohorts that allow the detection of potentially important sources of variation that may be difficult or impossible to study otherwise. An interesting topic of further research building on our own work would be a systematic investigation of the translation of POE-associated variation in methylation to POE-associated gene expression. Finally, longitudinal or stratified analysis could elucidate the stability of POE patterns at different developmental stages, disease, aging stages and genetic/environmental backgrounds and perhaps most importantly, tissue and cell types. In conclusion, a methylome-wide scan in 5101 individuals identified 984 candidate CpGs as the targets of POEs caused by imprinting at the DNA methylation level. Of these 984 candidate CpGs, there is strong evidence that 191 are previously unidentified POE-influenced CpGs from 171 independent regions. DNA methylation, genome-wide genotypes and intensive phenotyping data were further combined in a series of comprehensive analyses, where some of the potential causes (POE–mQTLs) and consequences (associated complex traits) of these POEs were uncovered, providing important targets for future studies.

Methods

Population samples

Generation Scotland: The Scottish Family Health Study (GS:SFHS) contains 21,387 subjects (Nmales = 8,772, Nfemales = 12,615; average age = 47.2 (SD = 15.1)) from ~7000 families who were recruited from the registers of collaborating general practices in Scotland between 2006 and 2011[24]. A subset of 5101 GS:SFHS participants have DNA methylation data (see below). The family structure for that subset includes 1692, 616, 1102 and 306 full siblings, father–offspring, mother–offspring and couple pairs, respectively. The average age of parents is 58 (5%–95%:45–78) and the average age of offspring is 34 (5%–95%:19–53). All components of GS:SFHS received ethical approval from the NHS Tayside Committee on Medical Research Ethics (REC Reference number: 05/S1401/89). GS:SFHS has also been granted Research Tissue Bank status by the Tayside Committee on Medical Research Ethics (REC Reference number: 10/S1402/20), providing generic ethical approval for a wide range of uses within medical research. Participants all gave written consent after having an opportunity to discuss the project and before any data or samples were collected. UK Biobank data were obtained under application number 19655. We used records on waist circumference on 4378 white-British unrelated individuals for whom parent-of-origin information could be imputed. The UK Biobank project was approved by the National Research Ethics Service Committee North West-Haydock (REC reference: 11/NW/0382). An electronic signed consent was obtained from the participants.

Genotyping, phasing and imputation in GS:SFHS

Genotyping data were generated using the Illumina Human OmniExpressExome −8- v1.0 array[34-36]. Phasing was performed using SHAPEIT option–duohmm, and imputation was performed using the Haplotype Reference Consortium (HRC) reference panel release 1.1[37,38]. A total of 497,401 genotyped common autosomal SNPs and 7,108,491 imputed common SNPs for 19,994 participants passed Quality Control (QC) criteria and were used in the subsequent analyses. Details of QC, phasing and imputation are given in Supplementary Methods. Chromosomal position for markers are based on human genome assembly GRCh37 (hg19).

DNA methylation data on a subset of GS:SFHS

DNA methylation data were available for a subset of 5200 participants from the GS:SFHS cohort, as part of the Stratifying Resilience and Depression Longitudinally (STRADL) project[39]. DNA methylation was measured at 866,836 CpGs from whole blood genomic DNA, using the Illumina Infinium MethylationEPIC array. Two formats of data were produced after QC and normalisation: (1) Beta values which measure the proportion of methylation at a given CpG (ranging from 0 to 1); and (2) M values which are the logit transformation of the Beta values. M values were used in downstream analysis as a previous study suggested these to be more statistically robust in analysis[40]. For each methylation site, a linear mixed model was used to pre-correct M values to remove effects of technical factors. The model converged successfully for 639,238 CpG sites, and the resulting residualised-M values were used as DNA methylation phenotypes in downstream analysis (Nparticipants = 5101, NCpG = 639,238). Details of QC, normalisation, assessment of cell composition, and pre-correction for M-values are given in Supplementary Methods.

Identification of parent-of-origin of alleles in offspring

Among GS:SFHS participants with genotype data (N = 19,994), there were 2680 trios (i.e., both parents and one offspring), 1185 father–offspring duos, and 3274 mother–offspring duos. We inferred parent-of-origin allelic transmission for 7,108,491 imputed common SNPs (MAF ≥ 0.01) in 7106 of the 7139 offsprings. We compared offspring haplotypes to their parents’ using informative loci (i.e., heterozygous) in offspring. We then evaluated the accuracy of the assigned parent-of-origin haplotype at a genotype level, and found an accuracy of over 99.9% across all SNPs. For details, see Supplementary Methods. We used parent-of-origin information in the POE–mQTL analyses and the Phe-WAS described below.

Assessment of number of independent methylation CpGs

Methylation levels between CpG sites can be highly correlated. The pairwise correlation in methylation levels between sites was estimated and used to produce a list of independent sites, which we henceforth refer to as index CpGs, using a similar algorithm to that used for LD-clumping (i.e., grouping on the basis of linkage disequilibrium) of SNPs in PLINK, using a window size of 250 kb and a R2 cut-off of 0.1 (Supplementary Methods)[41]. The number of independent index CpG sites and their location was used to compare our results to those described in the literature.

Variance component analyses of methylation at CpG sites

A variance component analysis framework based on multiple genomic and family–environmental relationship matrices has been previously developed to dissect phenotypic variation into contributions from additive genetic effect of common SNPs (), additional additive genetic effect associated with pedigree (), and a number of shared family environmental effects () for nuclear family relationship, for full-sibling relationship and for couple relationship)[42,43]. We refer to the model applied in these analyses as the GKFSC model. Here, we applied this method to dissect phenotypic variation in DNA methylation levels (measured as residualised M values) for each individual CpG site into these different genetic and family environmental components (Aim 1 in Table 1). We then reparameterised the model to identify candidate methylation CpGs with parent-of-origin inheritance (Aim 2 in Table 1).

GKFSC variance component analyses

The GKFSC model[42,43] includes two genomic relationship matrices, (genomic relationship matrix) and (kinship relationship matrix)[42,44], and three environmental relationship matrices, (environmental matrix representing nuclear family-member relationships), (environmental matrix representing full-sibling relationships) and (environmental matrix representing couple relationships) (see Supplementary Methods)[42,43]. These five matrices were fitted simultaneously as random effects in a mixed linear model for methylation at each CpG, together with covariates (i.e., age, age2, sex, cell-counts for granulocytes, B-lymphocytes, natural killer cells, CD4 + T-lymphocytes and CD8 + T-lymphocytes, season of the visit, appointment time of the day, appointment day of the week) fitted as fixed effects. The model facilitates estimation of the proportion of methylation variation explained by each fitted random effect, while accounting for the effects from the remaining components. The significance of the estimated variance explained by the random effects was tested using a Wald test (one-sided). A Bonferroni correction was applied to account for multiple testing (Ntest = 639,238).

POE-targeted variance component analyses

The full-sibling associated variance component (), modelled in the matrix, may capture not only the shared environmental effect between siblings, but also non-additive genetic effects that increase similarity between siblings. For additive genetic effects, the phenotypic covariance between parents and their offspring is of similar magnitude to that between siblings, whereas for phenotypes influenced by POEs caused by imprinting, the covariance between parents (one or both, depending on the POE model, see Fig. 1) and their offspring is reduced relative to that between full siblings[45]. Therefore, for CpGs sites for which methylation levels are influenced by POEs caused by imprinting, matrix when fitted simultaneously with the additive genetic components, can capture the additionally increased similarity between full siblings caused by POE (see the Results section), and can be used to identify methylation CpGs potentially influenced by POE (detailed discussion see Supplementary Methods). There are several possible imprinting inheritance patterns[11], each of which is expected to produce a characteristic covariance structure between parents and offspring and between full siblings (see Fig. 1 for examples). To maximise the power to detect methylation sites influenced by different patterns of POE, we designed three POE relationship matrices that specifically target the POE generated by complex imprinting (, this is the sibling matrix of the GKFSC model), paternal imprinting () and maternal imprinting (), respectively (Fig. 1, Supplementary Methods). For each CpG, we compared a model that only includes the additive genetic components and (base model in Table 1), against each of the three alternative imprinting models with one of the POE relationship matrices fitted as random effect, jointly with the genetic additive effects ( and ). We then selected the alternative imprinting model with the largest significant improvement of model fit, based on a log-likelihood-ratio test (LRT, one-sided, degree of freedom = 1. Supplementary Methods). Multiple testing correction was performed using a false discovery rate (FDR) at 0.05 level (Ntest = 639,238). These analyses were performed in GCTA[46]. The visualisation of results was performed using the R package coMET[47].

Parent-of-origin effect mQTL analyses

To locate the loci that cause the POE in the 984 methylation CpGs identified in the variance component analysis, a POE–mQTL analysis was performed by testing 7,108,491 imputed SNPs (MAF ≥ 0.01) against methylation levels for each of these 984 methylation CpGs. There were 1668 offsprings in GS:SFHS with both parent-of-origin assigned to alleles and DNA-methylation data that could be used in our POE–mQTL analysis. For each CpG, methylation values were pre-corrected to account for relatedness by fitting a genomic relationship matrix () as a random effect and fitting the following variables as fixed effects: age, age2, sex, cell count, season of the visit, appointment time of the day, appointment day of the week in a linear mixed model. The residualised M values from the model described above were regressed against three orthogonal genetic effects (two-sided): an additive effect (genotypes coded as 0, 1, 1 and 2 for AA, Aa, aA and aa), a dominance effect (genotypes coded as 0, 1, 1, 0 for AA, Aa, aA and aa), and a POE (genotypes coded as 0, −1, 1 and 0 for AA, Aa, aA and aa) (Table 1)[13]. SNPs showing a significant POE for a methylation CpG and within less than 1 Mb from that CpG were defined as cis-POE–mQTLs[8]. SNPs showing a significant POE for a methylation CpG located more than 5 Mb away from that CpG were defined as trans-POE–mQTLs[48]. SNPs located between 1 and 5 Mb from their POE-associated methylation CpG were not considered. To determine the significance threshold for association, a permutation-based multiple testing correction was performed for cis-POE–mQTLs and trans-POE–mQTLs analyses separately at the FDR ≤ 0.05 level. For the permutation test, individual identifiers were shuffled and the correlation structure between SNPs and between CpGs was retained[9,48]. Ten replicates were used to establish a stable distribution of the test statistic under the null hypothesis, as suggested in previous studies[9,48], which led to an estimate of the FDR ≤ 0.05 p-value threshold of 3.6 × 10−4 for cis-POE–mQTLs and 2.19 × 10−9 for trans-POE–mQTLs. PLINK was used to produce a set of independent POE–mQTLs by clumping POE-associated SNPs within a window size of 250 kb around the most significant associated SNP (the index SNP) with an R2 threshold of 0.1 and a p-value threshold of 1 for PPOE–mQTL[41].

POE–EWAS for associations between complex traits and POE CpGs

To assess the phenotypic effect of variation in methylation levels at the 984 CpGs identified as potentially influenced by POE, methylation levels at these sites were correlated with phenotypic values for 34 anthropometric, cardiometabolic, psychiatric and psychological traits available in GS:SFHS (details of traits and pre-processing are in Supplementary Table 4)[24]. A linear mixed model was used to pre-correct each of the 34 traits for covariates (age, age2, sex, clinic) by including them in the model as fixed effects, and for relatedness by fitting the and matrices as random effects, following previous work[49]. Methylation levels were pre-adjusted for cell count, season of the visit, appointment time of the day, appointment day of the week, never/ever smoking and pack years of smoking. Pairwise association tests were performed by regressing each pre-corrected phenotype against adjusted methylation levels at each CpG site using a linear regression model (tested two-sided). A principal component analysis of the 34 traits revealed that the top 27 principal components explained more than 95% of the variation and any component beyond it has an eigenvalue <0.5 (Kaiser’s rule), hence the number of independently tested traits (Ninde_traits) was estimated to be 27. Bonferroni-based multiple testing correction was performed, with the p-value significance threshold for multiple traits level estimated to be 1.88 × 10−6 (Ntest = 27*984 = 26,568), and for per-trait level estimated to be 5.08 × 10−5 (Ntest = 984). To explore if the POE–mQTL were also associated with POE effects on phenotypes, the 7106 GS:SFHS offspring with parent-of-origin assigned alleles were used in a POE–PheWAS for 34 traits (trait list and pre-correction process were the same as used in the previous section; see Table s1). Only SNPs that were significant in the POE–mQTL analyses described above were used (Ntotal_snps = 38,122, Ninde_snps = 1895) in this analysis. The same regression model applied to the residualised M values in POE–mQTL analysis was used to perform the POE–PheWAS on the 34 pre-corrected phenotypes (Table 1). As above, the POE–PheWAS model accounts for additive effects, dominance and POEs (tested two-sided). The number of independently tested traits (Ninde_traits) was estimated to be 27 (see the previous section), and multiple test correction was performed using Bonferroni method (Ntest = Ninde_snps*Ninde_traits = 51,165). In order to validate the PheWAS results obtained in GS:SFHS, UK Biobank (UKB) data (Nparticipants = 501,726) were used in a replication analysis[50]. For details of sample information, the identification of nuclear family members, phasing, imputation and QC and parent-of-origin information assignment, see Supplementary Methods. Parent-of-origin information was assigned to alleles of the target SNP (i.e., significant in the GS:SFHS PheWAS) for 4378 white-British unrelated UKB offspring (Kinship coefficient ≤ 0.05) used in the replication analysis (Supplementary Methods). Log-transformed waist circumference was tested for POE using a linear regression model accounting for additive effects, dominance and POEs as well as a number of covariates (age, sex, processing batch, assessment centre, genotype array and top 15 principal components of ancestry) as fixed effects (tested two-sided).

Functional enrichment of POE-influenced CpGs

To further characterise the methylation CpGs identified as displaying POEs, ANNOVAR[51] was used to annotate CpGs to regions of (1) different chromatin- and histone-modification states, as DNA methylation dynamics is associated with altered chromatin structure[52], and coupled with histone modifications in relevant tissues[53], and transcription factor binding sites. A lymphoblastoid cell line (GM12878) and an immortalised myelogenous leukaemia cell line (K562) were used in this annotation as they are the two cells produced from blood among primary cell lines with abundant annotation information in the ENCODE project[54]. Methylation CpGs were also annotated to (2) regions that are significant in published GWAS and EWAS and (3) substructure regions of genes (for databases used see Supplementary Methods). Fisher’s exact test was used to test for enrichment/depletion of POE-influenced CpGs sites in target annotations. The Bonferroni method was used for multiple testing correction (Ntest = 212 (see Supplementary Data 2), p-value threshold of significance = 2.36 × 10−4).

Gene set-based enrichment of POE-influenced CpGs

A further characterisation of the POE-influenced methylation CpGs involved annotating these to genes and then testing for enrichment of those annotated genes in specific gene set. ANNOVAR[51] was used to annotate CpGs to genes. A CpG site was assigned to its nearest gene if it was located between 5 kb of the gene’s transcription start site (TSS) and 1 kb distance from the transcription end site (TES). The online tool DAVID was used to perform an enrichment analysis in GO-ontology terms, biological pathways, GAD (Genetic Association Database) diseases, protein domains and interactions[55]. The enrichment test was performed using the EASE score test (a modified Fisher exact test which is more conservative than the standard Fisher exact test) to see whether the proportion of genes falling into the tested annotation differs in a target group compared with the background group.
  54 in total

1.  A linear complexity phasing method for thousands of genomes.

Authors:  Olivier Delaneau; Jonathan Marchini; Jean-François Zagury
Journal:  Nat Methods       Date:  2011-12-04       Impact factor: 28.547

2.  Cohort Profile: Generation Scotland: Scottish Family Health Study (GS:SFHS). The study, its participants and their potential for genetic research on health and illness.

Authors:  Blair H Smith; Archie Campbell; Pamela Linksted; Bridie Fitzpatrick; Cathy Jackson; Shona M Kerr; Ian J Deary; Donald J Macintyre; Harry Campbell; Mark McGilchrist; Lynne J Hocking; Lucy Wisely; Ian Ford; Robert S Lindsay; Robin Morton; Colin N A Palmer; Anna F Dominiczak; David J Porteous; Andrew D Morris
Journal:  Int J Epidemiol       Date:  2012-07-10       Impact factor: 7.196

3.  PLINK: a tool set for whole-genome association and population-based linkage analyses.

Authors:  Shaun Purcell; Benjamin Neale; Kathe Todd-Brown; Lori Thomas; Manuel A R Ferreira; David Bender; Julian Maller; Pamela Sklar; Paul I W de Bakker; Mark J Daly; Pak C Sham
Journal:  Am J Hum Genet       Date:  2007-07-25       Impact factor: 11.025

Review 4.  CpG methylation, chromatin structure and gene silencing-a three-way connection.

Authors:  A Razin
Journal:  EMBO J       Date:  1998-09-01       Impact factor: 11.598

5.  Comparison of Beta-value and M-value methods for quantifying methylation levels by microarray analysis.

Authors:  Pan Du; Xiao Zhang; Chiang-Ching Huang; Nadereh Jafari; Warren A Kibbe; Lifang Hou; Simon M Lin
Journal:  BMC Bioinformatics       Date:  2010-11-30       Impact factor: 3.169

Review 6.  Missing heritability and strategies for finding the underlying causes of complex disease.

Authors:  Evan E Eichler; Jonathan Flint; Greg Gibson; Augustine Kong; Suzanne M Leal; Jason H Moore; Joseph H Nadeau
Journal:  Nat Rev Genet       Date:  2010-06       Impact factor: 53.242

7.  Epigenetic mechanisms and genome stability.

Authors:  Emily L Putiri; Keith D Robertson
Journal:  Clin Epigenetics       Date:  2011-08-01       Impact factor: 6.551

8.  Functional variation in allelic methylomes underscores a strong genetic contribution and reveals novel epigenetic alterations in the human epigenome.

Authors:  Warren A Cheung; Xiaojian Shao; Andréanne Morin; Valérie Siroux; Tony Kwan; Bing Ge; Dylan Aïssi; Lu Chen; Louella Vasquez; Fiona Allum; Frédéric Guénard; Emmanuelle Bouzigon; Marie-Michelle Simon; Elodie Boulier; Adriana Redensek; Stephen Watt; Avik Datta; Laura Clarke; Paul Flicek; Daniel Mead; Dirk S Paul; Stephan Beck; Guillaume Bourque; Mark Lathrop; André Tchernof; Marie-Claude Vohl; Florence Demenais; Isabelle Pin; Kate Downes; Hendrick G Stunnenberg; Nicole Soranzo; Tomi Pastinen; Elin Grundberg
Journal:  Genome Biol       Date:  2017-03-10       Impact factor: 13.583

9.  Systematic identification of trans eQTLs as putative drivers of known disease associations.

Authors:  Harm-Jan Westra; Marjolein J Peters; Tõnu Esko; Hanieh Yaghootkar; Claudia Schurmann; Johannes Kettunen; Mark W Christiansen; Bruce M Psaty; Samuli Ripatti; Alexander Teumer; Timothy M Frayling; Andres Metspalu; Joyce B J van Meurs; Lude Franke; Benjamin P Fairfax; Katharina Schramm; Joseph E Powell; Alexandra Zhernakova; Daria V Zhernakova; Jan H Veldink; Leonard H Van den Berg; Juha Karjalainen; Sebo Withoff; André G Uitterlinden; Albert Hofman; Fernando Rivadeneira; Peter A C 't Hoen; Eva Reinmaa; Krista Fischer; Mari Nelis; Lili Milani; David Melzer; Luigi Ferrucci; Andrew B Singleton; Dena G Hernandez; Michael A Nalls; Georg Homuth; Matthias Nauck; Dörte Radke; Uwe Völker; Markus Perola; Veikko Salomaa; Jennifer Brody; Astrid Suchy-Dicey; Sina A Gharib; Daniel A Enquobahrie; Thomas Lumley; Grant W Montgomery; Seiko Makino; Holger Prokisch; Christian Herder; Michael Roden; Harald Grallert; Thomas Meitinger; Konstantin Strauch; Yang Li; Ritsert C Jansen; Peter M Visscher; Julian C Knight
Journal:  Nat Genet       Date:  2013-09-08       Impact factor: 38.330

10.  Genome-wide survey of parent-of-origin effects on DNA methylation identifies candidate imprinted loci in humans.

Authors:  Gabriel Cuellar Partida; Charles Laurin; Susan M Ring; Tom R Gaunt; Allan F McRae; Peter M Visscher; Grant W Montgomery; Nicholas G Martin; Gibran Hemani; Matthew Suderman; Caroline L Relton; George Davey Smith; David M Evans
Journal:  Hum Mol Genet       Date:  2018-08-15       Impact factor: 6.150

View more
  14 in total

1.  Detecting methylation quantitative trait loci using a methylation random field method.

Authors:  Chen Lyu; Manyan Huang; Nianjun Liu; Zhongxue Chen; Philip J Lupo; Benjamin Tycko; John S Witte; Charlotte A Hobbs; Ming Li
Journal:  Brief Bioinform       Date:  2021-11-05       Impact factor: 13.994

2.  Methylome-wide association study of early life stressors and adult mental health.

Authors:  David M Howard; Oliver Pain; Ryan Arathimos; Miruna C Barbu; Carmen Amador; Rosie M Walker; Bradley Jermy; Mark J Adams; Ian J Deary; David Porteous; Archie Campbell; Patrick F Sullivan; Kathryn L Evans; Louise Arseneault; Naomi R Wray; Michael Meaney; Andrew M McIntosh; Cathryn M Lewis
Journal:  Hum Mol Genet       Date:  2022-02-21       Impact factor: 6.150

3.  Epigenome-wide association study of attention-deficit/hyperactivity disorder in adults.

Authors:  Paula Rovira; Cristina Sánchez-Mora; Mireia Pagerols; Vanesa Richarte; Montserrat Corrales; Christian Fadeuilhe; Laura Vilar-Ribó; Lorena Arribas; Gemma Shireby; Eilis Hannon; Jonathan Mill; Miquel Casas; Josep Antoni Ramos-Quiroga; María Soler Artigas; Marta Ribasés
Journal:  Transl Psychiatry       Date:  2020-06-19       Impact factor: 6.222

4.  Generation Scotland participant survey on data collection.

Authors:  Rachel Edwards; Archie Campbell; David Porteous
Journal:  Wellcome Open Res       Date:  2019-07-25

Review 5.  Genetic impacts on DNA methylation: research findings and future perspectives.

Authors:  Sergio Villicaña; Jordana T Bell
Journal:  Genome Biol       Date:  2021-04-30       Impact factor: 13.583

6.  Genome-wide methylation data improves dissection of the effect of smoking on body mass index.

Authors:  Carmen Amador; Yanni Zeng; Michael Barber; Rosie M Walker; Archie Campbell; Andrew M McIntosh; Kathryn L Evans; David J Porteous; Caroline Hayward; James F Wilson; Pau Navarro; Chris S Haley
Journal:  PLoS Genet       Date:  2021-09-09       Impact factor: 5.917

7.  Epigenome-wide analyses identify DNA methylation signatures of dementia risk.

Authors:  Rosie M Walker; Mairead L Bermingham; Kadi Vaher; Stewart W Morris; Toni-Kim Clarke; Andrew D Bretherick; Yanni Zeng; Carmen Amador; Konrad Rawlik; Kalyani Pandya; Caroline Hayward; Archie Campbell; David J Porteous; Andrew M McIntosh; Riccardo E Marioni; Kathryn L Evans
Journal:  Alzheimers Dement (Amst)       Date:  2020-08-10

8.  Identification of epigenome-wide DNA methylation differences between carriers of APOE ε4 and APOE ε2 alleles.

Authors:  Rosie M Walker; Kadi Vaher; Mairead L Bermingham; Stewart W Morris; Andrew D Bretherick; Yanni Zeng; Konrad Rawlik; Carmen Amador; Archie Campbell; Chris S Haley; Caroline Hayward; David J Porteous; Andrew M McIntosh; Riccardo E Marioni; Kathryn L Evans
Journal:  Genome Med       Date:  2021-01-04       Impact factor: 11.117

9.  Germinal epimutation of Fragile Histidine Triad (FHIT) gene is associated with progression to acute and chronic adult T-cell leukemia diseases.

Authors:  Marcia Bellon; Izabela Bialuk; Veronica Galli; Xue-Tao Bai; Lourdes Farre; Achilea Bittencourt; Ambroise Marçais; Michael N Petrus; Lee Ratner; Thomas A Waldmann; Vahid Asnafi; Antoine Gessain; Masao Matsuoka; Genoveffa Franchini; Olivier Hermine; Toshiki Watanabe; Christophe Nicot
Journal:  Mol Cancer       Date:  2021-06-06       Impact factor: 27.401

10.  Genome-wide survey of parent-of-origin-specific associations across clinical traits derived from electronic health records.

Authors:  Hye In Kim; Bin Ye; Jeffrey Staples; Anthony Marcketta; Chuan Gao; Alan R Shuldiner; Cristopher V Van Hout
Journal:  HGG Adv       Date:  2021-06-11
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.