Literature DB >> 23303775

Patterns of methylation heritability in a genome-wide analysis of four brain regions.

Gerald Quon1, Christoph Lippert, David Heckerman, Jennifer Listgarten.   

Abstract

DNA methylation has been implicated in a number of diseases and other phenotypes. It is, therefore, of interest to identify and understand the genetic determinants of methylation and epigenomic variation. We investigated the extent to which genetic variation in cis-DNA sequence explains variation in CpG dinucleotide methylation in publicly available data for four brain regions from unrelated individuals, finding that 3-4% of CpG loci assayed were heritable, with a mean estimated narrow-sense heritability of 30% over the heritable loci. Over all loci, the mean estimated heritability was 3%, as compared with a recent twin-based study reporting 18%. Heritable loci were enriched for open chromatin regions and binding sites of CTCF, an influential regulator of transcription and chromatin architecture. Additionally, heritable loci were proximal to genes enriched in several known pathways, suggesting a possible functional role for these loci. Our estimates of heritability are conservative, and we suspect that the number of identified heritable loci will increase as the methylome is assayed across a broader range of cell types and the density of the tested loci is increased. Finally, we show that the number of heritable loci depends on the window size parameter commonly used to identify candidate cis-acting single-nucleotide polymorphism variants.

Entities:  

Mesh:

Substances:

Year:  2013        PMID: 23303775      PMCID: PMC3575819          DOI: 10.1093/nar/gks1449

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

The identification of genetic markers that impact the phenotype of an individual is an important step towards identifying the genetic basis of disease. Replicated findings of such associations have become increasingly common (1). However, a formidable remaining challenge is finding the mechanisms through which these identified markers act to ultimately drive phenotypic variation. The epigenome is now recognized as playing a critical role in developmental processes and is also likely to be involved in ultimately determining phenotypic traits (2). For example, DNA methylation of CpG dinucleotides can exert regulatory influence on gene expression levels, which in turn can influence phenotype (3). Methylation levels vary between cell types (4), between individuals (5), and they are known to be influenced by both environmental and genetic factors (6). Importantly, methylation has been implicated in a wide range of diseases, including cancers, autism-spectrum disorders (2), as well as several auto-immune diseases (7). It, therefore, stands to reason that finding and characterizing the genetic determinants of methylation could yield insight into mechanisms of disease and the functional consequences of genetic variation. Genetic sequence has been implicated as a determinant of DNA methylation in a number of contexts. Individuals who are heterozygous at a gene locus can exhibit allele-specific methylation that is dependent on DNA sequence and leads to differential gene expression patterns between the alleles (i.e. allele-specific gene expression) (8–11). Hellman and Chess (12) found that individuals who shared more parental chromosomes (i.e. are more related) tend to exhibit more similar methylation patterns. Single-nucleotide polymorphisms (SNPs) may also disrupt CpG dinucleotides (i.e. causing them to no longer be CpG), thereby preventing methylation there or at neighbouring loci (8). Several methylome-wide studies have identified individual SNPs that are correlated with specific methylation loci (13–15). Despite these findings, the extent to which differences in stretches of cis-DNA sequence (i.e. multivariate SNP signal) explain differences in methylation of a given CpG dinucleotide between individuals, and, correspondingly, to what extent methylation is deemed heritable when estimated from SNP data from unrelated individuals, remains unclear. Recently, Bell et al. (14) used the differences in correlation between monozygotic and dizygotic twins to estimate the heritability of methylation in blood samples, finding a genome-wide mean heritability of 18%. Twin-based analyses are important in shedding light on an upper bound of heritability, but yield no information as to the mechanism of action underlying heritability, a critical piece of the story. Before this, several more focused studies have been conducted on examining the heritability of methylation in particular contexts, such as between cell divisions of cancer cells (16), for a particular gene (17) or for the major histocompatibility complex region in a twin study focused only on immune cells (18). Herein, we identify ‘heritable methylation loci’—those loci for which cis-SNPs explain more of the phenotypic variance than expected by chance—within the human methylome, in four distinct brain regions across 150 unrelated individuals from publicly available data. Our goals were to investigate what role stretches of cis-DNA sequence plays in influencing methylation, what is an optimum definition of cis (i.e. locality) in this context, whether the additive effects of measured SNPs could explain the twin-based estimates of heritability previously reported and whether CpG dinucleotides with heritable methylation were more likely to be within or neighbouring particular classes of genes or genomic features.

MATERIALS AND METHODS

Individual SNP data and chromosomal coordinates were downloaded from dbGAP Study Accession phs000249.v1.p1. Normalized methylation levels across four brain regions [cerebellum (CRBLM), frontal cortex (FCTX), caudal pons (PONS) and temporal cortex (TCTX)] from 150 individuals were obtained from the Gene Expression Omnibus (GEO) database (accession GSE15745). This data profiled methylation levels of 27 578 CpG loci assayed using an Illumina HumanMethylation27 BeadChip. Methylation locus chromosome coordinates were obtained from GEO (GPL8490). SNP data for the same individuals were generated from tissue collected in the cerebellum brain region. All SNPs missing in >1% of the individuals, or those whose minor allele frequency was <0.01 were discarded. All individuals missing >5% of their SNP data were removed. Several methylation loci and individual samples were removed because of data quality concerns [see Supplementary Information of Gibbs et al. (13)]. Initially, we found that our estimates of heritability were significantly correlated with the number of SNPs within the methylation probe region. Thus, to avoid erroneously identifying methylation loci as heritable from such artefacts, we filtered out any methylation loci whose respective probe overlapped a SNP with minor allele frequency ≥0.05 (using the highest reported minor allele frequency from dbSNP, and the list of probe SNPs as provided by Illumina). This filter further removed 5816 methylation loci, leaving 21 000 methylation loci for our analysis. Individual covariate data were obtained from Supplementary Table S1 from Gibbs et al. (13) and converted to a 1-of-(M-1) encoding for discrete variables. Table 1 reports the final number of individuals and SNPs for each of the four brain regions.
Table 1.

Number of individuals and SNPs used in analyses for each of the four brain regions

RegionNumber of individualsNumber of SNPs
CRBLM106495 788
FCTX132495 873
PONS124495 870
TCTX125495 866
Number of individuals and SNPs used in analyses for each of the four brain regions

Identification of heritable methylation loci

We used linear mixed models (LMMs) to assess the narrow-sense heritability of each methylation locus (19). Let the vector of length N represent the methylation levels of locus i at brain region t across N individuals. Using LMMs, we can decompose the variance associated with as the sum of a linear additive genetic () and residual () component, where X is the N×Q matrix of Q individual covariates (gender, age, post-mortem interval, region source and methylation assay batch) and offset term, β is the Q×1 vector of covariate effects, I is the N×N identity matrix and K is the realized relationship matrix (RRM) (20) of size N×N. Note that K factors as , where W of dimension N×s contains the s SNPs in our window local to the gene and that when s To compute a P-value for whether a methylation locus, , was heritable—that is, to compute the significance of the genetic variance component in the model—we set to obtain the null model, and then used a modified likelihood ratio test, which accounted for the fact that the parameter being tested was on the boundary of the allowed space in the null model (22,23). That is, in the null model, and in the alternative model because it is a variance parameter. However, on checking the calibration of P-values by way of permutation tests, we discovered the P-values to be conservative, owing partly to the small sample size, but also to the approximation of the null distribution in this case [e.g. (24)], and thus used the permutation-based P-values instead (using 420 000 permutations of the individuals in the methylation data, and using the same permutations for each methylation locus). We defined a ‘heritable locus’ as one in which the P-value of association was smaller than a significance level of 0.05 after Bonferroni correction. Note that this test can be viewed as a test for association between the SNPs in the set and the phenotype in question, and it has been used in a similar manner in (25,26).

Determining an optimum cis window size

To find an optimal window size across all methylation loci for inclusion of cis-acting SNPs, we systematically varied the window size through 10 kb, 50 kb, 100 kb, 500 kb and 1 Mb, use of the entire chromosome that the locus fell on, and all SNPs assayed in the genome. We then deemed the optimum window to be the one yielding the largest number of heritable methylation loci, where we identified heritable loci by the permutation strategy previously described, but limiting the number of permutations to 10 000 for each window size for computational efficiency. Note the final set of heritable loci we report is based on the 420 000 permutations. We used a window that was symmetric around the methylation locus of interest. That is, we defined a window of size X kb centred at a methylation locus at position i as the DNA sequence within the region kb, inclusive. Closely related individuals can be problematic when estimating heritability (27) because of confounding owing to shared environmental factors. For example, Visscher and colleagues require removal of all individuals with RRM similarity >0.05 (27). In our data set, no two individuals were related this closely; thus, we did not filter any individuals by this criterion. Furthermore, a univariate scan of various methylation loci, randomly chosen, did not show significant deviation of the genomic control factor (28), , from 1.0, suggesting that hidden confounders were not present in this data set. When scanning cis window sizes, we restricted our comparison with the 15 179 methylation loci for which we could find at least one SNP within each of the window sizes considered.

Assigning methylation loci to gene sets

We first assigned methylation loci to genes, based on proximity, and then assigned genes to gene sets. Methylation loci were assigned to their closest neighbouring genes as reported by the Illumina HumanMethylation27 BeadChip annotation files. Next, we associated genes to gene sets, considering all genes that were associated with at least one methylation locus under study. Gene sets were obtained from the Gene Ontology (GO) (29), which yields gene sets organized by biological process, from the Molecular Signatures Database (MSigDB) (30) that defines canonical biological pathways and from the Pharmacogenomics Knowledgebase (PharmGKB) (31) that defines known pathway targets of drugs. GO annotations for humans were obtained from GO on 14 May 2012. We tested the 2464 GO sets for which there were between 20 and 500 member genes, inclusive. Canonical pathway definitions from MSigDB version 3.0 were used, totalling 880 gene sets. All pathways for which at least one drug was known to target it were downloaded from PharmGKB on 21 July 2012, totalling 263 gene sets. In total, there were 3607 sets tested.

Computing correlation of heritable loci with open chromatin regions and known regulatory elements

We explored whether heritable loci were enriched for loci lying in open chromatin regions or in known regulatory elements. To do so, we used Fisher’s exact test (FET) (32), using our results of which loci were deemed heritable, in conjunction with external data sources which could be used to annotate the loci. In particular, we obtained open chromatin regions from data published by the encyclopedia of DNA elements (ENCODE) Project Consortium (33). Briefly, the University of North Carolina at Chapel Hill has collected formaldehyde-assisted isolation of regulatory elements (FAIRE) evidence of open chromatin and has made this data available through the University of California, Santa Cruz Genome Browser (34), from which we obtained it on 19 November 2012. In particular, we obtained all 273 110 open chromatin annotations for normal human astrocyte cells (cell type NH-A), the only cell type relevant to brain tissue that was available at this time, and used the LiftOver tool to map the coordinates to build hg18. For computing the overlap of heritable loci with known regulatory elements, we used publicly available data obtained from the ORegAnno database containing 23 206 known regulatory elements (35) downloaded on 19 November 2012. The majority (17 744 of 23 206, or 76%) of regulatory elements stored in ORegAnno are binding sites of CTCF; therefore, we restricted the regulatory elements to CTCF sites only. Determination of overlap between methylation loci and genomic annotations was computed using the BEDTools software (36).

Gene set enrichment testing

We performed gene set enrichment testing using FET, which tests whether the proportion of heritable methylation loci belonging to a gene set is larger than that expected by chance. We hypothesized that the FET P-values may be inaccurate because FET treats loci as independent and, therefore, does not account for correlated loci (2). Thus, we computed permutation-based P-values (using permutations of individuals) for the FET and found that the closed-form FET P-values were inflated. Consequently, we used the permutation-based P-values, from 200 000 permutations of the individuals, calling those with Bonferroni-corrected P-values <0.05 as significant.

Identification of genes preferentially expressed in brain regions using the same individuals

To identify genes that were preferentially expressed in each brain region (those expressed more highly in that region as compared with other regions), we used the matching gene expression data from our publicly available data set (GEO accession GSE15745). For this analysis, only individuals for whom all four brain regions were profiled (and were done so within the same batch) were kept, leaving 122 individuals. For each probe and each individual, the ranks of the probe intensities across the four brain regions were computed. Then, for each brain region and each probe, the ranks across all individuals were summed, resulting in a matrix of Rx4 summations, one for each of the four brain regions and each of the R probes. By the central limit theorem, each summation of ranks is normally distributed with mean 305.0 and variance 203.333, as there are 122 terms in each sum, and each term is sampled from a distribution with mean 2.5 and variance 1.667, assuming all ranks (1, 2, 3, 4 because we have four tissues) are equally likely. Probes were then mapped to genes using the Illumina probeset information file, and only those genes assayed by exactly one probe were retained. Finally, using FET, we measured correlation between whether a methylation locus was heritable and whether the gene associated with the locus was preferentially expressed in the relevant brain region. Only the set of genes both profiled in the expression data, and linked to a methylation locus, were considered.

Identification of genes preferentially expressed in brain regions using independent data

To identify genes that were highly expressed in brain tissue in general, we downloaded the raw gene expression profiles collected by Su et al. (37) for multiple cell types from GEO accession GSE1133. We used the robust multi-array average algorithm in Bioconductor (38) with R version 2.15.1 to both pre-process the array data and map probes to gene Entrez ID (39) using an updated annotation file hgu133ahsrefseqcdf_15.1.0. We kept only samples of normal tissues and cell types, leaving 73 samples profiled in duplicate. We then performed a one-sided Wilcoxon rank sum test to identify preferential expression in brain cell types relative to all other profiled cell types. Similarly to the previous section, FET was used to look for associations between a methylation locus being heritable, and whether the gene associated with that locus was preferentially expressed.

RESULTS

The number of heritable loci depended on the window size for defining cis-acting SNPs

To find an optimal window size across all methylation loci for inclusion of cis-acting SNPs, we centred a window symmetrically around each methylation locus, extending the size of this window through 10 kb, 50 kb, 100 kb, 500 kb and 1 Mb, and we also tried the entire local chromosome, as well as the entire genome. We then deemed the optimum window to be the one yielding the largest number of heritable methylation loci among those loci which had at least one SNP for every window size. As shown in Figure 1a, a window size of 50 kb led to the highest number of heritable methylation loci. After more permutations to obtain more accurate P-values (see ‘Materials and Methods’ section), we found 654, 812, 600 and 636 heritable methylation loci for FCTX, TCTX, PONS and CRBLM, respectively. Although the number of heritable loci is similar for both the 50- and 100 kb windows, it is clear that using too large of a window (e.g. the entire genome), or too small of a window (e.g. ≤10 kb), dramatically reduced the number of heritable loci.
Figure 1.

Number of heritable methylation loci in the four brain regions: TCTX, FCTX, CRBLM and PONS. (a) Number of methylation loci passing a Bonferroni-corrected P-value threshold of 0.05, as a function of DNA sequence window size, when using only methylation loci analysed for all window sizes (so as to make them comparable). (b) Histogram of the number of SNPs found within the 50 kb window of each of the 21 000 methylation loci.

Number of heritable methylation loci in the four brain regions: TCTX, FCTX, CRBLM and PONS. (a) Number of methylation loci passing a Bonferroni-corrected P-value threshold of 0.05, as a function of DNA sequence window size, when using only methylation loci analysed for all window sizes (so as to make them comparable). (b) Histogram of the number of SNPs found within the 50 kb window of each of the 21 000 methylation loci. We believe that our loss of power to detect heritable loci when the window size was extended beyond 50 kb is related to the loss of power we observed when using LMMs to correct for confounding variables in genome-wide association studies (40,41), although we now have a better understanding of this effect (http://research.microsoft.com/apps/pubs/default.aspx?id=178646). In particular, in the present context, most SNPs influencing a methylation locus are expected to be physically near to the locus (i.e. are cis-acting); therefore they can be captured by a relatively small window such as the 50 kb window we identified in Figure 1a. Below this window size, many influential SNPs are likely to be missed, causing a downwards bias in the estimate of and, therefore, of heritability. With increasing window sizes, more and more extraneous SNPs are included in the RRM, causing an increase in the variance of the estimate of heritability. This bias-variance trade-off is perhaps best understood in light of the fact that an LMM with no fixed effects, using genetic similarities constructed from a set of SNPs, is equivalent to a form of linear regression of those SNPs on the phenotype. Thus, using extraneous SNPs in the estimation of the RRM is equivalent to using them as additional covariates in this form of linear regression, which increases the variance of the estimate of , diminishing our power to detect heritable loci (). Therefore, in our analysis, as we included more and more SNPs up to and including a window which contained most influential SNPs (i.e. the 50 kb window), the downwards bias on heritability decreased (and the estimate of heritability increased). As we went beyond this optimal window size, an increasing proportion of extraneous SNPs were included in the RRM, up until the point where the variance of the estimate of heritability almost completely diminished our power to detect significantly heritable loci. This bias-variance effect would be mitigated by a larger sample size. Figure 1b illustrates the number of SNPs included in the local sequence window, for all methylation loci, at the selected optimal 50 kb window size. Our locality result is similar to that found by Price et al. (42), where it was found that heritability of gene expression was primarily because of SNPs at cis loci. In the univariate SNP-methylation association analysis of Bell et al. (14), they examined SNPs within 100 kb, but found that most associations were actually within a few kilobases, whereas Gibbs et al. (13) reported finding a peak at 45 kb. However, as noted in (19), use of a stringent, multiple-testing correction to select significantly associated univariate SNPs, as done in these two studies, is likely to miss much of the weaker signal that the LMM can capture. Thus, it is not surprising that our analysis finds an optimal local window which is slightly larger than what one might have speculated from stringent univariate analyses. We also found that heritable methylation loci tended to have larger number of SNPs within their windows than non-heritable loci: for CRBLM, the median number of SNPs in the 50 kb window was nine versus seven, whereas for all other tissues, the median number was eight versus seven (all P < 10−8, Wilcoxon rank-sum test). This result suggests that with more SNPs, there is more power to uncover heritable methylation loci. The number of SNPs in each window are provided in Supplementary Table S1. Figure 2 illustrates the distribution of estimated narrow-sense heritability over all 21 000 methylation loci for all four regions (region-specific distributions were similar to one another). The mean estimated heritability of all methylation loci deemed heritable (aggregated across all four brain regions) was 29.9%, indicating the extent to which local sequence alone can account for variation in methylation at those loci. Across all loci (including those not deemed heritable), the mean estimated heritability was 2.8%.
Figure 2.

Narrow-sense heritability estimates over all methylation loci in all four brain regions. Loci were divided based on whether they were located in (a) CpG islands (15 469 loci) or (b) not in CpG islands (5531 loci). Within each plot, loci are then further grouped based on whether they were identified as heritable. The smoothed histograms were constructed using density estimation with a Gaussian kernel with default parameters in R, and the y-axis is scaled to a maximum of 1. The number of individuals used in this analysis is reported in Table 1.

Narrow-sense heritability estimates over all methylation loci in all four brain regions. Loci were divided based on whether they were located in (a) CpG islands (15 469 loci) or (b) not in CpG islands (5531 loci). Within each plot, loci are then further grouped based on whether they were identified as heritable. The smoothed histograms were constructed using density estimation with a Gaussian kernel with default parameters in R, and the y-axis is scaled to a maximum of 1. The number of individuals used in this analysis is reported in Table 1.

Concordance of heritable loci across brain regions and with eQTL and methQTL

We next assessed the extent to which heritable methylation loci were shared across regions when using the 50 kb window size. We found that 181 loci were heritable across all four regions with mean estimated heritability of 41.4%, whereas 207 loci were heritable across at least three regions (Figure 3a). The estimated narrow-sense heritability shows generally good agreement among FCTX, TCTX and PONS (Figure 3b). Supplementary Table S2 reports the list of all methylation loci, their estimated heritability and the significance of association with their 50 kb cis-sequence window.
Figure 3.

Concordance of heritable loci across the four brain regions and with eQTL and mQTL from Gibbs et al. (a) Number of heritable loci found to be overlapping in each of the four regions, using the 50 kb window size. (b) Correlation of the estimated narrow-sense heritability for each methylation locus, between tissue regions, using only the 1451 loci that were significant in at least one region. (c) Breakdown of heritable methylation loci according to whether a locus was also found to have at least one cis-mQTL in the Gibbs study—‘common’ refers to a locus we identified as heritable and for which Gibbs found at least one mQTL; ‘Quon-specific’ means the locus was found to be heritable but did not have an mQTL in the Gibbs study; and ‘Gibbs-specific’ means Gibbs et al. found at least one mQTL for a locus that we did not find to have heritable methylation. (d) Percentage of heritable loci for which at least one eQTL was reported by Gibbs et al. for the gene nearest to the heritable methylation locus (and where the eQTL was within the 50 kb window of the heritable locus), as compared with the number for all methylation loci. An asterisk indicates significant to a threshold of 0.05, as determined by a FET.

Concordance of heritable loci across the four brain regions and with eQTL and mQTL from Gibbs et al. (a) Number of heritable loci found to be overlapping in each of the four regions, using the 50 kb window size. (b) Correlation of the estimated narrow-sense heritability for each methylation locus, between tissue regions, using only the 1451 loci that were significant in at least one region. (c) Breakdown of heritable methylation loci according to whether a locus was also found to have at least one cis-mQTL in the Gibbs study—‘common’ refers to a locus we identified as heritable and for which Gibbs found at least one mQTL; ‘Quon-specific’ means the locus was found to be heritable but did not have an mQTL in the Gibbs study; and ‘Gibbs-specific’ means Gibbs et al. found at least one mQTL for a locus that we did not find to have heritable methylation. (d) Percentage of heritable loci for which at least one eQTL was reported by Gibbs et al. for the gene nearest to the heritable methylation locus (and where the eQTL was within the 50 kb window of the heritable locus), as compared with the number for all methylation loci. An asterisk indicates significant to a threshold of 0.05, as determined by a FET. We compared the set of heritable loci to the set of methylation loci identified by Gibbs et al. as being associated with at least one cis-methylation quantitative trait locus (methQTL). We found that on average, 43% of each tissue’s set of heritable loci was identified as being associated with at least one cis-methQTL in the Gibbs study, indicating that we identified overlapping but distinct loci from that of Gibbs et al. (Figure 3c). We also identified on average 54% more methylation loci (with cis association) than did Gibbs et al. in their univariate scan. Note that their multiple-testing burden was larger because they also looked for trans-methQTLs. We next cross-referenced our list of heritable loci with the expression quantitative trait loci (eQTLs) reported by Gibbs et al. (first restricting the set of eQTLs to those within the 50 kb window of the 21 000 methylation loci and whose target gene is the same gene as the one we assigned to the respective methylation locus). We observed that in three of the four tissues (all but PONS), the heritable methylation loci were enriched for genomic regions containing cis-eQTLs [Figure 3d; P = 1.05 × 10−3 (FCTX), P = 3.16 × 10−3 (TCTX), P = 0.076 (PONS), P = 0.0202 (CRBLM); FET]. To explore the relationship between heritable methylation loci in each of the four brain regions and levels of gene expression in these brain regions, we again used the expression data corresponding to our samples, now to identify genes preferentially expressed in each region—genes expressed higher in that region than in others (see ‘Materials and Methods’ section). We found that the genes assigned to heritable loci identified in the frontal cortex and cerebellum brain regions were significantly depleted in genes preferentially expressed in that region [P = 0.024 (FCTX), P = 6.20 × 10−5 (CRBLM), P = 0.55 (TCTX), P = 0.90 (PONS), FET]. For a more general investigation of heritable loci and brain-specific expression, we obtained genome-wide expression profiles for 73 different cell types so as to identify those genes preferentially expressed in the brain compared with all other tissues. We found that heritable loci identified in three of the four brain regions (frontal cortex, temporal cortex and cerebellum) were significantly depleted near genes expressed more highly in the brain compared with other tissues [P = 4.64 × 10−4 (FCTX), P = 4.44 × 10−3 (TCTX), P = 2.32 × 10−4 (CRBLM), P = 0.19 (PONS), FET]. These results suggest that heritable loci are not regulating genes highly expressed in either brain-specific regions or whole-brain tissue, both of whom may be critical to brain function.

Heritable methylation loci were enriched for genomic locations containing regulatory elements

To assess the potential role of heritable methylation loci in gene regulation, we checked to see whether our heritable loci lay in regions previously annotated with genomic features that are indicative of gene regulatory elements. Using data from the ENCODE project (see ‘Materials and Methods’ section), we found that the heritable loci for all four brain regions were enriched in open chromatin regions [P = 9.43 × 10−3 (CRBLM), P = 0.02 (PONS), P = 0.0122 (FCTX), P = 0.018 (TCTX) FET]. Furthermore, when comparing our heritable loci with known CTCF binding sites [by way of ORegAnno (35), see ‘Materials and Methods’ section] we also found significant enrichment for overlap between the heritable loci and these regulatory elements [P = 0.031 (CRBLM), P = 0.035 (PONS), P = 0.035 (FCTX), P = 0.027 (TCTX), FET]. CTCF is implicated in both diverse genomic regulatory functions (activation, repression, insulation) and the global organization of chromatin architecture (43). Furthermore, DNA methylation of CTCF’s binding site is the best understood mechanism for modulating CTCF binding (43). As an example, methylation of CpG loci within the CTCF binding site eliminates binding of CTCF in vivo and has been demonstrated to disrupt its regulatory activity at the methylated binding site (44). These results suggest that heritable loci may play a regulatory role in the expression of neighbouring genes by modulating binding and activity of regulators, such as CTCF. We also investigated whether those methylation loci found to be heritable favoured any particular position relative to the nearest transcription start site (TSS). We found the heritable methylation loci for each brain region were enriched for loci lying outside of CpG-islands (all P < 1.84 × 10−4, FET). Furthermore, as illustrated in Figure 4, we found that the heritable loci in the PONS tissue region were preferentially located downstream of the TSS relative to other methylation loci (median position relative to TSS was 72 versus −2 bp, P = 2.6 × 10−3, Wilcoxon rank-sum test); we did not find similar preferences for the other three tissue regions (all other P > 0.57). Heritable loci located much farther downstream from the TSS indicate possible genetic influence over alternative splicing events (45).
Figure 4.

Relative position of heritable and non-heritable loci identified in the PONS tissue region with respect to the TSS of the gene to which they were closest. The x-axis has been thresholded at a distance of 2 kb.

Relative position of heritable and non-heritable loci identified in the PONS tissue region with respect to the TSS of the gene to which they were closest. The x-axis has been thresholded at a distance of 2 kb.

Genes proximal to heritable methylation loci are involved in a variety of processes

One of the primary roles of DNA methylation is to control gene expression of particular genes. We next identified whether heritable loci seemed to be controlling any specific classes of genes. To do so, we first assigned methylation loci to genes, based on proximity (see ‘Materials and Methods’ section). We then performed a gene set enrichment analysis on all genes assigned to heritable methylation loci, using 3607 gene sets from the Gene Ontology (GO) Process hierarchy, canonical biological pathways from the Molecular Signatures Database (MSigDB) and drug-targeted pathways from the Pharmacogenomics Knowledgebase (PharmGKB). Supplementary Table S3 shows which methylation loci are assigned to which gene sets. Among the gene sets found significant (Figure 5), two involved neurotransmitters (agmatine and dopamine), another involved neurotransmitter transporters (SLC transporters) and another involved nicotinamide salvaging (an anti-inflammatory pathway), suggesting candidate epigenetic mechanisms through which genotype may play an important role in drug efficacy. Other gene sets associated with heritable methylation loci involved regulation of energy production and the immune system. Supplementary Table S4 reports the results of the enrichment analysis on all categories tested. Enrichment tests were performed using only genes that were assigned to at least one of the 21 000 methylation loci assayed. This ‘background’ set of genes was not itself significantly enriched for any specific brain functions, although it was enriched for 55 GO categories (of the 2464 tested) across a variety of processes (Supplementary Table S5).
Figure 5.

Gene set enrichment of the heritable loci in each of the four brain regions. (a) A black rectangle indicates significant enrichment () for the specified set and brain region combinations. (b) Network illustration of gene sets found to be enriched for heritable loci. Each node represents one gene set, whereas each edge represents an overlap of at least one methylation locus between the two gene sets. The size of each node is proportional to the number of methylation loci assigned to the respective gene set, and the width of each edge is proportional to the gene set coherence, defined as the number of loci in the overlap divided by the smaller size of the two gene sets. The legend depicts the minimum and maximum node sizes, as well as the edge width corresponding to the minimum gene set coherence (0.05) and maximum gene set coherence (1.0).

Gene set enrichment of the heritable loci in each of the four brain regions. (a) A black rectangle indicates significant enrichment () for the specified set and brain region combinations. (b) Network illustration of gene sets found to be enriched for heritable loci. Each node represents one gene set, whereas each edge represents an overlap of at least one methylation locus between the two gene sets. The size of each node is proportional to the number of methylation loci assigned to the respective gene set, and the width of each edge is proportional to the gene set coherence, defined as the number of loci in the overlap divided by the smaller size of the two gene sets. The legend depicts the minimum and maximum node sizes, as well as the edge width corresponding to the minimum gene set coherence (0.05) and maximum gene set coherence (1.0).

DISCUSSION

Epigenetic mechanisms, such as DNA methylation, play a critical role in controlling the gene expression programme of cells, which in turn is thought to have significant impact on phenotype (3). Epigenetic markers, therefore, represent a potential mechanism through which genetic variation can affect phenotype. Herein, we examined how cis-DNA sequence influences methylation across the human genome in four phenotypically normal brain regions from unrelated individuals. We found that between 3 and 4% of the tested loci were heritable with respect to an empirically selected optimal cis DNA window of size 50 kb. Furthermore, the heritable loci were shown to be enriched in open chromatin regions, and also enriched in locations of known binding sites of CTCF, suggesting a functional role for at least some of these heritable loci in disrupting or modulating binding of transcription factors, such as CTCF. Also, genes associated with heritable loci in some of the brain regions were enriched in several pathways, including those involved in neurotransmitter processing, regulation of energy production and the immune system. None of the enriched gene sets are clearly brain region specific, suggesting the heritable loci we identified may be heritable in a wide range of tissues rather than brain-specific. The number of heritable methylation loci depended on how large of a window of SNPs was considered local. We found that a window size of 50 kb was optimum in achieving a maximal number of heritable loci across all regions, and informs on a window in which the mechanistic action through which SNPs alter CpG methylation could be investigated. As the window size was extended beyond an optimal range, we hypothesize that the variance in the estimate of heritability became extremely high (especially with such a small cohort), and, therefore, that our ability to detect significance was diminished (http://research.microsoft.com/apps/pubs/?id=178646). Our estimates of heritability were less than that reported in the twin-based study of Bell et al. (14), who found a mean genome-wide heritability of 18% from blood samples, as compared with our 3%. In Gervin et al. (18), heritability of the major histocompatibility complex region in cultured lymphocyte cells was investigated using a twin-based approach and was found to be low (2–16%). The discrepancy between our estimates and the twin-based estimates could be explained by unmeasured SNPs (46), the cohort or tissue in which measurements were performed, the upwards bias of twin-based studies (47–49) and limited sample size. Further studies should shed more light on this issue. There are a number of reasons to suspect that the fraction of CpG dinucleotides whose methylation status is heritable is larger than what we have reported here. First, our study only included individuals with phenotypically healthy brains, and we expect that analysis of a wider range of tissues may uncover genetic dependencies that are tissue or condition specific. Second, our Bonferroni correction of P-values is likely ignoring weakly heritable loci. Third, use of more dense SNP and methylation assays will allow for a more refined exploration of the genetic basis of methylation. Finally, if heritable loci were tissue specific, we would lose power to detect them when analysing mixed tissues as we have here.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online: Supplementary Tables 1–5.

FUNDING

Microsoft Research. Funding for open access charge: Microsoft Research. Conflict of interest statement. J.L., C.L. and D. H. own stock in Microsoft.
  45 in total

1.  Genomic control for association studies.

Authors:  B Devlin; K Roeder
Journal:  Biometrics       Date:  1999-12       Impact factor: 2.571

2.  Potential etiologic and functional implications of genome-wide association loci for human diseases and traits.

Authors:  Lucia A Hindorff; Praveen Sethupathy; Heather A Junkins; Erin M Ramos; Jayashri P Mehta; Francis S Collins; Teri A Manolio
Journal:  Proc Natl Acad Sci U S A       Date:  2009-05-27       Impact factor: 11.205

3.  Increased accuracy of artificial selection by using the realized relationship matrix.

Authors:  B J Hayes; P M Visscher; M E Goddard
Journal:  Genet Res (Camb)       Date:  2009-02       Impact factor: 1.588

4.  Common SNPs explain a large proportion of the heritability for human height.

Authors:  Jian Yang; Beben Benyamin; Brian P McEvoy; Scott Gordon; Anjali K Henders; Dale R Nyholt; Pamela A Madden; Andrew C Heath; Nicholas G Martin; Grant W Montgomery; Michael E Goddard; Peter M Visscher
Journal:  Nat Genet       Date:  2010-06-20       Impact factor: 38.330

5.  Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles.

Authors:  Aravind Subramanian; Pablo Tamayo; Vamsi K Mootha; Sayan Mukherjee; Benjamin L Ebert; Michael A Gillette; Amanda Paulovich; Scott L Pomeroy; Todd R Golub; Eric S Lander; Jill P Mesirov
Journal:  Proc Natl Acad Sci U S A       Date:  2005-09-30       Impact factor: 11.205

6.  Extensive sequence-influenced DNA methylation polymorphism in the human genome.

Authors:  Asaf Hellman; Andrew Chess
Journal:  Epigenetics Chromatin       Date:  2010-05-24       Impact factor: 4.954

7.  A user's guide to the encyclopedia of DNA elements (ENCODE).

Authors: 
Journal:  PLoS Biol       Date:  2011-04-19       Impact factor: 8.029

8.  Single-tissue and cross-tissue heritability of gene expression via identity-by-descent in related or unrelated individuals.

Authors:  Alkes L Price; Agnar Helgason; Gudmar Thorleifsson; Steven A McCarroll; Augustine Kong; Kari Stefansson
Journal:  PLoS Genet       Date:  2011-02-24       Impact factor: 5.917

9.  The influence of cis-regulatory elements on DNA methylation fidelity.

Authors:  Mingxiang Teng; Curt Balch; Yunlong Liu; Meng Li; Tim H M Huang; Yadong Wang; Kenneth P Nephew; Lang Li
Journal:  PLoS One       Date:  2012-03-06       Impact factor: 3.240

10.  Epigenome-wide scans identify differentially methylated regions for age and age-related phenotypes in a healthy ageing population.

Authors:  Jordana T Bell; Pei-Chien Tsai; Tsun-Po Yang; Ruth Pidsley; James Nisbet; Daniel Glass; Massimo Mangino; Guangju Zhai; Feng Zhang; Ana Valdes; So-Youn Shin; Emma L Dempster; Robin M Murray; Elin Grundberg; Asa K Hedman; Alexandra Nica; Kerrin S Small; Emmanouil T Dermitzakis; Mark I McCarthy; Jonathan Mill; Tim D Spector; Panos Deloukas
Journal:  PLoS Genet       Date:  2012-04-19       Impact factor: 5.917

View more
  31 in total

1.  The impact of genetic variation and cigarette smoke on DNA methylation in current and former smokers from the COPDGene study.

Authors:  Weiliang Qiu; Emily Wan; Jarrett Morrow; Michael H Cho; James D Crapo; Edwin K Silverman; Dawn L DeMeo
Journal:  Epigenetics       Date:  2015       Impact factor: 4.528

2.  Efficient set tests for the genetic analysis of correlated traits.

Authors:  Francesco Paolo Casale; Barbara Rakitsch; Christoph Lippert; Oliver Stegle
Journal:  Nat Methods       Date:  2015-06-15       Impact factor: 28.547

3.  Novel epigenetic determinants of type 2 diabetes in Mexican-American families.

Authors:  Hemant Kulkarni; Mark Z Kos; Jennifer Neary; Thomas D Dyer; Jack W Kent; Harald H H Göring; Shelley A Cole; Anthony G Comuzzie; Laura Almasy; Michael C Mahaney; Joanne E Curran; John Blangero; Melanie A Carless
Journal:  Hum Mol Genet       Date:  2015-06-22       Impact factor: 6.150

4.  RL-SKAT: An Exact and Efficient Score Test for Heritability and Set Tests.

Authors:  Regev Schweiger; Omer Weissbrod; Elior Rahmani; Martina Müller-Nurasyid; Sonja Kunze; Christian Gieger; Melanie Waldenberger; Saharon Rosset; Eran Halperin
Journal:  Genetics       Date:  2017-10-12       Impact factor: 4.562

Review 5.  Characterization of DNA methylation-based markers for human body fluid identification in forensics: a critical review.

Authors:  Farzeen Kader; Meenu Ghai; Ademola O Olaniran
Journal:  Int J Legal Med       Date:  2019-11-12       Impact factor: 2.686

6.  Personalized medicine: from genotypes, molecular phenotypes and the quantified self, towards improved medicine.

Authors:  Joel T Dudley; Jennifer Listgarten; Oliver Stegle; Steven E Brenner; Leopold Parts
Journal:  Pac Symp Biocomput       Date:  2015

7.  Correlation between DNA methylation and gene expression in the brains of patients with bipolar disorder and schizophrenia.

Authors:  Chao Chen; Chunling Zhang; Lijun Cheng; James L Reilly; Jeffrey R Bishop; John A Sweeney; Hua-Yun Chen; Elliot S Gershon; Chunyu Liu
Journal:  Bipolar Disord       Date:  2014-09-22       Impact factor: 6.744

8.  Characterization of the DNA methylome and its interindividual variation in human peripheral blood monocytes.

Authors:  Hui Shen; Chuan Qiu; Jian Li; Qing Tian; Hong-Wen Deng
Journal:  Epigenomics       Date:  2013-06       Impact factor: 4.778

9.  Identification of methylation quantitative trait loci (mQTLs) influencing promoter DNA methylation of alcohol dependence risk genes.

Authors:  Huiping Zhang; Fan Wang; Henry R Kranzler; Can Yang; Hongqin Xu; Zuoheng Wang; Hongyu Zhao; Joel Gelernter
Journal:  Hum Genet       Date:  2014-06-03       Impact factor: 4.132

Review 10.  Detecting epistasis in human complex traits.

Authors:  Wen-Hua Wei; Gibran Hemani; Chris S Haley
Journal:  Nat Rev Genet       Date:  2014-09-09       Impact factor: 53.242

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.