Literature DB >> 24069155

A minimal set of tissue-specific hypomethylated CpGs constitute epigenetic signatures of developmental programming.

Alejandro Colaneri1, Tianyuan Wang, Vijayakanth Pagadala, Jaya Kittur, Nickolas G Staffa, Shyamal D Peddada, Elvira Isganaitis, Mary Elizabeth Patti, Lutz Birnbaumer.   

Abstract

BACKGROUND: Cell specific states of the chromatin are programmed during mammalian development. Dynamic DNA methylation across the developing embryo guides a program of repression, switching off genes in most cell types. Thus, the majority of the tissue specific differentially methylated sites (TS-DMS) must be un-methylated CpGs. METHODOLOGY AND PRINCIPAL
FINDINGS: Comparison of expanded Methyl Sensitive Cut Counting data (eMSCC) among four tissues (liver, testes, brain and kidney) from three C57BL/6J mice, identified 138,052 differentially methylated sites of which 23,270 contain CpGs un-methylated in only one tissue (TS-DMS). Most of these CpGs were located in intergenic regions, outside of promoters, CpG islands or their shores, and up to 20% of them overlapped reported active enhancers. Indeed, tissue-specific enhancers were up to 30 fold enriched in TS-DMS. Testis showed the highest number of TS-DMS, but paradoxically their associated genes do not appear to be specific to the germ cell functions, but rather are involved in organism development. In the other tissues the differentially methylated genes are associated with tissue-specific physiological or anatomical functions. The identified sets of TS-DMS quantify epigenetic distances between tissues, generated during development. We applied this concept to measure the extent of reprogramming in the liver of mice exposed to in utero or early postnatal nutritional stress. Different protocols of food restriction reprogrammed the liver methylome in different but reproducible ways. CONCLUSION AND SIGNIFICANCE: Thus, each identified set of differentially methylated sites constituted an epigenetic signature that traced the developmental programing or the early nutritional reprogramming of each exposed mouse. We propose that our approach has the potential to outline a number of disease-associated epigenetic states. The composition of differentially methylated CpGs may vary with each situation, behaving as a composite variable, which can be used as a pre-symptomatic marker for disease.

Entities:  

Mesh:

Year:  2013        PMID: 24069155      PMCID: PMC3771925          DOI: 10.1371/journal.pone.0072670

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

Animals develop from repeated division of a single cell (a fertilized egg) as a result of precise spatio-temporal regulation of gene expression. Combinatorial use of cis and trans DNA regulatory elements permit commitment of cells to different lineages. Once a differentiation path is chosen the trajectory of gene expression is maintained through many cell generations. The acquisition of this “cellular memory” is key for the assembly of the different organized tissues, and is maintained largely by epigenetic marks such as 5-methyl cytosine methylation [1]–[3]. During development the genome is faithfully replicated millions of times but the epi-genome varies with each cell type. Moreover, in contrast to the genome, the epigenome is dynamic and sensitive to the environment. Exposure to stress during critical stages of development could cause subtle changes in the epigenome resulting in a small but time-cumulative effect on cellular physiology [1], [2], [4]–[6]. Thus, small deviations in tissue-specific methylation patterns could contribute to the developmental origins of many adult diseases [7], [8]. To understand the role of DNA methylation in the cell-type-specification of the chromatin or in the onset of pathophysiological mechanism, it is necessary to contrast the distribution of methyl-marks across various tissues and different conditions. The ultimate goal will be to identify tissue-specific regulatory loci controlled by methylation. Most of the research done to address this problem has focused on the function of promoters and their CpG islands (CGIs), shaping the perception that the CGIs are the hotspots for epigenetic regulation of tissue specific transcriptional activity [9] [10]–[14]. However, technological advances in methylome analysis have increasingly shown Tissue Specific Differentially Methylated Regions (TS-DMR) as hypo-methylated loci existing outside promoters and CGIs, [15]–[18]. We recently showed that most of the hypo-methylated loci existing in the liver of an adult mouse are located in introns or intergenic regions and do not meet commonly accepted definitions of CGIs [15]. Indeed these CpG-poor un-methylated loci showed the highest concentrations in DNA regulatory sequences, many of which were liver specific [15]. We hypothesized that the epigenetic regulation of transcriptional networks that specify tissues occurs primarily in CpG located outside promoters. Comparing the distribution of methylation sites in mouse gDNA derived from liver, kidney, brain and testis we found 138,052 independent differentially methylated sites (DMS). One sixth of them consisted of CpGs whose methylation levels were particularly high or low in only one tissue, hence named tissue specific differentially methylated sites (TS-DMS). Most of these are un-methylated CpGs associated with tissue-specific expressed genes. Functional profiles obtained from these genes suggest that relevant aspects of tissue physiology would be epigenetically demarcated during development. The distribution of methylation among a minimal set of 23,270 CpG showed commonalities between biological replicates and differences between tissues, suggesting that these TS-DMS constitute unique epigenetic footprints created during development. We detected further reprogramming of methylation patterns drove by in utero and/or immediate postnatal food restriction. We suggest that our approach can outline a number of disease-associated epigenetic states. The composition of DMS would be different for each situation representing highly specific DNA methylation biomarker panels. Discovery of novel epigenetic makers is an area of increasing interest in biomedical research, however most of the newly discovered candidates have been found by focusing on a small number of well-defined loci, usually promoters and CGIs [19]. As these loci can be identified either as methylated or un-methylated, they offer a limited sensitivity or specificity for diagnosis, [20], [21]. On the other hand, a composite variable, such as the composition of CpGs in a set of differentially methylated sites, can represent a larger number of different situations covering wider applications.

Materials and Methods

Samples

Four tissues: brain (B), kidney (K), liver (L) and testis (T) were obtained from each of three adult male C57Bl/6J mice. Fetal livers were also derived from the same mouse strain. In addition, 16 liver samples were used from three-week-old male ICR mice (Harlan, Indianapolis, IN, USA) that were subjected to a study of nutritional influences on diabetes and obesity risk [22], [23]. This study was carried out in strict accordance with the recommendations in the Guide for the Care and Use of Laboratory Animals of the National Institutes of Health. Animal protocols “Proposal # 2009-0017 LN, DNA methylation in obesity and diabetes”, were approved by The National Institute of Environmental Health Sciences Animal Care and Use Committee.

Genome-wide DNA methylation profiling using eMSCC

CpG tag libraries for eMSCC were prepared as described [15], including the addition of equimolar amounts of un-methylated lambda phage DNA (Promega, Cat.# D1521). All libraries were sequenced by Expression Analysis (Durham, NC, USA) using an Illumina Genome Analyzer IIx. Reads (30 to 50 million per library) were aligned to the mouse reference genome (mm9) using MOM [24].

Identification of TS-DMS

This procedure has four main steps: 1) For each of 12 samples (3 per tissue), we normalized the reads at each CpG by a factor that is proportional to the total number of reads in the sample; 2) We aligned all the reads in each sample to Takai-Jones CGIs (TJ-CGIs) and generated the distribution of the average normalized methyl-sensitive-counts (digestion frequencies) of TJ-CGIs. We chose the value at the valley of the distribution as the cutoff to filter out CpGs presumed heavily methylated in all four tissues; 3) For each CpG that passed the filter in step 2, we performed all six pairwise comparisons of the four tissues, i.e. B-K, B-L, B-T, K-L, K-T, L-T; where B, K, T and L represent brain, kidney, testis and liver respectively and B-K, B-L, etc., represent the pair wise comparisons. For each CpG, the average and the standard deviation of the reads were calculated using all samples. The standardized residual for each sample was then calculated by dividing the difference between the observed numbers of reads for that sample and the corresponding average by the standard deviation. We plotted the quantiles of the residuals against the quantiles of standard normal distribution to see if it is reasonable to assume that the data are approximately normally distributed; 4) We applied the mdFDR method to identify DMSs based on the p values of all six possible t-tests. The mdFDR procedure allowed us to control the overall false discovery rate for all pairwise comparisons as well as the directional errors when declaring more or less methylated states [25]. In this manner, we identified the preliminary DMS. We next removed the pairs of comparison with average normalized read differences between two tissues that were less than the cutoff value generated from step 2, to ensure that two tissues had a different methylation state at that CpG. Last, for each DMS we averaged reads of each tissue and identified the sites with methylation levels significantly high or low in only one tissue. These sites were called TS-DMS.

Identification of DMS associated with nutritional history

Previously, an experiment was reported in which pregnant mice were subjected to undernutrition (U) during the last third of their pregnancy and this feeding regimen was continued during lactation generating the UU pups, alternative some mothers were changed to a control diet (C) during lactation and generated the UC pups. CC and CU protocols were also implemented to generate the respective pups. All pups were euthanized and their livers analyzed [23]. We used 16 liver samples derived from the same experiment: CC (n = 4), CU (n = 5), UC (n = 2) and UU (n = 5) to profile changes in methylation using the approach described in previous section.

CpG distribution related to UCSC known genes

CpGs were classified according to their genomic location: 1) TSS region (−3 Kb to +2 Kb of the TSS); 2) gene body region; 3) 3′ end region encompassing 3 Kb of DNA sequences downstream of each transcriptional end; and 4) intergenic region. Coordinates for the beginning and end of these regions were taken from the mm9 building downloaded from the UCSC Genome Bioinformatics browser (http://hgdownload.cse.ucsc.edu/goldenPath/mm9/database/knownGene.txt.gz). The equally unmethylated CpGs (497,547) were defined as the CpGs with the average normalized df above the cutoff value of unmethylation in all four tissues and excluding any overlapping with DMSs.

CpG distribution related to CGIs

CpGs were mapped to three regions relative to CGIs and their 2-kb flanking regions (shores), including inside CGI, in CGI shores, and outside of CGI and its 2 Kb shores. Five CGI definitions were used for this analysis [26]–[30].

Measures of the association between TS-DMS and tissue-specific expressed genes

We used the Gene Expression Barcode 2.0 browser to obtain a list of genes expressed in brain, kidney, liver, and testis (http://rafalab.jhsph.edu/barcode/index.php?page=tissuegene). The original gene expression information in this database was extracted from publicly available gene expression results of 9,652 samples using the Mouse Genome 430 2.0 microarray [31]. In this set, a gene is considered expressed in a specific tissue if it is expressed in more than 95% of the samples of this tissue. We further extracted tissue-specific expressed genes by collecting, those that are expressed in only one of the four tissues. We considered a gene to be tissue differentially methylated when it has at least one TS-DMS within the region encompassing ±3 Kb from its edges. To standardize the different lists, all of them were converted to DAVID identifiers using the Gene ID conversion tool in NIAID's DAVID Bioinformatics Resources 6.7 [26], [27] website (http://david.abcc.ncifcrf.gov/). Only genes with DAVID identifiers were considered in this study. The odds ratio was calculated as the ratio of the odds of tissue-specific un-methylation occurring in tissue-specific expressed genes versus the odds of it occurring in non-tissue-specific expressed genes. Four binary sets of data were created to calculate the odds ratio (OR) in each tissue according to the formula: OR = (a/c)/(b/d). Where “a” is the number of genes tissue differentially methylated that were uniquely expressed in the same tissue; “b” are genes not associated with TS-DMS but uniquely expressed in the analyzed tissue; “c” is the number of genes tissue-differentially-methylated but expressed in more than one tissue; “d” is the number of genes not associated to TS-DMS and expressed in more than one tissue. Both ORs and p-values were calculated by using the logistic regression function in R package with one degree of freedom.

Overlap between TS-DMS and published tissue-specific cis-regulatory sequences

We downloaded a list of 32,266 mouse tissue-specific enhancers in cortex, liver, kidney, and testis identified by H3K4me1 ChIP-Seq study (http://chromosome.sdsc.edu/mouse/download.html) [32]. We aligned our identified TS-DMSs and a total of 7 million surveyed CpGs in the genome to the enhancers based on their genomic locations. Enhancer was defined as being covered by our eMSCC method if it has at least one TS-DMS. To quantify the fold enrichment of TS-DMS in enhancers with respect to the genome, we calculated the ratio between [number of TS-DMS within enhancers/total number of TS-DMS] and [total number of surveyed CpGs within enhancers/total number of surveyed CpGs in the genome]. We estimated the significance of the enrichments by simulation, which allowed us to assign p values based on the distribution of the fold enrichments generated from simulation of 100,000 iterations. For each iteration we randomly generated a set of genomic regions with the same number of enhancers and the same size (bp) of each enhancer. In addition, we examined the overlap of TS-DMS with 15,435 active enhancers, which were defined by both H3K4me1 and H3K27ac marks.

Functional annotation of CpGs using NIAID's DAVID Bioinformatics Resources

We used the tool provided in NIAID's DAVID Bioinformatics Resources 6.7 [26], [27] to analyze enrichments in five main categories: biological process, cellular component, molecular function, KEGG pathway, and tissue expression. We used the Bonferroni-corrected P value of 0.05 as the threshold to identify significantly enriched categories, which were sorted by -fold enrichment over the background frequency in the human genome. We focused on genes associated with CpGs uniquely unmethylated in one of the four tissues, using gene names derived from the Gene ID conversion tool within DAVID. A gene was defined as being associated with a CpG if the CpG was within 3 Kb upstream to 3 Kb downstream of the gene.

Functional enrichment analysis

The functional annotation tables usually display a rank-order with broader general term at the top, and most specific terms toward the bottom. Although the upper terms have greater statistical support (largely enriched, o lower p-values), they are usually less to almost no informative, but generally are the reported terms when the main output of the enrichment tool is based on functional enrichments tables. DAG graph facilitates the systematic set up of thresholds to select relevant terms based on the structure of the results. We used Gene Ontology enRIchment anaLysis and visuaLizAtion tool (GORILLA) [33] which allows to identify the most informative terms that are significant enriched.

Unsupervised hierarchical clustering analysis

Unsupervised hierarchical cluster analysis was performed with Cluster 3.0 and displayed using TreeView [34], [35].

Results

Identification of Tissue-Specific Differentially Methylated Sites (TS-DMS)

Brain, kidney, liver and testis were dissected from three C57BL/6J mice and genomic DNA was isolated from every specimen. Each sample was treated with four methylation-sensitive restriction enzymes to compare the extent of digestion at approximately 7 million CpGs. Briefly, twelve CpG-tag libraries were prepared according to our previously published method [15], and deep sequenced to generate 37.5 to 51.4 million aligned reads per sample, Table S1 in Supporting Information S1, Data Set S1. These reads contain ∼27-bp sequences located adjacent to CCGG, ACGT, GCGC, CCGC and GGCG digested sites. Digestion frequencies (df) were calculated by counting the number of reads aligned to each site in the genome. A site with low df may indicate either a need for deeper sequencing or that the site is highly methylated. Three observations support the idea that most poorly digested sites were associated with high levels of methylation, rather than with poor coverage or random sampling of CpG-tags during sequencing: 1) Of the approximately 7 million CpGs tested across this study, 5,283,360 were represented in the libraries suggesting an effective coverage; 2) To a large extent, sites that were resistant to the enzymatic digestion, were arranged in tandem within the body of genes, repetitive sequences and intergenic regions. All these places are known to harbor most of the genomic methylation marks; 3) Digestion frequencies for poorly identified sites were consistently low in all the tissues and in all the replicates, which is not expected as a result of random sampling. Figure 1 illustrates these observations. Each digestion profile represents the averaged frequencies of three replicates. Highly digested sites segregate from those resistant to the enzymes. The majority of sites located in the 5′ regions of genes and in CGIs scored high digestion frequencies, whereas sites in intergenic regions, repetitive DNA or gene bodies, scored low. Figure 1 shows tissue specific differentially methylated regions (TS-DMR) spanning the entire intragenic CGIs in the last exon of several Zinc finger proteins, Table 1. All these differentially methylated CGIs distinguish somatic tissues from testis. Indeed, this epigenetic dichotomy seems to reflect a transcriptional dichotomy, Figure 2. For instance, the CGI located in the 5′ region of the Repin 1 gene is un-methylated and coincides with DNAse hypersensitivity in all the somatic tissues, whereas, the intragenic CGI located in the 3′ region of the gene is hyper-methylated, and insensitive to DNase I.
Figure 1

Identification of tissue differential methylated sites (T-DMS) led to the identification of tissue differentially methylated regions (T-DMR).

Averaged digestion frequencies were profiled for each tissue and depicted with the UCSC genome browser. The long RNA-seq track generated by the transcriptome group at Cold Spring Harbor Laboratories and the Center for Genomic Regulation in Barcelona display the density of mapped reads. Individual tissues were harvested from 8-month C57BL/6J mice. The UW ENCODE group generated the DNaseI hypersensitivity track, the signal represent the density of reads mapped within a 150 bp sliding window. Tissues were harvested from mice of the same strain and age as described above. A) region containing the entire Repin1 gene. B), C) and D) regions spanning the last exon of the Zfp467, Zfp777 and Zfp618 genes respectively. E) scatter plot comparing digestion frequencies of liver replicates (upper panel) or different tissues (lower panel). Data was collected for the “chr6: 48,537,207–48,558,557” interval. The solid lines represent the result of a linear regression; the dashed lines defined the 95% interval of prediction. Data outside this interval could represent tissue differentially methylated CpGs. F) bisulfite sequencing analysis confirmed the discovered tissue-specific differentially methylated region. Regions selected for this analysis are shown as black bars in the panels A, B and C.

Table 1

A set of tissue differentially expressed Zfp genes with TS-DMS in their 3′ terminal exon.

Methylation in 3′ exon1 Gene expression: RNA-seq2
Gene List5′ CGI?3 3′ exon CGI?DMR in 3′ exonBKLTFCh4 BKLTFCh5
Zfp92noyeschrX:70,667,009–70,668,6691.81.82.415.87.8251155
Zfp787yesyeschr7:6,083,186–6,085,9056.78.34.030.84.93438145501814.5
Zfp775noyeschr6:48,569,295–48,571,5273.12.01.736.616.214016862563
Zfp768yesyeschr7:134,486,685–134,488,3642.67.45.327.15.331314254627821.8
Zfp689yesyeschr7:134,587,450–134,589,7906.39.64.431.14.614813843412−2.8
Zfp647yesyeschr15:76,741,187–76,743,4201.00.81.523.921.780341598.9
Zfp64yesyeschr2:168,751,178–168,753,4167.52.31.321.95.9289249159466.3
Zfp629noyeschr7:134,754,156–134,757,4452.12.90.636.019.12352751312001.4
Zfp536yesyeschr7:38,264,240–38,266,6314.14.02.721.15.9397322814.2
Zfp467yesyeschr6:48,387,913–48,389,95914.812.07.030.12.72516052352722.4
Zfp398yesyeschr6:47,816,064–47,817,2700.60.40.26.36.3300315871462.2
Zfp358yesyeschr8:3,494,937–3,497,6501.51.31.429.220.990111873294725.3
Zfp324yesyeschr7:13,555,845–13,557,6133.04.12.636.611.4972121081415.1
Zfp282yesyeschr6:47,853,767–47,857,2038.98.04.319.12.7279159492810.0
Zfp275yesyeschrX:70,599,182–70,599,9941.00.30.37.27.212510552582.2
Zfp213yesyeschr17:23,694,614–23,695,4423.14.32.312.73.9123256124624.1
Zfpm2yesnochr15:40,933,302–40,935,1511.11.01.523.519.6108124392.8
Zfp94yesnochr7:25,087,544–25,089,2061.40.91.024.222.0973115442.2
Zfp809yesnochr9:22,043,700–22,045,6600.20.30.219.819.8239180158524.6
Zfp804ayesnochr2:82,099,024–82,099,8722.912.43.224.23.9181111181
Zfp791nonochr8:87,633,541–87,638,1831.61.61.314.810.14705265443071.8
Zfp780bnonochr7:28,746,705–28,746,9436.37.15.226.04.252151668.7
Zfp764yesnochr7:134,548,323–134,549,9002.72.81.423.410.210415446625.7
Zfp691nonochr4:118,842,688–118,844,1332.94.32.320.16.4120379230479−1.3
Zfp69yesnochr4:120,602,876–120,604,4721.52.10.834.923.628143171.6
Zfp672yesnochr11:58,129,068–58,131,8187.96.12.935.76.354510834247111.5
Zfp668yesnochr7:135,009,556–135,012,9521.91.80.916.210.61887455424.5
Zfp619yesnochr7:46,790,771–46,795,8030.30.40.418.818.84321958.6
Zfp612yesnochr8:112,612,925–112,614,3710.30.30.319.019.03503119566.3
Zfp608yesnochr18:55,057,322–55,060,6834.27.23.819.43.81018619273−2.7
Zfp42nonochr8:44,380,959–44,381,8430.81.62.417.210.931280−26.7
Zfp341yesnochr2:154,471,197–154,472,4354.85.12.030.97.81315341196.9
Zfp27nonochr7:30,679,004–30,681,8410.10.10.011.011.01984636326.2
Zfp248yesnochr6:118,388,618–118,389,6610.40.30.416.216.2122126
Zfp239nonochr6:117,820,865–117,823,0173.34.78.224.64.66757621096.2
Zfp180yesnochr7:24,889,403–24,892,2101.82.02.119.09.744820388845.3
Zfp112nonochr7:24,902,263–24,904,8684.63.22.718.75.3366249
Zfp365yesnochr10:67,350,323–67,351,7044.83.93.536.38.939838131327.7
Zfp809yesnochr9:22,043,950–22,044,9000.20.30.219.819.8239180158524.6

) The level of methylation in the 3′exon of each Zfp gene was calculated for each tissue by adding the averaged digestion frequencies of triplicates at each CpG identified in the exon, and dividing by the number of identified CpGs in the exon. A digestion frequency around 26 is typically found in un-methylated CGI (Figure S2E in Supporting Information S1), where df<10 are associated to methylated CGI.

) The numbers represent the raw signal for the density of mapped reads (wiggle format) in the CSHL Long RNA-seq track in the UCSC genome browser (NCBI37/mm9 assembly).

) All 5′ CGI were found un-methylated in all the tissues for each Zfp gene listed in this table.

) Fold changes calculated as the ratio between df at testes and the avg df of the three somatic tissues.

) Fold change is calculated as the ratio of somatic tissue vs testes. The somatic tissue with the highest RNA-seq signal or “df” was chose in each case. Abbreviation: CGI for CpG Island, df for digestion frequency, B for brain, K for kidney, L for liver and T for testes.

Figure 2

Co-occurrence of differential tissue gene expression and differential tissue methylation at 3′ terminal exons of a set of Zfp genes.

A, paired comparisons of the digestion frequencies (somatic tissues vs. testis) tabulated in Table 1 under the column header: “methylation in 3′ exon”. B and C, distribution of linear fold change (somatic tissues vs. testis) values for gene expression and methylation respectively. Box plots represent the data tabulated under FCh columns in Table 1. D, paired comparison of gene expression (somatic tissues vs. testis), the somatic tissue with the highest raw signal was chosen for comparison in each pair.

Identification of tissue differential methylated sites (T-DMS) led to the identification of tissue differentially methylated regions (T-DMR).

Averaged digestion frequencies were profiled for each tissue and depicted with the UCSC genome browser. The long RNA-seq track generated by the transcriptome group at Cold Spring Harbor Laboratories and the Center for Genomic Regulation in Barcelona display the density of mapped reads. Individual tissues were harvested from 8-month C57BL/6J mice. The UW ENCODE group generated the DNaseI hypersensitivity track, the signal represent the density of reads mapped within a 150 bp sliding window. Tissues were harvested from mice of the same strain and age as described above. A) region containing the entire Repin1 gene. B), C) and D) regions spanning the last exon of the Zfp467, Zfp777 and Zfp618 genes respectively. E) scatter plot comparing digestion frequencies of liver replicates (upper panel) or different tissues (lower panel). Data was collected for the “chr6: 48,537,207–48,558,557” interval. The solid lines represent the result of a linear regression; the dashed lines defined the 95% interval of prediction. Data outside this interval could represent tissue differentially methylated CpGs. F) bisulfite sequencing analysis confirmed the discovered tissue-specific differentially methylated region. Regions selected for this analysis are shown as black bars in the panels A, B and C.

Co-occurrence of differential tissue gene expression and differential tissue methylation at 3′ terminal exons of a set of Zfp genes.

A, paired comparisons of the digestion frequencies (somatic tissues vs. testis) tabulated in Table 1 under the column header: “methylation in 3′ exon”. B and C, distribution of linear fold change (somatic tissues vs. testis) values for gene expression and methylation respectively. Box plots represent the data tabulated under FCh columns in Table 1. D, paired comparison of gene expression (somatic tissues vs. testis), the somatic tissue with the highest raw signal was chosen for comparison in each pair. ) The level of methylation in the 3′exon of each Zfp gene was calculated for each tissue by adding the averaged digestion frequencies of triplicates at each CpG identified in the exon, and dividing by the number of identified CpGs in the exon. A digestion frequency around 26 is typically found in un-methylated CGI (Figure S2E in Supporting Information S1), where df<10 are associated to methylated CGI. ) The numbers represent the raw signal for the density of mapped reads (wiggle format) in the CSHL Long RNA-seq track in the UCSC genome browser (NCBI37/mm9 assembly). ) All 5′ CGI were found un-methylated in all the tissues for each Zfp gene listed in this table. ) Fold changes calculated as the ratio between df at testes and the avg df of the three somatic tissues. ) Fold change is calculated as the ratio of somatic tissue vs testes. The somatic tissue with the highest RNA-seq signal or “df” was chose in each case. Abbreviation: CGI for CpG Island, df for digestion frequency, B for brain, K for kidney, L for liver and T for testes. Whilst these results uphold the effectiveness of Methyl-Sensitive Cut Counting (MSCC) to profile methylation [15], [36], [37], we harnessed the power of this method to perform a multiple testing procedure to detect differentially methylated sites (DMS), [38]. By implementing MSCC in a multiple pairwise comparison design, we controlled site-specific biases, Figure S1 in Supporting Information S1. In addition, the variability introduced by random errors is weighed in the denominator of the equation used to calculate the t-statistic; this allows us to ascribe the differences in the digestion frequencies to differences in methylation levels. Although our experimental design allows for comparisons across the dynamic range, we contrasted only those sites that are highly supported by sequencing data. To conduct this selection we first normalized the libraries to reach similar sequencing depth Figure S2E in Supporting Information S1. The average df was computed for every CGI defined by the Takai and Jones algorithm [26]. For each library, the 21,246 computed scores were organized in a frequency histogram, Figure S2E in Supporting Information S1. The figure shows a bimodal distribution that is consistent with the bimodal pattern of methylation found in CpG islands [39]. As such the frequency histogram has been used to directly determine the methylation status of individual CGIs [15], but here we used them to derive an unrefined estimate of the relationship between digestion frequency and rate of methylation at each surveyed CpG. The sites with a df of 10 or less were considered heavily methylated and sites not reaching the cut-off in any of the libraries were excluded. For the identification of DMS multiple pairwise Welch t tests were performed. The null hypothesis was stated as no difference between the averages of the triplicates. Based on the differences detected by the Welch t-test further directional decisions were made to sort the means according to their values. This allowed the identification of sites whose rate of digestion was unusually high or low in only one of the tissues. These sites were called TS-DMS. The test was designed to control for errors associated with wrong rejections of the null hypothesis and also for assigning wrong inequality patterns. A mixed directional false discovery rate (mdFDR) of 10% was chosen in this analysis [25]. Mouse gDNA samples were spiked with lambda gDNA (λDNA) to achieve an equimolar ratio between both genomes. In this way, for each one of the 1,202 completely un-methylated sites present in the lambda genome, 12 replicates were obtained, Data Set S2. Figure S2 A and C in Supporting Information S1 suggest that df computed for these replicates are approximately normally distributed with very slight departures in the tails of the distribution. Comparison of digestion frequencies among λ-gDNA replicates reveals that 10% of the 1,202 λ-sites were found to have significant differences in their digestion frequencies. Although no differences were expected between the 12 un-methylated λDNA replicates, our results show that these empirical values for false discovery rate were very close to the theoretical value of 10% selected for this study. Applied to CpGs from the mouse genome the comparative differences reveals 150,428 sites where the methylation varies to the same extent among the analyzed four tissues. Furthermore, retaining only the sites whose digestion frequencies differed by a minimum of 10 resulted in a final count of 138,052 DMS (Data Set S3). Only 24,803 of these DMS were TS-DMS, i.e. sites that were hypo or hyper-methylated in only one tissue, Figure 3. Notably, 23,270 (94%) of these TS-DMS are un-methylated CpGs in one of the four tissues, of which 57% were in the testis.
Figure 3

Summary of the findings of differential methylation observed for the comparison of tissue.

Of the 138,052 sites with differentially methylated marks, called DMS, only 24,803 were also TS-DMS, defined here as CpGs uniquely methylated or un-methylated in one but not in the other tissues. The remaining 113,249 sites are just DMS, meaning that methylation at these CpGs varies at a detectable level among compared tissues.

Summary of the findings of differential methylation observed for the comparison of tissue.

Of the 138,052 sites with differentially methylated marks, called DMS, only 24,803 were also TS-DMS, defined here as CpGs uniquely methylated or un-methylated in one but not in the other tissues. The remaining 113,249 sites are just DMS, meaning that methylation at these CpGs varies at a detectable level among compared tissues.

Distribution of TS-DMS in relation to Genes and CpG Islands

An overwhelming number of genome-wide methylation studies focused on CGIs and promoter regions have proposed them as hot spots for epigenetic control of gene expression through CpG methylation [40]. To investigate if our results are consistent with this concept, we compared the distribution of sites having a comparable methylation level among the four tissues with the distribution of TS-DMS. Almost all the sites identified as DMS or TS-DMS were found in introns or intergenic regions which are remote from promoters, Figure 4A. Recently, it has been proposed that intergenic and intragenic CGIs function as alternative promoters which can be repressed in a tissue specific manner through methylation of their CpGs [11], [12], [14]. However we found that only 5 to 25% of TS-DMS were parts of CpG islands. The fraction of TS-DMS in CGIs increased proportionally to the number of island in each set, Figure 4-B.
Figure 4

Distribution of TS-DMS relative to promoters, CGI, CGI shores, experimentally determined un-methylated regions (UMR) or shores of UMRs.

A) Distribution of equally and tissue differentially un-methylated CpGs relative to UCSC genes. B) Distribution of differentially un-methylated CpGs relative to CGIs and their 2-kb shores. TJ-CGIs (Takai and Jones CpG islands), GGF-CGIs (Gardiner- Garden and Frommer's CpG islands), Epi-CGIs (epigenetically predicted CpG islands), CGI clusters (CpG clusters), HMM CGI (Hidden Markov Model predicted CpG islands).

Distribution of TS-DMS relative to promoters, CGI, CGI shores, experimentally determined un-methylated regions (UMR) or shores of UMRs.

A) Distribution of equally and tissue differentially un-methylated CpGs relative to UCSC genes. B) Distribution of differentially un-methylated CpGs relative to CGIs and their 2-kb shores. TJ-CGIs (Takai and Jones CpG islands), GGF-CGIs (Gardiner- Garden and Frommer's CpG islands), Epi-CGIs (epigenetically predicted CpG islands), CGI clusters (CpG clusters), HMM CGI (Hidden Markov Model predicted CpG islands). Most recently, CGI shores, not CGI, have been proposed to be the targets of tissue-specific epigenetic regulation [41], [42]. We found that the fraction of TS-DMS mapping to CGI shores (2000 base pairs from the edge of a CGI) follows the same trend and proportions than those mapping to CGIs, Figure 4B. The different CGI sets used in this study represent efforts to improve predictions of functionality on GC-rich genomic sequences, mainly promoter activity and un-methylated status. Notably the Gardiner- Garden and Frommer's set (GGF-CGI), which is the oldest, biggest and considered the less rigorous in term of functional prediction, is the one that collects the biggest number of TS-DMS. Approximately 60% of the tissue specific un-methylated CpGs are equally distributed between GGF-CGI and their shores. Conversely, the GGF-CGI-derived more stringent sets, such as those produced applying the Takai and Jones criteria [26] or with an epigenome prediction pipeline [28], are the ones containing the smallest fractions of TS-DMS in their CGIs or shores, Figure 4b. Thus the main characteristic in a set correlating with the proportion of TS-DMS is their number of predicted CpG rich loci, suggesting that the effort done to improve the prediction of epigenetically regulatory sequences in a genome failed to capture the DNA features that guide tissue-specific epigenetic mechanisms.

Association between TS-DMS and tissue-specific expressed genes

The Gene Expression Barcode browser contains a list of genes whose levels of expression, in 89 murine tissues, were represented as binary calls (expressed or not-expressed) [31]. These lists were made from 9,652 publicly available gene expression results obtained with the Mouse Genome 430 Array 2.0. Genes detected above threshold in more than 95% of samples of a specific tissue are considered expressed in that tissue. We considered a gene to be expressed in a tissue-specific manner when it is expressed in only one of the four organs included in our study. In addition we created a list including all the genes with at least one TS-DMS in the region extending from -3 Kb to +3 Kb of the edges of the gene. We found that genes associated with TS-DMS are more likely to be expressed in a tissue specific manner. The strength of this association varies across the different tissues, Table 2.
Table 2

Association of genes with TS-DMS and tissue specific gene expression in each tissue.

TissueTS-DMS associated genesTS-expressed genes1 OverlapsOdds Ratio2 p-value3
Brain1,0351,0801603.187.42E-36
Kidney940434522.557.41E-10
Liver2,0908783395.54.64E-118
Testis2,7002,3934381.311.97E-06

) Taken from the Gene Expression Barcode 2.0 database (http//:rafalab.jhsph.edu/barcode/).

) The ratio of the odds of tissue-specific un-methylation occurring in tissue-specific expressed genes.

) p-values calculated using the logistic regression function in R package with one degree of freedom.

) Taken from the Gene Expression Barcode 2.0 database (http//:rafalab.jhsph.edu/barcode/). ) The ratio of the odds of tissue-specific un-methylation occurring in tissue-specific expressed genes. ) p-values calculated using the logistic regression function in R package with one degree of freedom.

TS-DMS are more frequently found at CpGs inside tissue specific cis-regulatory sequences

A map of cis-regulatory elements for the mouse genome has recently been published [32]. The map includes 53,834 putative promoters and 234,764 potential enhancers, many of which are tissue specific. These enhancers were defined according to the presence of ChIP-Seq peaks for histone H3 lysine 4 mono-methylation marks (H3K4me1) outside promoters. We obtained the coordinates for 24,128 tissue-specific enhancers that are divided between liver, kidney and testis (http://chromosome.sdsc.edu/mouse/download.html). We compared the relative number of TS-DMS and surveyed CpG sites inside and outside of these enhancer-DNA sequences. We found that the percentage of TS-DMS within enhancers is up to 30 times higher than outside enhancers Table 3 (p-value<10−5, simulation of 100,000 iterations), with enrichment varying across tissues. Next we examined the overlap of TS-DMS with active enhancers, as defined by acetylation marks (H3K27ac), [32] (Table 4). A comparison of the results (Table 3 with Table 4) shows that most of the TS-DMS derived from somatic tissues are localized in active enhancers, while the opposite is true for the case of the TS-DMS derived from the testis. For example, of the 792 TS-DMS marking enhancers in liver, 561 (70%) are in active enhancers.
Table 3

Overlap between TS-DMS and tissue-specific enhancers derived from ChIP-Seq H3K4me1 peaks (poised and active enhancers).

TissueTS-enhancersTS-DMSsOverlaps1 # CpG2 Fold Enrichmen3 P-value4
Cortex-Brain 5 8,1381,14960; (5.2%)64,5125.63<1.0 E-5
kidney 5,9761,115147 (13.2%)44,54920.58<1.0 E-5
liver 8,7012,933792 (27.0%)63,05329.79<1.0 E-5
testis 9,4518,8653,314 (37.4 3,314 (37.4%)164,86215.77<1.0 E-5

) Number of TS-DMS within TS-enhancer regions.

) Number of surveyed CpGs within TS-enhancer regions.

) The quotient of two ratios (TS-DMS within TS-enhancer regions/total number of TS-DMSs) and (total number of surveyed CpGs within TS-enhancer regions/total number of surveyed CpGs in the genome).

) p-value calculated according to the distribution of the fold enrichments generated from simulation of 100,000 iterations.

) While these enhancers are derived from cortex the TS-DMS are derived from whole brain.

Table 4

Overlap between TS-DMSs and active enhancers (H3K27ac and H3K4me1 marks).

TissueTS-enhancersTS-DMSsOverlaps1 # CpG2 Fold Enrichment3 p-value4
Cortex-Brain 5 4,1281,14943; (3.7%)33,0107.89<1 E-5
kidney 3,7211,115115 (10.3%)29,18724.58<1 E-5
liver 4,9412,933561 (19.1%)38,69734.38<1 E-5
testis 2,6458,865758 (8.6%)50,78311.71<1.E-5

) Number of TS-DMSs within enhancers with H3K27ac marks.

) Number of surveyed CpGs within enhancers with H3K27ac marks.

) The quotient of two ratios, (number of TS-DMS within H3K27ac enhancers/total number of TS-DMS) and (total number of surveyed CpGs within H3K27enhancers/total number of surveyed CpGs in the genome).

) p-value calculated according to the distribution of the fold enrichments generated from simulation of 100,000 iterations.

) While these enhancers are derived from cortex the TS-DMS are derived from whole brain.

) Number of TS-DMS within TS-enhancer regions. ) Number of surveyed CpGs within TS-enhancer regions. ) The quotient of two ratios (TS-DMS within TS-enhancer regions/total number of TS-DMSs) and (total number of surveyed CpGs within TS-enhancer regions/total number of surveyed CpGs in the genome). ) p-value calculated according to the distribution of the fold enrichments generated from simulation of 100,000 iterations. ) While these enhancers are derived from cortex the TS-DMS are derived from whole brain. ) Number of TS-DMSs within enhancers with H3K27ac marks. ) Number of surveyed CpGs within enhancers with H3K27ac marks. ) The quotient of two ratios, (number of TS-DMS within H3K27ac enhancers/total number of TS-DMS) and (total number of surveyed CpGs within H3K27enhancers/total number of surveyed CpGs in the genome). ) p-value calculated according to the distribution of the fold enrichments generated from simulation of 100,000 iterations. ) While these enhancers are derived from cortex the TS-DMS are derived from whole brain.

Genes associated with TS-DMS are enriched in functional annotations specific to the corresponding tissue

We created two lists representing different levels of association between the TS-DMS and their corresponding genes, Data Set S4. If a TS-DMS is outside the region occupied by the gene ±3,000 bp it was assigned to the nearest TSS (weak association). If a TS-DMS is inside the region occupied by the gene ±3,000 bp it was directly assigned to that gene (strong association). Gene annotation enrichment analyses using these gene lists were mainly based on the Gene Ontology data base (GO) [43]. To obtain the functional profiles we used DAVID Bioinformatics Resources [44], [45] and Gene Ontology enRIchment anaLysis and visuaLizAtion tool (GORILLA) [33]. Genes associated with TS-DMS were significantly enriched in tissue-specific expression annotation in each of the four tissues, (Table 5). Table 6 tabulates the functional enrichment results for biological processes in the context of a directed acyclic graph (DAG) structure created using the GOrilla tool, e.g. Figure 5. The table includes all the terms at terminal nodes of the DAG describing the most specific and informative biological processes that showed significant enrichment. Terms described in these sets matched distinct attributes of the physiology and biochemistry of each particular tissue, Figures 5 and Figures S3and S4 in Supporting Information S1, also Table 6. Interestingly, genes physically linked to TS-DMS are mostly associated with typical activities of the respective adult organ, whilst many genes with TS-DMS distal to their promoters are associated with embryonic development, Table 6 and Figure S4 in Supporting Information S1. Finally, we scored and ranked the genes according to the number of TS-DMS associated with them, and used the GOrilla tool to discover GO terms that are significantly enriched at the top of a ranked gene list. In the case of liver, we found very high enrichments for genes involved in lipid metabolism and homeostasis, particularly for processes including synthesis, transport and cellular response to cholesterol, Table 7.
Table 5

Enrichment of tissue-specific gene expression annotations performed with the DAVID Bioinformatics Resources.

TS-DMS1 TS-DMS in genes2 Genes with TS-DMSTS-genes3 %p-value4
Brain238214021098575525.6 E-36
Kidney21771250999170177.6 E-8
Liver562633262218702321.1 E-79
Testes1305858483042598203.8 E-5

) Number of CpGs that were identified as uniquely un-methylated.

) Number of TS-DMS located in the region spanned from ±3 kb of the gene edges.

) Genes annotated as tissue-specific expressed.

) Adjusted p-values (Bonferroni correction for family-wise error rate).

Table 6

Gene Ontology enrichment results on the list of genes associated to liver specifically un-methylated CpG.

GO Term1 DescriptionP-valueFDR q-valueEnrichmentB2 b3
Liver
GO:0042632cholesterol homeostasis1.79E-134.42E-115.314825
GO:0008203cholesterol metabolic process5.52E-131.14E-103.948834
GO:1901606alpha-amino acid catabolic process2.17E-103.05E-084.116225
GO:0007169transmembrane receptor protein tyrosine kinase signaling pathway3.30E-104.56E-082.3924357
GO:0007584response to nutrient3.64E-083.56E-064.274318
GO:0070328triglyceride homeostasis4.60E-084.35E-066.112012
GO:0006633fatty acid biosynthetic process4.94E-084.60E-062.9110530
GO:0046889positive regulation of lipid biosynthetic process1.08E-079.55E-063.715520
GO:0090207regulation of triglyceride metabolic process1.26E-071.10E-055.32513
GO:0033344cholesterol efflux1.95E-071.61E-055.562212
Brain
GO:0007399nervous system development1.30E-082.26E-062.6930740
GO:0007411axon guidance1.71E-082.76E-063.6614125
GO:0007409axonogenesis1.98E-083.11E-063.7513224
GO:0007169transmembrane receptor protein tyrosine kinase signaling pathway8.11E-081.07E-052.8124233
GO:0070588calcium ion transmembrane transport7.17E-077.43E-053.879618
GO:0071902positive regulation of protein serine/threonine kinase activity1.20E-061.20E-042.8119827
GO:0007163establishment or maintenance of cell polarity2.00E-061.89E-043.988316
GO:0006813potassium ion transport2.18E-062.04E-043.2113521
GO:0007626locomotory behavior2.94E-062.56E-042.9716023
GO:0050772positive regulation of axonogenesis3.47E-062.95E-044.955012
Testes
GO:0071805potassium ion transmembrane transport1.63E-085.94E-062.828632
GO:0006816calcium ion transport3.98E-071.05E-042.0817548
GO:0007156homophilic cell adhesion5.06E-071.30E-042.787126
GO:0007411axon guidance5.15E-071.30E-042.2114141
GO:0007126meiosis1.89E-063.95E-042.478929
GO:0070192chromosome organization involved in meiosis2.27E-064.67E-043.673115
GO:2000310regulation of N-methyl-D-aspartate selective glutamate receptor activity3.17E-066.09E-046.07108
GO:0060079regulation of excitatory postsynaptic membrane potential4.53E-067.89E-043.224017
GO:0032673regulation of interleukin-4 production8.87E-061.34E-033.962312
GO:0007283spermatogenesis7.38E-057.66E-032.0526758
Kidney
GO:0030029actin filament-based process9.49E-058.38E-032.3523224
GO:0006820anion transport6.78E-056.33E-032.1631630
GO:0022610biological adhesion3.49E-079.86E-052.0363957
GO:0007167enzyme linked receptor protein signaling pathway6.11E-055.81E-032.0637634
GO:0046847filopodium assembly4.26E-071.18E-048.53249
GO:0016570histone modification5.33E-055.33E-032.3923825
GO:0042490mechanoreceptor differentiation2.55E-052.98E-036.27298
GO:0030512negative regulation of transforming growth factor beta receptor signaling pathway1.05E-051.47E-035.414210
GO:0009887organ morphogenesis5.03E-067.89E-042.239338
GO:0035335peptidyl-tyrosine dephosphorylation5.82E-055.68E-034.99419
GO:0043065positive regulation of apoptotic process3.30E-053.80E-03244339

) GO terms at the terminal nodes of the DAG, p-values<0.0001 and enrichments bigger than 2 are displayed in the table.

) “B” is the total number of genes in the background dataset annotated with GO term in row.

) “b” is the number of genes in the ‘experimental set’ that are associated with the GO term in row.

Figure 5

DAG graphical representation of the functional profiles extracted from genes associated to liver specific un-methylated sites.

2500 genes directly linked to liver specific TS-DMS were input in the Gorilla tool. Enriched GO terms are depicted using a direct acyclic graph with color code reflecting the statistical support of their enrichment. Most significant branches whit not redundant nodes are shown separately for enhanced detail.

Table 7

Significantly enriched GO terms at the top of a ranked gene list1.

GO TermDescriptionP-valueFDR q-valueEnrichmentB2 b3
GO:0071396cellular response to lipid5.33E-073.36E-0318.49377
GO:0042592homeostatic process3.14E-069.88E-036.5216511
GO:0048878chemical homeostasis4.75E-069.97E-037.2913410
GO:0032844regulation of homeostatic process6.01E-069.46E-0345.26384
GO:0033993response to lipid6.62E-068.34E-0310.29768
GO:0034383low-density lipoprotein particle clearance8.32E-068.73E-0327.7454
GO:0055088lipid homeostasis1.40E-051.26E-0215.85376
GO:0009755hormone-mediated signaling pathway2.15E-051.70E-024.022313
GO:0034381plasma lipoprotein particle clearance2.45E-051.72E-0217.34105
GO:0030301cholesterol transport2.55E-051.61E-0213166
GO:0015918sterol transport2.55E-051.46E-0213166
GO:0055092sterol homeostasis2.70E-051.42E-0213.95256
GO:0042632cholesterol homeostasis2.70E-051.31E-0213.95256
GO:2000021regulation of ion homeostasis4.76E-052.14E-0271.67183
GO:0043691reverse cholesterol transport4.97E-052.09E-0261.4373
GO:0065008regulation of biological quality7.28E-052.87E-024.0429012

) 4,265 genes were ranked according to their contain in TS-DMS. The Gorilla web-based application was used to identify enriched GO terms at the top of the ranked list of genes.

) “B” is the total number of genes in the background dataset annotated with GO term in row.

) “b” is the number of genes in the ‘experimental set’ that are associated with the GO term in row.

DAG graphical representation of the functional profiles extracted from genes associated to liver specific un-methylated sites.

2500 genes directly linked to liver specific TS-DMS were input in the Gorilla tool. Enriched GO terms are depicted using a direct acyclic graph with color code reflecting the statistical support of their enrichment. Most significant branches whit not redundant nodes are shown separately for enhanced detail. ) Number of CpGs that were identified as uniquely un-methylated. ) Number of TS-DMS located in the region spanned from ±3 kb of the gene edges. ) Genes annotated as tissue-specific expressed. ) Adjusted p-values (Bonferroni correction for family-wise error rate). ) GO terms at the terminal nodes of the DAG, p-values<0.0001 and enrichments bigger than 2 are displayed in the table. ) “B” is the total number of genes in the background dataset annotated with GO term in row. ) “b” is the number of genes in the ‘experimental set’ that are associated with the GO term in row. ) 4,265 genes were ranked according to their contain in TS-DMS. The Gorilla web-based application was used to identify enriched GO terms at the top of the ranked list of genes. ) “B” is the total number of genes in the background dataset annotated with GO term in row. ) “b” is the number of genes in the ‘experimental set’ that are associated with the GO term in row.

TS-DMS constitute a unique epigenetic signature of an adult mouse liver

We performed an unsupervised hierarchical clustering analysis to visualize the TS-DMS structure between and within tissue samples as a hierarchical dendrogram where the branch lengths represent the degree of epigenetic similarity between samples. Samples from the same tissues were grouped together regardless of whether DMS or TS-DMS dataset was used (Figure 6 A and B). These results suggest that the DMS identified among 6,955,111-surveyed CpGs have the potential to function as tissue-specific methylation fingerprints.
Figure 6

Epigenetic distances between tissues and biological replicates are assessed with unsupervised cluster analysis.

The length of the branches in the hierarchical dendrogram measures the person correlation distances between digestion frequencies scored by CpGs identified as spot of tissue differential methylation. Shown are heat maps of the log2-transformed digestion frequencies (red: more than 50 reads per CpG, i.e., un-methylated; black: 0 reads per CpG, i.e., methylated). Rows represent the first 40 CpGs in the final cluster; each column represents one of the 12 tissue samples. A) 138,052 T-DMS. B) 23,270 TS-DMS (only the tissue specific un-methylated sites were clustered). C) MSCC data produced from livers of 16 outbred mice were combined with MSCC data of the 12 tissue samples. D) MSCC data produced from a liver of a C57BL/6J 12.5 d embryo was combined with MSCC data of the 12 tissue samples.

Epigenetic distances between tissues and biological replicates are assessed with unsupervised cluster analysis.

The length of the branches in the hierarchical dendrogram measures the person correlation distances between digestion frequencies scored by CpGs identified as spot of tissue differential methylation. Shown are heat maps of the log2-transformed digestion frequencies (red: more than 50 reads per CpG, i.e., un-methylated; black: 0 reads per CpG, i.e., methylated). Rows represent the first 40 CpGs in the final cluster; each column represents one of the 12 tissue samples. A) 138,052 T-DMS. B) 23,270 TS-DMS (only the tissue specific un-methylated sites were clustered). C) MSCC data produced from livers of 16 outbred mice were combined with MSCC data of the 12 tissue samples. D) MSCC data produced from a liver of a C57BL/6J 12.5 d embryo was combined with MSCC data of the 12 tissue samples. We next tested if the liver specific DMS can be used as a unique epigenetic signature of the adult mouse liver. We combined the DMS found in the four-tissue dataset with similar data obtained from livers of 16 outbred mice and also the liver from a E11.5 C57BL/6J embryo. Despite the differences in the genetic background among these samples, the livers from all 19 mice co-clustered separately from brain, kidney, and testis, Figure 6C. A sample derived from embryonic liver (E11.5) was most similar to the adult liver samples than those from other organs, but still clustered separately from adult-stage liver, Figure 6D.

A set of DMS derived from mice livers can trace nutritional history

Given the critical role of the liver in controlling whole-body metabolism, and the high frequency of TS-DMS in genes that control key aspects of the hepatic lipid homeostasis Figure S3 in Supporting Information S1, we analyzed the impact of prenatal and postnatal nutrient availability on liver methylation patterns. Differentially methylated sites (DMS) were isolated from mice whose mothers had been subjected to four different feeding protocols [23]: CC, prenatal and postnatal control diet; CU, prenatal control diet and postnatal food restriction; UU, prenatal and postnatal food restriction; UC, prenatal food restriction and postnatal control diet (for details see [23]). Comparison of CC, CU and UU liver samples led to the identification of 5,574 DMS (mdFDR 15%), Data Set S5. Practically all these DMS were hypo-methylated in mice that were exposed to a control diet during the whole experiment (CC) but hyper-methylated in those mice exposed to any of the diets including food restriction (UC, CU or UU). Out of these 5,574 DMS induced for food restriction, 533 were previously identified as differentially methylated sites in the comparisons among the four tissues (vide supra). Remarkably genes associated with these 533 DMS were highly enriched (between 4 to 15 times) in annotations related to SMAD protein signal transduction, steroid and fatty acid metabolic process. We examined to what extent these DMS can function as epigenetic signatures of each nutritional exposure. Unsupervised clustering of MSCC data obtained from 4 CC, 5 CU, and 5 UU samples regrouped experiments according to the treatment, Figure 7B. When data from a different treatment was included (UC) the samples clustered outside the previous defined groups, Figure 7B. We inferred that these DMS constitute epialleles that traced the nutritional history of mice during the in utero and/or immediate postnatal stages.
Figure 7

Epigenetic distances between livers of mice exposed to in utero or postnatal food restriction.

The length of the branches in the hierarchical dendrogram measures the person correlation distances between digestion frequencies scored by CpGs identified as spot of epialleles. Shown are heat maps of the log2-transformed digestion frequencies (red: more than 50 reads per CpG, i.e., un-methylated; black: 0 reads per CpG, i.e., methylated). Rows represent the first 40 CpGs in the final clusters; each column represents one of the 16 tissue samples from this study. Only the field with the first 40 TS-DMS in the cluster are depicted A) Design of mouse feeding experiment (Joslin feeding protocol; adapted from [23]) B) MSCC data produced from livers of the CC, CU and UU groups. 16 outbred mice were combined with MSCC data of the 12 tissue samples. D) MSCC data produced from a liver of a C57BL/6J 12.5 d embryo was combined with MSCC data of the 12 tissue samples.

Epigenetic distances between livers of mice exposed to in utero or postnatal food restriction.

The length of the branches in the hierarchical dendrogram measures the person correlation distances between digestion frequencies scored by CpGs identified as spot of epialleles. Shown are heat maps of the log2-transformed digestion frequencies (red: more than 50 reads per CpG, i.e., un-methylated; black: 0 reads per CpG, i.e., methylated). Rows represent the first 40 CpGs in the final clusters; each column represents one of the 16 tissue samples from this study. Only the field with the first 40 TS-DMS in the cluster are depicted A) Design of mouse feeding experiment (Joslin feeding protocol; adapted from [23]) B) MSCC data produced from livers of the CC, CU and UU groups. 16 outbred mice were combined with MSCC data of the 12 tissue samples. D) MSCC data produced from a liver of a C57BL/6J 12.5 d embryo was combined with MSCC data of the 12 tissue samples.

Discussion

Based on our expanded methyl-sensitive cut counting (eMSCC) method [15], we screened for differential methylation in gDNA samples derived from liver, kidney, brain and testis. We detected 138,052 differentially methylated sites, of which 24,803 were uniquely methylated or un-methylated in one but not in the other tissues (Figure 3). Strikingly, 23,270 (93%) of these TS-DMS were hypomethylated, suggesting that un-methylation rather that methylation is the epigenetic state highlighting developmentally active loci. In agreement with previous results obtained by Restriction Landmark Genomic Scanning (RLGS) or Methylated DNA Immunoprecipitation (MeDIP), we found that most TS-DMS represent differences between the testis and somatic tissues [17], [18], [46]–[48]. Interestingly and more often in testis, many TS-DMS co-localize in CpG islands or CpG clusters to form testis-specific un-methylated regions within gene bodies or gene deserts. The transcriptional landscape around these loci seems inconsistent with the notion that they may represent alternative promoters (Figure 1) [11], [12], [14]. A particularly interesting result is that tissue-specific differential methylation at intragenic CpG clusters correlates with tissue-specific differential expression of zinc finger protein genes (Zfp), We found 39 Zfp genes with a differentially methyled region in the 3′ terminal exon that is invariably methylated in somatic tissues and hypomethylated in testis, (Table 1). Remarkably, most of them follow a similar inverse relationship between the level of methylation at the 3′ terminal exon and gene transcriptional activity, Figure 2. It is intriguing why their TSSs remain largely un-methylated in all tissues but their intragenic CGIs seem to be preferential sites for de novo methylation in somatic tissues during development. The lack of differential methylation at the promoter CGIs suggests a limited role for these loci in the epigenetic transcriptional regulation, however aberrant methylation marks in the promoters of a Zfp correlates with cervical cancer [49]. Probably, the 3′ exons start the developmental program as un-methylated regions, acquiring methyl-marks as the embryo develops into adult. For instance, spatio-temporal and cell-specific methylation of these exonic-CpGs could avoid the binding of a gene repressor. Most of Zfp genes described in Table 1 belong to the C2H2 Zf family, which are enriched in KAP 1 repressor target sites. Interestingly ChiP-chip experiment detected most of the KAP1 binding sites towards the 3′ transcribed regions of these Zf-genes, [50]. Alternatively, the exonic-DNA methylation could turn the chromatin in to a Pol II elongation-permissive state. For example it has been recently shown that H3K79, H4K20, H2BK5 and H3K36m3 mark nucleosomes wrapping exonic-DNA. The tri-methylation of H3K36 is the most prominent mark and correlates positively with increased gene expression levels and stimulated transcriptional elongation [51]–[53]. In agreement with the positions of TS-DMS we found in the Zfp genes, H3K36me3 is primarily found in downstream exons, [53]. The co-occurrence of H3K36me3 and DNA hypermethylation has been recently shown in the bodies of zinc finger genes along chromosome 9, [54]. When all genes covered by this study were considered, these authors found that the co-occurrence between H3K36me3 and hypermethylation was more frequently observed at last exons or at highly expressed genes. Overall the data suggest that differential methylation of Zfp genes at 3′ exons constitutes a developmental epigenetic signature. Some of these same genes have been already identified as playing key roles in cell fate specification, [55]. It will be interesting to follow the dynamics of gene expression and methylation at these loci during embryonic development. We observed intriguing differences in the methylation pattern of DNA derived from testis. Firstly, the largest numbers of TS-DMS were observed in testes. However, testis showed the weakest association of TS-DMS with active enhancers or tissue-specific gene expression (Tables 2, 4, and 5). Secondly, testis TS-DMS were localized at loci encoding genes involved in the embryonic development of multiple organs and anatomical structures, Figure S4 in Supporting Information S1 and Table 6. Whilst we recognize that testis includes both gametes and somatic cells, our data suggest that part of the epigenetic information involved in development would be pre-coded in the sperm genome. The presence of differentially methylated CGIs is rarer in somatic tissues, by contrast most TS-DMS are located far from promoters, in loci with scarce CpGs and in many overlapping regions reported as tissue-specific enhancers. Indeed we found that local DNA hypomethylation highlighted active enhancers (H3K4me1 and H3K27me3) in a tissue-specific manner. Indeed, it has recently been shown that transcription factors (TF) can actively induce open chromatin spots and local de-methylation of low CpG density sequences, which ultimately constitute a footprint for DNA regulatory elements [16], [56]. Despite the scarcity of methylable sites in these elements, the methylation of critical CpGs was sufficient to prevent the TF-DNA interaction, [56]. Not only did we find active enhancers highly enriched in TS-DMS, but we also found that genes spotted with TS-DMS are expressed in a tissue-specific manner, (Tables 2 and 5) with their functional profiles matching specific functions of the corresponding tissues. Considering all our data we suggest that most of the sequences having TS-DMS must be cis-acting elements for tissue-specific trans-acting factors. To assess the extent to which the TS-DMS could demarcate tissue-specific physiological aspects, we performed annotation enrichment analysis in the set of genes associated with these epigenetic marks. We found that the position of the TS-DMS in relation to genes defines two types of functional profiles. When TS-DMS are found in intergenic regions and distant from gene boundaries, there is a notable enrichment in processes related to embryonic development Figure S4 in Supporting Information S1. In the instance where TS-DMS are within or very near to the borders of the gene, most of the biological processes describe physiological and organ-specific functions, (Table 6). A remarkable observation is that genes involved in specific liver functions such as, hepatic cholesterol homeostasis, response to nutrients, response to peptide hormones, plasma clearance of lipoproteins and metabolic control of xenobiotic, are distinguished in the liver genome by the presence of TS-DMS, (Figure 5 and Figure S3 in Supporting Information S1. Indeed, by ranking genes according to the number of TS-DMS and querying for enriched GO terms at the top of a ranked list, we found a selective increase in functional specificity towards lipid homeostasis, especially hepatic cholesterol management, Table 7. We speculate that the high incidence of TS-DMS demarcating key genetic aspects of hepatic physiology could account for many of the findings that report associations between epigenetic alterations and metabolic disorders. Multiple studies have demonstrated that the nutritional state during sensitive developmental periods, including the intrauterine life and lactation periods, can “program” developmental trajectories to improve the ability of the fetus to survive in similar postnatal environments. Unfortunately, these adaptations may also modulate risk for metabolic disease in adult life. An attractive hypothesis is that these effects of nutritional exposures are mediated by altered epigenetic regulation of transcription. Whether methylation is the primary triggering event, or mediated by histone modification or effects of noncoding RNAs, remains unclear. However, our data suggest that differential methylation may mark key developmental loci, which are susceptible to nutritional or other environmental insults. We identified thousands of sites whose differentially methylated status generated epialleles in the offspring of mothers subjected to different feeding protocols (Figure 7, Data Set S5). We used correlation distances to look for comparable variations in the magnitude of digestion frequencies in the sites detected as differentially methylated. Unsupervised cluster analysis shows that this set of epi-alleles characterize the nutritional history of each mouse. In summary, in the present work we extended our previous observation that outside of CGIs and promoters there are a large number of regions with hypomethylated CpG, at low density. These regions not only outnumber the CGI but also are the loci with the greater enrichment in regulatory sequences, [15], [16]. Here we show that CpGs that are part of TS-DMS are also typically found far from promoters and outside of CGIs. TS-DMS are enriched in sequences recognized as tissue-specific enhancers and associated with tissue-specific expressed genes. Genes associated with these sites fulfill roles in the development of the corresponding tissue but also in their specific physiological functions. The identification of loci with aberrant DNA methylation marks has been largely extended during the last 5 years. Our results agree with the idea that epi-mutation rather than mutations better explain the developmental origin of diseases [57]. Most of the work on disease-related DNA methylation has focused on hypermethylation of CGI in cancer. Thus, the relevance of aberrant methylation marks outside CGI in human disease has not been well studied. Here we show that during development methylation of CpGs outside promoter or CGI seemed to be far more dynamic than commonly appreciated. We propose that the catalog of TS-DMS presented in this work, particularly those linked to the hepatic function could constitute a comprehensive epi-panel for detection and diagnosis of metabolic diseases. This file contains Table S1 and Figure S1–Figure S4. Table S1, which is a summary of the results of the short-reads-alignments to the NCBI37/mm9 assembly mouse reference genome; Figure S1, showing the influence of random sampling and systematic bias are canceling out during the pairwise site-by-site comparisons; Figure S2, Validating of the assumption of normality for the distribution of digestion frequencies (methyl sensitive cut counts) in CpGs from both lambda and mouse replicates; Figure S3, detailing the biological regulation branch showed in Figure 5 and Figure S4, showing the functional enrichment analysis results for genes proximal to TS-DMS located at intergenic regions. A description of the column headers in Data Sets S1 to S5 is provided in tables at the end of the document. (PDF) Click here for additional data file. Methyl Sensitive Cut Counting Results for CpG sites surveyed in the mouse genome (ZIP) Click here for additional data file. Methyl Sensitive Cut Counting Results for CpG sites surveyed in the lambda phage genome (XLSX) Click here for additional data file. 138,052 differentially methylated sites (DMS), their genomic coordinates and their closest (TXT) Click here for additional data file. List of genes used for Gene Ontology enrichment analysis (XLSX) Click here for additional data file. List of 5,574 diet-reprogrammed differentially methylated CpGs (XLSX) Click here for additional data file.
  57 in total

1.  Gene ontology: tool for the unification of biology. The Gene Ontology Consortium.

Authors:  M Ashburner; C A Ball; J A Blake; D Botstein; H Butler; J M Cherry; A P Davis; K Dolinski; S S Dwight; J T Eppig; M A Harris; D P Hill; L Issel-Tarver; A Kasarskis; S Lewis; J C Matese; J E Richardson; M Ringwald; G M Rubin; G Sherlock
Journal:  Nat Genet       Date:  2000-05       Impact factor: 38.330

2.  Java Treeview--extensible visualization of microarray data.

Authors:  Alok J Saldanha
Journal:  Bioinformatics       Date:  2004-06-04       Impact factor: 6.937

3.  Open source clustering software.

Authors:  M J L de Hoon; S Imoto; J Nolan; S Miyano
Journal:  Bioinformatics       Date:  2004-02-10       Impact factor: 6.937

Review 4.  CpG-rich islands and the function of DNA methylation.

Authors:  A P Bird
Journal:  Nature       Date:  1986 May 15-21       Impact factor: 49.962

5.  CpG islands in vertebrate genomes.

Authors:  M Gardiner-Garden; M Frommer
Journal:  J Mol Biol       Date:  1987-07-20       Impact factor: 5.469

6.  Reductions in caloric intake and early postnatal growth prevent glucose intolerance and obesity associated with low birthweight.

Authors:  J C Jimenez-Chillaron; M Hernandez-Valencia; A Lightner; R R Faucette; C Reamer; R Przybyla; S Ruest; K Barry; J P Otis; M E Patti
Journal:  Diabetologia       Date:  2006-06-08       Impact factor: 10.122

7.  Tissue specific differentially methylated regions (TDMR): Changes in DNA methylation during development.

Authors:  Fei Song; Saleh Mahmood; Srimoyee Ghosh; Ping Liang; Domminic J Smiraglia; Hiroki Nagase; William A Held
Journal:  Genomics       Date:  2008-11-13       Impact factor: 5.736

8.  Conserved role of intragenic DNA methylation in regulating alternative promoters.

Authors:  Alika K Maunakea; Raman P Nagarajan; Mikhail Bilenky; Tracy J Ballinger; Cletus D'Souza; Shaun D Fouse; Brett E Johnson; Chibo Hong; Cydney Nielsen; Yongjun Zhao; Gustavo Turecki; Allen Delaney; Richard Varhol; Nina Thiessen; Ksenya Shchors; Vivi M Heine; David H Rowitch; Xiaoyun Xing; Chris Fiore; Maximiliaan Schillebeeckx; Steven J M Jones; David Haussler; Marco A Marra; Martin Hirst; Ting Wang; Joseph F Costello
Journal:  Nature       Date:  2010-07-08       Impact factor: 49.962

9.  The human colon cancer methylome shows similar hypo- and hypermethylation at conserved tissue-specific CpG island shores.

Authors:  Rafael A Irizarry; Christine Ladd-Acosta; Andrew P Feinberg; Bo Wen; Zhijin Wu; Carolina Montano; Patrick Onyango; Hengmi Cui; Kevin Gabo; Michael Rongione; Maree Webster; Hong Ji; James Potash; Sarven Sabunciyan
Journal:  Nat Genet       Date:  2009-01-18       Impact factor: 38.330

10.  Intergenerational transmission of glucose intolerance and obesity by in utero undernutrition in mice.

Authors:  Josep C Jimenez-Chillaron; Elvira Isganaitis; Marika Charalambous; Stephane Gesta; Thais Pentinat-Pelegrin; Ryan R Faucette; Jessica P Otis; Alice Chow; Ruben Diaz; Anne Ferguson-Smith; Mary-Elizabeth Patti
Journal:  Diabetes       Date:  2008-11-18       Impact factor: 9.461

View more
  12 in total

Review 1.  Promoting health and longevity through diet: from model organisms to humans.

Authors:  Luigi Fontana; Linda Partridge
Journal:  Cell       Date:  2015-03-26       Impact factor: 41.582

Review 2.  Developmental origins of health and disease: a paradigm for understanding disease cause and prevention.

Authors:  Jerrold J Heindel; Laura N Vandenberg
Journal:  Curr Opin Pediatr       Date:  2015-04       Impact factor: 2.856

Review 3.  Endocrine disruptors and obesity.

Authors:  Jerrold J Heindel; Retha Newbold; Thaddeus T Schug
Journal:  Nat Rev Endocrinol       Date:  2015-09-22       Impact factor: 43.330

4.  Methylated CpG site count of dapper homolog 1 (DACT1) promoter prediction the poor survival of gastric cancer.

Authors:  Jingyu Deng; Han Liang; Rupeng Zhang; Guoguang Ying; Xingmin Xie; Jun Yu; Daiming Fan; Xishan Hao
Journal:  Am J Cancer Res       Date:  2014-09-06       Impact factor: 6.166

5.  Sex-specific epigenetic profile of inner cell mass of mice conceived in vivo or by IVF.

Authors:  Elena Ruggeri; Saúl Lira-Albarrán; Edward J Grow; Xiaowei Liu; Royce Harner; Emin Maltepe; Miguel Ramalho-Santos; Annemarie Donjacour; Paolo Rinaudo
Journal:  Mol Hum Reprod       Date:  2020-11-01       Impact factor: 4.025

Review 6.  The Role of Epigenetic Change in Autism Spectrum Disorders.

Authors:  Yuk Jing Loke; Anthony John Hannan; Jeffrey Mark Craig
Journal:  Front Neurol       Date:  2015-05-26       Impact factor: 4.003

7.  Poor survival is associated with the methylated degree of zinc-finger protein 545 (ZNF545) DNA promoter in gastric cancer.

Authors:  Jingyu Deng; Han Liang; Guoguang Ying; Qiuping Dong; Rupeng Zhang; Jun Yu; Daiming Fan; Xishan Hao
Journal:  Oncotarget       Date:  2015-02-28

8.  Applicability of the methylated CpG sites of paired box 5 (PAX5) promoter for prediction the prognosis of gastric cancer.

Authors:  Jingyu Deng; Han Liang; Rupeng Zhang; Qiuping Dong; Yachao Hou; Jun Yu; Daiming Fan; Xishan Hao
Journal:  Oncotarget       Date:  2014-09-15

9.  DNA methylation profile of liver of mice conceived by in vitro fertilization.

Authors:  Saúl Lira-Albarrán; Xiaowei Liu; Seok Hee Lee; Paolo Rinaudo
Journal:  J Dev Orig Health Dis       Date:  2021-06-14       Impact factor: 3.034

10.  The survival decrease in gastric cancer is associated with the methylation of B-cell CLL/lymphoma 6 member B promoter.

Authors:  Jingyu Deng; Han Liang; Qiuping Dong; Yachao Hou; Xingming Xie; Jun Yu; Daiming Fan; Xishan Hao
Journal:  Open Biol       Date:  2014-07       Impact factor: 6.411

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.