Mammalian somatic cells can be directly reprogrammed into induced pluripotent stem cells (iPSCs) by introducing defined sets of transcription factors. Somatic cell reprogramming involves epigenomic reconfiguration, conferring iPSCs with characteristics similar to embryonic stem cells (ESCs). Human ESCs (hESCs) contain 5-hydroxymethylcytosine (5hmC), which is generated through the oxidation of 5-methylcytosine by the TET enzyme family. Here we show that 5hmC levels increase significantly during reprogramming to human iPSCs mainly owing to TET1 activation, and this hydroxymethylation change is critical for optimal epigenetic reprogramming, but does not compromise primed pluripotency. Compared with hESCs, we find that iPSCs tend to form large-scale (100 kb-1.3 Mb) aberrant reprogramming hotspots in subtelomeric regions, most of which exhibit incomplete hydroxymethylation on CG sites. Strikingly, these 5hmC aberrant hotspots largely coincide (~80%) with aberrant iPSC-ESC non-CG methylation regions. Our results suggest that TET1-mediated 5hmC modification could contribute to the epigenetic variation of iPSCs and iPSC-hESC differences.
Mammalian somatic cells can be directly reprogrammed into induced pluripotent stem cells (iPSCs) by introducing defined sets of transcription factors. Somatic cell reprogramming involves epigenomic reconfiguration, conferring iPSCs with characteristics similar to embryonic stem cells (ESCs). HumanESCs (hESCs) contain 5-hydroxymethylcytosine (5hmC), which is generated through the oxidation of 5-methylcytosine by the TET enzyme family. Here we show that 5hmC levels increase significantly during reprogramming to human iPSCs mainly owing to TET1 activation, and this hydroxymethylation change is critical for optimal epigenetic reprogramming, but does not compromise primed pluripotency. Compared with hESCs, we find that iPSCs tend to form large-scale (100 kb-1.3 Mb) aberrant reprogramming hotspots in subtelomeric regions, most of which exhibit incomplete hydroxymethylation on CG sites. Strikingly, these 5hmC aberrant hotspots largely coincide (~80%) with aberrant iPSC-ESC non-CG methylation regions. Our results suggest that TET1-mediated 5hmC modification could contribute to the epigenetic variation of iPSCs and iPSC-hESC differences.
Pluripotency is defined as a stem cell state with the potential to differentiate into any of the three germ layers. Somatic cells can be reprogrammed to a pluripotent state by defined factors such as OCT4, SOX2, KLF4, c-MYC, NANOG and LIN28[1-3]. These iPSCs are extremely similar to ESCs. During the reprogramming process, the global epigenetic landscape in somatic cells has to be reset to reach a pluripotent state via DNA methylation/demethylation and chromatin remodelling processes.Besides 5-methylcytosine (5mC), which is known to display dynamic changes during early embryonic and germ cell development as well as the reprogramming process, the mammalian genome also contains 5hmC, which is generated by oxidation of 5mC by the TET family of enzymes[4, 5]. The Tet proteins function in ESCs regulation, myelopoiesis and zygote development[6-10]. 5hmC was found to be widespread in many tissues and cell types at different levels[11, 12]. Particularly, 5hmC is abundant in the central nervous system and ESCs. Several reports have explored the genome-wide distribution of 5hmC modification in mES cells and hES cells, and suggest that it is enriched in gene bodies and enhancers[13, 14].Reprogramming toward pluripotency involves a dynamic epigenetic modification process. 5hmC has been implicated in the DNA demethylation process[15], pointing to a potential role for 5hmC modification during reprogramming toward pluripotency. Thus, understanding the dynamic 5hmC changes during reprogramming will provide additional insight into somatic cell reprogramming mechanisms.Multiple studies suggest there are subtle yet substantial genetic and epigenetic differences between iPS cells and hES cells[16, 17]. The current consensus is that iPS cells and ES cells are two overlapping classes of heterogeneous cells, with iPS cells being more variable than hES cells[18]. Although iPS cells and hES cells are functionally equivalent in general, the subtle genetic and epigenetic differences could lead to functional consequences among individual lines. Previous study of the base-resolution methylomes of iPSCs and ESCs identified differentially methylated regions (DMRs) between iPSCs and ESCs, consisting of CG-DMRs and non-CG-DMRs[16, 17]. However, the traditional bisulfite sequencing technique they used could not distinguish 5mC from 5hmC[19], which means how these DMRs are caused by hydroxymethylation differences remains unknown.Here we show that 5hmC levels increase significantly during reprogramming to human iPSCs mainly due to TET1 activation, and this hydroxymethylation change is critical for optimal epigenetic reprogramming. We found that during reprogramming extensive genome-wide 5hmC modification occurs. Importantly, we identified specific aberrant reprogramming hotspots in iPS cells, which cluster on a large-scale (100kb-1.3Mb) at subtelomeric regions bearing incomplete CG hydroxymethylation. These hotspots largely overlap with aberrant non-CG methylation hotspots, suggesting hydroxymethylation contributes to the epigenetic difference between iPS cells and hES cells.
RESULTS
TET1-mediated hydroxymethylation plays a critical role during reprogramming to pluripotency in human cells
DNA methylation is a major barrier to iPS cell reprogramming. Several lines of evidence suggest that 5hmC is involved in the process of DNA demethylation[20, 21]. We found a significant increase of 5hmC level in human iPS cells compared to their original fibroblasts, with the amount in iPSCs being similar to hES cells (Fig. 1a).
Figure 1
TET1 is associated with increased hydroxymethylation during human iPSC reprogramming
(a) Measurement of 5hmC levels in genomic DNAs from fibroblasts, hiPSCs and hESCs by dot blot using anti-5hmC antibody. Mouse cerebellum genomic DNA was used as a control. 225 ng, 450 ng and 1000 ng DNA were used for each sample. (b) Quantitative RT-PCR to detect mRNA levels of TET1, TET2, TET3 and NANOG in fibroblasts (CRL2097) and hiPSCs (iPSC-B21, iPSC-B22). Error bars represent the standard error of the mean (S.E.M.) collected from three independent experiments. (c) Boxplot of transcript copy numbers of TET1, TET2, TET3, and NANOG in IMR90 (fibroblasts) and H1 (hESCs) represented by RPKM in RNA-seq. (d) Knocking down TET1 by siRNA significantly decreases 5hmC levels in hiPSCs. Left panel represents siTET1 knock down efficiency by quantitative RT-PCR (* t-test, p<0.05). Right panel depicts the effect of total 5hmC levels 48hours post siTET1 transfection. Error bars represent S.E.M. collected from three independent experiments. (e) Alkaline phosphatase (AP) staining of reprogrammed cells treated either with shTET1 lentivirus or an equal titer shControl lentivirus after O,S,K,M retroviral transduction of 100,000 CRL2097 cells on day 20. Cells used for staining were grown in 10 cm dishes. The image on the right shows a representative AP positive colony and TET1 transcript level in shTET1- or shControl-treated cells 10 days post transduction in one representative experiment of three independent experiments. Scale bars: 300 μm. (f) Summary of quantitative analysis of AP-positive colonies in three different experiments (* t-test, p<0.05). Controls were normalized to 100%. Error bars represent the standard deviation (SD). (g) Real time PCR analysis of TET1 and pluripotency marker NANOG. shTET1-treated reprogrammed colonies maintained normal levels of NANOG, but shows decreased TET1 expression (* t-test, p<0.05). Colonies were picked and maintained in puromycin medium (0.5 μg/ml) on puromycin resistant MEFs. (h) Real time PCR analysis of normalized gene expression levels of TET1 and selected pluripotency related factors in stable shTET1 or shControl iPS-B22 cells under the puromycin selection (0.5 μg/ml) (*** t-test, p<0.05). Error bars represent the S.E.M. of three independent experiments. The raw values of related statistical test in this figure are listed in Supplementary Table S1.
TET family proteins (TET1, TET2 and TET3) could convert 5mC to 5hmC[6]. We found a statistically significant increase of TET1 and TET3; with a more dramatically increase of TET1, and a slight decrease of TET2 expression (Fig. 1b). RNA-seq reveals that TET1 is at a comparable level to NANOG in pluripotent cells, but the expression of TET2 and TET3 are significantly lower (Fig. 1c). Depletion of TET1 but not TET2 and TET3 by siRNA could significantly decrease total 5hmC levels in human iPS cells (Fig. 1d and Supplementary Fig. S1a,b). Therefore, we conclude that TET1 is the main TET protein regulating hydroxymethylation during human iPS cells reprogramming.Because cellular reprogramming is an epigenetic state reconfiguring process, we next asked whether TET1-mediated hydroxymethylation changes are critical in human iPSC reprogramming. Introducing shTET1 lentivirus with “Yamanka factors” infection could decrease alkaline phosphatase positive colonies when compared with equal titer shControl lentivirus transduction (Fig. 1e,f and Supplementary Fig. S1c,d). shTET1 treated colonies during reprogramming can be further stably maintained, showing decreased TET1 levels, but similar pluripotent gene expression levels compared with iPSCs (Fig. 1g). Furthermore, iPS cells depleted with TET1 maintained a normal undifferentiated stem cell morphology, are positive for alkaline phosphatase, expressed same level pluripotent related factors and stained positive for the pluripotency markers such as NANOG, SOX2, TRA-1-81 (Fig. 1h and Supplementary Fig. S1e-g). Therefore, TET1-mediated hydroxymethylation modification is required for optimal induction of iPSCs, but does not compromise the essential pluripotency of human stem cells.
5hmC epigenomic landscape during reprogramming
We employed 5hmC Capture-Seq to assess genome-wide 5-hmC distributions during reprogramming[11]. The cell lines and sequencing statistics are summarized on Supplementary Table S2 and S3. Pearson correlation and cluster analysis of the global 5hmC modification pattern suggests a significant difference between iPS cells and fibroblasts (Fig. 2a and Supplementary Table S4).
Figure 2
Reprogramming confers a 5hmC epigenome in a pattern with a bias towards telomere proximal regions in autosomes
(a) Pearson correlation analysis and cluster among fibroblasts and fibroblast derived iPSCs. The values close to 1 indicate greater similarity. (b) Summary of the numbers of 5hmC differentially modified between fibroblasts and iPSCs, indicated by hyperDhMR (iPSCs>Fibroblast) and hypoDhMR (iPSCs
Based on a negative binomial model for testing differential expression of sequencing data[22], we found 267,664 regions in the genome showing differential 5-hydroxymethylation modification between iPS cells and fibroblast (false discovery rate (FDR): 0.01), which denoted as differential 5-hydroxymethylated regions (DhMRs). Among them, 231,866 are hyperDhMRs (5hmC level is higher in iPS cells), and 35,798 are hypoDhMRs (5hmC level is lower in iPS cells) (Fig. 2b). The hyperDhMRs show higher gain of 5hmC than the loss of 5hmC observed at hypoDhMRs (Fig. 2c). The hyperDhMRs are distributed across all autosomes, but largely missing in sex chromosomes (Fig. 2d). Particularly, of the top 20000 hyperDhMRs (ranked by adjusted p-values), they have a higher probability (p<0.0001) of being located in the telomere proximal regions (Fig. 2e), as shown by example of Chromosome 1 and Chromosome X (Fig. 2f).
5hmC is bi-directionally correlated with DNA methylation changes and associated with pluripotency related gene networks
The analysis described above suggests a global hydroxymethylation change during reprogramming. 5hmC has been suggested linked with gene expression in ES cells and neurons[13, 14, 23-26]. To assess the correlation between 5hmC modifications and gene expression changes during reprogramming, we stratified genes into 9 categories based on gene expression changes between iPS cells and fibroblasts (category 1: high expression in iPS cells, low expression in fibroblast; category 2: medium expression in iPS cells, low expression in fibroblast, etc). We then quantified the amount of 5hmC around transcription start site (TSS). As a result, those 9 categories can be clustered into 3 distinct patterns (Fig. 3a). Of note, most expressed genes during reprogramming show a bimodal distribution with a depletion of 5hmC in TSS sites, whereas genes remain silenced after reprogramming show a peak in TSS sites. Among 3 clusters, cluster1 has the lowest 5hmC levels in TSS; cluster 3 has the highest levels of 5hmC in TSS, but has lowest 5hmC levels in gene bodies (Fig. 3b).
Figure 3
5hmC is associated with gene activity and pluripotency regulatory networks in stem cells
(a) 3 distinct clusters of 5hmC-density pattern at TSS regions (+/− 3kb) in iPSCs and fibroblasts among 9 categories. The 9 categories were classified based on the gene expression changes between iPS cells and fibroblasts: Category 1: high expression in iPS cells, low expression in fibroblast; Category 2: medium expression in iPS cells, low expression in fibroblast, etc. (b) Box plots of hydroxymethylation levels in TSS regions and Gene bodies among the three clusters. *** indicates significantly more 5hmC levels compared with all others (P < 0.001, Wilcoxon rank test). Similarly, * indicates lowest 5hmC levels, ** indicated intermediate 5hmC level. (c) 5hmC enrichment density heatmap. Genes were ordered by expression level from high to low as determined by H1 RPKM[27]. The TSS and direction of transcription of genes are indicated by the genomic region from –3kb to +3kb and an arrow. The TES is indicated by the genomic region from –3kb to +3kb and vertical lines. The left part of the panel shows genes in fibroblasts, the right part shows the genes in iPSCs. (d) The correlation between PMD (methylation level is higher in stem cells) and DhMRs, and the correlation between hypoDMRs (methylation level is lower in stem cells) and DhMRs. (e) 5hmC density at the NANOG locus in input, iPSCs, and fibroblast cell lines. The position of the loci within the chromosome and the scale are shown above the gene tracks. Black lines indicate the DhMRs. (f) The overlap between NANOG, OCT4, KLF4, SOX2 binding sites in ES cells and 5hmC significant change regions, shown are observed-to-expected ratios. Lower panel shows the overlapping percentage of each binding sites. (g) Gene ontology analysis for genes overlapped with most significant DhMRs. (h) Plot of hyperDhMR and hypoDhMR densities in the context of C+G percent, CG percent, CH percent and CHG percent.
We then examined the correlation between absolute amount of transcripts and 5hmC enrichment. We noticed that hyperDhMRs tend to form bimodal distribution associated with gene activity in iPS cells, with the lowest level similar to the level in fibroblast in TSS regions (Fig. 3c and Supplementary Fig. S2). TES regions also show a bimodal distribution, the depletion is more dramatic in a narrower region centred on TES (Supplementary Fig. S2). However, compared with hypoDhMRs, hyperDhMRs are more enriched in TSS, exons and TES (Supplementary Fig. S3a). We observed a significant negative correlation between 5hmC level of TSS surrounding regions (±200bp) and gene expression levels in iPS cells (Supplementary Fig. S3b).We also observe bidirectional correlation between 5hmC level and DNA methylation during reprogramming process. 80% of the partially methylated domains (PMD), which displays lower levels of CG methylation in somatic cells than stem cells[27], have increased 5hmC levels, with the rest have no 5hmC level change (Fig. 3d). Interestingly, we also found around 60% stem cells hypoDMRs (lower CG methylation in stem cells) shows increased 5hmC modification (Fig. 3b). Collectively, our results suggest that increased hydroxymethylation not only occur in loci with increased methylation but also loci with decreased methylation during reprogramming.Based on the results of bimodal distribution of 5hmC in TSS and TES, we then determined whether this distribution is associated with core pluripotency regulatory networks. We found that pluripotent master regulators, such as OCT3/4 and NANOG, bear this typical modification in iPSCs but not in fibroblasts (Fig. 3e). We further tested the relation of 5hmC and key pluripotency factors binding sites[27]. We found a more than 8-fold higher than expected overlap between 5hmC-enriched regions and OCT4, KLF4 binding sties, with a weak association with NANOG and SOX2 binding sites (Fig. 3f). Our results suggest that OCT4 and KLF4 regulatory networks may require 5hmC to regulate pluripotency during reprogramming. Furthermore, gene ontology analysis shows that genes acquiring most 5hmC are involved in stem cell differentiation and patterning process (Fig. 3g), suggesting 5hmC in stem cells are highly correlated with pluripotency.
Sequence preferences of 5hmC modification during reprogramming
We compared the CG, CH (CA, CT, CC), CHG preference of hyperDhMRs and hypoDhMRs. HyperDhMRs tend to be located at higher C and G enriched regions, as well as CHG and CH enriched regions, whereas hypoDhMRs have the same level as the genome background (Fig. 3h). Previous observations suggest that 5hmC modification is related to CpG-density[24, 28]. We find that in iPSCs, the low CpG content group of CpG islands tend to have more 5hmC modifications (Supplementary Fig. S3c), which is consistent with the observation that DNA methylation occurs more frequently in CpG islands with low CpG content[29]. Furthermore, 5hmC modifications acquired during reprogramming tend to occur within the unique sequence in which the methylation is evolutionarily less conserved[30](Supplementary Fig. S3d-f).
Aberrant 5hmC reprogramming hotspots cluster in telomere-proximal regions
Reprogramming of somatic cells to a pluripotent state requires complete reversion of the somatic epigenome into the pluripotent epigenome, which is an ES-like-state. iPSCs retain some type of somatic memory from their previous identity[31-33]. We further determined the genome-wide 5hmC modification differences between iPS and ES cells, aiming to understand whether 5hmC modifications underlie the differences between hES cells and iPS cells. To reduce the biases of tissue origins, we used 9 iPS cells derived from different origins, 6 of which are from fibroblasts as mentioned earlier, 2 are derived from peripheral blood cells, and 1 is derived from human exfoliated deciduous teeth cells (SHED).In general, global DNA hydroxymethylation patterns are very similar between iPS and ES cells (Fig. 4a). A comprehensive analysis of 372,423 5hmC-enriched regions between 4 hES cell and 9 iPS cell lines led to the identification of 113 iPS-ES-DhMRs that were differentially hydroxymethylated in at least one iPS cell or ES cell line (FDR<0.01), as shown for the SIGLEC6 and SIGLEC 12 locus in Fig. 5a. Surprisingly, these regions are not randomly located across the genome; instead, they tend to cluster at the telomere-proximal regions, in particular, at chromosome 3, 7, 8, 12, and 20 (Fig. 4b).
Figure 4
Aberrant 5hmC reprogramming hotspots cluster at subtelomeric regions
(a) Pearson correlation analysis and clustering among 9 iPSCs and hESCs. Values close to 1 indicate greater similarity. (b) Chromosome ideograms showing the genome-wide distribution of 113 iPSC-ES DhMRs. Red lines indicate locations of DhMRs. (c) The number of iPS-ES-hyperDhMRs and iPS-ES-hypoDhMRs. The 372,423 5hmC-enriched regions either in 9 iPSC lines or 4 hESC lines were subjected to DhMR calling by Bioconductor Deseq package. This analysis led to the identification of 113 iPS-ES-DhMRs that were differentially hydroxymethylated in at least one iPS cell or ES cell line (FDR<0.01). 105 of the 113 iPS-ES DhMRs are hypo-hydroxymethylated, with 5hmC levels similar to their respective progenitors. (d) Complete linkage hierarchical clustering of 5hmC density within the iPS-ES-DhMRs. The raw count values are scaled by rows during clustering. (e) Hierarchical cluster analysis using the top 1,000 most variable 5hmC enriched regions across all iPSC and hESC samples. Arrow indicates hESCs.
Figure 5
5hmC DhMRs largely overlap with non-CG-DMRs in a large-scale pattern
(a) 5hmC density at the iPS-ES-DhMR SIGLEC6, SIGLEC12 locus, in fibroblast (CRL2097), blood, iPS, and ES cell lines. The position of the loci within the chromosome and the scale are shown above the gene tracks. Black bars indicate DhMRs. (b) The number of 5hmC DhMRs that overlaps with CG-DMRs. CG-DMRs were categorized by methylation state relative to the ES cells. (c) The number of 5hmC large-scale hypoDhMRs that overlap with nonCG-DMRs. NonCG-DMRs were categorized by methylation state relative to the ES cells reported previously[17]. The overlap was called if overlapping length is larger than 1 kb. First bar summarizes the overlap for large-scale hypoDhMRs with hypo-nonCG-DMRs. The second bar summarizes the overlap for hypo-nonCG-DMRs with large-scale hypoDhMRs. The blue colour represents overlap between nonCG-DMR and hypoDhMRs. The red colour represents no overlap. (d) 5hmC density at of iPS-ES-DhMR TCERG1L locus in fibroblast (CRL2097), blood, iPS, and ES cell lines. The position of the loci within the chromosome and the scale are shown above the gene tracks. Lower parts shows the 5mC levels in CH studied by Lister et al, black colour indicates H1 stem cells, green depicts iPSCs.
In contrast to the symmetric pattern of DMRs between iPS and ES cells[17], 105 of the 113 iPS-ESDhMRs are hypo-hydroxymethylated, with 5hmC levels similar to their respective progenitors blood cells or fibroblast (Fig. 4c,d). Of these DhMRs, the 5hmC patterns are more variable compared with hES cells (Fig. 4d). Unsupervised hierarchical clustering using the top 1,000 most variable 5hmC modified regions among all samples could not distinguish hESCs from hiPSCs, suggesting that the variability among iPSCs is not due to different levels of pluripotency, and the 5hmC deviation of iPSCs is not a key determinant to distinguish hESCs from iPSCs (Fig. 4e).Copy number variation (CNV) has been reported to contribute to the variations of iPSCs[34,35]. Since DhMRs cluster at subtelomeric regions and shows depletion of hydroxymethylation, we further examined whether the DhMRs were simply due to genetic variation, such as CNV, instead of real aberrant 5hmC epigenetic modification. To this end we used high-density comparative genomic hybridization (aCGH) array to examine 3 iPSCs and 2 humanESCs. Array CGH yields an average of 70 CNVs on autosomes, none of which is overlapping with the iPS-ES-DhMRs we identified (Supplementary Fig. S4). Therefore, iPS-ES-DhMRs are caused by aberrant epigenetic modification.
Concordance of large-scale 5hmC hotspots and iPS-ES non-CG DMRs
Our results suggest that iPS-ES-DhMRs tend to cluster at telomere proximal regions, forming aberrant reprogramming hotspots. To better define these large-scale regions, we developed a statistical method to identify potential large-scale aberrant reprogramming hotspots. An aberrant reprogramming hotspot is defined as a genomic region satisfying the following conditions: (1) large variability of 5hmC levels among iPS cells, (2) the average 5hmC difference between iPSCs and ESCs is statistically significant, and (3) longer than100kb. 20 large scale regions were identified. Among them, 19 are hypoDhMRs, all of which have the same epigenetic status as their parent cells, pointing to a “somatic memory” during reprogramming, and 1 is hyperDhMRs (Table 1).
Table 1
Summary of large-scale hotspots between iPSCs and hESCs
hypoDhMR(19 regions)
Chr
Range(bp)
Length (bp)
NonCG-DMR
Aberrant Lines No.
Somatic Memory
Genes
Chr1
4533001-5059001
526,001
Y
5
Y
AJAP1
Chr3
474001-592001
118,001
N
9
Y
Intergenic
Chr3
2515001-2907001
392,001
N
7
Y
CNTN4
Chr7
152805001-153016001
211,001
Y
8
Y
Intergenic
Chr7
153184001-153312001
128,001
Y
8
Y
DPP6
Chr7
153461001-153856001
395,001
Y
6
Y
DPP6
Chr7
154010001-154317001
307,001
Y
6
Y
DPP6
Chr8
2681001-3289001
608,001
Y
7
Y
CSMD1
Chr8
138881001-139209001
328,001
Y
7
Y
CSMD1
Chr8
139536001-139818001
282,001
Y
5
Y
FAM135B,COL22A1
Chr10
132010001-133270001
1,260,001
Y
7
Y
TCERG1L,MIR378c
Chr12
125969001-126071001
102,001
Y
5
Y
Intergenic
Chr12
127355001-127814001
459,001
Y
5
Y
TMEM132C
Chr16
6803001-7330001
527,001
Y
5
Y
RBFOX1
Chr18
73780001-74420001
640,001
N
4
Y
Intergenic
Chr20
40395001-40593001
198,001
Y
7
Y
PTRPT
Chr20
41004001-41305001
301,001
Y
7
Y
PTRPT
Chr20
53591001-53742001
151,001
Y
7
Y
Intergenic
Chr22
46433001-46536001
103,001
Y
4
Y
Intergenic
We then compared DhMRs with the DMRs identified previously using whole-genome single base bisulfite sequencing, which would not be able to distinguish 5mC from 5hmC[17]. Of the total 113 DhMRs, only 5 overlap with 1,175 CG-DMRs (Fig. 5b). Surprisingly, out of the 19 hypo large-scale hotspots, 84.2% overlap with the 24 mega-scale hypo-non-CG-DMRs, whereas the expected percentage is 1.6% based on permutation (Fig. 5c). Fig. 5d shows one of these regions, chr10: 132010002-133270002, 5-mCH are depleted in iPS cells but not hESC lines; similarly, of the 9 total iPS cells, only iPS-S1 and iPS-S2 derived from blood bear similar levels of 5hmC compared with hESC counterparts. Of note, the variances from iPS cells are significantly larger than ES cells (Fig. 6a and Supplementary Fig. S5a, b). None of the iPS cell lines has all of the 19 hypo large-scale DhMRs restored the same level as the 4 humanES cell lines (Fig. 6b). This indicates that these large-scale regions tend to form aberrant reprogramming hotspots that were resistant to reprogramming. We did not observe a statistically significant (p=0.54) correlation between passage number of iPSCs and the number of aberrant hotspots (Supplementary Fig. S5c), implying that passage number may not be a key determinant of hotspots number in each iPSC line.
Figure 6
Large-scale incomplete hydroxymethylation hotspots are characteristics of human iPS cells
(a) Distribution of 20 5hmC large-scale DhMRs in iPSCs and ESs respectively. Green colour: 9 iPS cell relative enrichment counts, Red colour: 4 hESC cell relative enrichment counts. Solid vertical line separates hyperDhMRs and hypoDhMRs. (b) Summary of 19 hypo large-scale DhMRs in each iPSC line. Blue colour indicates regions have similar 5hmC level compared with ES cells, red colour indicates a lower 5hmC level than ES cells. 5hmC levels were determined by counting 5hmC Capture-Seq reads within each hypo large-scale DhMRs for each cell line. A lower 5hmC level in iPS cells was determined by the criteria that 5hmC levels were less than three standard deviations from the mean among ES cells; if levels were within three standard deviations, the region was considered having similar 5hmC levels.
The aberrant 5hmC reprogramming hotspots we identified may also explain the transcription level variability in iPSCs. Notably, some of the genes such as TCERG1L and FAM19A (Table 1), have been reported to be expressed at a significantly lower level in many but not all iPSCs as compared to ES cells[36, 37].
Base-resolution 5hmC analyses reveal large-scale hotspots are mainly caused by aberrant CG hydroxymethylation
The observed extremely high concordance between hypo large-scale DhMRs and non-CG-DMRs is surprising, and might indicate that of the previously identified aberrant 5mCH hotspot regions, a significant portion of CH consists of 5hmC; alternatively, these regions could contain both non-CG (mC) and CG (hmC) aberrant modification. The majority of 5hmC in ESCs is found at CG sites[38]. In addition, 5hmC quantification by Tet-Asisted-Bisulfite sequencing (TAB-Seq) and the chemical capture approach is well correlated both genome-widely and within the 20 large-scale hotspots regions (Supplementary Fig. S6a,b). Therefore, it is very likely that the aberrant 5hmC is caused by CG modification.To test this possibility experimentally, we applied TAB-Seq, which can detect hydroxymethylation status at base resolution, to 2 hESCs and 4 iPS cell lines. We performed base-resolution analysis of 5hmC in 3 randomly chosen large-scale regions, chr10, chr18, chr22, and amplified 5hmC enriched regions by PCR (Fig. 7a and Supplementary Table S6,7). We then subjected them to deep sequencing. Deep sequencing of PCR amplicons after traditional bisulfite conversion confirmed that there is epigenetic variation in non-CG sites but not CG sites (Fig. 7b,d). Consistent with the results obtained by capture method, we saw the similar 5hmC variations in iPS cells (Fig. 7c and Supplementary Fig. S6c,d). Importantly, this incomplete hydroxymethylation is caused by CG modification, but not CH modification (Fig. 7c and Supplementary Fig. S6c,d). For example, in the Chr10 hotspot, iPS-B22 and B23 show incomplete 5hmC in CG dinucleotides, but not in CHdinucleotides (Fig. 7e). Therefore, our results suggest the coexistence of aberrant non-CG methylation and CG aberrant hydroxymethylation in subtelomeric hotspots (Fig. 7f). The concordance of aberrant CG hydroxymethylation with those aberrant CH large-scale regions suggests there might be crosstalk between epigenetic pathway regulates hydroxymethylation and pathway regulates CH methylation; this crosstalk may behave more stochastically in those subtelomeric regions.
Figure 7
Large-scale hotspots are caused predominantly by aberrant CpG hydroxymethylation
(a) Summary of PCR based TAB-Seq. (b) 5hmC+5mC single base density in one of the amplicons by traditional bisulfite sequencing in 2 hESC and 2 iPSC lines. Bisulfite sequencing shows the CH methylation (or methylation plus hydroxymethylation) variation in iPS cells. The position of the loci within the chromosome and the scale are shown above the gene tracks. (c) 5hmC single base density on CG sites in 15 amplicons by TAB-Seq in 2 human ES cells 4 iPS cell lines. iPS-B22 and B23 shows incomplete CG hydroxymethylation. Green colour indicates iPSCs bearing same hydroxymethylation detected by 5hmC Capture-Seq. Blue colour indicates iPSCs bearing incomplete hydroxymethylation detected by 5hmC Capture-Seq in this region. (d) 5hmC+5mC single base density in 15 amplicons by traditional bisulfite sequencing in 2 hESC and 2 iPSC lines. (e) 5hmC single base density on CG dinucleotides and CH dinucleotides in one of the amplicons that are marked by blackdot in (c) by TAB-Seq in 2 human ES and 4 iPS cell lines. Green colour indicates iPSCs bearing the same hydroxymethylation detected by 5hmC Capture-Seq. Blue colour indicates iPSCs bearing incomplete hydroxymethylation detected by 5hmC Capture-Seq in this region. (f) Schematic summary of large scale incomplete hydroxymethylation on CG dinucleotides in iPS cells.
DISCUSSION
Our study suggests that the significant increase of 5hmC during reprogramming is mainly due to the activation of TET1 protein in human iPS cells, which is in contrast to the previous observations that both Tet1 and Tet2 are upregulated in mouse iPS cells. MouseESCs are different from humanESCs in many aspects, such as X-chromosome inactivation status in female lines[39]. From a cell signaling perspective, human pluripotency (primed pluripotency) depends mainly on FGF and Activin-Nodal signaling pathways, whereas mouse pluripotency (naïve/ground state pluripotency) is maintained by LIF-STAT pathways. The difference between human and mouseTET family proteins involved in reprogramming may be caused by FGF signaling selection of a subpopulation of hiPSCs. Several studies of generating naïve human iPSCs under LIF signaling have been reported[40, 41]. So it is possible that TET1 and TET2 have distinct roles in regulating pluripotency, with TET2 being involved in naïve pluripotency and TET1 functioning in primed pluripotency. On the other hand, it is possible that TET1-mediated 5hmC modification is unique in human regardless of different pluripotent stages. Since TET1/2 is dispensable for maintaining stem cells pluripotency, and their loss are compatible with embryonic and postnatal development[42], it is likely that TET2 expression is not under positive section for stem cell functions during evolution, thus eventually silenced in human pluripotent stages.Reprogramming induces a remarkable epigenomic reconfiguration throughout the somatic cell genome. Recently, it was shown that TET1 and TET2, in synergy with NANOG, enhance the efficiency of mouse iPS cells reprogramming[43]. Here we show TET1-mediated hydroxymethylation change is critical for optimal human iPS cells reprogramming. We further show that TET1-mediated-5hmC modification only affects reprogramming efficiency, but does not alter the essential pluripotency in human stem cells. The pathways involving TET1 regulation largely remain unknown. It would be interesting to know whether the known epigenetic factors such as DOT1L, Kdm2b, etc [44, 45] which are negative and positive modulators for reprogramming are linked to TET1-regulated hydroxymethylation modification.Human iPS cells hold great promise for regenerative medicine and for establishing models of specific diseases. iPS and ES cells are known to share key features of pluripotency, including the expression of pluripotency markers, teratoma formation, cell morphology, the ability to differentiate into germ layers, and tetraploid complementation[46]. Two models depict the equivalence, or lack thereof, between iPSCs and ESCs. One model posits there may be small but consistent differences between ESCs and iPSCs, as suggested before[36, 47]; the other model states that iPSCs and ESCs should be treated as two partially overlapping groups that share unique features. In this second model, single iPS cell lines cannot be distinguished from ES cell lines, though iPSCs shows more epigenetic variance. Mounting evidence supports the latter model[16, 17, 32]. Therefore, each iPSC may represent a unique epigenetic status with variable differentiation potential. The cause and degree of variation remain to be determined. Our study integrates the 5hmC epigenomic mark into the investigation of ES-iPS equivalence. We find that 5hmC occurs extensively in iPS cells at levels similar to ES cells, and there are no consistent 5hmC markers that can distinguish iPSCs from hESCs; however, we identified 20 regions in iPSCs that tend to form large scale (100kb-1.3Mb) aberrant reprogramming hotspots, supporting the current consensus that iPSCs are more epigenetically variable than ESCs. Remarkably, these regions with 5hmC variations tend to cluster in telomere-proximal regions. The close proximity of the hotspots to telomeres indicates there may be a distinct cellular process that could impede the reprogramming process.Almost none of the DhMRs overlap with CG-DMRs, suggesting CG-DMRs identified previously are primarily caused by DNA methylation. DNA methylation in non-CG contexts is abundant in pluripotent stem cells (mCHG and mCHH, where H = A, C or T), comprising almost 25% of all cytosines at which DNA methylation is identified. Strikingly, ~80% of large-scale iPS-ESDhMR regions coincide with previously reported non-CG DNA methylation aberrant hotspots[17]. Reciprocally, ~50% of non-CGDMRs overlaps with our identified DhMRs. It was reported that non-CGDMRs also occur in the peri-centromeric zones. Notably, these peri-centromeric regions contain low level of 5hmC (stem cells have similar levels of 5hmC as fibroblasts), suggesting cells do not need to establish 5hmC in these regions during reprogramming (Supplementary Fig. S7). Thus, the concordance occurs mainly at telomere proximal regions. By applying TAB-Seq, we show that incomplete hydroxymethylation occur predominantly at CG sites, but not CH sites, suggesting the co-existence of aberrant non-CG methylation and aberrant CG hydroxymethylation in these regions. During reprogramming, both CH methylation and hydroxymethylation need to be established de novo from the somatic epigenome. It is known that non-CGcytosine methylation is exclusively catalysed by Dnmt3a and Dnmt3b[48]. The concordance suggests there might be crosstalk between epigenetics pathways that regulate the activities of TET and DNMT3, which may behave more stochastically in those subtelomeric regions.In summary, our results indicate that TET1-mediated 5hmC modification contributes to both the human iPS cell reprogramming process and differences between iPSCs and hESCs. In particular, we identified 20 large-scale aberrant hotspots, suggesting iPSCs are more epigenetically variable than ESCs in terms of 5hmC modification. Our data suggest that, when studying aberrant epigenetic reprogramming events, as well as their functional consequences, at the DNA level, 5hmC modification merits particular consideration, in addition to 5mC.
METHODS
Methods and any associated references are available in the online version of this paper.
Authors: Matthew G Guenther; Garrett M Frampton; Frank Soldner; Dirk Hockemeyer; Maya Mitalipova; Rudolf Jaenisch; Richard A Young Journal: Cell Stem Cell Date: 2010-08-06 Impact factor: 24.633
Authors: Meelad M Dawlaty; Achim Breiling; Thuc Le; Günter Raddatz; M Inmaculada Barrasa; Albert W Cheng; Qing Gao; Benjamin E Powell; Zhe Li; Mingjiang Xu; Kym F Faull; Frank Lyko; Rudolf Jaenisch Journal: Dev Cell Date: 2013-01-24 Impact factor: 12.270
Authors: Ryan Lister; Mattia Pelizzola; Robert H Dowen; R David Hawkins; Gary Hon; Julian Tonti-Filippini; Joseph R Nery; Leonard Lee; Zhen Ye; Que-Minh Ngo; Lee Edsall; Jessica Antosiewicz-Bourget; Ron Stewart; Victor Ruotti; A Harvey Millar; James A Thomson; Bing Ren; Joseph R Ecker Journal: Nature Date: 2009-10-14 Impact factor: 49.962
Authors: K Kim; A Doi; B Wen; K Ng; R Zhao; P Cahan; J Kim; M J Aryee; H Ji; L I R Ehrlich; A Yabuuchi; A Takeuchi; K C Cunniff; H Hongguang; S McKinney-Freeman; O Naveiras; T J Yoon; R A Irizarry; N Jung; J Seita; J Hanna; P Murakami; R Jaenisch; R Weissleder; S H Orkin; I L Weissman; A P Feinberg; G Q Daley Journal: Nature Date: 2010-09-16 Impact factor: 49.962
Authors: Shinsuke Ito; Ana C D'Alessio; Olena V Taranova; Kwonho Hong; Lawrence C Sowers; Yi Zhang Journal: Nature Date: 2010-08-26 Impact factor: 49.962
Authors: Jason P Awe; Eric H Gschweng; Agustin Vega-Crespo; Jon Voutila; Mary H Williamson; Brian Truong; Donald B Kohn; Noriyuki Kasahara; James A Byrne Journal: Stem Cells Transl Med Date: 2015-01-09 Impact factor: 6.940
Authors: Mitsutoshi Yamada; Bjarki Johannesson; Ido Sagi; Lisa Cole Burnett; Daniel H Kort; Robert W Prosser; Daniel Paull; Michael W Nestor; Matthew Freeby; Ellen Greenberg; Robin S Goland; Rudolph L Leibel; Susan L Solomon; Nissim Benvenisty; Mark V Sauer; Dieter Egli Journal: Nature Date: 2014-04-28 Impact factor: 49.962