Jonathan Nissenbaum1, Ori Bar-Nur2, Eyal Ben-David3, Nissim Benvenisty1. 1. Stem Cell Unit, Institute of Life Sciences, The Hebrew University of Jerusalem, Israel ; Department of Genetics, Institute of Life Sciences, The Hebrew University of Jerusalem, Israel. 2. Stem Cell Unit, Institute of Life Sciences, The Hebrew University of Jerusalem, Israel ; Department of Genetics, Institute of Life Sciences, The Hebrew University of Jerusalem, Israel ; Massachusetts General Hospital Cancer Center and Center for Regenerative Medicine, 185 Cambridge Street, Boston, MA 02114, USA. 3. Department of Genetics, Institute of Life Sciences, The Hebrew University of Jerusalem, Israel.
Abstract
Molecular reprogramming of somatic cells into human induced pluripotent stem cells (iPSCs) is accompanied by extensive changes in gene expression patterns and epigenetic marks. To better understand the link between gene expression and DNA methylation, we have profiled human somatic cells from different embryonic cell types (endoderm, mesoderm, and parthenogenetic germ cells) and the iPSCs generated from them. We show that reprogramming is accompanied by extensive DNA methylation in CpG-poor promoters, sparing CpG-rich promoters. Intriguingly, methylation in CpG-poor promoters occurred not only in downregulated genes, but also in genes that are not expressed in the parental somatic cells or their respective iPSCs. These genes are predominantly tissue-specific genes of other cell types from different lineages. Our results suggest a role of DNA methylation in the silencing of the somatic cell identity by global nonspecific methylation of tissue-specific genes from all lineages, regardless of their expression in the parental somatic cells.
Molecular reprogramming of somatic cells into human induced pluripotent stem cells (iPSCs) is accompanied by extensive changes in gene expression patterns and epigenetic marks. To better understand the link between gene expression and DNA methylation, we have profiled human somatic cells from different embryonic cell types (endoderm, mesoderm, and parthenogenetic germ cells) and the iPSCs generated from them. We show that reprogramming is accompanied by extensive DNA methylation in CpG-poor promoters, sparing CpG-rich promoters. Intriguingly, methylation in CpG-poor promoters occurred not only in downregulated genes, but also in genes that are not expressed in the parental somatic cells or their respective iPSCs. These genes are predominantly tissue-specific genes of other cell types from different lineages. Our results suggest a role of DNA methylation in the silencing of the somatic cell identity by global nonspecific methylation of tissue-specific genes from all lineages, regardless of their expression in the parental somatic cells.
Forced expression of transcription factors in human somatic cells allows the generation of induced pluripotent stem cells (iPSCs) (Takahashi et al., 2007). These cells are equivalent to inner-cell mass-derived embryonic stem cells (ESCs) and hold a great promise for regenerative medicine and cell replacement therapy. The way somatic cells transition to the pluripotent state is not yet fully elucidated. The activation of the pluripotent state depends on the ability to upregulate a set of pluripotency genes. Unraveling the way by which pluripotent factors interact with the genome is key to understanding cellular reprogramming (Plath and Lowry, 2011). Although the thorough studies concerning the action of the pluripotent factors illuminate some aspect of the silencing of the somatic cell identity, the induction of pluripotency gene targets by itself is insufficient to solely explain the conversion of somatic cells into pluripotent cells, and other cellular processes need to occur for the erasure of the somatic cell identity. Methylation of cytosine in the context of CpG dinucleotides in gene promoters has been acknowledged for many years as a mechanism for regulation of gene expression in mammalian cells (Cedar and Bergman, 2009). Differential gene expression between somatic cells and ESC cells has been shown to be governed by methylation of gene promoters (Meissner et al., 2008). The genomic landscape affects the location and level of DNA methylation by the content of the CpG dinucleotides in a given genomic region. DNA methylation density varies in a CpG rich versus CpG poor regions (Hawkins et al., 2010; Lister et al., 2011). Overall, gene promoters are generally characterized by a high content of CpG dinucleotide (HCpG) known as well as CpG Islands, or by a low content of CpG dinucleotide (LCpG). Given the complex interplay between DNA methylation and gene expression, comprehensive correlation analysis can illuminate our understanding of the reprogramming process. Recent studies that have focused on DNA methylation profiling of different CpG regions during reprogramming, included limited expression analysis, mainly in the form of preselected genes sets with an a priori knowledge regarding their mode of action (Nishino et al., 2011; Weber et al., 2007). Other studies have focused on CpG regions from an opposite direction, i.e., the methylation processes that occur when pluripotent cells differentiate in culture (Brunner et al., 2009; Xie et al., 2013). Here, we set out to investigate the methylation and expression dynamics of somatic cells representative of three different embryonic cell types (mesoderm, endoderm, and teratoma cells derived from parthenogenetic germ cells) and their respective iPSCs. We thus aimed at deciphering the involvement of DNA methylation in silencing the somatic cell identity in the context of different somatic cells with distinct genetic and epigenetic backgrounds.
Results
To study the status of DNA methylation during cellular reprogramming, we have analyzed the gene expression and methylation profiles of somatic cells from three different lineages, representative of different embryonic germ-layers, and the iPSCs derived from them, as well as control human ESCs. For mesoderm, we have chosen human fibroblasts and the iPSCs (Fib-iPSCs) generated from them (Pick et al., 2009; Urbach et al., 2010). For endoderm, we have used humanpancreatic beta cells and beta-iPSCs (Bar-Nur et al., 2011), and for the germline we have used human parthenogenetic ovarian teratoma-derived cells and parthenogenetic iPSCs (Pg-iPSCs) generated from them (Stelzer et al., 2011). For each lineage, we have used between two and three iPSC clones in all analyses. We initially compared the somatic and pluripotent cells by gene expression microarrays. As expected, an unsupervised hierarchical clustering separated the somatic and pluripotent cells into two distinct groups (Figure 1A). Within the somatic group, further separation was observed based on the origin of the somatic cells; however, for the pluripotent cells, this distinction was only seen in the Pg-iPSCs versus other iPSCs, probably due to the lack of expression of paternal imprinted genes in the parthenogenetic cells (Figure 1A). Further separation of somatic versus pluripotent cells was observed following principal-component analysis (PCA; Figure 1B) that placed all pluripotent cells in one group while all somatic cell were clustered in a separate group. As expected, pluripotent cells, although generated from various origins, were clustered tightly together (Figure 1B). We then correlated each iPSC clone to the somatic cell from which it was derived (Figure 1C). This has allowed us to generate four different groups of genes: (1) genes that are upregulated following reprogramming (“pluripotency genes”), (2) genes that are downregulated following reprogramming (“somatic genes”), (3) genes that are not expressed in both cell types (“nonexpressed genes”), and (4) genes that are expressed in both cells types (“coexpressed genes”) (Figure 1C). This analysis shows that only a relatively small subset of genes are differentially expressed between iPSCs and the somatic cells from which they are derived (Figure 1C). Next, we looked at the overlap among the three different somatic lineages within each of the four gene expression categories. Coexpressed and nonexpressed genes show a high overlap (88% and 80%, respectively) among the three groups (Figure 1D). Somatic-specific genes show low overlap among the three groups (30%), which is expected since each cell type represents a different embryonic lineage. Lastly, the expression levels of well-known pluripotency related genes were significantly higher in all the pluripotent cells compared to the somatic cells (Figure 1E). We then profiled the methylation status of the cells. We have used Illumina’s Infinium Methylation 27 BeadChip Platform arrays that sample 27,578 CpG sites in promoters regions covering about 15,000 genes in the genome. Here, again, unsupervised hierarchical clustering separated the somatic and pluripotent cells into two distinct groups, with continued subdivision into smaller groups based on the cell type (Figure 2A). Similar to the above expression analysis, the PCA method was applied and a clear distinction between pluripotent and somatic cells was observed (Figure 2B). We next looked at the global methylation levels in each cells type. The Illumina array samples CpG across the genome and gives a β value for methylation in each CpG, ranging between 0 and 1 (where 0 is not methylated and 1 is fully methylated). CpG that is fully methylated (i.e., score of “1”) is representative to the sampled region. We divided the β values into three categories: (1) hypomethylated (β between 0 and 0.3), (2) hemimethylated (β between 0.3 and 0.6), and (3) hypermethylated (β between 0.6 and 1). The Illumina platform allows dividing the methylation of the genes into two distinct groups: CpG island promoters (characterized by HCpG content) and nonisland CpG promoters (characterized by LCpG content). The methylation patterns in the somatic cells show that they are more methylated in the LCpG and that CpG islands are overall not methylated (Figure 2C). All the iPSCs showed higher β values than their corresponding somatic cells and were similar to the levels observed in ESCs (Figure 2C), thus corroborating previous observations (Hawkins et al., 2010; Lister et al., 2011). The higher methylation levels observed in iPSCs were predominantly located in LCpG, while the HCpG promoters showed little change in methylation levels following reprogramming (p value < 0.0001 and p value = 0.61, respectively; Figure 2C). Our analysis shows that most of the genes having HCpG content show low levels of methylation, while only a small subset maintain a high degree of methylation. Overall, the methylation levels in these gene promoters does not dramatically change following reprogramming and they remain either hypo- or hypermethylated (Figure 2C). Our analysis also shows that methylation is much more prominent following reprogramming than demethylation (Figure 2C). The distribution of the distance of the CpG dinucleotides from the transcription start site (TSS) shows that most LCpG and HCpG sites are located in 1,000 bp range from the TSS (89% and 97%, respectively; Figure 2D), in accordance with other studies that compared tissue-specific genes and highly expressed housekeeping genes (characterized by nonisland CpG and CpG islands, respectively) (Brenet et al., 2011; Morita et al., 2012). In summary, our data show that reprogramming is accompanied by massive methylation of gene promoters that have a low number of CpG dinucleotides, resembling the state in ESCs. This de novo methylation occurred in all the iPSC clones, regardless of the starting cell type or lineage.
Figure 1
Expression Analysis of Somatic and Pluripotent Cell Lines
(A) Hierarchical clustering analysis based on mRNA expression. Somatic cells and their derived iPSCs are designated in dark/light matching colors.
(B) Principal-component analysis (PCA) based on RNA expression levels across somatic and pluripotent cell samples. Coloring indicates classification of samples into subgroups. Light brown and gray shades represent somatic and pluripotent cells, respectively.
(C) Scatterplot of somatic and iPSC gene expression. x axis = fibroblast cell expression; y axis = Fib-iPSC expression. Transcripts were divided into four gene expression categories: (1) coexpressed genes, (2) nonexpressed genes, (3) somatic-specific genes, and (4) pluripotency-specific genes (black, red, green, and blue, respectively).
(D) Gene overlap for the coexpressed genes, nonexpressed genes, and somatic genes are presented in the Venn diagrams. The beta cell (Beta), fibroblast (Fib), and parthenogenetic teratoma (Pg-Ter) lineages are represented by light green, light red, and purple, respectively.
(E) Gene expression of typical core pluripotency genes among the somatic and pluripotent cells. x axis = pluripotency genes for the somatic (s) and pluripotent (p) cells; y axis = gene expression level. Parental origin cell line and their respective iPSCs are designated by a different color (dark/light hues). Horizontal black line represents an expression threshold.
Figure 2
DNA Methylation Analysis of Somatic and Pluripotent Cell Lines
(A) Hierarchical clustering analysis based on DNA methylation. Somatic cells and their derived iPSCs are designated in dark/light matching colors.
(B) PCA based on CpG methylation levels across somatic and pluripotent cell samples. Coloring indicates classification of samples into subgroups. Light brown and gray shades represent somatic and pluripotent cells, respectively.
(C) Distribution of methylation scores across somatic and pluripotent cells. The three histograms represent the distribution of hypermethylated (>0.6), hemimethylated (0.3–0.6), and hypomethylated (<0.3) genes (red, blue, and green, respectively) for all island CpG and nonisland CpG sites. p values = Fisher’s exact test.
(D) CpG site distance from transcription start site (TSS). The histogram shows the distribution of distance from the TSS for the CpG island and nonisland CpG sites (orange and purple, respectively). y axis = distance from TSS in base pairs.
We next sought to look at the correlation between the methylation and expression levels of each somatic cell and the iPSCs that were generated from it. We looked at the methylation status for CpG island or nonisland CpG promoters in the four expression groups that we generated previously (Figure 1C). Although the overall methylation proportions of HCpG genes provide a rather static picture (Figure 2C), a certain fraction of the genes change their methylation status following reprogramming (Figure 3A). Plotting methylation versus expression revealed that many upregulated pluripotency genes underwent promoter demethylation following reprogramming, especially those that are located in HCpG promoters, as the vast majority of pluripotent genes are characterized by CpG island promoters (Figure 3A). Concomitantly, the somatic-specific gene group that is downregulated in iPSCs underwent extensive de novo methylation in LCpG rather than in HCpG promoters (Figures 3A and 3B; Figures S1 and S2 available online). Intriguingly, genes that are not expressed in either somatic cells or iPSCs generated from them showed a great increase in methylation in LCpG promoters, regardless of the parental somatic cell type (Figure 3B; Figure S2). The nonexpressed gene group was significantly different from the coexpressed genes in the LCpG category and also greatly differs from the nonexpressed gene group in the HCpG category (Figures 3B; Figure S2). This group of genes behaves in a similar fashion to the somatic-specific gene group, which shows a similar de novo methylation trend. This raised an interesting issue: why would genes that are not expressed by the somatic cells undergo extensive de novo methylation? To address this issue, we used the Amazonia database (http://amazonia.transcriptome.eu/; Le Carrour et al., 2010), which shows the expression levels of most human genes in hundreds of human tissue samples representing the cell repertoire of the human body. Analysis of the genes that are not expressed in any of the three somatic cell types or their respective iPSCs by the Amazonia database demonstrated that they are tissue-specific genes, each representing a different cell type (Figure 3C; Table S1) such as skin, intestine, blood, bone marrow, testis, or brain. The differences in methylation between the somatic and the pluripotent cells were highly significant for all cell-specific genes. A gene ontology annotation analysis of these genes showed enrichment for processes such as epidermal, keratinocyte, and epithelial cell differentiation and developmental processes.
Figure 3
Analysis of Combined Expression and Methylation Data
(A) Expression versus methylation. For fibroblast versus fib-iPSCs, island CpG and nonisland CpG are presented separately. x axis = expression ratio (log iPS cell expression − log somatic cell expression); y axis = methylation difference (iPS methylation − somatic cell methylation). A positive value on the y and x axes reflects higher methylation or expression in iPSCs than in the parental somatic cells and vice versa. Colors represent the four expression categories as in Figure 1B. Regression lines are presented for the somatic and pluripotent gene categories (green and blue, respectively), and shaded gray boundaries represent confidence interval (CI). See also Figure S1.
(B) Comparison of DNA methylation of fibroblast and their respective iPS cell line. y axis = methylation difference based on fib-iPSCs values minus fibroblast values. A positive value reflects higher methylation in iPSCs than in the parental somatic cells and vice versa. The box plots illustrate the methylation difference for each expression category (Figure 1C) and for the CpG and nonisland CpG groups. Each panel represents different cell origins (fibroblasts, beta cells, and parthenogenetic teratomas). p value = Student’s t test. See also Figure S2.
(C) Tissue-specific gene expression profile using the Amazonia database. The displayed genes, which are not expressed in the somatic (S) and pluripotent (P) cells, become hypermethylated during reprogramming. mRNA expression and methylation values are presented in the histograms. Left side = expression; right side = methylation. Center histogram: expression levels in various tissues and cells. GAPDH and SALL4 serve as control for housekeeping and pluripotency genes, respectively. Black line = expression threshold; dotted black line = threshold for hypermethylation and hypomethylation. See also Table S1.
Discussion
DNA methylation is an epigenetic modification that plays a pivotal role in the silencing of gene expression, such as the silencing of imprinted genes or genes that reside in the inactive X chromosome (Cedar and Bergman, 2009; Weber et al., 2007). How the role of DNA methylation in resetting the epigenome of somatic cells following reprogramming to pluripotency is not yet fully understood. Since reprogramming to pluripotency dictates the activation and repression of a subset of genes, it is expected that methylation will play major role in this process. Repression of lineage-specific genes that are not expressed in a given somatic cell can be maintained due to the fact that specific transcription factors are absent. This regulation can then be augmented by DNA methylation executed by de novo methylases. Our results, using three different somatic cells and their iPSC progenies, corroborates previous observations that reprogramming is accompanied by a wave of de novo methylation in iPSCs (Deng et al., 2009; Doi et al., 2009; Lister et al., 2011; Nagae et al., 2011; Nishino et al., 2011). Although the de novo methylase DNMT3b is expressed only in the pluripotent cells (Figure 1E), the trajectory in which hypermethylation is acquired can occur alongside gene repression, to precede the regulatory repression, or to occur after the gene is silenced. In addition, our work refines these observations by analyzing the expression levels of the cells, showing that this methylation occurs not only for the somatic-specific expressed genes, but also for many other nonexpressed genes.Our results can be explained by the following model. During reprogramming, the de novo DNA methyltransferases (DNMTs) methylates not only the promoters of the somatic-specific genes that are undergoing reprogramming, but also the promoters of most cell-specific genes, even if they are not expressed in the somatic cell undergoing reprogramming (Figure 4). This suggests that the DNMTs cannot separate between the different tissue-specific promoters of the cell undergoing reprogramming and that gene silencing occurs in an indiscriminate mode that does not distinguish between parental cells from different lineages. For example, the gene CMTMC5 is expressed predominantly in ectodermal cells of the adult brain; however, this gene undergoes de novo methylation in iPSCs that are generated from mesoderm, endoderm, and teratoma cells derived from parthenogenetic germ cells (Figure 3C). A different interpretation is that iPSCs undergo extensive aberrant methylation in vitro (Lister et al., 2011). However, the high concordance between methylation of the tissue-specific genes among our iPSCs that are generated from distinct lineages suggests otherwise. If indeed the methylation of tissue-specific genes represents an aberrant phenomenon, then we would not expect it to be consistent across three diverse cell types and many independent and different reprogramming experiments. In addition, a recent study shows that keratinocytes reprogram much faster than fibroblasts because they are more methylated than fibroblasts (Barrero et al., 2012). In this case, the DNMTs may not need to methylate all the tissue-specific gene promoters, thus enhancing reprogramming efficiency.
Figure 4
A Model that Illustrates the Possible Silencing of the Somatic Cell Identity following Reprogramming by DNA Methylation
Following reprogramming, pluripotency-related gene promoters undergo extensive DNA demethylation regardless of the starting somatic cell type. Somatic cell-specific gene promoters undergo extensive methylation (illustrated by a fibroblast cells). Other cell-specific gene promoters also undergo extensive DNA methylation, as the methylating enzymes cannot distinguish between various somatic cell-specific promoters, thus acting in a “blind fashion.”
In mouse ESCs, it was recently shown that the cells are hypermethylated when grown with serum; however, a switch to serum-free medium supplemented with mitogen-activated protein/extracellular signal-regulated kinase kinase (MEK) inhibitor or GSK3 inhibitor (2i medium) results in genome-wide hypomethylated mouse ESCs that resemble an earlier developmental “naïve state” (Leitch et al., 2013). Human ESCs may represent a later developmental stage than mouse ESCs, more similar to the mouse epiblast stem cell stage (Tesar et al., 2007). Human iPSCs resemble human ESCs, and thus the high degree of methylation we observe in both cell types may represent a depiction of their in vivo postimplantation state, just before the cells start to acquire their somatic cell identity. It will be interesting to examine the methylation status of naive human PSCs and see if they are relatively hypomethylated (Hanna et al., 2010). Which enzymes mediate the extensive de novo methylation during reprogramming is yet to be determined. A recent work shows that non-CpG methylation in PSCs is mediated by DNMT3a and DNMT3b, as knocking down these genes eliminated most of the non-CpG methylation (Ziller et al., 2011). It is likely that these enzymes are also responsible for the de novo methylation we detect in iPSCs in LCpG promoters as the expression of DNMT3b was upregulated in all our pluripotent cells (Figure 1E). How HCpG gene promoters are protected from the de novo methylation is yet not fully understood. Binding of proteins to CpG islands may interfere with the DNMTs attempt to methylate these HCpG promoters, but more work is needed to show if this is indeed the case. Finally, several recent works suggest that methylation plays a key role in mediating the ability of cells to differentiate (Bar-Nur et al., 2011; Kim et al., 2010; Lister et al., 2011; Nagae et al., 2011; Polo et al., 2010), sometime due to an epigenetic memory (Bar-Nur et al., 2011; Kim et al., 2010; Lister et al., 2011; Polo et al., 2010). We propose that a thorough dissection of the methylation status of each iPSCs will greatly benefit the use of iPSCs for direct differentiation protocols and in using the cells for potential cell replacement therapy.
Experimental Procedures
Generation and Culture of iPSCs
Generation of fibroblast iPSCs, beta iPSCs, and parthenogenetic iPSCs were reported previously (Bar-Nur et al., 2011; Pick et al., 2009; Stelzer et al., 2011). See the Supplemental Experimental Procedures for full culture conditions.
DNA and mRNA Extraction
Total genomic DNA was extracted using a genomic DNA extraction kit (RBC). Total RNA (DNase-treated) was extracted using the RNAeasy Mini Kit (QIAGEN).
DNA Expression and Methylation Microarray Analyses
RNA and DNA were subjected to either Human Gene 1.0 ST microarray platform (Affymetrix) or to HumanMethylation27 BeadChip (Illumina) analysis, respectively. See the Supplemental Experimental Procedures for further details.
Authors: R David Hawkins; Gary C Hon; Leonard K Lee; Queminh Ngo; Ryan Lister; Mattia Pelizzola; Lee E Edsall; Samantha Kuan; Ying Luu; Sarit Klugman; Jessica Antosiewicz-Bourget; Zhen Ye; Celso Espinoza; Saurabh Agarwahl; Li Shen; Victor Ruotti; Wei Wang; Ron Stewart; James A Thomson; Joseph R Ecker; Bing Ren Journal: Cell Stem Cell Date: 2010-05-07 Impact factor: 24.633
Authors: Alayne L Brunner; David S Johnson; Si Wan Kim; Anton Valouev; Timothy E Reddy; Norma F Neff; Elizabeth Anton; Catherine Medina; Loan Nguyen; Eric Chiao; Chuba B Oyolu; Gary P Schroth; Devin M Absher; Julie C Baker; Richard M Myers Journal: Genome Res Date: 2009-03-09 Impact factor: 9.043
Authors: Akiko Doi; In-Hyun Park; Bo Wen; Peter Murakami; Martin J Aryee; Rafael Irizarry; Brian Herb; Christine Ladd-Acosta; Junsung Rho; Sabine Loewer; Justine Miller; Thorsten Schlaeger; George Q Daley; Andrew P Feinberg Journal: Nat Genet Date: 2009-11-01 Impact factor: 38.330
Authors: Jacob Hanna; Albert W Cheng; Krishanu Saha; Jongpil Kim; Christopher J Lengner; Frank Soldner; John P Cassady; Julien Muffat; Bryce W Carey; Rudolf Jaenisch Journal: Proc Natl Acad Sci U S A Date: 2010-05-04 Impact factor: 11.205
Authors: Ryan Lister; Mattia Pelizzola; Yasuyuki S Kida; R David Hawkins; Joseph R Nery; Gary Hon; Jessica Antosiewicz-Bourget; Ronan O'Malley; Rosa Castanon; Sarit Klugman; Michael Downes; Ruth Yu; Ron Stewart; Bing Ren; James A Thomson; Ronald M Evans; Joseph R Ecker Journal: Nature Date: 2011-02-02 Impact factor: 49.962
Authors: K Kim; A Doi; B Wen; K Ng; R Zhao; P Cahan; J Kim; M J Aryee; H Ji; L I R Ehrlich; A Yabuuchi; A Takeuchi; K C Cunniff; H Hongguang; S McKinney-Freeman; O Naveiras; T J Yoon; R A Irizarry; N Jung; J Seita; J Hanna; P Murakami; R Jaenisch; R Weissleder; S H Orkin; I L Weissman; A P Feinberg; G Q Daley Journal: Nature Date: 2010-09-16 Impact factor: 49.962