| Literature DB >> 25122086 |
Alex H Lang1, Hu Li2, James J Collins3, Pankaj Mehta1.
Abstract
A common metaphor for describing development is a rugged "epigenetic landscape" where cell fates are represented as attracting valleys resulting from a complex regulatory network. Here, we introduce a framework for explicitly constructing epigenetic landscapes that combines genomic data with techniques from spin-glass physics. Each cell fate is a dynamic attractor, yet cells can change fate in response to external signals. Our model suggests that partially reprogrammed cells are a natural consequence of high-dimensional landscapes, and predicts that partially reprogrammed cells should be hybrids that co-express genes from multiple cell fates. We verify this prediction by reanalyzing existing datasets. Our model reproduces known reprogramming protocols and identifies candidate transcription factors for reprogramming to novel cell fates, suggesting epigenetic landscapes are a powerful paradigm for understanding cellular identity.Entities:
Mesh:
Year: 2014 PMID: 25122086 PMCID: PMC4133049 DOI: 10.1371/journal.pcbi.1003734
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Figure 1Phenotypic landscape.
These are illustrative cartoons of the cell fate attractor landscape. (A) The minimal cellular identity landscape. Each cell fate is a basin of attraction (black circles). Reprogramming between different cell fates (1 and 2) can occur probabilistically via different trajectories (black paths). Partially reprogrammed cells (PRC) exist as smaller, spurious, basins of attraction (red circle) that can be experimentally observed by reprogramming experiments (example trajectory in red). (B) Same cellular identity landscape in the presence of a stabilizing environment (ex. favorable culturing medium) for cell fate 2. The environment increases the radius and depth of the cell fate 2 basin of attraction. (C) Landscape in the presence of an external signal that gives rise to differentiation from cell fate 1 to cell fate 2 (ex. growth factors associated with differentiation). Notice the low energy path between the cell fates that drives switching from cell fate 1 to cell fate 2.
Mathematical model of cell identity landscape.
| Landscape Term: Index Notation | Landscape Term: Matrix Notation (dim.) | Biological Interpretation |
|
| Total landscape. | |
|
|
| Produces cell basins of attraction. |
|
|
| External control of individual genes, i.e. inducible expression. |
|
|
| External control of specific cell basins, i.e. culturing conditions. |
|
|
| Cell switching by signals, i.e. |
|
| Number of TFs, labeled by | |
|
| Number of cell fates, labeled by | |
|
|
| State ( |
|
|
| State ( |
|
|
| Correlation between cell fate |
|
|
| Interaction strength between |
|
|
| External control of |
|
|
| External control of |
|
|
| Overlap of |
|
|
| Projection of |
|
|
| Predictivity of |
|
|
| Signal dependent coupling that drives cell fate |
This table provides a summary of the landscape model and the biological interpretation of each term. The first column is written in index notation, while the second column is the same term in matrix notation with the dimension of the term given in parenthesis. If no dimension is listed, the term is a single number.
Figure 2Overview of model.
(A) Histone 3 tri-methylation at lysine 4 (K4) is associated with active genes, while histone 3 tri-methylation at lysine 27 (K27) is associated with repressed genes. (B) Conditional probability distribution of histone modification (HM) given transcription factor (TF) expression levels derived by comparing microarray data with HM data from [36], [37]. Notice the sharp threshold (black line) between expression levels of active and inactive TFs. (C) For mathematical convenience, we take the continuous TF expression levels and convert it to binary states (z-score to and z-score to ). This binarization is consistent with the result from (B). (D) An arbitrary state is represented by a vector of , with each dimension in the vector space representing the state of a TF. The natural cell fates form a subspace (gray plane). The landscape model is based on the orthogonal projection of the TF state onto this subspace. (E) The dynamics of the landscape model for different initial conditions for a fully connected interaction matrix and a diluted (non-equilibrium) interaction matrix where 20% of interactions have been randomly deleted. Plot shows the projection of on embryonic stem cells (ESC) as function of time. Notice the large basins of attraction (red bracket). Parameters used were and burst errors of every spin updates. (F) Simulations showing how a common myeloid progenitor (CMP) can differentiate into either granulo-monocytic progenitors (GMP) or megakaryocyte-erythroid progenitors (MEP) in response to two distinct external signals. All trajectories used . For signal 1, we set and all other . For signal 2, we set and all other .
Partially reprogrammed cells as spurious attractors.
| Cell line | Start | Goal | Highest projecting states (projection) |
| A2 | MEF | ESC | ESC (0.178), MSC (0.158), myoblast (0.142), MEP (0.129), blood vessel (0.113), keratinocyte (0.112), medullary thymic epithelial (−0.111), adipose - brown (−0.117), NK (−0.130), CMP (−0.138) |
| B3 | MEF | ESC | ESC (0.222), MSC (0.161), blood vessel (0.139), myoblast (0.138), GMP (0.127), kidney (0.111), MEP (0.107), cornea (0.107), NK (−0.129) |
| BIV1+ | B Cell | ESC | myoblast (0.181), prostate (0.164), MSC (0.154), MEP (0.138), keratinocyte (0.136), cornea (0.125), ESC (0.111), intestine - Paneth cell (−0.111), CMP (−0.122) |
| BIV1- | B Cell | ESC | ESC (0.382), EpiSC (0.184), MEP (0.160), myoblast (0.145), NSC (−0.108), T Cell (−0.115), skeletal muscle (−0.117), CMP (−0.154) |
| MCV6 | MEF | ESC | MEP (0.155), myoblast (0.150), ESC (0.149), keratinocyte (0.145), CLP (0.107), GMP (0.107), cornea (0.107), CMP (−0.130) |
| MCV8 | MEF | ESC | ESC (0.203), MEP (0.191), myoblast (0.160), cornea (0.119), prostate (0.113), skeletal muscle (−0.141), CMP (−0.142) |
Partially reprogrammed cell lines (first column) and their significant projections (2 std above noise or ) onto “natural” cell fates based on microarray data. Bold indicates 3 std above noise or . Abbreviations: CLP, Common Lymphoid Progenitor; CMP, Common Myeloid Progenitor; EpiSC, epiblast stem cell; ESC, embryonic stem cell; GMP, Granulocyte-Monocyte Progenitor; MEF, mouse embryonic fibroblast; MEP, Megakaryocyte-Erythroid Progenitor; MSC, Mesenchymal stem cells; NK, Natural Killer cells; NSC, neural stem cells.
Figure 3Identifying reprogramming candidates.
For a given cell fate, we plot every differentially expressed transcription factor's (TF) predictivity (aka energy projection-contribution, ) vs TF expression level (z-score normalized). Unless otherwise stated all existing reprogramming protocols to a given cell fate are labeled. (A) Schematic illustrating predictivity vs expression level plots. The large positive (negative) predictivity and large positive (negative) gene expression TFs are candidates for over expression (knock out) in a reprogramming protocol. The TFs with z-score between and are highlighted in gray because Figure 2B suggests these TFs predictivity may be prone to extra noise induced by the data discretization. (B) Embryonic stem cell, ESC (induced pluripotent stem cells, iPSC). Original Takahashi and Yamanaka factors Pou5f1 (Oct 4), Sox2, Klf4, and Myc [1]. (C) Inset of ESC positive predictivity and gene expression. Zfp42 (Rex1) [40] and Nr0b1 (Dax1) [41] are pluripotency markers that are not necessary to overexpress for reprogramming, while combinations of the remaining labeled TFs have been successfully used in reprogramming protocols [8]. (D) Heart (induced cardiomyocytes, iCM) [3]. (E) Liver (induced hepatocytes, iHep). There are two published protocols. One protocol used Hnf4a plus any of Foxa1, Foxa2, or Foxa3 [4] while another used Gata4, Foxa3, Hnf1a, and deletion of p19Arf [5]. p19Arf was not differentially expressed in our microarrays and is not shown. (F) Thyroid [7]. (G) Neural Progenitor Cells, NPC (induced NPC, iNPC) used Pou3f2 (Brn2), Sox2, and Foxg1 [6]. With our microarrays we find that Foxg1 is not predictive for NPC but is predictive of neural stem cells (NSC) (see Figure S3). (H) Neurons (induced neuron, iN) [2]. The reprogramming protocol used a combination of factors that were known to be important to ether mature neurons (Myt1l) or NPCs (Pou3f2, Ascl1). (G) shows that Pou3f2 and Ascl1 are predictive of NPCs.