Literature DB >> 27281220

Dissecting direct reprogramming from fibroblast to neuron using single-cell RNA-seq.

Barbara Treutlein^1,2, Qian Yi Lee^1,3,4, J Gray Camp⁵, Moritz Mall^3,4, Winston Koh¹, Seyed Ali Mohammad Shariati⁶, Sopheak Sim³, Norma F Neff¹, Jan M Skotheim⁶, Marius Wernig^3,4, Stephen R Quake^1,7.

Abstract

Direct lineage reprogramming represents a remarkable conversion of cellular and transcriptome states. However, the intermediate stages through which individual cells progress during reprogramming are largely undefined. Here we use single-cell RNA sequencing at multiple time points to dissect direct reprogramming from mouse embryonic fibroblasts to induced neuronal cells. By deconstructing heterogeneity at each time point and ordering cells by transcriptome similarity, we find that the molecular reprogramming path is remarkably continuous. Overexpression of the proneural pioneer factor Ascl1 results in a well-defined initialization, causing cells to exit the cell cycle and re-focus gene expression through distinct neural transcription factors. The initial transcriptional response is relatively homogeneous among fibroblasts, suggesting that the early steps are not limiting for productive reprogramming. Instead, the later emergence of a competing myogenic program and variable transgene dynamics over time appear to be the major efficiency limits of direct reprogramming. Moreover, a transcriptional state, distinct from donor and target cell programs, is transiently induced in cells undergoing productive reprogramming. Our data provide a high-resolution approach for understanding transcriptome states during lineage differentiation.

Entities: CellLine Chemical Disease Gene Species

Mesh：

Substances：

Year: 2016 PMID： 27281220 PMCID： PMC4928860 DOI： 10.1038/nature18323

Source DB: PubMed Journal: Nature ISSN： 0028-0836 Impact factor: 49.962

Direct lineage reprogramming bypasses an induced pluripotent stage to directly convert somatic cell types. Using the three transcription factors Ascl1, Brn2 and Myt1l (BAM), mouse embryonic fibroblasts (MEFs) can be directly reprogrammed to induced neuronal (iN) cells within 2 to 3 weeks at an efficiency of up to 20%[8]. Several groups have further developed this conversion using transcription factor combinations that almost always contain Ascl1[9-12]. Recently, we found that Ascl1 is an “on target” pioneer factor initiating the reprogramming process[13], and inducing conversion of MEFs into functional iN cells alone, albeit at a much lower efficiency compared to BAM[14]. These findings raised the question whether and when a heterogeneous cellular response to the reprogramming factors occurs during reprogramming and which mechanisms might cause failure of reprogramming. We hypothesized that single-cell RNA-seq could be used as a high resolution approach to reconstruct the reprogramming path of MEFs to iN cells and uncover mechanisms limiting reprogramming efficiencies[4,15,16]. In order to understand transcriptional states during direct conversion between somatic fates, we measured 405 single-cell transcriptomes (Supplementary Data 1) at multiple time points during iN cell reprogramming (Figure 1a, Extended Data Figure 1a). We first explored how individual cells respond to Ascl1 overexpression during the initial phase of reprogramming. We analyzed d0 and d2 Ascl1-only cells using PCA and identified 3 distinct clusters (cluster A, B, C), which correlated with the level of Ascl1 expression (Figure 1b–e). Cluster A consisted of all control d0 MEFs and a small fraction of d2 cells (~12%) which showed no detectable Ascl1 expression, suggesting these d2 cells were not infected with the Ascl1 virus. This is consistent with typical Ascl1 infection efficiencies of about 80–90%. We found that the d0 MEFs were surprisingly homogeneous, with much of the variance due to cell cycle (Extended Data Figure 1b–g, Supplementary Data 3, SI). Cluster C was characterized by high expression of Ascl1, Ascl1 target genes (Zfp238, Hes6, Atoh8, etc), and genes involved in neuron remodeling, as well as the down-regulation of genes involved in cell cycle and mitosis (Figure 1c,e-f and Supplementary Data 2). Cluster B cells represent an intermediate population that expressed Ascl1 at a low level, and were characterized by a weaker up-regulation of Ascl1 target genes and less efficient down-regulation of cell cycle genes compared to cluster C cells. This suggests that an Ascl1 expression threshold is required to productively initiate the reprogramming process. In addition, we found that forced Ascl1 expression resulted in less intracellular transcriptome variance, a lower number of expressed genes (Figure 1d) and a lower total number of transcripts per single cell (Extended Data Figure 2a–b). Notably, the distribution of average expression levels per gene was similar for all experiments independent of Ascl1 overexpression (Extended Data Figure 2c). We observed that the up-regulation of neuronal targets and down-regulation of cell cycle genes in response to Ascl1 expression are uniform, indicating that the initial transcriptional response to Ascl1 is relatively homogenous among all cells (Figure 1e). This suggests that most fibroblasts are initially competent to reprogram and later events must be responsible for the moderate reprogramming efficiency of about 20%.

Figure 1

Ascl1 overexpression elicits a homogeneous early response and initiates expression of neuronal genes

(a) Mouse embryonic fibroblasts stably integrated with neuronal reporter Tau-EGFP[8] were directly transformed to neuronal cells through overexpression of a single (Ascl1), or three factors (Brn2, Ascl1, Myt1l; BAM) as described[8]. Cells were sampled using single-cell RNA-seq at day 0 without infection (d0, 73 cells), day 2 (d2, 81 cells Ascl1-infected and 47 cells clonal), day 5 (d5, 55 cells, EGFP+ and EGFP− cells), day 20 (d20, 33 cells, EGFP+ cells), and day 22 (d22, 73 cells, EGFP+ cells) post-induction with Ascl1. As a comparison, cells reprogrammed using all three BAM factors were analyzed at 22 days (d22, 43 cells, EGFP+ cells).

(b–c) PCA of single-cell transcriptomes from d0 MEFs (circle, 73 cells) and d2 Ascl1-induced cells (square, 81 cells) shows reduced intercellular variation at d2. Points are colored based on (b) hierarchical clustering shown in (e), or (c) Ascl1 expression.

(d) Left, Distribution of transcriptome variance within single cells grouped by cluster assignment of (b) shows that Ascl1 expression reduces the intracellular transcriptome variance. Right, Distribution of total number of genes expressed by single cells grouped by cluster assignment shows that Ascl1 overexpression reduces the range of gene expression.

(e) Hierarchical clustering of d0 and d2 cells (rows) using the top 50 genes (columns) correlating (Genes I) and anticorrelating (Genes II) with PC1. Cells are clustered into three clusters (left sidebar): A (83 cells, MEFs), B (20 cells, intermediates), C (51 cells, d2 induced cells).

(f) Top gene ontology enrichments of genes I and II (d) are shown with Bonferroni corrected p-values.

(g) Distribution of PC1 loadings are shown for d2 cells carrying variable numbers of Ascl1 transgene copies (dark green, Ascl1-infected) or carrying the same Ascl1 copy number and genomic location (yellow, clonal). PC1 effectively separates un-induced MEFs (cluster A) from induced cells highly expressing Ascl1 target genes (cluster C) and both, Ascl1-infected and clonal cells, productively initiate reprogramming. The induction efficiency is higher for clonally induced MEFs, however even in the clonal population Ascl1 induction is variable.

Extended Data Figure 1

The majority of MEFs are actively undergoing cell cycle, but exit cell cycle upon Ascl1 induction

(b) Boxplots of intercellular transcriptome variance showed that MEFs are more heterogeneous than mouse embryonic stem cells under 2iLIF culture conditions[45] and less heterogeneous than glioblastoma cells[46].

(c) PCA of genes with most variance in d0 MEFs revealed MEF heterogeneity (blue, A). Density plot showing the distribution of number of cells along PC1 loading is shown above the PCA plot.

(d) Heatmap and hierarchical clustering of genes used for the PCA in panel (c) shows 2 major MEF subpopulations. Each column represents a single cell, and each row a gene. Subpopulation A is highlighted in blue in the dendrogram.

(e) GO enrichment for genes in (c) shows that MEF subpopulation A is distinguished by the low or lack of expression of genes enriched for cell cycle terms.

(f–g) PCA and heatmap of the same genes used in panels (c–e), this time including d0 MEFs (circles, light green) and d2 cells (squares, dark green), showed that most of the d2 cells had the same cell cycle signature as MEF subpopulation A. Cells in columns of both heatmaps are ordered based on PC1 loading.

Extended Data Figure 2

Total number of transcripts per cell decreases during MEF-to-iN cell reprogramming

(b) Boxplots showing the distribution of the total number of transcripts per single cell for each experiment. Number of transcripts per cell were calculated from the FPKM values of all genes in each cell using the correlation between number of transcripts of exogenous spike-in mRNA sequences and their respective measured mean FPKM values (calibration curves are shown in panel a). The total number of transcripts expressed by a single cell and detected by single-cell RNA-seq is highest in MEFs and is more than 2-fold decreased upon overexpression of Ascl1 or BAM.

(c) Boxplots showing the distribution of the median transcript number per gene across all cells of one experiment. The distributions are similar over the course of iN cell reprogramming.

To explore the effect of transgene copy number variation on the heterogeneity of the early response, we analyzed single-cell transcriptomes of an additional 47 cells induced with Ascl1 for 2 days from secondary MEFs derived via blastocyst injection from a clonal, Ascl1-inducible ES cell line. As expected, the induction efficiency of Ascl1 was 100% since the secondary MEFs are genetically identical and all cells carry the transgene in the same genomic location (Figure 1g). Nevertheless, these clonal MEFs had similar transcriptional responses and heterogeneity as primary infected MEFs at the d2 time point, as well as comparable reprogramming efficiencies and maturation (Extended Data Figure 3a). Finally, we compared the early response in our Ascl1-only single-cell RNA-seq data with our previously reported bulk RNA-seq data of Ascl1-only and BAM-mediated reprogramming[13] (Extended Data Figure 3b). We found similar downregulation of MEF related genes and upregulation of pro-neural marker genes in both Ascl1 and BAM mediated reprogramming. These data suggest that the overexpression of Ascl1 focuses the transcriptome and directs the expression of target genes.

Extended Data Figure 3

Clonal MEFs reprogram successfully into iN cells and Ascl1-only and BAM induce similar responses during early iN cell reprogramming

(b) Barplots showing expression of Ascl1 target genes (Hes6, Zfp238, Snca, Cox8b, Bex1, Dner) and MEF marker genes averaged across single cells from d0 MEFs and d2 Ascl1-only cells, as well as from bulk RNA-seq data from MEFs, d2 BAM, and d2 Ascl1-only cells. This data shows that the initiation of reprogramming at d2 is similar for Ascl1 alone- and BAM-mediated reprogramming.

We next analyzed the transcriptomes of reprogramming cells on day 5. At this timepoint, the first robust Tau-EGFP signal can be detected in successfully reprogramming cells and we therefore purified 40 Tau-EGFP+ and 15 Tau-EGFP− cells for transcriptome analysis by fluorescence activated cell sorting. We found that Tau-EGFP− cells lacked expression of neuronal Ascl1 target genes (Genes B), and maintained expression of fibroblast-associated genes (Genes A and C; Figure 2a–b, Extended Data Figure 4a, Supplementary Data 4). In addition, we found a positive correlation (R2=0.49) between Ascl1 expression and Tau-EGFP intensities (Extended Data Figure 4b, Figure 2a–b). RT-qPCR and western blot analysis of Ascl1 expression on d5-d12 Tau-EGFP sorted cells validated a significant decrease in Ascl1 expression in Tau-EGFP− cells compared to Tau-EGFP+ cells (Figure 2c, Supplementary Data 5). Thus, Ascl1 expression is correlated to Tau-EGFP levels and expression of neuronal genes at d5. This raises the hypothesis that Ascl1 is silenced in cells that fail to reprogram. Alternatively, cells with low or no Ascl1 expression at d5 and d22 might have never highly expressed Ascl1. To distinguish between these two mechanisms, we used live cell microscopy to track cells over a time course from 3–6 days post Ascl1 induction using an EGFP-Ascl1 fusion construct (Figure 2d, Extended Data Figure 5). We immunostained the cells at d6 using Tuj1 antibodies recognizing the neuronal β-3-tubulin Tubb3 to identify cells that differentiated towards neuronal fate. We found that transgenic Ascl1 protein levels varied substantially over time and, on average, continued to increase over time in Tuj1+ cells, but decreased or plateaued in Tuj1− cells, leading to a significant difference in Ascl1 expression within 6 days of Ascl1 induction (Figure 2e, Extended Data Figure 4c). This time-lapse analysis demonstrated that Ascl1 is silenced in many cells that fail to reprogram.

Figure 2

Transgenic Ascl1 silencing explains early reprogramming failure

(a) Hierarchical clustering of d5 cells using genes correlating and anticorrelating with PC1 and PC2 from PCA of d5 Ascl1-only cells. Note that EGFP fluorescent intensity and Ascl1 mRNA expression shown in the left side bar appear correlated.

(b) Violin plots show the distribution of Ascl1 and neuronal marker Tubb3 in d0 MEFs, as well as Tau-EGFP+ and Tau-EGFP− d5 cells.

(c) qRT-PCR for exogenous Ascl1 expression (top, n=4, biological replicates) and western blot of Ascl1 protein levels (bottom, Supplementary Data 5) for unsorted control MEFs and d2 cells, as well as d5, d7, d10 and d12 cells FAC-sorted using Tau-EGFP as a neuronal marker. Both RNA and protein levels of Ascl1 are significantly higher in Tau-EGFP+ cells, and gradually decrease in Tau-EGFP− cells. (* p< 0.05, ** p < 0.01, *** p < 0.001, two-tailed t-test, error bars: s.e.m.)

(d) Schematic for live cell imaging experiment. CD1 MEFs were infected with an EGFP-Ascl1 construct at −1d, induced with doxycycline at d0, switched to N3 media at d1 and imaged between 3 and 6 days post-doxycycline. Cells were fixed at 6 days and stained for Tubb3 expression.

(e) Average EGFP-Ascl1 intensity (error bars: s.e.m.) was plotted at 45min intervals for Tuj1+ (n=10) and Tuj1− (n=12) cells between d3 and d6. Tuj1+ cells significantly (one-tailed t-test, p<0.05) increased Ascl1 expression through time compared to Tuj1− cells, which appeared to silence Ascl1.

Extended Data Figure 4

Failed reprogramming at d5 correlates with silencing of Ascl1

(a) Bonferroni corrected p-values for gene ontology enrichments are shown for each group of genes from Figure 2a, with representative genes listed (Supplementary Data 4).

(b) Biplot showing Tau-EGFP fluorescence intensity as a function of Ascl1 transcript level in d5 cells. Point size is proportional to EGFP transcript levels in log2(FPKM). There is a positive correlation (R2=0.49) indicating that cells with higher Ascl1 expression are more likely to reprogram.

(c) Heatmap of EGFP-Ascl1 expression in 14 individual cells (columns) during live cell imaging. Rows represent time post Ascl1 induction in 45 min intervals.

Extended Data Figure 5

Live cell imaging shows diminishing of EGFP-Ascl1 signal in cells that fail to reprogram

(b) Representative images from live cell imaging showing an example of diminishing of EGFP signal in a cell that failed to reprogram (i.e. cell was Tuj1-negative at d6).

(d) EGFP imaging of live cells 6 days post induction of Ascl1 and corresponding immunostaining for Tubb3 after fixation.

We next analyzed the maturation events occurring during late reprogramming stages. We performed principal component analysis (PCA) on the single-cell transcriptomes of all reprogramming stages analyzed, including d22 cells reprogrammed with Ascl1 alone or with all three BAM factors (Extended Data Figure 6a). PC1 separated MEFs and early time points (d2, d5) from most of the d22 cells. Surprisingly, PC2 separated most d22 BAM cells from d22 Ascl1-only cells despite robust Tau-EGFP expression in both groups. We used t-distributed Stochastic Neighbor Embedding (t-SNE) to organize all d22 cells into transcriptionally distinct clusters, and identified differentially expressed genes marking each cluster (Figure 3a). We identified 3 clusters, which contained cells expressing neuron (Syp), fibroblast (Eln), or myocyte (Tnnc2) marker genes, respectively (Figure 3b). Consistent with this marker gene expression, cells in each cluster had a maximum correlation with bulk RNA-seq data from either purified neurons, embryonic fibroblasts, or myocytes (Figure 3c). Neuron- and myocyte-like cells expressed a clear signature of each cell type (Figure 3d). Although we observed cells with complex neuronal morphologies in the Ascl1-only reprogramming experiments as we had reported previously[14] (Figure 3e), their frequency was too low to be captured in the single-cell RNA-seq experiments. All of the d22 Ascl1-only cells, and 33% of BAM cells had a highest correlation with myocytes or fibroblasts.

Extended Data Figure 6

Brn2 and Myt1l repress alternative fates that compete with the iN cell fate during advanced Ascl1 reprogramming

(b) Overview of quadratic programming. Fractional identities are calculated assuming a linear combination of different cell fates.

(c) Biplots showing the fractional fibroblast identity as a function of fractional neuron (left) and fractional myocyte (right) identity for each cell with points shaped and colored-coded based on reprogramming timepoint and condition.

(d) Correlation of transcriptomes from d0, d2, d5, and d20/22 cells (Ascl1-only and BAM induced) with bulk RNA-seq from MEFs, cortical neurons and myocytes. Bottom bars show Tau-EGFP fluorescence intensity.

(e) Barplot quantifying the number of cells with a maximum correlation to bulk RNA-seq data from each of the observed fates (panel d).

(f) Immunofluorescent detection of Tau-EGFP (green), DAPI (blue), Myh3 (red) and Tubb3 (cyan) for d22 cells that were infected with Ascl1 co-infected with Brn2 or Myt1l. See Figure 3e for respective data for cells infected with Ascl1-only or all three BAM factors. Images are representative for 4 biological replicates. Right: Mean fractions of EGFP+ cells that express either Tubb3 or Myh3. Only Tubb3+ cells with a neuronal morphology were counted. Co-expression of Ascl1 with Brn2 and/or Myt1l increases fraction of Tau-EGFP+ cells that are also Tubb3+, while decreasing the number of cells that are Myh3+. 6–7 images were analyzed for each of 4 biological replicates. Error bars: s.e.m.

(g–i) qRT-PCR of selected (g) myogenic, (h) neuronal, and (i) fibroblast markers using d22 cells that are infected with Ascl1 only or co-infected with Brn2 or Myt1l or both and FAC-sorted by Tau-EGFP (n=3, biological replicates, error bars: s.e.m.). Myogenic genes were significantly downregulated in TauGFP+ cells that were co-infected with Brn2 and/or Myt1l compared to those infected with Ascl1 alone, while some neuronal genes are significantly upregulated (Map2, Gria) (* p < 0.05, ** p < 0.01, *** p < 0.001, two-tailed t-test).

Figure 3

iN cell maturation competes with an alternative myogenic cell fate that is repressed by Brn2 and Myt1l

(a) tSNE reveals alternative cell fates that emerge during direct reprogramming. Shapes and colors indicate the d20/22 Ascl1-only (dark green) or d22 BAM factor (blue) induced cells. Note that all cells are Tau-EGFP+.

(b–c) tSNE plot from (a) with cells colored based on (b) expression level of marker genes, or (c) correlation with bulk RNA-seq data from different purified cell types (neurons[24], myocytes[25], fibroblasts[13]).

(d) Heatmap showing expression of genes marking the two alternative fates in d20/22 Ascl1-only (upper sidebar, dark green) and d22 BAM (upper sidebar, blue) Tau-EGFP+ cells. Genes (rows) have the highest positive and negative correlation with the first principal component in a PCA analysis on all d20/d22 cells and all genes. Columns represent 121 single cells, ordered based on their correlation coefficient with the first principal component. Lower sidebars: Ascl1 transcript level and Tau-EGFP fluorescence for each cell.

(e) Immunofluorescent detection of Tau-EGFP (green), DAPI (blue), Myh3 (red) and Tubb3 (cyan) for d22 cells infected with Ascl1 alone, or with all BAM factors. Images are representative for 4 biological replicates. Right: Mean fractions of EGFP+ cells that express either Tubb3 or Myh3. Only Tubb3+ cells with a neuronal morphology were counted. 6–7 images were analyzed for each of 4 biological replicates. Error bars: s.e.m..

We applied an analytical technique based on quadratic programming to quantify fate conversion and to predict when during reprogramming the alternative muscle program emerges (Extended Data Figure 6b). This method allowed us to decompose each single cell’s transcriptome and express each cell’s identity as a linear combination of the transcriptomes from the 3 different observed fates (neuron, MEF, myocyte, Supplementary Data 6). Using this method, we observed that there is an initial loss of MEF identity concomitant with an increase in neuronal and myocyte identity over the first five days of Ascl1 reprogramming. The neuronal identity is maintained and matures in d22 cells transduced with BAM (Extended Data Figure 6c). However, the d22 Ascl1-only cells failed to mature to neurons and adopted a predominantly myogenic transcriptional program. This divergence was already apparent in some d5 cells (Extended Data Figure 6d,e). These findings raised the question whether the additional two reprogramming factors Brn2 and Myt1l suppress the aberrant myogenic program. Compatible with this notion, we observed that Brn2 and Myt1l were lowly expressed in the 5 d22 BAM cells that expressed a myogenic program. To directly address this question, we infected MEFs with Ascl1 alone or in combination with Brn2 and/or Myt1l and assessed myogenic and neurogenic fates at d22 based on immunostaining and qRT-PCR (Figure 3e, Extended Data Figure 6f–i). Indeed, myocyte markers (Myh3, Myo18b, Tnnc2) were upregulated in Tau-EGFP positive versus negative cells and were strongly repressed when Brn2 and/or Myt1l was overexpressed together with Ascl1. Moreover, Brn2 and Myt1l enhanced the expression of the synaptic genes Gria2, Nrxn3, Stmn3, and Snap25 but not the immature pan-neuronal genes Tubb3, Map2. As expected, fibroblast markers were repressed in Tau-EGFP+ cells. We next set out to reconstruct the reprogramming path from MEFs to iN cells. By deconstructing heterogeneity at each time point as described above, we removed cells that appeared stalled in reprogramming due to Ascl1 silencing or cells converging on the alternative myogenic fate. We used quadratic programming to order the cells based on fractional similarity to MEF and neuron bulk transcriptomes. This revealed a continuum of intermediate states through the 22-day reprogramming period (Figure 4a,b). Notably, the total number of transcripts per single cell decreased as a function of fractional neuron identity (Extended Data Figure 7a). Our ordering of cells based on fractional identities correlated well with pseudotemporal ordering using Monocle[15], an alternative algorithm for delineating differentiation paths (Extended Data Figure 7b–d). Heatmap visualization of genes identified by PCA of all cells on the iN cell lineage revealed two gene regulatory events during reprogramming with many cells at intermediate stages (Figure 4c, Supplementary Data 7). First, there is an initiation stage where MEFs exit the cell cycle upon Ascl1 induction, and genes involved in mitosis are turned down or off (such as Birc5, Ube2c, Hmga2). Concomitantly, genes associated with cytoskeletal reorganization (Sept3/4, Coro2b, Ank2, Mtap1a, Homer2, Akap9), synaptic transmission (Snca, Stxbp1, Vamp2, Dmpk, Ppp3ca), and neural projections (Cadm1, Dner, Klhl24, Tubb3, Mapt (Tau)) increase in expression. This indicates that Ascl1 induces genes involved in defining neuronal morphology early in the reprogramming process. The initiation phase is followed by a maturation stage whereby MEF extracelluar matrix genes are turned off and genes involved in synaptic maturation are turned on (Syp, Rab3c, Gria2, Syt4, Nrxn3, Snap25, Sv2a). These results are consistent with our previous finding that Tuj1+ cells with immature neuron-like morphology can be found as early as 3 days after Ascl1-induction, while functional synapses are only formed 2 to 3 weeks into the reprogramming process[8]. Finally, we constructed a transcription regulator (TR) network based on pairwise correlation of TR expression across all stages of the MEF-to-iN cell reprogramming. This revealed three densely connected sub-networks identifying TRs regulating MEF cell biology, iN cell initiation, and iN cell maturation, respectively (Figure 4d, Extended Data Figure 8, Supplementary Data 8, SI). Notably, Ascl1 was found to correlate strongly with the TRs in both the initiation and maturation subnetworks and anticorrelate with TRs specific to MEFs. This data corroborates evidence that persistent Ascl1 expression is required to maintain chromatin states conducive to iN cell maturation[13].

Figure 4

Reconstructing the direct reprogramming path from MEFs to iN cells

(a) Top: For each cell on the iN cell reprogramming path, the similarity to bulk RNA-seq from either MEFs[13] or neurons[24] was calculated using quadratic programming and plotted as fractional identities (left axis, circle, fractional MEF identity; right axis, triangle, fractional neuron identity). Points are colored based on the experimental time point. Bottom: Lagrangian residuals of the quadratic programming for each single cells ordered based on their fractional identity as above. Points are colored based on the experimental time point.

(b) Fractional neuron identities of all cells on the iN cell reprogramming path are shown as a function of the experimental time point.

(c) Ordering of single-cells (rows) according to fractional neuron identity revealed a cascade of gene expression changes leading to neuronal identity. Genes (columns) with the highest correlation and anti-correlation to PC1 and PC2 are shown. Left sidebars: Experimental time-point (green/blue) and fractional neuron identity (yellow/red); Right sidebars: Ascl1 transcript levels (log2(FPKM), blue/yellow) and EGFP fluorescence intensities (log10(RFU), black/white).

(d) Transcriptional regulator (TR) covariance network during iN cell lineage progression. Shown are nodes (TRs) with more than 3 edges, with each edge reflecting a correlation >0.25 between connected TRs.

(e) Fractional MEF (left axis) or fractional neural precursor cell (NPC) identities (right axis) are plotted against fractional neuron identity for single cells on the MEF-to-iN cell lineage. Points are shaped based on the experiment.

(f) Expression of selected genes (columns) that mark NPCs, intermediate progenitor cells (IPCs), neurons, or proliferating cells (Prolif.) are shown for cells on the iN cell lineage (rows). Left sidebars: fractional neuron identity (yellow/red) and experimental timepoint (green/blue).

Extended Data Figure 7

Comparison of Monocle and quadratic programming with respect to ordering of neuronal cells through the reprogramming path

(b) Cells (depicted as circles) are arranged in the 2-D independent component space based on the expression of genes used for quadratic programming in Figure 4a. Lines connecting cells represent the edges of a minimal spanning tree with the bold black line indicating the longest path. Time points are color-coded.

(c) Monocle plots with single cells colored based on gene expression that distinguishes the stages of iN cell reprogramming.

(d) Biplot shows the correlation between ordering of cells based on pseudotime (Monocle) and fractional identity (Quadratic programming). Time points are color-coded. Pearson correlation coefficient = 0.91.

Extended Data Figure 8

Neuronal maturation proceeds through expression of distinct transcriptional regulators

(a) Correlogram showing transcriptional regulators (TRs) highly correlated within MEFs as well as the initiation phase and the maturation phase of reprogramming.

(b) Heatmap shows expression of TRs that control the two stages of MEF to iN cell reprogramming (Figure 4d) in cells ordered based on fractional neuron identity. Each row represents a single cell, each column a gene. Experimental time-point (green/blue sidebar) and fractional neuron identity (yellow/red sidebar) are shown to the top.

(c–e) Pseudo-temporal expression dynamics of exemplary TRs marking (c) the initiation stage and (d) the maturation stage of iN cell reprogramming as well as (e) MEF identity. Transcript levels of the TRs are shown across all single cells on the MEF to iN cell lineage ordered based on fractional neuron identity. Growth curves based on a model-free spline method were fitted to the data.

(f) qRT-PCR of selected TRs from initiation and maturation subnetworks from Figure 4d. Uninfected MEF controls and d2-d12 Ascl1-infected cells were assayed for all selected TRs, and d22 Ascl1 alone and BAM infected cells were additionally assayed for maturation TRs. Cells for d5 to d22 samples were FAC-sorted into Tau-EGFP+ and Tau-EGFP− populations (n=4 for all populations, biological replicates, error bars: s.e.m.).

(g) Western blot for selected TRs from the initiation subnetwork presented in panel b. Beta-Actin was used as a loading control (Supplementary Data 8).

It has been suggested that direct somatic lineage reprogramming may not involve an intermediate progenitor cell state as seen during iPS cell differentiation[17,18,19]. However, our fractional analysis showed that the identity of intermediate reprogramming cells could not be explained by a simple linear mixture of the differentiated fibroblast and neuron identities, as revealed by an intermediary increase of Lagrangian residuals (Figure 4a). Therefore, we tested whether a neural precursor cell (NPC) state is transiently induced by adding NPC bulk transcriptome data along with that of MEFs and neurons into the quadratic programming analysis (Figure 4e). We found that the fractional NPC identity of cells increased specifically for cells at intermediate positions on the MEF-to-iN cell lineage path, and then decreased as a function of iN cell maturation. In addition, several NPC genes (i.e. Gli3, Sox9, Nestin, Fabp7, Hes1) are expressed in intermediates of the iN cell reprogramming path[20] (Figure 4f). However, canonical NPC marker genes such as Sox2 and Pax6 were never induced. This indicates that cells do not go through a canonical NPC stage, yet a unique intermediate transcriptional state is induced transiently that is unrelated to donor and target cell program similar to what was observed for iPS cell reprogramming[21-23] A fundamental question in cell reprogramming is whether there are pre-determined mechanisms that prevent the majority of the fibroblasts from reprogramming or whether all donor cells are competent to reprogram but the reprogramming procedure is inefficient. We did not observe any MEF subpopulations, other than cell cycle variation, that suggested differences in the capacity to initiate reprogramming. Furthermore, we observed that 48 hours after infection the majority of the cells induced Ascl1 target genes and silenced MEF-associated genes. This does not preclude the possibility that underlying epigenetic variation in donor cells influences reprogramming outcomes; however, our analysis suggests that it is unlikely that MEF heterogeneity contributes significantly to reprogramming efficiency. We found that divergence from the neuronal differentiation path into an alternative myogenic fate, as well as Ascl1 transgene silencing, were both significant factors contributing to reprogramming efficiency. Though Ascl1 induces lineage conversion, it is inefficient to restrict cells to the neuronal fate. This suggests that intermediate stages of iN cell progression are unstable, perhaps due to epigenetic barriers, and additional factors promote cells to permanently acquire neuron-like identity, rather than revert to MEF-like or diverge toward the alternative myocyte-like fate. In summary, we present a single-cell transcriptomic approach that can be used to dissect direct cellular reprogramming pathways or developmental programs in which cells transform their identity through a series of intermediate states.

Methods

Cell derivation, cell culture and iN cell generation

Tau-EGFP reporter MEFs, tested negative for mycoplasma contamination, were isolated, infected with dox-inducible lentiviral constructs and reprogrammed into iN cells as previously described[8]. Day 0 (d0) cells were uninfected MEFs that served as a negative control. Day 2 (d2) cells were infected with Ascl1 and harvested 2 days after dox-induction. Day 5 (d5) cells were infected with Ascl1, FAC-sorted for Tau-EGFP+ and Tau-EGFP− cells 5 days after dox-induction and the two cell populations were mixed again in a 1:1 ratio. Day 20 or 22 (d20/d22) cells were infected either with Ascl1 alone, or combined with Brn2 and Myt1l, plated with glia 7 days post-dox induction, and FAC-sorted for Tau-EGFP+ iN cells 20 or 22 days after dox-induction. Each of these groups was then loaded onto separate microfluidic mRNA-seq chips for preparation of preamplified cDNA from single cells. Clonal Ascl1-inducible MEFs were derived as previously described[13]. 12-well plates were coated with Matrigel and incubated at 37°C overnight. 350 thousand cells were then plated per well and kept in MEF media. Doxycycline (dox) was added a day after plating. For single-cell RNA-seq, cells were harvested 2 days post-dox induction and loaded onto a microfluidic mRNA-seq chip. To evaluate efficiency in reprogramming, MEF+dox media was switched out for N3+dox media after 48 hrs, and cells were fixed for immunostaining 12 days post-dox.

Capturing of single cells and preparation of cDNA

Single cells were captured on a medium-sized (10–17 μm cell diameter) microfluidic RNA-seq chip (Fluidigm) using the Fluidigm C1 system. Cells were loaded onto the chip at a concentration of 350–500 cells/μl, stained for viability (LIVE/DEAD cell viability assay, Molecular Probes, Life Technologies) and imaged by phase-contrast and fluorescence microscopy to assess number and viability of cells per capture site. For d5 and d22 experiments, cells were only stained with the dead stain ethidium homodimer (emission ~635 nm, red channel) and Tau-EGFP fluorescence was imaged in the green channel. Only single, live cells were included in the analysis. cDNAs were prepared on chip using the SMARTer Ultra Low RNA kit for Illumina (Clontech). ERCC (External RNA Controls Consortium) RNA spike-in Mix (Ambion, Life Technologies)[26,27] was added to the lysis reaction and processed in parallel to cellular mRNA. Tau-EGFP fluorescence intensity of each single cell was determined using CellProfiler[28] by first identifying the outline of the cell in the image of the respective capture site and then integrating over the signal in the EGFP channel.

RNA-seq library construction and cDNA sequencing

Size distribution and concentration of single cell cDNA was assessed on a capillary electrophoresis based fragment analyzer (Advanced Analytical) and only single cells with high quality cDNA were further processed. Sequencing libraries were constructed in 96 well plates using the Illumina Nextera XT DNA Sample Preparation kit according to the protocol supplied by Fluidigm and as described previously[29]. Libraries were quantified by Agilent Bioanalyzer using High Sensitivity DNA analysis kit as well as fluorometrically using Qubit dsDNA HS Assay kits and a Qubit® 2.0 Fluorometer (Invitrogen, Life Technologies). Up to 110 single cell libraries were pooled and sequenced 100 bp paired-end on one lane of Illumina HiSeq 2000 or Illumina NextSeq 500 to a depth of 1–7 million reads. CASAVA 1.8.2 was used to separate out the data for each single cell using unique barcode combinations from the Nextera XT preparation and to generate *.fastq files. In total, the transcriptome of a total of 405 cells was measured from the following 8 independent experiments: d0 (73 cells, 1experiment), d2 (Ascl1-only in regular MEFs, 81 cells, 1 experiment; Ascl1-only in clonal MEFs, 47 cells, 1 experiment), d5 (Ascl1-only, 55 cells, 1 experiment) and d20 (Ascl1-only, 33 cells, 1 experiment) and d22 (BAM, 43 cells, 1 experiment; Ascl1-only, 34 and 39 cells, 2 independent experiments). See Supplementary Data 1 for the transcriptome data for all 405 cells with annotations (quantification in log2(FPKM)).

Processing, analysis and graphic display of single cell RNA-seq data

Raw reads were pre-processed with sequence grooming tools FASTQC[30], cutadapt[31], and PRINSEQ[32] followed by sequence alignment using the Tuxedo suite (Bowtie[33], Bowtie2[34],TopHat[35] and SAMtools[36]) using default settings. Transcript levels were quantified as Fragments Per Kilobase of transcript Per Million mapped reads (FPKM) generated by TopHat/Cufflinks[37]. After 7 days of reprogramming, TauGFP reporter MEFs (with C57BL/6J and 129S4/SvJae background) are co-cultured with glia derived from CD-1 mice. To determine if any feeder cells contaminated the 20–22d time points, we used the single cell RNA-seq reads to identify positions that differ from the mouse reference genome (mm10, built from strain C57BL/6J mice). We used the mpileup fuction in samtools to generate a multi-sample variant call format file (vcf), and a custom python script to genotype the cells by requiring coverage in all cells for all positions, with a coverage depth of 5 reads, a phred GT likelihood == 0 for called genotype and >= 40 for next-best genotype. This resulted in 95 informative sites distinguishing more than 1 cell from the reference genome. We clustered cells based on their genotype (homozygous reference, heterozygous, homozygous alternate), and identified cells that were strongly different from the reference genome. These cells expressed either astrocyte (GFAP) or microglia marker genes suggesting they were contaminants from the feeder cell culture. We removed these cells from subsequent analyses. Approximate number of transcripts was calculated from FPKM values by using the correlation between number of transcripts of exogenous spike-in mRNA sequences and their respective measured mean FPKM values (Extended Data Figure 2). The number of spike-in transcripts per single cell lysis reaction was calculated using the concentration of each spike-in provided by the vendor (Ambion, Life Technologies), the approximate volume of the lysis chamber (10 nl) as well as the dilution of spike-in transcripts in the lysis reaction mix (40,000x). Transcript levels were converted to the log-space by taking the logarithm to the base 2 (Supplementary Data 1). R studio[38] (https://www.rstudio.com/) was used to run custom R[39] scripts to perform principal component analysis (PCA, FactoMineR package), hierarchical clustering (stats package), variance analysis and to construct heatmaps, correlation plots, box plots, scatter plots, violin plots, dendrograms, bar graphs, and histograms. Generally, ggplot2 and gplots packages were used to generate data graphs. The Seurat package[40,41] implemented in R was used to identify distinct cell populations present at d22 of Ascl1-only and BAM reprogramming (Figure 3a–d). T-distributed stochastic neighbor embedding (tSNE) was performed on all d20/d22 cells using the most significant genes (p-value <10−3, with a maximum of 100 genes per principal component) that define the first 3 principal components of a PCA analysis on the data set. To further estimate the identity of each cell on the tSNE plot, we color coded cells based on Pearson correlation of each single cell’s expression profile with the expression profile of bulk cortical neurons [13,24], myocytes[25], and MEFs[13] (Figure 3). The Monocle package[15] was used to order cells on a pseudo-time course during MEF to iN cell reprogramming (Extended Data Figure 7). Covariance network analysis and visualizations were done using igraph implemented in R[42] (http://igraph.sf.net). To generate PCA plots and heatmaps in Figures 1c–e, 2a, 3a and 4c, PCA was performed on cells using all genes expressed in more than two cells and with a variance in transcript level (log2(FPKM)) across all single cells greater than 2. This threshold resulted generally in about 8,000–12,000 genes. Subsequently, genes with the highest PC loadings (highest (top 50–100) positive or negative correlation coefficient with one of the first one to two principal components) were identified and a heatmap was plotted with genes ordered based on their correlation coefficient with the respective PC (Figures 1e, 2a, 4c). Cells in rows were ordered based on unsupervised hierarchical clustering using Pearson correlation as distance metric (Figures 1e, 2a) or based on their fractional identity as determined by quadratic programming (Figure 4c, see below) Gene ontology enrichment analyses were performed using DAVID informatics Resources 6.7 of the National Institute of Allergy and Infectious Diseases [43]. Functional annotation clustering was performed and GO terms representative for top enriched annotation clusters are shown in Figure 1f, Extended Data Figures 1e and 4a with their Bonferroni corrected p-values. In addition, results of GO enrichment analyses are provided in the Supplementary Data. To express a single cell transcriptome as a linear combination of primary cell type transcriptomes, we used published bulk RNA-seq data sets for primary murine neurons[24], myocytes[25], and embryonic fibroblasts[13] (Extended Data Figure 6b,c), neurons[24] and embryonic fibroblasts[13] (Figure 4a) or neurons[24], embryonic fibroblasts[13] and neuronal progenitor cells[13] (Figure 4e). In each quadratic programming analysis, we first identified genes that were specifically (log2-fold change of 3 or higher) expressed in each of the bulk data sets compared to the respective others (Supplementary Data 6). Using these genes, we then calculated the fractional identities of each single cell using quadratic programming (R package “quadprog”). The resulting fractional neuron identities of cells on the MEF-to-iN cell reprogramming path (265 cells in total, excluding cells that were Tau-EGFP-negative at d5 or myocyte- and fibroblast-like cells at d22) were used to order cells in a pseudo-temporal manner (Figure 4a,b,c,e,f). We compared this fractional neuron identity based cell ordering with pseudo-temporal ordering of cells based on Monocle (Extended Data Figure 7b–d), an algorithm that combines differential dimension reduction using independent component analysis with minimal spanning tree construction to link cells along a pseudotemporally ordered path[15]. Monocle analysis was performed using genes differentially expressed between neuron[24] and embryonic fibroblast[13] bulk RNA-seq data (same gene set that was used when calculating fractional neuron and fibroblast identities in Figure 4a, genes listed in Supplementary Data 6). For the transcription factor network analysis (Figure 4d), we computed a pairwise correlation matrix (Pearson correlation, visualized in correlogram in Extended Data Figure 8a) for transcriptional regulators annotated as such in the “Animal Transcription Factor Database” (http://www.bioguo.org/AnimalTFDB/)[44] and identified those transcriptional regulators (TRs) with a Pearson correlation of greater than 0.35 with at least 5 other TRs (82 TRs, shown in Extended Data Figure 8b). We used a permutation approach to determine the probability of finding TRs meeting this threshold by chance. We performed 500 random permutations of the expression matrix of all TRs across cells on the MEF-to-iN cell lineage, and calculated the pairwise correlation matrix for each permutation of the input data frame. All randomized data frames resulted in 0 TRs that met our threshold. This shows that our correlation threshold is strict, and all nodes and connections that we present in the TR network are highly unlikely to be by chance. We used the pairwise correlation matrix for the selected TRs as input into the function graph.adjacency() of igraph implemented in R[42] (http://igraph.sf.net) to generate a weighted network graph, in which the selected TRs are presented as vertices and all pairwise correlations >0.25 are presented as edges linking the respective vertices. The network graph was visualized using the fruchterman-reingold layout and the three clear subnetworks (MEF, initiation, maturation) were manually color-coded. We used Pearson correlation of each single cell expression profile with the expression profile of bulk cortical neurons [13,24], myocytes[25], and MEFs[13] to further estimate the identity of each single cell and to estimate when alternative fates emerge (Figure 3c, Extended Data Figure 6d,e). For this analysis, we considered the same cell type specific gene sets that were used in the quadratic programming analysis, i.e. were genes specifically expressed (log2-fold change of 3 or higher) in a respective bulk RNA-seq data set compared to the others (Supplementary Data 6). To estimate intercellular heterogeneity of d0 MEFs, we calculated the variance for each gene across all MEF cells as well as across mouse embryonic stem cells under 2iLIF culture conditions[45] and across glioblastoma cells[46]. We then plotted the distribution of variances for all genes per cell population as boxplot.

Quantitative RT-PCR and Immunostaining

Ascl1 infected Tau-EGFP reporter MEFs were FAC-sorted 5, 7, 10, 12 or 22 days post-Ascl1 induction with dox. RNA was then extracted from both Tau-EGFP positive and negative populations from each time point, as well as uninfected control MEFs and unsorted d2 Ascll infected MEFs using the TRIzol RNA isolation protocol (Invitrogen, 15596-018). Reverse transcription into cDNA was performed using the SuperScript III First-strand Synthesis System (Invitrogen, 18080-051) and qRT-PCR was performed using Sybr Green (Thermo Fisher Scientific, 4309155). Immunostaining was performed as previously described[8]. Antibodies and qRT-PCR primers are listed in the Supplementary Information.

Time-lapse imaging of Ascl1 expression

MEFs were isolated from E13.5 CD-1 embryos (Charles River) and infected with a dox-inducible, N-terminal-tagged EGFP-Ascl1 fusion construct using the protocol previously described[1]. Cells were plated on 35 cm glass bottom dishes (MatTek), coated with polyorthinine (Sigma P3655) and laminin (Invitrogen 23017-015). Imaging experiments were performed between 3 and 6 days post-dox induction, in a temperature and CO2-controlled chamber. Images were taken for up to 10 positions per dish, for 3 dishes, every 45 minutes with a Zeiss AxioVert 200M microscope with an automated stage using an EC Plan-Neofluar 5x/0.16NA Ph1 objective or an A-plan 10x/0.25NA Ph1 objective. Cells were fixed at 6 days and immunostained using Tuj1 antibodies recognizing the neuronal β-3-tubulin Tubb3 (Covance MRB-435P) to confirm neuronal identity. We used ImageJ to segment individual cells and measure the level of GFP for 7 Tuj1+ cells and 7 Tuj1− cells over time. Average intensity was obtained by normalizing the average intensity of a cell segment by the average background intensity of an adjacent segment of the same size. A t-test was performed comparing Tuj1+ and Tuj1− cells at each time-point to evaluate significance.

Antibodies

Rabbit anti-Ascl1 (Abcam ab74065), chicken anti-GFP (Abcam ab13970), rabbit anti-Tubb3 (Covance MRB-435P), mouse anti-Tubb3 (Covance MMS-435P), mouse anti-Map2 (Sigma M4403), rabbit anti-Myh3 (Santa Cruzsc-20641), goat anti-Dlx3 (Santa Cruz sc-18143), mouse anti-β-Actin (Sigma A5441), rabbit anti-Tcf12 (Bethyl A300-754A).

Primers

Gapdh (forward: AGGTCGGTGTGAACGGATTTG, reverse: TGTAGACCATGTAGTTGAGGTCA); Ascl1 (TetO) (forward: CCGAATTCGCTAGCCACCAT, reverse: AAGAAGCAGGCTGCGGG) Atoh8 (forward: GCCAAGAAACGGAAGGAGTGA, reverse: CTGAGAGATGGTACACGGGC); Dlx3 (forward: CGCCGCTCCAAGTTCAAAAA, reverse: GTGGTACCAGGAGTTGGTGG); Hes6 (forward: TACCGAGGTGCAGGCCAA, reverse: AGTTCAGCTGAGACAGTGGC); Sox11 (forward: CCTGTCGCTGGTGGATAAGG, reverse: CTGCGCCTCTCAATACGTGA); Sox9 (forward: CGAGCACTCTGGGCAATCTCA, reverse: ATGACGTCGCTGCTCAGTTC); Tcf4 (forward: CAGTGCGATGTTTTCGCCTC, reverse: ATGTGACCCAAGATCCCTGC); Tcf12 (forward: GTCTCGAATGGAAGACCGCT, reverse: GTTCCGACCATCGAAGCTGA) Camta1 (forward: CCCCTAAGACAAGACCGCAG, reverse: ACATAGCAGCCGTACAAGCA); Insm1 (forward: GACCCGGCACATCAACAAGT, reverse: GAAGCGAAGCGAAGAGGACA); Myt1l (forward: ATGTTCCCACAACCACACCA, reverse: TACCGCTTGGCATCGTCATA); St18 (forward: TGCCAAGGGAGCTGAGATAGA, reverse: GAAGGCTGCTTGCGTTGAAT) Gria2 (forward: GGGGACAAGGCGTGGAAATA, reverse: GTACCCAATCTTCCGGGGTC); Map2 (forward: CAGAGAAACAGCAGAGGAGGT, reverse: TTTGTTCTGAGGCTGGCGAT); Nrxn3 (forward: TGTGAACCAAGTACAGATAAGAGT, reverse: CAGCTCAGGGGACAAAGAGG); Snap25 (forward: TTCATCCGCAGGGTAACAAA, reverse: GTTGCACGTTGGTTGGCTT); Stmn3 (forward: AGCACCGTATCTGCCTACAAG, reverse: TGGTAGATGGTGTTCGGGTG); Tubb3 (forward: CAGATAGGGGCCAAGTTCTGG, reverse: GTTGTCGGGCCTGAATAGGT) Acta1 (forward: CTAGACACCATGTGCGACGA, reverse: CATACCTACCATGACACCCTGG); Myh3 (forward: AAATGAAGGGGACGCTGGAG, reverse: CAGCTGGAAGGTGACTCTGG); Myo18b (forward: TGCCCTCTTCAGGGAAGGTA, reverse: GAGCTTCTCCACTGACACCC); Tnnc2 (forward: CAACCATGACGGACCAACAG, reverse: GTGTCTGCCCTAGCATCCTC) Col1a2 (forward: AGTCGATGGCTGCTCCAAAA, reverse: ATTTGAAACAGACGGGGCCA); Dcn (forward: GCAAAATCAGTCCAGAGGCA, reverse: CGCCCAGTTCTATGACAAGC)

The majority of MEFs are actively undergoing cell cycle, but exit cell cycle upon Ascl1 induction

(a) Live cell imaging of Tau-EGFP reporter over the course of BAM-mediated iN cell reprogramming. Tau-EGFP fluorescence normalized to the maximum expression is shown in relation to days post-BAM induction. Tau-EGFP expression began at d5 and reached a peak at d8 after induction. Shown are representative images from d0, d5 and d9 (b) Boxplots of intercellular transcriptome variance showed that MEFs are more heterogeneous than mouse embryonic stem cells under 2iLIF culture conditions[45] and less heterogeneous than glioblastoma cells[46]. (c) PCA of genes with most variance in d0 MEFs revealed MEF heterogeneity (blue, A). Density plot showing the distribution of number of cells along PC1 loading is shown above the PCA plot. (d) Heatmap and hierarchical clustering of genes used for the PCA in panel (c) shows 2 major MEF subpopulations. Each column represents a single cell, and each row a gene. Subpopulation A is highlighted in blue in the dendrogram. (e) GO enrichment for genes in (c) shows that MEF subpopulation A is distinguished by the low or lack of expression of genes enriched for cell cycle terms. (f–g) PCA and heatmap of the same genes used in panels (c–e), this time including d0 MEFs (circles, light green) and d2 cells (squares, dark green), showed that most of the d2 cells had the same cell cycle signature as MEF subpopulation A. Cells in columns of both heatmaps are ordered based on PC1 loading.

Total number of transcripts per cell decreases during MEF-to-iN cell reprogramming

(a) Average detected transcript levels (mean FPKM, log2) for 92 ERCC RNA spike-ins as a function of provided number of molecules per lysis reaction for each of the 8 independent single-cell RNA-seq experiments. Linear regression fits through data points are shown. The length of each ERCC RNA spike-in transcript is encoded in the size of the data points. No particular bias towards the detection of shorter versus longer transcripts is observed. The linear regression fit was used to convert FPKM values to approximate number of transcripts. (b) Boxplots showing the distribution of the total number of transcripts per single cell for each experiment. Number of transcripts per cell were calculated from the FPKM values of all genes in each cell using the correlation between number of transcripts of exogenous spike-in mRNA sequences and their respective measured mean FPKM values (calibration curves are shown in panel a). The total number of transcripts expressed by a single cell and detected by single-cell RNA-seq is highest in MEFs and is more than 2-fold decreased upon overexpression of Ascl1 or BAM. (c) Boxplots showing the distribution of the median transcript number per gene across all cells of one experiment. The distributions are similar over the course of iN cell reprogramming.

Clonal MEFs reprogram successfully into iN cells and Ascl1-only and BAM induce similar responses during early iN cell reprogramming

(a) Immunostaining of heterogenous Ascl1-infected MEFs and clonal MEFs with homogenous Ascl1 transgene insertions, fixed 12 days after Ascl1 induction, using rabbit anti-Tubb3 (red) and mouse anti-Map2 (cyan) antibodies and dapi (blue) as a nuclear stain. Reprogramming efficiencies are comparable regardless of variation in Ascl1 copy numbers. Images are representative for one reprogramming experiment. (b) Barplots showing expression of Ascl1 target genes (Hes6, Zfp238, Snca, Cox8b, Bex1, Dner) and MEF marker genes averaged across single cells from d0 MEFs and d2 Ascl1-only cells, as well as from bulk RNA-seq data from MEFs, d2 BAM, and d2 Ascl1-only cells. This data shows that the initiation of reprogramming at d2 is similar for Ascl1 alone- and BAM-mediated reprogramming.

Failed reprogramming at d5 correlates with silencing of Ascl1

(a) Bonferroni corrected p-values for gene ontology enrichments are shown for each group of genes from Figure 2a, with representative genes listed (Supplementary Data 4). (b) Biplot showing Tau-EGFP fluorescence intensity as a function of Ascl1 transcript level in d5 cells. Point size is proportional to EGFP transcript levels in log2(FPKM). There is a positive correlation (R2=0.49) indicating that cells with higher Ascl1 expression are more likely to reprogram. (c) Heatmap of EGFP-Ascl1 expression in 14 individual cells (columns) during live cell imaging. Rows represent time post Ascl1 induction in 45 min intervals.

Live cell imaging shows diminishing of EGFP-Ascl1 signal in cells that fail to reprogram

(a) Immunostaining for Tubb3 and Map2 at d12 post induction of Ascl1, C-terminal tagged Ascl1-EGFP and N-terminal tagged EGFP-Ascl1 in CD-1 MEFs. EGFP-Ascl1 has comparable reprogramming efficiency with untagged Ascl1 while Ascl1-EGFP has a much reduced reprogramming efficiency, so EGFP-Ascl1 was chosen for live-cell imaging. Images are representative for one reprogramming experiment per condition. (b) Representative images from live cell imaging showing an example of diminishing of EGFP signal in a cell that failed to reprogram (i.e. cell was Tuj1-negative at d6). (c) Live cell imaging of EGFP signal of EGFP-Ascl1 infected MEFs between 3–6 days post-dox induction. (d) EGFP imaging of live cells 6 days post induction of Ascl1 and corresponding immunostaining for Tubb3 after fixation.

Brn2 and Myt1l repress alternative fates that compete with the iN cell fate during advanced Ascl1 reprogramming

(a) Scatterplot showing PC1 and PC2 loadings from principal component analysis (PCA) of single cells from all time points with experimental time point and reprogramming condition (Ascl1 vs BAM) encoded in point shape and color. (b) Overview of quadratic programming. Fractional identities are calculated assuming a linear combination of different cell fates. (c) Biplots showing the fractional fibroblast identity as a function of fractional neuron (left) and fractional myocyte (right) identity for each cell with points shaped and colored-coded based on reprogramming timepoint and condition. (d) Correlation of transcriptomes from d0, d2, d5, and d20/22 cells (Ascl1-only and BAM induced) with bulk RNA-seq from MEFs, cortical neurons and myocytes. Bottom bars show Tau-EGFP fluorescence intensity. (e) Barplot quantifying the number of cells with a maximum correlation to bulk RNA-seq data from each of the observed fates (panel d). (f) Immunofluorescent detection of Tau-EGFP (green), DAPI (blue), Myh3 (red) and Tubb3 (cyan) for d22 cells that were infected with Ascl1 co-infected with Brn2 or Myt1l. See Figure 3e for respective data for cells infected with Ascl1-only or all three BAM factors. Images are representative for 4 biological replicates. Right: Mean fractions of EGFP+ cells that express either Tubb3 or Myh3. Only Tubb3+ cells with a neuronal morphology were counted. Co-expression of Ascl1 with Brn2 and/or Myt1l increases fraction of Tau-EGFP+ cells that are also Tubb3+, while decreasing the number of cells that are Myh3+. 6–7 images were analyzed for each of 4 biological replicates. Error bars: s.e.m. (g–i) qRT-PCR of selected (g) myogenic, (h) neuronal, and (i) fibroblast markers using d22 cells that are infected with Ascl1 only or co-infected with Brn2 or Myt1l or both and FAC-sorted by Tau-EGFP (n=3, biological replicates, error bars: s.e.m.). Myogenic genes were significantly downregulated in TauGFP+ cells that were co-infected with Brn2 and/or Myt1l compared to those infected with Ascl1 alone, while some neuronal genes are significantly upregulated (Map2, Gria) (* p < 0.05, ** p < 0.01, *** p < 0.001, two-tailed t-test).

Comparison of Monocle and quadratic programming with respect to ordering of neuronal cells through the reprogramming path

(a) Biplot showing the total number of transcripts per cell for all cells on the MEF-to-iN cell lineage as a function of the fraction neuron identity of each cell (see Figure 4). The total number of transcripts decreases during the reprogramming process. (b) Cells (depicted as circles) are arranged in the 2-D independent component space based on the expression of genes used for quadratic programming in Figure 4a. Lines connecting cells represent the edges of a minimal spanning tree with the bold black line indicating the longest path. Time points are color-coded. (c) Monocle plots with single cells colored based on gene expression that distinguishes the stages of iN cell reprogramming. (d) Biplot shows the correlation between ordering of cells based on pseudotime (Monocle) and fractional identity (Quadratic programming). Time points are color-coded. Pearson correlation coefficient = 0.91.

Neuronal maturation proceeds through expression of distinct transcriptional regulators

(a) Correlogram showing transcriptional regulators (TRs) highly correlated within MEFs as well as the initiation phase and the maturation phase of reprogramming. (b) Heatmap shows expression of TRs that control the two stages of MEF to iN cell reprogramming (Figure 4d) in cells ordered based on fractional neuron identity. Each row represents a single cell, each column a gene. Experimental time-point (green/blue sidebar) and fractional neuron identity (yellow/red sidebar) are shown to the top. (c–e) Pseudo-temporal expression dynamics of exemplary TRs marking (c) the initiation stage and (d) the maturation stage of iN cell reprogramming as well as (e) MEF identity. Transcript levels of the TRs are shown across all single cells on the MEF to iN cell lineage ordered based on fractional neuron identity. Growth curves based on a model-free spline method were fitted to the data. (f) qRT-PCR of selected TRs from initiation and maturation subnetworks from Figure 4d. Uninfected MEF controls and d2-d12 Ascl1-infected cells were assayed for all selected TRs, and d22 Ascl1 alone and BAM infected cells were additionally assayed for maturation TRs. Cells for d5 to d22 samples were FAC-sorted into Tau-EGFP+ and Tau-EGFP− populations (n=4 for all populations, biological replicates, error bars: s.e.m.). (g) Western blot for selected TRs from the initiation subnetwork presented in panel b. Beta-Actin was used as a loading control (Supplementary Data 8).

40 in total

1. Synthetic spike-in standards for RNA-seq experiments.

Authors: Lichun Jiang; Felix Schlesinger; Carrie A Davis; Yu Zhang; Renhua Li; Marc Salit; Thomas R Gingeras; Brian Oliver
Journal: Genome Res Date: 2011-08-04 Impact factor: 9.043

2. Fast gapped-read alignment with Bowtie 2.

Authors: Ben Langmead; Steven L Salzberg
Journal: Nat Methods Date: 2012-03-04 Impact factor: 28.547

3. Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma.

Authors: Anoop P Patel; Itay Tirosh; John J Trombetta; Alex K Shalek; Shawn M Gillespie; Hiroaki Wakimoto; Daniel P Cahill; Brian V Nahed; William T Curry; Robert L Martuza; David N Louis; Orit Rozenblatt-Rosen; Mario L Suvà; Aviv Regev; Bradley E Bernstein
Journal: Science Date: 2014-06-12 Impact factor: 47.728

4. Modeling human disease with pluripotent stem cells: from genome association to function.

Authors: Florian T Merkle; Kevin Eggan
Journal: Cell Stem Cell Date: 2013-06-06 Impact factor: 24.633

Review 5. Brains in metamorphosis: reprogramming cell identity within the central nervous system.

Authors: Paola Arlotta; Benedikt Berninger
Journal: Curr Opin Neurobiol Date: 2014-05-04 Impact factor: 6.627

6. Hierarchical mechanisms for direct reprogramming of fibroblasts to neurons.

Authors: Orly L Wapinski; Thomas Vierbuchen; Kun Qu; Qian Yi Lee; Soham Chanda; Daniel R Fuentes; Paul G Giresi; Yi Han Ng; Samuele Marro; Norma F Neff; Daniela Drechsel; Ben Martynoga; Diogo S Castro; Ashley E Webb; Thomas C Südhof; Anne Brunet; Francois Guillemot; Howard Y Chang; Marius Wernig
Journal: Cell Date: 2013-10-24 Impact factor: 41.582

7. Full-length mRNA-Seq from single-cell levels of RNA and individual circulating tumor cells.

Authors: Daniel Ramsköld; Shujun Luo; Yu-Chieh Wang; Robin Li; Qiaolin Deng; Omid R Faridani; Gregory A Daniels; Irina Khrebtukova; Jeanne F Loring; Louise C Laurent; Gary P Schroth; Rickard Sandberg
Journal: Nat Biotechnol Date: 2012-08 Impact factor: 54.908

8. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation.

Authors: Cole Trapnell; Brian A Williams; Geo Pertea; Ali Mortazavi; Gordon Kwan; Marijke J van Baren; Steven L Salzberg; Barbara J Wold; Lior Pachter
Journal: Nat Biotechnol Date: 2010-05-02 Impact factor: 54.908

9. TopHat: discovering splice junctions with RNA-Seq.

Authors: Cole Trapnell; Lior Pachter; Steven L Salzberg
Journal: Bioinformatics Date: 2009-03-16 Impact factor: 6.937

10. CellProfiler: image analysis software for identifying and quantifying cell phenotypes.

Authors: Anne E Carpenter; Thouis R Jones; Michael R Lamprecht; Colin Clarke; In Han Kang; Ola Friman; David A Guertin; Joo Han Chang; Robert A Lindquist; Jason Moffat; Polina Golland; David M Sabatini
Journal: Genome Biol Date: 2006-10-31 Impact factor: 13.583

163 in total

1. Generation of inner ear hair cells by direct lineage conversion of primary somatic cells.

Authors: Louise Menendez; Talon Trecek; Suhasni Gopalakrishnan; Litao Tao; Alexander L Markowitz; Haoze V Yu; Xizi Wang; Juan Llamas; Chichou Huang; James Lee; Radha Kalluri; Justin Ichida; Neil Segil
Journal: Elife Date: 2020-06-30 Impact factor: 8.140

Review 2. Using Large Datasets to Understand Nanotechnology.

Authors: Kalina Paunovska; David Loughrey; Cory D Sago; Robert Langer; James E Dahlman
Journal: Adv Mater Date: 2019-08-20 Impact factor: 30.849

3. Reconstructing complex lineage trees from scRNA-seq data using MERLoT.

Authors: R Gonzalo Parra; Nikolaos Papadopoulos; Laura Ahumada-Arranz; Jakob El Kholtei; Noah Mottelson; Yehor Horokhovsky; Barbara Treutlein; Johannes Soeding
Journal: Nucleic Acids Res Date: 2019-09-26 Impact factor: 16.971

4. Rapid Chromatin Switch in the Direct Reprogramming of Fibroblasts to Neurons.

Authors: Orly L Wapinski; Qian Yi Lee; Albert C Chen; Rui Li; M Ryan Corces; Cheen Euong Ang; Barbara Treutlein; Chaomei Xiang; Valérie Baubet; Fabian Patrik Suchy; Venkat Sankar; Sopheak Sim; Stephen R Quake; Nadia Dahmane; Marius Wernig; Howard Y Chang
Journal: Cell Rep Date: 2017-09-26 Impact factor: 9.423

5. Continuous-state HMMs for modeling time-series single-cell RNA-Seq data.

Authors: Chieh Lin; Ziv Bar-Joseph
Journal: Bioinformatics Date: 2019-11-01 Impact factor: 6.937

Review 6. Cycling through developmental decisions: how cell cycle dynamics control pluripotency, differentiation and reprogramming.

Authors: Abdenour Soufi; Stephen Dalton
Journal: Development Date: 2016-12-01 Impact factor: 6.868

7. Deconstructing Olfactory Stem Cell Trajectories at Single-Cell Resolution.

Authors: Russell B Fletcher; Diya Das; Levi Gadye; Kelly N Street; Ariane Baudhuin; Allon Wagner; Michael B Cole; Quetzal Flores; Yoon Gi Choi; Nir Yosef; Elizabeth Purdom; Sandrine Dudoit; Davide Risso; John Ngai
Journal: Cell Stem Cell Date: 2017-05-11 Impact factor: 24.633

8. SingleCellNet: A Computational Tool to Classify Single Cell RNA-Seq Data Across Platforms and Across Species.

Authors: Yuqi Tan; Patrick Cahan
Journal: Cell Syst Date: 2019-07-31 Impact factor: 10.304

9. Targeted Epigenetic Remodeling of Endogenous Loci by CRISPR/Cas9-Based Transcriptional Activators Directly Converts Fibroblasts to Neuronal Cells.

Authors: Joshua B Black; Andrew F Adler; Hong-Gang Wang; Anthony M D'Ippolito; Hunter A Hutchinson; Timothy E Reddy; Geoffrey S Pitt; Kam W Leong; Charles A Gersbach
Journal: Cell Stem Cell Date: 2016-08-11 Impact factor: 24.633

10. Combined Numerical and Experimental Investigation of Localized Electroporation-Based Cell Transfection and Sampling.

Authors: Prithvijit Mukherjee; S Shiva P Nathamgari; John A Kessler; Horacio D Espinosa
Journal: ACS Nano Date: 2018-11-27 Impact factor: 15.881