Literature DB >> 30673604

Transcriptional Heterogeneity in Naive and Primed Human Pluripotent Stem Cells at Single-Cell Resolution.

Tobias Messmer¹, Ferdinand von Meyenn², Aurora Savino³, Fátima Santos³, Hisham Mohammed³, Aaron Tin Long Lun⁴, John C Marioni⁵, Wolf Reik⁶.

Abstract

Conventional human embryonic stem cells are considered to be primed pluripotent but can be induced to enter a naive state. However, the transcriptional features associated with naive and primed pluripotency are still not fully understood. Here we used single-cell RNA sequencing to characterize the differences between these conditions. We observed that both naive and primed populations were mostly homogeneous with no clear lineage-related structure and identified an intermediate subpopulation of naive cells with primed-like expression. We found that the naive-primed pluripotency axis is preserved across species, although the timing of the transition to a primed state is species specific. We also identified markers for distinguishing human naive and primed pluripotency as well as strong co-regulatory relationships between lineage markers and epigenetic regulators that were exclusive to naive cells. Our data provide valuable insights into the transcriptional landscape of human pluripotency at a cellular and genome-wide resolution. Crown

Entities: Chemical

Keywords: early embryonic development; evolution; heterogeneity; human embryonic stem cells; naive; pluripotency; primed; single-cell RNA-seq

Mesh：

Year: 2019 PMID： 30673604 PMCID： PMC6344340 DOI： 10.1016/j.celrep.2018.12.099

Source DB: PubMed Journal: Cell Rep Impact factor: 9.423

Introduction

Human and mouse embryonic stem cells (ESCs) are both derived from the inner cell mass (ICM) of the pre-implantation epiblast but differ in transcriptomic, epigenetic, and morphological features that correspond to consecutive stages of ontogeny. Mouse ESCs (mESCs) are marked by early developmental characteristics such as expression of the core pluripotency network, including Oct4, Klf4, or Dppa3; the activity of both X chromosomes in females; global DNA hypomethylation; and apolar morphology of the dome-shaped mESC colonies and, therefore, show the characteristics of naive pluripotency (Boroviak and Nichols, 2017). In contrast, primed or conventional human ESCs (hESCs) are developmentally more advanced and resemble murine post-implantation epiblast or mouse epiblast stem cells, thus they are considered to be primed pluripotent (Brons et al., 2007, Tesar et al., 2007). Several groups have aimed to capture naive pluripotency in humans and to establish culture conditions closely recapitulating the signature of human ICM cells. These studies attempted to induce a naive state in hESCs by reprogramming primed hESCs with cytokines or small molecules (Gafni et al., 2013, Hanna et al., 2010, Takashima et al., 2014, Theunissen et al., 2014, Ware et al., 2014) or by directly culturing hESCs isolated from pre-implantation ICM cells under conditions that favor naive stemness (Guo et al., 2016). Among these, the stimulation of NANOG and KLF2 expression in 2 inhibitors (PD0325901 and CHIR99021) + Leukemia inhibitory factor (2i+Lif) conditions (inhibition of mitogen-activated protein extracellular signal-regulated kinase [ERK] and glycogen synthase kinase-3 beta) and subsequent restriction of protein kinase C (PKC) activity yielded hESCs with a close resemblance to the human blastocyst (Guo et al., 2017, Huang et al., 2014, Takashima et al., 2014). These reprogrammed naive hESCs express naive pluripotency markers, including OCT4, SOX2, and KLF2 and KLF4 (Boroviak and Nichols, 2017), and their metabolic and epigenetic profiles resemble the phenotype of mESCs rather than the primed state of conventional hESCs (Takashima et al., 2014). There is still incomplete understanding of the transcriptional features that drive naive and primed pluripotency in ESCs (Ware, 2017, Weinberger et al., 2016). Studies exploring transcriptional identity and heterogeneity in mESCs have found significant variability associated with different states of pluripotency (Klein et al., 2015, Kolodziejczyk et al., 2015, Kumar et al., 2014). In a recent in vivo study of early mouse development (Mohammed et al., 2017), transcriptional noise was suggested to contribute to cell fate decision-making. However, although certain key pluripotency genes are much less variably expressed in the naive state (e.g., NANOG), single-cell RNA sequencing (scRNA-seq) suggests that overall heterogeneity in gene expression in mESC lines is independent of the respective culture condition and pluripotency state (Kolodziejczyk et al., 2015). Our understanding of in vivo lineage commitment in humans is much more limited. By studying transcriptional profiles of developmental stages embryonic day 3 (E3) to E7 of human preimplantation embryos, the first lineage decisions between trophectoderm, primitive endoderm, and epiblast have been described (Petropoulos et al., 2016, Stirparo et al., 2018). Furthermore, a recent study has investigated the primed-to-naive cellular state transition process and found that genes related to hemogenic endothelium development were overrepresented in naive hESCs, resulting in higher differentiation potency into hematopoietic lineages (Han et al., 2018). Nonetheless, the extent and details of hESC heterogeneity have not been systematically characterized, and it is unclear whether the variability in gene expression is important for differentiation. To address these questions, we performed scRNA-seq of primed hESCs and reprogrammed naive hESCs to investigate the heterogeneity within each subpopulation and to compare their molecular phenotypes with in vivo transcriptome studies of embryogenesis.

Results

We assayed the transcriptomes of single primed and naive hESCs (WiCell WA09-NK2) to investigate gene expression heterogeneity and to identify potential subpopulations within different human pluripotency states. In total, we collected 480 hESCs grown under naïve titrated 2 inhibitors (PD0325901 and CHIR99021) + Leukemia inhibitory factor + inhibitor Gö6983 (t2iL+Gö) conditions (Takashima et al., 2014) and 480 hESCs grown under primed (E8) culture conditions (Chen et al., 2011). Single cells were separated and collected using fluorescence-activated cell sorting (FACS), and full-length cDNAs were prepared using the switch mechanism at the 5′ end of RNA templates (Smart-seq2) protocol (Picelli et al., 2014), followed by Nextera XT library preparation (Figure 1A). We removed low-quality cells and normalized for cell-specific bias prior to further analyses (STAR Methods; Figure S1A).

Figure 1

Naive and Primed Human ESCs Exhibit Strong Differences in Gene Expression

(A) Naive and primed human ESCs were cultured in N2B27 supplemented with t2iL+Gö or in E8 medium, dissociated into single cells, and sorted into 96-well plates loaded with RLT lysis buffer and External RNA Controls Consortium (ERCC) spike-ins. RNA-seq libraries were prepared using the SmartSeq2 protocol and submitted for sequencing.

(B) PCA plot of hESC expression profiles, constructed from batch-corrected and normalized log expression values of highly variable genes detected across the entire dataset. Cells are colored by their condition, and the percentage of variance explained by the first two principal components is shown.

(C) Smear plot of log2-fold changes in expression between the naive and primed conditions, where differential expression (DE) genes were detected using edgeR at a false discovery rate (FDR) of 5%.

Naive and Primed hESCs Form Distinct Phenotypic Clusters

To confirm that scRNA-seq can recapitulate known differences between naive and primed conditions, we performed dimensionality reduction on all cells in the dataset using principal-component analysis (PCA) on highly variable genes (STAR Methods). We observed strong separation between naive and primed cells on the first principal component (Figure 1B), indicating that the difference between conditions is the dominant factor of variation. Differential expression analysis between naive and primed conditions identified a number of genes that were strongly upregulated under each condition (Figure 1C). This included the previously reported naive pluripotency and ground state marker genes KLF17, DPPA5, DNMT3L, GATA6, TBX3, IL6ST, DPPA3, and KLF5 (Blakeley et al., 2015, Dunn et al., 2014, Guo et al., 2017, Shahbazi et al., 2016, Theunissen et al., 2016, Yan et al., 2013). Although KLF4 has been described as a marker for both naive and primed cells (Ware, 2017), we only observed its expression in naive hESCs, consistent with other studies (Weinberger et al., 2016). In primed hESCs, we observed upregulation of established marker genes of primed pluripotency, such as CD24, ZIC2, and SFRP2, but not OTX2 or TFT (Buecker et al., 2014, Guo et al., 2016, Shakiba et al., 2015). Shared pluripotency markers, including SOX2, OCT4, and NANOG, did not significantly differ between the naive and primed population. We also identified additional (and only recently suggested; Collier et al., 2017) markers of naive and primed hESCs (Table 1; Figure S1B; Table S1). The naive markers included genes that have been implicated in germ cell function (e.g., HORMAD1 for meiotic progression; Chen et al., 2005); KHDC3L as a regulator of imprinting (Parry et al., 2011); the alkaline phosphatases ALPP and ALPPL2, which are generally used as markers of pluripotent cells (Martí et al., 2013); as well as putative regulatory genes such as ZNF729. Some of these are also expressed in the early embryo; e.g., TRIM60 (Choo et al., 2002) and HORMAD1 (Chen et al., 2005). Primed markers included a number of genes related to later developmental stages; e.g., SOX11 for neuronal development (Bergsland et al., 2006), CYTL1 for chondrogenesis and expressed at implantation (Ai et al., 2016), HMX2 (an NK-like [NKL] homeobox gene) for organogenesis (Wang et al., 2001), and THY1 for hematopoietic stem cells (Majeti et al., 2007). We also found regulators of key signaling pathways, such as DUSP6 (a negative regulator of mitogen-activated protein kinase [MAPK] signaling) (Muda et al., 1996) and the receptor tyrosine phosphatase PTPRZ1 (Levy et al., 1993). We validated a number of these genes at the protein level using proteomics (Figure S1C) and in bulk RNA-seq data of the hESC lines UCLA1, WIBR3, and SHEF6 under naive and primed conditions (Table 1; Figure S1D; Pastor et al., 2016, Theunissen et al., 2016, Guo et al., 2017).

Table 1

Markers of Naive and Primed Pluripotency in hESCs

Naive	logFC (WA09-NK2)	logFC∗ (UCLA1)a	logFC∗ (WIBR3)b	logFC∗ (SHEF6)c	Primed	logFC∗ (WA09-NK2)	logFC∗ (UCLA1)a	logFC∗ (WIBR3)b	logFC∗ (SHEF6)c
KHDC1L	14.49	8.82	13.23	9.36	DUSP6	−13.88	−6.29	−9.69	−7.09
FAM151A	14.13	7.57	10.49	9.20	FAT3	−9.80	−8.44	−10.76	−7.65
HORMAD1	15.03	7.09	10.94	10.25	THY1	−13.60	−8.78	−8.47	−6.26
ALPPL2	20.17	7.32	11.26	8.82	STC1	−11.82	−8.79	−12.14	−7.57
ZNF729	14.07	4.25	11.15	4.83	KLHL4	−12.58	−7.10	−13.17	−5.09
KHDC3L	19.58	6.88	12.78	9.54	ZDHHC22	−15.53	−9.02	−7.86	−9.44
TRIM60	18.53	7.90	12.19	8.33	NEFM	−13.20	−4.64	−7.33	−5.26
MEG8	17.57	8.23	8.89	not DE	HMX2	−10.58	not DE	not DE	not DE
OLAH	10.80	7.58	12.87	7.97	PLA2G3	−15.29	−6.31	−5.74	−8.07
LYZ	17.31	5.47	7.94	4.70	PTPRZ1	−10.55	−8.33	−12.72	−9.63
HYAL4	17.01	5.92	9.13	5.66	CYTL1	−14.54	−7.60	−9.43	−9.09
ALPP	16.58	4.55	10.03	9.29	SOX11	−10.20	−6.33	−8.60	−4.97

Log fold change between the primed and naive population; adjusted p < 0.005 for all shown DE genes.

Pastor et al. (2016)

Theunissen et al. (2016)

Guo et al. (2017)

Markers of Naive and Primed Pluripotency in hESCs Log fold change between the primed and naive population; adjusted p < 0.005 for all shown DE genes. Pastor et al. (2016) Theunissen et al. (2016) Guo et al. (2017)

Identification of a Subcluster in the Naive hESCs Population

We observed a small group of naive cells between the main naive and primed clusters (Figure 1B). We identified these cells by hierarchical clustering within the naive population, yielding a separate cluster of 9 cells. Despite being labeled as naive, this cluster was distinguishable from the other cells in the naive population as well as from the primed population (Figure 2A). These cells expressed some naive markers (DPPA3 and TFCP2L1; Figure 2B) but also exhibited primed-like characteristics (downregulation of KLF4 and KLF7) (Figure S2A); thus we labeled them “intermediate.” This subpopulation does not consist of doublets from the single-cell sorting procedure because they uniquely express genes that are absent in the primed population and other naive cells.

Figure 2

The Naive Subpopulation Is Transcriptionally Distinct from the Other Naive and Primed Cells

(A) Heatmap of the top 50 genes with the strongest differential expression between the naive and intermediate cells (top) or between the intermediate and primed cells (bottom). The box for each cell (column) and gene (row) is colored according to the log2-fold change from the average expression for each gene.

(B) Log2 expression profiles of selected marker genes across cells in the naive, intermediate, and primed populations. Each point represents a cell in the corresponding population.

(C) Normalized protein expression of DPPA5 against ABCG2 in naive and primed hESCs. Protein expression was determined using immunofluorescence staining of cytospin-fixed cells.

(D) Representative immunofluorescence images of naive and primed hESCs using DPPA5 and ABCG2 antibodies. The scale bar represents 20 μm.

Subclusters with Lineage-Specific Gene Expression Profiles Are Not Present in Naive or Primed hESCs

To study transcriptional heterogeneity within the naive and primed conditions, we applied t-distributed stochastic neighbor embedding (t-SNE) (van der Maaten and Hinton, 2008) to the cells under each condition after removing all intermediate population cells from the naive condition. We did not observe any distinct clustering within each condition; instead, the major driver of heterogeneity in each condition was the cell cycle (Figure S3A). To focus only on heterogeneity related to embryonic development, we constructed the t-SNE plots using only a set of 184 endoderm-, ectoderm-, and mesoderm-specific markers (Table S3). The aim was to enrich for any weak population structure related to early fate commitment. However, we still did not observe any clusters corresponding to the different germ layers in either the naive or primed populations (Figures 3A, 3B, and S3B). This suggests that the primed cells remain in a mostly homogeneous undifferentiated state and have yet to begin the process of committing to differentiate into the germ layers.

Figure 3

Naive and Primed Populations Do Not Exhibit Lineage-Associated Structure, but Correlations between Lineage Markers and Epigenetic Regulators Are Stronger under the Naive Condition

(A and B) Gene expression of germ layer-specific marker genes in the (A) naive and (B) primed population, visualized using tSNE on the batch-corrected normalized log expression values. Each point in the scatterplot represents a cell, which is colored by the expression of respective mesoderm (SNAI1), ectoderm (ITGA6), or endoderm (PAF1) markers.

(C and D) Heatmaps of the strongest correlation values between selected pluripotency and lineage markers (rows) and epigenetic markers (columns) for the naive (C) and the primed (D) population. The correlation values were bound at [−0.5, 0.5].

See also Figure S3 and Tables S3 and S4.

Naive and Primed Populations Do Not Exhibit Lineage-Associated Structure, but Correlations between Lineage Markers and Epigenetic Regulators Are Stronger under the Naive Condition (A and B) Gene expression of germ layer-specific marker genes in the (A) naive and (B) primed population, visualized using tSNE on the batch-corrected normalized log expression values. Each point in the scatterplot represents a cell, which is colored by the expression of respective mesoderm (SNAI1), ectoderm (ITGA6), or endoderm (PAF1) markers. (C and D) Heatmaps of the strongest correlation values between selected pluripotency and lineage markers (rows) and epigenetic markers (columns) for the naive (C) and the primed (D) population. The correlation values were bound at [−0.5, 0.5]. See also Figure S3 and Tables S3 and S4. The homogeneity of both the naive and primed conditions suggests that it is possible to explore co-regulatory relationships via gene-gene correlations within each population. In particular, we focused on epigenetic modulators because of their importance in controlling cellular memory and their relevance for early embryonic development. Within each condition, we computed pairwise correlations between the expression profiles of 704 epigenetic modulators with a set of 94 developmental markers (Table S4; Figure S3C). We observed strong correlations in the naive population (Figure 3C) that were much weaker in the primed population (Figure 3D). This indicates that the expression of the epigenetic machinery is more distinctly linked to the naive gene expression network and particularly to regulators related to de novo DNA and histone methylation (e.g., DNMT3A/B and EHMT1).

A Naive-to-Primed Axis Can Identify Pluripotency Transitions in Other Species

To integrate our data with previous in vivo studies, we defined a naive-to-primed axis based on empirically defined marker genes that were strongly differentially expressed between the two conditions (STAR Methods). Cells from other scRNA-seq datasets were mapped onto this axis based on the proportion of naive-primed markers (or homologous equivalents in non-human data) they expressed. As a proof of concept, we mapped the previously described intermediate population onto the naive-to-primed signature map (Figure S4A). The subpopulation hESCs were located close to the naive axis but expressed a lower proportion of signature markers than the residual naive population. This is consistent with our hypothesis that the intermediate population originates from naive cells but has lost some features of naive pluripotency. Next we mapped published scRNA-seq datasets of pre-implantation embryos from mice (Mohammed et al., 2017), cynomolgus monkeys (Nakamura et al., 2016), and humans (Petropoulos et al., 2016) onto our naive-to-primed axis. For mouse and monkey embryos, we observed a gradual loss of naive marker expression and an increase in primed marker expression (Figures 4A and 4B). This is consistent with the transition from naive to primed pluripotency and suggests that the relevant genes are conserved across species. Equal proportions of naive-primed markers were expressed at approximately E5 (mice) and E9–E13 (monkeys). In contrast, we did not observe any clear shift to primed pluripotency in humans before E7 (Figure 4C), consistent with the similarity of in vivo naive pluripotency with in vitro reprogrammed naive pluripotency under the applied culture conditions (Takashima et al., 2014).

Figure 4

Cells Shift from a Naive-like to a Primed-like Expression Pattern during Early Embryonic Development

Naive and primed markers were identified from the DE analysis of the hESC data. New cells are mapped onto the naive-primed axis based on the proportions of naive and primed markers that they express. This was performed for cells derived from mouse embryos (A), cynomolgus monkey embryos (B), and human pre-implantation embryos (C). For each plot, the density of cells is represented by the color of the pixels. Cells on the red line have equal proportions of expressed primed and naive markers.

Discussion

By sequencing the transcriptomes of single naive and primed hESCs, we identified discrete expression signatures of the two pluripotency states. In addition to recovering existing markers (Ware, 2017, Weinberger et al., 2016), we defined genes that are highly specific to each population. These expression markers are well conserved across species, as we were able to show by mapping mouse, monkey, and human sequencing data onto our naive-to-primed signature axis. Another aim of this study was to clarify the heterogeneity and developmental progression of each pluripotency state in hESCs. We found that both naive and primed states of cultured hESCs were comparably homogeneous, except for a small subpopulation of cells in the naive state with transcriptional features of primed pluripotency. This was surprising because the primed state was expected to be more differentiated and possibly showing signatures of early lineage commitments, as suggested by in vivo work in mice (Mohammed et al., 2017). However, the comparably low levels of heterogeneity of naive and primed pluripotency in vitro have also been observed in mESC lines (Kolodziejczyk et al., 2015) and could be a reflection of the medium favoring one particular cellular phenotype. Therefore, these artificial states may be a misleading representation of primed pluripotency, which is more heterogeneous in vivo. We observed that cell cycle-related effects were the most prominent source of variability within both the naive and the primed population. It is possible that specific cell cycle states may play a major role in contributing to cell fate decisions by introducing transcriptional noise. We also found that naive hESCs showed stronger correlations of pluripotency and lineage markers to epigenetic regulators than primed hESCs. Given the major epigenetic resetting observed during early embryonic development (i.e., from fertilization to the formation of the naive ICM cells; Iurlaro et al., 2017), the naive transcriptional state may have a unique need to be tightly coupled to the expression of the epigenetic machinery. In contrast, primed hESCs represent a later developmental stage in which the epigenetic machinery may be less strictly controlled as the epigenome is re-established in a more heterogeneous and cell type-specific manner. Future work exploring the epigenetic dynamics in early mouse and human embryonic development by single-cell epigenomics will help to dissect these mechanisms in more detail. We also identified a subpopulation of naive hESCs that showed both features of naive and primed states. Indeed, we assume that the naive state in hESCs is temporally limited and that cells are prone to exit it. The existence of “formative” pluripotency has recently been suggested (Smith, 2017). This state may represent a cellular phase where cells acquire differentiation competency and are marked by the expression of early post-implantation factors such as OTX2, SOX3, and POU3F1 and the transient loss of NANOG expression. Interestingly, the intermediate population is characterized by significantly decreased NANOG transcription, although we did not detect significant upregulation of OTX2, SOX3, and POU3F1. It remains to be seen whether this subpopulation corresponds to cells exiting naive pluripotency toward formative pluripotency and whether this represents a real in vivo state or arises because of culture-specific conditions. Our study provides important insights into the transcriptomic heterogeneity of naive and primed hESCs. The identification of specific markers may contribute to studying the reprogramming dynamics during the primed-to-naive transitions and delineate key transcriptional events leading to human naive pluripotency. Finally, we catalog and compare pluripotency identity across species to characterize transitions between different pluripotency states that mark specific temporal windows of embryonic development.

STAR★Methods

Key Resources Table

Contact for Reagent and Resource Sharing

Further information and requests for resources and reagents should be directed to and will be fulfilled by the Lead Contact, Wolf Reik (wolf.reik@babraham.ac.uk).

Experimental Model and Subject Details

Human WA09-NK2 ESCs (Takashima et al., 2014) were kindly provided by Austin Smith and grown under naive or primed conditions.

Method Details

Cell culture and collection

Naive hESCs were grown in 6-well dishes on mouse embryonic fibroblasts in N2B27 supplemented with human LIF, 1μM Chiron, 1μM PD03 and 2μM Gö6983. 1 passage before sorting, cells were plated on 6-well plates coated with Matrigel (growth-factor reduced). Primed hESCs were grown in 6-well dishes coated with Vitronectin in E8 media. For collection, hESCs were dissociated with Accutase and sorted in 96 well plates containing lysis buffer on a BD Aria Cell sorter, gating for cell size and granularity. In each plate, 4 wells were left empty as negative controls. Plates were immediately spun down and frozen at −80°C until subsequent processing. This was performed in two batches – the first batch contained 96 cells from each condition, while the second batch contained 384 cells from each condition (480 cells in total per condition).

Library preparation and sequencing

Single-cells were sorted in 2uL of Lysis Buffer (0.2% v/v Triton X-100 (Sigma-Aldrich, cat. no. T9284) with 2U/ul RNase Inhibitor (Clontech, cat. no. 2313A)) in 96 well plates, spun down and immediately frozen at −80°C. cDNA from sorted single cells was prepared following the SmartSeq2 protocol  (Picelli et al., 2014). Briefly, oligo-dT primer, dNTPs (ThermoFisher, cat. no. 10319879) and ERCC RNA Spike-In Mix (1:25,000,000 final dilution, Ambion, cat. no. 4456740) were added to the single-cell lysates, and Reverse Transcription and PCR were performed. The cDNA libraries for sequencing were prepared using Nextera XT DNA Sample Preparation Kit (Illumina, cat. no. FC-131-1096), according to the protocol supplied by Fluidigm (PN 100-5950 B1). Libraries from 96 single cells were pooled and purified using AMPure XP beads (Beckman Coulter). Pooled samples were sequenced on an Illumina HiSeq 2500 instrument, using paired-end 100-bp reads. On average, we obtained 2.1 × 106 reads per cell in batch 1 and 0.5 × 106 reads per cell in batch 2.

Immunofluorescence Analysis

Antibody staining was performed as previously described (Santos et al., 2003). Briefly, hESCs were cytospun, after fixation with 2% PFA for 30 minutes at room temperature. Cells were permeabilised with 0.5% Triton X-100 in PBS for 1h; blocked with 1% BSA in 0.05%Tween20 in PBS (BS) for 1h; incubation of the appropriate primary antibody diluted in BS; followed by wash in BS and secondary antibody. Secondary antibody was Alexa Fluor conjugated (Molecular Probes) diluted 1:1000 in BS and incubated for 30 minutes. Incubations were performed at room temperature unless otherwise stated. DNA was counterstained with 5 μg/mL DAPI in PBS. Single optical sections were captured with a Zeiss LSM780 microscope (63x oil-immersion objective). Fluorescence semi-quantification analysis was performed with Volocity 6.3 (Improvision).

Quantification and Statistical Analysis

Alignment and read counting

Read pairs were aligned to a reference consisting of the hg38 build of the human genome as well sequences for the ERCC spike-in transcripts. This was performed using the subread aligner v1.6.0 (Liao et al., 2013) in paired-end mode with unique alignment. Each read pair was then assigned to a gene in the Ensembl GRCh38 v91 annotation or to the spike-in transcripts. This was done using the featureCountsfunction in the Rsubread package v1.28.1 (Liao et al., 2014). Only reads with mapping quality scores above 10 were used for counting. Read counts from technical (sequencing) replicates of the same cell were added together prior to further analysis. On average, over 71% of reads mapped to the genome with over 59% mapped to exons.

Quality control on cells and genes

A range of quality metrics were computed for each cell (Figure S1A) using the calculateQCMetrics function in the scater package v1.6.3 (McCarthy et al., 2017). For each metric, outlier values were identified as those that were more than three median absolute deviations from the median. Low quality cells were identified in each batch, as those with small outlier values for the log-transformed total count; small outliers for the log-transformed number of expressed genes; large outliers for the proportion of read pairs assigned to mitochondrial genes; or large outliers for the proportion of read pairs assigned to spike-in transcripts. These cells were removed from the dataset prior to further analysis, leaving 414 naive and 423 primed cells remained for downstream analysis. The cell cycle phase for each cell was identified using the cyclone classifier (Scialdone et al., 2015) implemented in the scran package v1.6.9. This was performed with a set of human marker genes, identified by training the classifier on a pre-existing hESC dataset (Leng et al., 2015).

Normalization of cell-specific biases

For the endogenous genes, cell-specific size factors were computed using the deconvolution method (Lun et al., 2016a) with pre-clustering. For each gene, the count for each cell was divided by the appropriate size factor. A pseudo-count of 1 was added, and the value was log2-transformed to obtain log-normalized expression values. This was repeated using the spike-in transcripts, where the size factor for each cell was proportional to the sum of counts for all spike-in transcripts (Lun et al., 2017).

Feature selection and dimensionality reduction

Feature selection was performed by computing the variance of the normalized log-expression values across cells for each endogenous gene or spike-in transcript. To represent technical noise, a mean-dependent trend was fitted to the variances of the spike-in transcripts using the trendVar function in the scran package (Lun et al., 2016b). This was done separately for each batch of cells to ensure that large variances were not driven by uninteresting batch effects. The decomposeVar function was used to obtain the biological component of the variance by subtracting the fitted value of the trend (i.e., the technical component) from the total variance of each gene. The combineVar function was then used to consolidate statistics across batches. Batch effects were removed from the log-expression matrix using the removeBatchEffects function from the limma package v3.34.9 (Ritchie et al., 2015). This involved performing a linear regression on the log-expression profile of each gene and setting the blocking term for the batch to zero, which was possible for this data due to the balanced naive-primed composition of each batch. Principal component analysis was applied to the corrected expression matrix, only using the genes with positive biological components. This was performed with the denoisePCA function from scran to determine the number of principal components to retain. t-SNE was performed on the retained PCs using the Rtsne package v0.13.

Testing for differential expression between conditions

Counts for the naive and primed cells within each batch were pooled to obtain four sets of pseudo-bulk counts (Lun and Marioni, 2017). Low-abundance genes with average counts below 5 were removed and normalization was performed on the remainders with the trimmed mean of M-values method (Robinson and Oshlack, 2010). Genes were tested for differential expression (DE) between naive and primed conditions using the quasi-likelihood framework in the edgeR package v3.20.9 (Y. Chen et al., 2016). The experimental design was parameterized using an additive design containing a condition term and the batch blocking factor. DE genes were defined as those with significant differences between conditions at a FDR of 5%. To validate the identified marker genes, DE genes in three different bulk RNA-seq datasets (Pastor et al., 2016, Theunissen et al., 2016, Guo et al., 2017) were identified using the quasi-likelihood framework. This used the procedure described above with the only difference being that DE genes were defined as having absolute log-fold changes significantly greater than 0.5 at a FDR of 5% (McCarthy and Smyth, 2009). This ensured that the DE analysis focused on genes with strong differences in expression. The results of the analysis for each dataset were visualized using volcano plots. For comparison, we highlighted the top 200 naive markers and the top 200 primed markers from our single-cell data (ranked by p value) on each plot.

Detecting the intermediate population

Dimensionality reduction was performed as previously described using only the cells in the naive condition. The retained principal components were used for hierarchical clustering of the cells with the hclust function in R, using Ward linkage on the Euclidean distances. Clusters of cells were identified using a simple tree cut, where the optimal number of clusters was determined by maximizing the average silhouette width. The cluster of cells located between the bulk of cells from the naive and primed conditions in the PCA plot was denoted as the intermediate population. The intermediate population was characterized by testing for differential expression relative to the other naive cells or to the primed cells. This was done using t tests on the log-expression values (Soneson and Robinson, 2018) after blocking on the batch. For each contrast, several candidates were chosen from the top set of DE genes for further validation by immunofluorescence staining.

Exploring lineage-related heterogeneity

Dimensionality reduction was performed for each condition as previously described. For the naive condition, cells in the intermediate population were removed. PCA plots were colored according to the expression of CDK1 to represent cell cycle activity (Figure S3A). Dimensionality reduction in each condition was also repeated using only genes that were specific for the germ layers (Table S3) to detect potential early lineage commitment. Again, cells in the intermediate population were excluded. The resulting t-SNE plots were colored by expression of the mesoderm marker SNAI1 (Evseenko et al., 2010), the ectoderm marker ITGA6 (Brafman et al., 2013) or the endoderm marker PAF1 (Ponnusamy et al., 2009) to visualize any lineage-related substructure and by the expression of cell-cycle marker CDK1 (Figure S3C).

Correlations with epigenetic modulators

Pairwise correlations of selected lineage and pluripotency markers to epigenetic modulators (Table S4) were calculated using the correlatePairs function from the scran package. This involved computing Spearman’s rank correlation coefficients between the log-expression profiles of lineage marker genes (endoderm, ectoderm, mesoderm, trophectoderm, core pluripotency, naive pluripotency, formative pluripotency, primed pluripotency and germline) and genes comprising the epigenetic machinery. P-values for all pairs were combined and corrected for multiple testing. To visualize the correlations, we computed the average absolute correlation across naive and primed conditions for each gene pair. We then selected the top 25 lineage/pluripotency markers and the top 25 epigenetic modifiers with the largest average absolute correlations. For each condition, a heatmap of the correlation values between all pairs of the selected genes was constructed using the pheatmap function from the pheatmap package v1.0.8.

Mapping temporal trajectories in early embryos

Naive marker genes were defined from our data as those that were DE relative to primed cells (using the pseudo-bulk statistics, above) at a FDR of 5% and with a log2-fold change of 10; were present in at least 25% of naive cells; and were present in no more than 5% of primed cells. Similarly, primed marker genes were defined as those that were DE relative to naive cells at a FDR of 5% and with a log2-fold change of −10; were present in at least 25% of primed cells; and were present in no more than 5% of naive cells. A marker gene was considered to be expressed in a cell from a different dataset if its (normalized) count was greater than 10. For each cell, we calculated the proportion of naive markers that were expressed. This was repeated for the primed markers. Cells were mapped onto the “naïve-primed axis” based on these proportions. Large naive proportions and small primed proportions indicate that the cell is naive, and vice versa for primed cells. Mapping onto the naive-primed axis was performed for cells collected from human pre-implantation embryos (Petropoulos et al., 2016), mouse embryos (Mohammed et al., 2017), and cynomolgus monkey embryos (Nakamura et al., 2016). Mouse homologs for the marker genes were identified using the getLDS function from the biomaRt package (Durinck et al., 2005), using the homology relationships predicted by Ensembl. Monkey homologs for marker genes were identified as those with the same gene symbol. As a control, we also performed remapping using the naive, primed and intermediate population cells in our own dataset. To assess the expression of intermediate population genes in the human pre-implantation embryos (Petropoulos et al., 2016), an intermediate-naive axis was constructed similarly to the naive-primed axis. Intermediate population marker genes were considered uniquely expressed in the subpopulation by a log2-fold change of 5 against both the naive and the primed population at a FDR of 5%, and by their presence in less than 25% of the naive and the primed cells.

Data and Software Availability

Code availability

All analysis code is available at https://github.com/MarioniLab/NaiveHESC2016.

Deposition of sequencing data

The accession number for the scRNA-seq data reported in this paper is ArrayExpress: E-MTAB-6819.

REAGENT or RESOURCE	SOURCE	IDENTIFIER
Antibodies

Human DPPA5/ESG1 Antibody	R&D Systems	AF3125-SP; RRID: AB_2094168
Human ABCG2 Antibody	R&D Systems	MAB995-SP; RRID: AB_2220316

Chemicals, Peptides, and Recombinant Proteins

Human LIF	Stem Cell Institute, Cambridge	N/A
CHIR99021	Stem Cell Institute, Cambridge	N/A
PD0325901	Stem Cell Institute, Cambridge	N/A
Gö6983	TOCRIS	2285
Penicillin/Streptomycin	Thermo Fisher Scientific	15140122
MEM non-essential amino acids	Thermo Fisher Scientific	11140050
TeSR-E8	STEMCELL Technologies	05990
B27	Thermo Fisher Scientific	17504044
N2	Stem Cell Institute, Cambridge	N/A
DMEM-F12	Thermo Fisher Scientific	11320-033
Neurobasal	Thermo Fisher Scientific	21103049
Accutase	STEMCELL Technologies	07920
Triton X-100	Sigma-Aldrich	T9284
Tween20	Sigma-Aldrich	P9416
DAPI	Thermo Fisher Scientific	62248
RNase Inhibitor	Clontech	2313A
ERCC RNA Spike-In Mix	Ambion	4456740
dNTPs	Thermo Fisher Scientific	10319879

Critical Commercial Assays

AMPure XP beads	Beckman Coulter	A63880
Nextera XT DNA Sample Preparation Kit	Illumina	FC-131-1096
Nextera XT Index Kit	Illumina	FC-131-1001
Superscript II reverse transcriptase	Invitrogen	18064-014

Deposited Data

Raw sequence data and count tables	This study	E-MTAB-6819

Experimental Models: Cell Lines

Human WA09 Embryonic stem cells	Takashima et al., 2014	N/A

Oligonucleotides

Oligo-dT30VN	Picelli et al., 2014	N/A
AAGCAGTGGTATCAACGCAGAGTACT30VN	Picelli et al., 2014	N/A
Template Switching Oligo	Picelli et al., 2014	N/A
AAGCAGTGGTATCAACGCAGAGTACATrGrG+G	Picelli et al., 2014	N/A
ISPCR oligo	Picelli et al., 2014	N/A
AAGCAGTGGTATCAACGCAGAGT	Picelli et al., 2014	N/A

Software and Algorithms

R version 3.5.1 (Feather Spray)	The R Project	https://www.r-project.org
scran	Lun et al., 2016b	https://bioconductor.org/packages/release/bioc/html/scran.html
scater	McCarthy et al., 2017	https://bioconductor.org/packages/release/bioc/html/scater.html
Rtsne	van der Maaten and Hinton, 2008	https://cran.r-project.org/web/packages/Rtsne/index.html
edgeR	Robinson et al., 2010	https://bioconductor.org/packages/release/bioc/html/edgeR.html
subread	Liao et al., 2013	http://subread.sourceforge.net
Rsubread	Liao et al., 2014	https://bioconductor.org/packages/release/bioc/html/Rsubread.html
limma	Ritchie et al., 2015	https://bioconductor.org/packages/release/bioc/html/limma.html

45 in total

1. Single-Cell Sequencing and Organoids: A Powerful Combination for Modelling Organ Development and Diseases.

Authors: Yuebang Yin; Peng-Yu Liu; Yinghua Shi; Ping Li
Journal: Rev Physiol Biochem Pharmacol Date: 2021 Impact factor: 5.545

Review 2. Innovative Molecular Testing Strategies for Adjunctive Investigations in Hemostasis and Thrombosis.

Authors: Elham Ghorbanpour; David Lillicrap
Journal: Semin Thromb Hemost Date: 2019-08-12 Impact factor: 4.180

3. Modeling human embryo development with embryonic and extra-embryonic stem cells.

Authors: Bailey A T Weatherbee; Tongtong Cui; Magdalena Zernicka-Goetz
Journal: Dev Biol Date: 2020-12-24 Impact factor: 3.582

Review 4. Epigenetic aberrations in human pluripotent stem cells.

Authors: Shiran Bar; Nissim Benvenisty
Journal: EMBO J Date: 2019-05-14 Impact factor: 11.598

Review 5. Epigenetics as "conductor" in "orchestra" of pluripotent states.

Authors: Ishita Baral; Pallavi Chinnu Varghese; Debasree Dutta
Journal: Cell Tissue Res Date: 2022-07-15 Impact factor: 4.051

6. Single cell transcriptome profiling of the human alcohol-dependent brain.

Authors: Eric Brenner; Gayatri R Tiwari; Manav Kapoor; Yunlong Liu; Amy Brock; R Dayne Mayfield
Journal: Hum Mol Genet Date: 2020-05-08 Impact factor: 6.150

Review 7. All models are wrong, but some are useful: Establishing standards for stem cell-based embryo models.

Authors: Eszter Posfai; Fredrik Lanner; Carla Mulas; Harry G Leitch
Journal: Stem Cell Reports Date: 2021-05-11 Impact factor: 7.765

8. In Situ Expansion, Differentiation, and Electromechanical Coupling of Human Cardiac Muscle in a 3D Bioprinted, Chambered Organoid.

Authors: Molly E Kupfer; Wei-Han Lin; Vasanth Ravikumar; Kaiyan Qiu; Lu Wang; Ling Gao; Didarul B Bhuiyan; Megan Lenz; Jeffrey Ai; Ryan R Mahutga; DeWayne Townsend; Jianyi Zhang; Michael C McAlpine; Elena G Tolkacheva; Brenda M Ogle
Journal: Circ Res Date: 2020-03-31 Impact factor: 17.367

Review 9. Single-Cell Transcriptome Analysis as a Promising Tool to Study Pluripotent Stem Cell Reprogramming.

Authors: Hyun Kyu Kim; Tae Won Ha; Man Ryul Lee
Journal: Int J Mol Sci Date: 2021-06-01 Impact factor: 5.923

Review 10. Unraveling the Spatiotemporal Human Pluripotency in Embryonic Development.

Authors: Daniela Ávila-González; Wendy Portillo; Guadalupe García-López; Anayansi Molina-Hernández; Néstor E Díaz-Martínez; Néstor F Díaz
Journal: Front Cell Dev Biol Date: 2021-06-23