Literature DB >> 27383781

Resolving early mesoderm diversification through single-cell expression profiling.

Antonio Scialdone^1,2, Yosuke Tanaka^3,4, Wajid Jawaid^3,4, Victoria Moignard^3,4, Nicola K Wilson^3,4, Iain C Macaulay², John C Marioni^1,2,5, Berthold Göttgens^3,4.

Abstract

In mammals, specification of the three major germ layers occurs during gastrulation, when cells ingressing through the primitive streak differentiate into the precursor cells of major organ systems. However, the molecular mechanisms underlying this process remain unclear, as numbers of gastrulating cells are very limited. In the mouse embryo at embryonic day 6.5, cells located at the junction between the extra-embryonic region and the epiblast on the posterior side of the embryo undergo an epithelial-to-mesenchymal transition and ingress through the primitive streak. Subsequently, cells migrate, either surrounding the prospective ectoderm contributing to the embryo proper, or into the extra-embryonic region to form the yolk sac, umbilical cord and placenta. Fate mapping has shown that mature tissues such as blood and heart originate from specific regions of the pre-gastrula epiblast, but the plasticity of cells within the embryo and the function of key cell-type-specific transcription factors remain unclear. Here we analyse 1,205 cells from the epiblast and nascent Flk1(+) mesoderm of gastrulating mouse embryos using single-cell RNA sequencing, representing the first transcriptome-wide in vivo view of early mesoderm formation during mammalian gastrulation. Additionally, using knockout mice, we study the function of Tal1, a key haematopoietic transcription factor, and demonstrate, contrary to previous studies performed using retrospective assays, that Tal1 knockout does not immediately bias precursor cells towards a cardiac fate.

Entities: Chemical

Mesh：

Substances：

Year: 2016 PMID： 27383781 PMCID： PMC4947525 DOI： 10.1038/nature18633

Source DB: PubMed Journal: Nature ISSN： 0028-0836 Impact factor: 49.962

Traditional experimental approaches for genome-scale analysis rely on large numbers of input cells and therefore cannot be applied to study early lineage diversification directly in the embryo. To address this, we used single cell transcriptomics to investigate mesodermal lineage diversification towards the haematopoietic system in 1,205 single cells covering a timecourse from early gastrulation at embryonic day E6.5 to the generation of primitive red blood cells at E7.75 (Figure 1a, Extended Data Fig. 1a,2a). Using previously published metrics (Methods), we observed that the data were of high quality. 501 single cell transcriptomes were obtained from dissected distal halves of E6.5 embryos sorted for viability only, which contain all of the epiblast cells, including the developing PS, and a limited number of visceral endoderm and extra-embryonic ectoderm cells. From E7.0, embryos were staged according to anatomical features (Methods) as primitive streak (S), neural plate (NP) and head fold (HF). The VEGF receptor Flk1 (Kdr) was used to capture cells as it marks much of the developing mesoderm4. During subsequent blood development, Flk1 is down-regulated and CD41 (Itga2b) is up-regulated5. We therefore also sampled cells expressing both markers and CD41 alone at the NP and HF stages (Figure 1a; Extended Data Fig. 1b, 2a), giving a total of 138 cells from E7.0 (S), 259 from E7.5 (NP) and 307 from E7.75 (HF).

Figure 1

Single cell transcriptomics identifies 10 populations relevant to early mesodermal development.

a) Whole mount images and schematics of E6.5 to E7.75 embryo sections. Colors indicate approximate locations of sorted cells. Anterior, left; posterior, right. Scale bars: 200µm. b) Heatmap showing key genes distinguishing 10 clusters. Colored bars indicate assigned cluster (top), stage (middle: turquoise, E6.5; purple, primitive streak (S, E7.0); green, neural plate (NP, E7.5); red, headfold (HF, E7.75)) and the sorted population (bottom: green, E6.5 epiblast; blue, Flk1+; turquoise, Flk1+CD41+; red, Flk1-CD41+). c) t-SNE of all 1205 cells colored by embryonic stage, and d) according to clusters in (b).

Extended Data Figure 1

Fluorescence activated cell sorting of single cells.

a) Flk1+ cells were sorted from 3 embryos at S and HF stages and 4 embryos at NP stage. Labels such at ‘S4’ refer to the embryo number in the metadata available online at http://gastrulation.stemcells.cam.ac.uk/scialdone2016. b) CD41+Flk1- cells (red gate) and CD41+Flk1+ (green gate) cells were sorted from 8 embryos each at NP and HF stages. Each stage was sorted on two occasions. Labels above FACS plots refer to the sort and embryo number in the metadata available online, as above. In all plots, pink text indicates the percentage of cells in that gate.

Extended Data Figure 2

Quality control of single cell RNA-seq data.

a) Table showing numbers and estimates of numbers of cells of different phenotypes present in embryos between E6.5 and E7.75 (HF stage) and numbers sorted for this study. (*) Total cell numbers for E6.5 are from Beddington and Robertson (1999)58 and (§) total numbers and numbers of Flk1+ cells are from Moignard et al., (2015)15. Percentages of cells expressing Flk1 and/or CD41 at NP and HF stages are the average values from the embryos used in this study and were used to calculate the estimated numbers present in embryos from the total cell numbers. ND, not done. b) t-SNE representation of the five metrics used to assess the quality of the samples. Only cells that passed all criteria (blue circles) were used for downstream analysis. c) Squared coefficient of variation (CV2) as a function of the mean normalised counts (µ) for genes across all cells. The green line shows the fit CV2 = a1/μ + α0. All highly variable genes (with an adjusted p-value < 0.1) are marked by red circles. d) Number of genes detected (i.e., with more than 10 normalised read counts) in cells across the different clusters in the WT (left panel) and the Tal1-/- (right panel) mice.

Following rigorous quality control, 2,085 genes were identified as having significantly more heterogeneous expression across the 1,205 cells than expected by chance (Extended Data Fig. 2b-d). Unsupervised hierarchical clustering in conjunction with a dynamic hybrid cut (Methods) yielded 10 robust clusters with varying contributions from the different embryonic stages (Figure 1b, Extended Data Fig. 3, Methods, cell numbers in Extended Data Fig. 3h). Using t-Stochastic Neighbor Embedding (t-SNE) dimensionality reduction to visualise the data, three major groups were observed, one comprising almost all E6.5 cells, another mainly consisting of earlier S and NP stage cells, and a third containing predominantly later HF stage cells (Figure 1c). Importantly, clusters were coherent with the t-SNE visualization except for the small cluster 5 (Figure 1d).

Extended Data Figure 3

Identifying cell clusters.

The Dynamic Hybrid Cut algorithm was used with all the possible values of the “deepSplit” parameter on 100 bootstrapped subsamples. To assess the quality of the clustering, the Pearson gamma (a) and the average silhouette width (b) were calculated. Higher values of these parameters correspond to better clustering. The Pearson gamma represents the correlation between the dissimilarity of samples and a binary variable that equals 1 for pairs of samples in the same cluster and 0 for samples in different clusters. The average silhouette width measures the average separation between neighboring clusters43,44. At “deepSplit”=2 the Pearson gamma is highest whereas the average silhouette width begins to decrease. This suggests that at such a value of the “deepSplit” parameter a good compromise between robustness and sensitivity is achieved. The Pearson gamma and the average silhouette width were computed with the R function “cluster.stats” in the “fpc” package (version 2.1-9). (c-f) Examples of marker genes for four clusters: Mesp1 for cluster 4 (top-ranked marker) (c), Sox18 for cluster 6 (second-ranked) (d), Hbb-bh1 for cluster 8 (fourth-ranked) (e) and Alx1 for cluster 10 (top-ranked) (f).The y-axis shows the log10 normalised expression of the genes. g) Dendrogram showing the clustering of the cells in the first experiment. The colors at the bottom indicate the cluster each cell was assigned to by the dynamic hybrid cut algorithm. Cluster assignment was used to sort cells in Figure 1b. h) Identities were assigned to the 10 clusters in Figure 1c based on the expression of key genes associated with various mesodermal lineages or spatial locations within the embryo.

The expression of key marker genes allowed us to assign identities to each cluster: visceral endoderm, extra-embryonic ectoderm, epiblast, early mesodermal progenitors, posterior mesoderm, endothelium, blood progenitors, primitive erythrocytes, allantoic mesoderm, pharyngeal mesoderm (Figure 1b, Extended Data Fig. 4 and Extended Data Fig. 3h). Because of the limited cell numbers and lack of markers for their prospective isolation, conventional bulk transcriptome analysis of these key populations has never before been attempted.

Extended Data Figure 4

Expression of key marker genes in E7.0-7.75 embryos.

Schematic representations of expression patterns were generated from published in situ hybridization data (see citations) for key markers of clusters 1 (magenta, visceral endoderm59), 2 (pink, extra-embryonic ectoderm61), 6 (red, YS endothelium62), 9 (black, allantois63,64) and 10 (green, second heart field65,66). Anterior is shown on the left and posterior on the right. Also shown is the t-SNE for all cells or cells from E7.0 onwards (S, NP and HF stages) indicating expression of each gene (white, low; purple, high).

Since the T-box transcription factor Brachyury - encoded by the T gene – marks the nascent PS6, we investigated the gene expression programs associated with T induction in the E6.5 cells (cluster 3). T expression was restricted to a distinct subset of epiblast cells found closest to cluster 4 (Figure 1d, Extended Data Fig. 5b), with rare isolated cells within the bulk of the epiblast population also expressing moderate levels, consistent with priming events for single gastrulation-associated genes. T expression correlated with other gastrulation-associated genes including Mixl1 and Mesp1 (Figure 2a), with Mesp1 highly expressed only in the small subset of cells situated at the pole of the E6.5 epiblast cluster (association of T and Mesp1 expression: p-value 3x10-15, Fisher’s exact test). We also observed a subset of cells distinct from the T+/Mesp1+ population, which expressed Foxa2, suggestive of endodermal priming7 (Extended Data Fig. 5d).

Extended Data Figure 5

Expression of key genes used for sorting single cells.

a)t-SNE as in Figure 1 showing the sorting strategy for each of the 1,205 cells. b)Expression of Flk1 (Kdr), CD41 (Itga2b), Scl (Tal1), Gata2 and T (Brachyury) superimposed onto the t-SNE. c) t-SNE showing only the cells from S, NP and HF stages, colored according to cluster as in Figure 1c and E. d) t-SNE for the 481 E6.5 cells in cluster 3, as in Figure 2a. Each point is colored by expression of T and Foxa2.

Figure 2

Transcriptional program associated with T induction in E6.5 epiblast cells.

a) t-SNE of the 481 E6.5 cells in cluster 3. Points are colored by expression of T (Brachyury) and Mixl1, Mesp1 and Frzb. b) Heatmap showing the 10 genes most highly positively and negatively correlated with T (Supplementary Information Table 1). c) Forward scatter (FSC) for the 481 E6.5 epiblast cells in cluster 3, with cells grouped according to T/Mesp1 expression. Boxplots indicate the median and interquartile range. P-values were calculated using a two-sided Welch’s t-test for samples with unequal variance, with FDR correction for multiple testing.

We next identified genes displaying correlated expression with T, which identified known markers and regulators such as Mixl1, and also genes not previously implicated in mammalian gastrulation, such as Slc35d3, an orphan member of a nucleotide sugar transporter family8 and the retrotransposon-derived transcript Cxx1c9 (Figure 2b and Supplementary Information Table 1). Genes negatively correlated with T were consistently expressed across the majority of epiblast cells, suggesting that cells outside the PS have not yet committed to a particular fate, consistent with the known plasticity of epiblast cells in transplant experiments10. Ingressing epiblast cells undergo an EMT, turning from pseudo-stratified epithelial cells into individual motile cells, a conformational change associated with alterations in cell size and shape11. Our E6.5 epiblast cells were isolated using index sorting thus providing a forward scatter (FSC) value for each cell. As shown in Figure 2c, T+/Mesp1+ co-expressing cells showed a significant reduction in FSC values compared to T+/Mesp1- and T cells. Since FSC correlates positively with cell size, this observation provides a direct link between specific transcriptional programs and characteristic physical changes associated with gastrulation. As T+/Mesp1+ cells also express Mesp2, this observation was consistent with the known EMT defect in Mesp1/Mesp2 double knock-out embryos12. Index sorting therefore linked expression changes with dynamic physical changes similar to those recognised to occur during chicken gastrulation13. We next focused on mesodermal lineage divergence during and immediately after gastrulation. We reasoned that approaches analogous to those used to order single cells in developmental pseudotime could be used to infer the location of cells in pseudospace, specifically with respect to the anterior-posterior axis of the primitive streak (Figure 3a). To this end, we used diffusion maps14, a dimensionality reduction technique particularly suitable for reconstructing developmental trajectories15. We identified the diffusion-space direction that most likely represents true biological effects (see Methods), which we interpreted as the pseudospace coordinate (red line in Figure 3b, Extended Data Fig. 6a-d). Hierarchical clustering revealed three groups of genes (Figure 3c, Extended Data Fig. 6e, Supplementary Information Table 4) showing a gradient of expression along the pseudospace axis. These were assigned as anterior (darker blue, 334 genes) and posterior (lighter blue, 87 genes) due to the enrichment of genes with known differential expression along the anterior-posterior axis of the PS (Figure 3d, Extended Data Fig. 6f-h, 7). A third cluster was expressed highly at either end of the pseudospace axis (turquoise, 41 genes). Interestingly, the more posterior Flk1+ mesodermal cells are associated with the allantois, blood and endothelial clusters (Figure 1d, Extended Data Fig. 5c), which are known to arise from the posterior primitive streak. Gene ontology analysis revealed that the putative anterior genes were associated with terms relating to somite development, endoderm development and Notch signaling, consistent with a more anterior mesoderm identity16 (Supplementary Information Table 2a; Extended Data Fig. 6h). Conversely, the putative posterior mesoderm cluster was associated with BMP signaling, hindlimb development and endothelial cell differentiation, which consistent with the posterior portion of the streak17.

Figure 3

Dimensionality reduction reveals transcriptional profiles associated with cell location in the embryo.

a) Schematic of tissue emergence along the anterior-posterior primitive streak, derived from 29. Mesodermally- and endodermally-derived tissues are marked by a red and green line, respectively. b) Diffusion map of 216 cells in cluster 4 with pseudospace axis in red. Projections onto this axis represent pseudospace coordinates. c) Heatmap for differentially expressed genes along the pseudospace axis, showing genes more highly expressed in the anterior (dark blue) and posterior region (light blue), or highly expressed at either end (aquamarine). d) Expression profiles for example genes.

Extended Data Figure 6

Pseudospace analysis of cluster 4 correlates with anterior-posterior position along the primitive streak.

a) Diffusion plot of cells in cluster 4. Different colors correspond to different plates and different lanes of flow cells. b) Table showing the number of cells in each stage analysed on the different lanes of flow cells (S, Primitive streak; NP, Neural plate; HF, Head fold). c) A direction in the diffusion space can be identified by the angle α that it forms with the first diffusion component (left panel). For each value of α the right panel shows the percentage of variance explained by the batch effect associated to plates SLX-8408 and SLX-8409 (orange line) and plates SLX-8410 and SLX-8411. α1 and α2 are the angles corresponding to directions that correlate the least with the batch effect (i.e., variance explained by the batch effect is minimum). d) The use of alternative dimensionality reduction techniques results in the identification of highly correlated pseudo-space coordinates. A t-SNE projection of the dissimilarity matrix was carried out (perplexity set to 50), and the direction corresponding to the pseudo-space coordinate was estimated by minimizing the correlation with the batch effect (left panel; Spearman correlation between the two pseudo-space coordinates 0.79, p-value < 2.2x10-16). Independent Component Analysis was performed on the dissimilarity matrix with the “fastICA” R function, and 3 independent components (corresponding to the 2 batch effects and the biological effect) were estimated. The presumptive pseudo-space coordinate is the component having the smallest correlation with the batch effects (right panel; Spearman correlation coefficient is 0.97, p-value < 2.2x10-16). e) Plots showing the average expression of genes in clusters 1-3 of Figure 3c along the pseudospace axis. Gene expression levels are normalised between 0 and 1. Black lines indicate the normalised mean expression levels of genes in each cluster as obtained from the fitting procedure and red shaded area indicates standard deviation. f) Expression of T as function of the pseudospace coordinate. g) Gene expression levels for example genes showing high-low-high expression pattern across the blue cluster. In B and C, putative anterior cells are to the left and posterior to the right as in C. Each dot represents a cell and red lines indicate fits based on local polynomial functions (see Methods). h) We performed Principal Component Analysis (PCA) on the cells in cluster 4 by using markers of presomitic mesoderm as anterior mesoderm markers and genes expressed in haemato-vascular and allantoic mesoderm as posterior markers67–80, as well as Podxl which was shown to separate distinct Flk1+ mesodermal lineages81. The first component explained 36% of the total variance and was highly correlated with the pseudo-space coordinate (left panel; Spearman rank correlation 0.84, p-value < 2.2x10-16). All the anterior markers were negatively correlated with the pseudo-space coordinate, whereas all posterior markers had a positive correlation (right panel).

Extended Data Figure 7

Expression of key genes along the anterior-posterior axis of the primitive streak in E7.0-7.75 embryos.

Schematic representations of gene expression were generated from published in situ hybridization data (see citations) for key markers of clusters 4 (blue, mesoderm) and 7 (yellow, posterior mesoderm/blood progenitors). Expression of T (Brachyury)82 and Flk1 (Kdr – from in house data) are shown to illustrate the extent of the primitive streak at E7.5. Lefty2 and Tbx659 are expressed in the putative anterior portion of cluster 4 and in more anterior regions of the primitive streak in in situ analysis. Tbx383 and Bmp484 are expressed in the more posterior portion of cluster 4 and in the embryo are expressed in the more posterior region of the primitive streak around the amnion and into the extra-embryonic mesoderm. Tek and Fli1 (from in-house data) are expressed in cluster 7 and in the embryo are found exclusively in the extra-embryonic portion. Also shown is the t-SNE for the cells from E7.0 onwards (S, NP and HF stages) indicating expression of each gene (white, low; purple, high).

Although derived from the same embryonic stages as the mesodermal progenitor cells, cluster 7 lacks expression of genes such as Mesp1, yet expresses Tal1, Sox7, Tek (Tie2) and Fli1 which are vital for extra-embryonic mesoderm formation (Figure 1b, Extended Data Fig. 5, 7). Expression of Kdr and Itga2b (Extended Data Fig. 5b) further highlights clusters 7 and 8 (brown) as corresponding to the developmental journey towards blood (Figure 4a), with a transition to mostly HF stage cells in cluster 8 and increasing expression of embryonic haemoglobin Hbb-bh1 (Figure 4b). Given the apparent trajectory of blood development from cluster 7 to 8, we used an analogous approach to that described above to recover a pseudotemporal ordering of cells (Figure 4a, Extended Data Fig. 8a-d and Methods). 803 genes were down-regulated, including the haematovascular TF Sox7 which is known to be down-regulated during blood commitment15 (Figure 4c,d, Extended Data Fig. 8e,f). 67 genes were up-regulated including the erythroid-specific TFs Gata1 and Nfe2, and embryonic globin Hbb-bh1 (Figure 4b, d, e, Extended Data Fig. 8). 27 genes were transiently expressed, including the known erythroid regulator Gfi1b. Significant GO terms associated with the up-regulated genes were indicative of erythroid development, while down-regulated genes were associated with other mesodermal processes including vasculogenesis and osteoblast differentiation (Supplementary Information Table 2b).

Figure 4

Inferring the transcriptional program underlying primitive erythropoiesis.

a) Diffusion map of 271 cells in clusters 7 and 8 displaying the inferred pseudotime axis (blue). b) Expression of Hbb-bh1 ordered by pseudotime (local polynomial fit). c) Heatmap ordered along the pseudo-time axis. Horizontal bars indicate cluster and developmental stage. Genes shown were repressed (grey), activated (green) or transiently expressed (blue). d) Examples of activated and repressed genes and e) Gata1 as in (b). f) UCSC Browser tracks for Gata1 ChIP-seq and input in Runx1+Gata1+ cells; the Nfe2 locus is shown. g) Percentage of genes in each group identified in C overlapping Gata1 targets.

Extended Data Figure 8

Pseudotime analysis of primitive erythroid development.

a) Diffusion plot of cells in cluster 7 and 8. Different colors correspond to different plates and lanes of flow cells. b) Table showing the number of cells in each stage collected on the different plates (S, primitive streak; NP, Neural plate; HF, Head fold). c) Analogously to Extended Data Fig. 6, the angle α identifies a direction in the diffusion space (left panel). The percentage of variance explained by the batch effect associated to plates SLX-8344 and SLX-8345 is plotted as function of α in the right panel. d) The pseudo-time coordinate is robust to the use of different dimensionality reduction techniques, as shown in the left panel with t-SNE (Spearman correlation 0.92, p-value < 2.2x10-16) and in the right panel with Independent Component Analysis (Spearman correlation 0.97, p-value < 2.2x10-16; same procedure described in Extended Data Fig. 6d). e) Plots showing the average expression of genes in clusters 1-3 of Figure 4c along the pseudotime axis. Gene expression levels are normalised between 0 and 1. Black lines are the average expression levels of genes in each cluster as obtained from the fitting procedure, after normalization. Red shaded areas indicate standard deviation. f) Principal Component Analysis was carried out on the expression pattern of genes known from previous studies to be up-regulated or down-regulated along the blood developmental trajectory15,85–90. The first principal component (explaining 44% of total variance) showed a very strong correlation with the pseudo-time coordinate (left panel; Spearman correlation coefficient 0.91, p-value < 2.2x10-16). All up-regulated (down-regulated) genes positively (negatively) correlate with the pseudotime coordinate (right panel).

Gata1 null embryos die at around E10.5 due to the arrest of yolk sac erythropoiesis18. We generated genome-wide ChIP-seq data for Gata1 in haematopoietic cells derived following 5 days of ESC in vitro differentiation (Extended Data Fig. 9a-c). The group of up-regulated genes from the pseudotime analysis showed a pronounced overlap with Gata1 targets (p < 2.2x10-16, Fisher’s test) including known targets such as Nfe2 and Zfpm1 (Figure 4f,g, Extended Data Fig. 9d,e, Supplementary Information Table 6). Integration of single cell transcriptomics with complementary TF binding data therefore predicts likely in vivo targets of developmental regulators such as Gata1.

Extended Data Figure 9

ChIP-seq for Gata1 in ESC-derived haematopoietic cells.

a) Flow cytometry for Gata1-mCherry and Runx1-IRES-GFP knock-in reporter genes in embryoid body cells after 5 days of haematopoietic differentiation. Cells were sorted for the expression of both Runx1-IRES-GFP and Gata1-mCherry knock-in reporter genes to provide in vitro equivalents of the developing primitive erythrocytes assayed by RNA-seq. The gate used for sorting is shown in red. b) Numbers of reads and peaks identified for Gata1 and an input sample after mapping and peak calling. 4,135 Gata1 peaks were identified. c) Distribution of Gata1 peaks between promoter, intragenic and intergenic sequences. d) UCSC genome browser tracks for Gata1 and input sample at the Zfpm1 (Fog1) locus known to be a target of Gata1, indicating the quality of the ChIP-seq data. e) Expression of Gata1 target Zfpm1 during the pseudotimecourse for erythroid development, as in Figure 4.

Two contrasting mechanisms are commonly invoked to explain how drivers of cell fate determination regulate cell type diversification. The first involves fate restriction through a step-wise sequence of binary fate choices and is supported by mechanistic investigations using embryonic stem cell (ESC) differentiation2,19. The alternative invokes acquisition of diverse fates from independent precursor cells and is commonly supported by cell transplantation and lineage tracing analysis (Figure 5a)1,10,20,21. In contrast to the retrospective nature of transplantation and lineage tracing experiments where measurements are typically obtained a day or more after cell fate decisions are made, single cell transcriptomics allows cellular states to be determined at the moment when fate decisions are executed since low cell numbers are not a limiting factor.

Figure 5

Analysis of Tal1-/- embryos suggests independent fate acquisition.

a) Schematic of two cell fate diversification models. b) Tal1 in situ hybridization at HF stage. Scale bar: 200µm. c) Flow cytometry of WT and Tal1-/- mice at HF and E8.25. d) Blood program genes are differentially expressed between nascent mesoderm (blue) and endothelial (red) and blood cells (brown). Differential expression between 45 Tal1-/- and 59 WT endothelial cells (lower left t-SNE) identified 50 down-regulated genes. Gene set overlap (central panel) indicates failure to induce the blood program in Tal1-/- endothelium (p < 2.2 x 10-16, Fisher's test). On the right are expression distributions for selected genes in wild type (black) or Tal1-/- (grey) endothelial cells. e) For genes previously reported3 to be bound and activated (left panel) or bound and repressed (right panel) by Tal1, fold change between Tal1-/- and WT endothelium (defined in D) is plotted against average expression. Red circles: genes with a fold change > 1.5 and an FDR < 0.05.

The bHLH TF Tal1 (also known as Scl) is essential for the development of all blood cells22,23 with strong expression in posterior mesodermal derivatives (Figure 5b). Tal1-/- bipotential blood/endothelial progenitors cannot progress to a haemogenic endothelial state19, Tal1 overexpression drives transdifferentiation of fibroblasts into blood progenitors24 and Tal1-/- mesodermal progenitors from the yolk sac give rise to aberrant cardiomyocyte progenitors when cultured in vitro2. However, the precise nature of the molecular defect within Tal1-/- mesodermal progenitors within the embryo has remained obscure, because cell numbers are too small for conventional analysis. We profiled single Flk1+ cells from 4 wild type and 4 Tal1-/- embryos obtained from E7.5 (NP) to E8.25 (4 somite stage) (256 wild type and 121 Tal1-/- cells; Figure 5c, Extended Data Fig. 10), and computationally assigned cells to the previously defined 10 clusters (Methods). Cells from wild type embryos contributed to all clusters, while Tal1-/- embryos did not contain any cells corresponding to the blood progenitor and primitive erythroid clusters (yellow and brown, Figure 5d) consistent with the known failure of primitive erythropoiesis in Tal1-/- embryos23 and their lack of CD41 expression (Figure 5c).

Extended Data Figure 10

Collection of embryos from Tal1 LacZ/+ crosses.

a) Genotyping PCR for embryos from Tal1 LacZ/+ crosses. Lower band is the wild type allele and upper band is the mutant allele carrying a neomycin knock in. Presence of both bands indicates heterozygosity. Embryos from which sequencing data were obtained are indicated with a red star and the number given corresponds to embryo identity in the metadata available online with the sequencing data. b) t-SNE as in Figure 5d showing Tal1 data (triangles) and original wild type data (grey circles). Tal1 data are colored according to the embryo stage from which they were collected: green, NP; red, HF; orange, 4SP. c) As in Figure 5d, showing the complete list of genes. d) Gene Set Control Analysis91 (GSCA) was used to identify statistically significant overlaps between genes significantly down-regulated in Tal1-/- compared with WT cells in the endothelial cluster (see Figure 5) and Tal1 targets identified by ChIP-seq. GSCA identified an enrichment of our gene set with Tal1 ChIP-seq in ESC-derived haemangioblasts and haemogenic endothelium92, but not ESC-derived haematopoietic progenitors92 or a haematopoietic progenitor cell line52.

Fifty-one Tal1-/- cells were mapped to the endothelial (red) cluster, which therefore allowed us to investigate the early consequences of Tal1 deletion in this key population for definitive haematopoietic development (Figure 5d, Supplementary Information Table 7, 8). Fifty genes were down-regulated in Tal1-/- endothelial cells (fold change < 0.67, 5% FDR). These include known regulators of early blood development (Itga2b, Lyl1, Cbfa2t3, Hhex, Fli1, Ets2, Egfl7, Sox7, Hoxb5), consistent with Tal1 specifying a haematopoietic fate in embryonic endothelial progenitor cells19, and in particular Hoxb5, which has recently emerged as a powerful marker for definitive blood stem cells25. Single cell profiling also identified genes with altered distributions of expression. For example, Sox7 changed from a largely unimodal pattern in wild-type cells to a bimodal on/off pattern in Tal1-/- endothelial cells, while Cbfa2t3 showed the opposite pattern (Figure 5d). However, we did not observe up-regulation of cardiac markers in Tal1-/- endothelial cells (Figure 5e, Supplementary Information Table 8, 9). Previously, this up-regulation had been observed in YS endothelial cells collected 1-1.5 days later than our data2, and had been taken as evidence that Tal1 acts as a gatekeeper controlling the balance between alternative cardiac and blood/endothelial fates within single multipotent mesodermal progenitors3. Our results however suggest that the primary role of Tal1 is induction of a blood program, and the subsequent ectopic expression of cardiac genes may be the result of secondary induction events acting on a still relatively plastic mesodermal cell blocked from executing its natural developmental program. Here we have used single cell transcriptomics to obtain a comprehensive view of the transcriptional programs associated with mammalian gastrulation and early mesodermal lineage diversification. Further technological advances to resolve epigenetic processes at single cell resolution26 and match single cell expression profiles with spatial resolution27,28 are likely key drivers of future progress in this field. Finally, our analysis of Tal1-/- embryos illustrates how the phenotypes of key regulators can be re-evaluated at single cell resolution to advance our understanding of early mammalian development.

Methods

Timed matings and embryo collection

All procedures were performed in strict adherence to United Kingdom Home Office regulations. Timed matings were set up between CD1 mice (which produce large litters). Embryos were staged according to the morphologic criteria of Downs and Davies30, and classified broadly as primitive streak (S), neural plate (NP) or head fold (HF) stage. Suspensions of cells from individual embryos were prepared by incubating with TrypLE Express dissociation reagent (Life Technologies) at 37°C for 10 minutes and quenching with heat inactivated serum. All cells were stained with DAPI for viability. At E6.5, the distal half of the embryo was dissected and dissociated into a single cell suspension, and live cells were sorted. For E7.0 and older, suspensions consisted of the whole embryo and were also stained with Flk1-APC (AVAS12 at 1:400 dilution; BD Bioscience) and only Flk1+ cells were collected. For cell sorting of CD41+Flk1- and CD41+Flk1+ cells from NP stage and HF stages, suspensions were stained with Flk1-APC, PDGFRa-PE (APA5 at 1:200 dilution; Biolegend) and CD41-PEcy7 (MWReg30 at 1:400 dilution; Biolegend) for 20 minutes at 4°C as described31. Cells were sorted from 7 E6.5 embryos. Flk1+ cells were sorted from 3 S stage, 4 NP stage and 3 HF stage embryos (Extended Data Fig. 1a). CD41+Flk1- and CD41+Flk1+ cells were sorted from the same embryos, an additional 8 each at NP and HF stages (Extended Data Fig. 1b). Cell sorting was performed with a BD Influx cell sorter in single cell sort mode with index sorting to confirm the presence of a single event in each well. Additional cells were sorted into tissue culture plates to visually confirm the presence of single events. To obtain Tal1-/- cells, timed matings were set up between Tal1 mice32. Flk1+ cells were sorted as above from 4 embryos for each genotype: from 1 embryo for each genotype at NP and 4 somite (4S) stages, from 2 HF stage embryos for Tal1 (designated Tal1), 1 HF stage WT embryo and 1 WT embryo intermediate between NP and HF stages. Genotyping PCR using 1/20 suspension cells was performed as described previously32.

Single-cell RNA sequencing

Library preparation and mapping of reads

scRNA-seq analysis was performed using the Smart-seq2 protocol as previously described33. Single cells were sorted by FACS into individual wells of a 96-well plate containing lysis buffer (0.2 % (v/v) Triton X-100 and 2 U/μl RNase inhibitor (Clontech)) and stored at -80°C. Libraries were prepared using the Illumina Nextera XT DNA preparation kit and pooled libraries of 96 cells were sequenced on the Illumina Hi-Seq 2500. Reads were mapped simultaneously to the M. musculus genome (Ensembl version 38.77) and the ERCC sequences using GSNAP (version 2014-10-07) with default parameters. HTseq-count34 was used to count the number of reads mapped to each gene (default options).

Identification of poor quality cells

To assess data quality35, five metrics were used: (i) total number of mapped reads, (ii) fraction of total reads mapped to endogenous genes, (iii) fraction of reads mapped to endogenous genes that are allocated to mitochondrial genes, (iv) fraction of total reads mapped to ERCC spike-ins and (v) level of sequence duplication (as estimated by FastQC, version 0.11.4, http://www.bioinformatics.babraham.ac.uk/projects/fastqc). For all downstream analyses, we only retained samples that had (i) more than 200,000 reads mapped (either to ERCC spike-ins or endogenous mRNA), (ii) more than 20% of total reads mapped to mRNA, (iii) less than 20% of mapped reads allocated to mitochondrial genes, (iv) less than 20% of reads mapped to ERCC spike-ins and (v) less than 80% of duplicated sequences. Out of the 2,208 cells that were captured across the two experiments, 1,582 (i.e., ~72% of the total) passed our quality check. A t-SNE projection36 of the values of these five metrics (Extended Data Fig. 2b) shows that most of discarded cells tend to cluster together and fail at least two criteria. All metrics were standardised before applying t-SNE with the “RtSNE” function (default parameters) from the R package “RtSNE” (version 0.1) 37.

Normalization of read counts

The data was normalised for sequencing depth using size factors38 calculated on endogenous genes. By doing so, we also normalise for the amount of RNA obtained from each cell39, which is itself highly correlated with cell cycle stage40.

Highly variable genes and GO enrichment analysis

Highly variable genes were identified by using the method described in Brennecke et al.39. In brief, we fitted the squared coefficient of variation as function of the mean normalised counts39. In the fitting procedure, in order to minimise the skewing effect due to the lowly expressed genes39, only genes with a mean normalised count greater than 10 were used. Genes with an adjusted p-value (Benjamini-Hochberg method) less than 0.1 were considered significant (red circles in Extended Data Fig. 2c). This set of highly variable genes was used for the clustering analysis discussed below. The GO enrichment analysis was carried out using TopGO in its “elimination mode” with Fisher’s exact test; we considered as significant GO categories with an unadjusted p-value below 10-4 .

Differentially expressed genes

In order to find genes differentially expressed between two groups of cells we used edgeR41 (version 3.12). Before running edgeR, we excluded genes annotated as pseudogenes in Ensembl, sex-related genes (Xist and genes on the Y chromosome) and genes that were not detected or were expressed at very low levels (we considered only genes that had more than 10 reads per million in a number of cells greater than 10% of the cells in the smaller group being compared). The function “glmTreat” was then used to identify the genes having a fold change significantly greater than 1.5 at a FDR threshold equal to 0.05.

Clustering analysis

Clustering analysis was performed on the 1,205 WT cells from the first experiment that passed the QC. The Spearman correlation coefficient, ρ, was computed between the transcriptome of each pair of cells, which was then used to build a dissimilarity matrix defined as (1-ρ)/2. Hierarchical clustering was carried out (“hclust” R function with the “average” method) on the dissimilarity matrix and clusters were identified by means of the Dynamic Hybrid Cut algorithm42. The R function “cutreeDynamic” with the “hybrid” method and a minimum cluster size equal to 10 cells was used (“dynamicTreeCut” package, version 1.62). This function allows the user to specify the “deepSplit” parameter that controls the sensitivity of the method: higher values of this parameter correspond to higher sensitivity and can result in more clusters being identified, but also entail an increased risk of overfitting the data. The optimal trade-off between robustness of clustering and sensitivity was found by analyzing the results of the algorithm with all possible values of the deepSplit parameter (i.e. integer values from 0 to 4) on 100 subsamples of our data. In particular, in each subsample, we removed 10% of genes randomly selected before computing the dissimilarity matrix and applying the clustering algorithm. The statistics of the Pearson gamma and the average silhouette width (computed with the “cluster.stats” function included in the R package “fpc”, version 2.1-10)43,44 of the subsamples (see Extended Data Fig. 3a,b) suggest that with “deepSplit=2” a good compromise is reached between robustness and sensitivity for our data. We identified 10 different clusters as well as two outlier cells that, although similar in gene expression to the mesodermal progenitor cells (cluster 4), were not assigned to any cluster by the algorithm, probably due to their relatively poor quality. We then evaluated how specifically each gene is expressed in any given cluster. First, we found the differentially expressed genes (as described above) between all pairs of clusters. Marker genes for cluster i are expected to be significantly up-regulated in i across all pairwise comparisons involving cluster i. The average rank of a marker gene across the pairwise comparisons provides a measure of how specifically the marker is expressed in the cluster. Extended Data Fig. 3c-f show the expression values of marker genes for four different clusters. We provide the full list of markers in Supplementary Information Table 3. The clusters were visualised by using t-SNE (as implemented in the “RtSNE” R package) on the dissimilarity matrix.

Single-cell trajectories in pseudo-space: the anterior/posterior axis of the primitive streak

As discussed in the main text, cells allocated to cluster 4 (Figure 1b-d) are cells that have likely exited the primitive streak only recently. We sought to align the cells along a pseudospatial trajectory representing the anterior-posterior axis of the primitive streak, which would allow us to identify the likely original locations of each cell along such an axis. To do this we adopted an unsupervised approach: we did not use any prior information about marker genes, but selected the strongest signal present in this cluster of cells (controlling for potential batch effects) and later verified its biological meaning. We first used a diffusion map-based technique to reduce the dimensionality of the dataset. Diffusion maps have recently been successfully applied to identify developmental trajectories in single-cell qPCR and RNA-seq data14,15. We used the implementation of the “destiny” R package (“DiffusionMap” function) developed by Angerer et al.45. We restricted the analysis to genes that are highly variable among cells in the blue cluster and have an average expression above 10 normalised read counts. The centered cosine similarity was used (“cosine” option in the “DiffusionMap” function) and only the first two diffusion components (DC1 and DC2) were retained for downstream analysis. In addition to biologically meaningful signals, batch effects (due to cells being sorted and processed on different plates) can also be present and induce structure within the data. While in our dataset the batch effect does not strongly influence the definition of different populations of cells, it might become relevant when finer structures within a single cluster of cells are considered (see Extended Data Fig. 6a). In order to tease apart the signals due to biological and batch effects, we computed the fraction of variance attributable to the batch effect along each direction in the diffusion space using a linear regression model. The direction “orthogonal” to the batch effect, i.e., the direction associated with the smallest fraction of variance explained by the batch effect, was considered as mostly driven by a biologically relevant signal. Hence, all cells were projected on this direction to obtain a “pseudo-coordinate” representing the state of a cell relative to the biological process captured by the diffusion map. The direction was identified by the angle α that it formed with the DC1 axis (Extended Data Fig. 6c). Cells considered here are mostly from 2 batches including cells from the Primitive Streak stage (plate SLX-8408 and SLX-8409) and 2 batches including cells from the Neural Plate stage (plate SLX-8410 and SLX-8411; Extended Data Fig. 6b). For each of these two sets of batches, we computed the fraction of variance that can be explained by the batch covariate along any possible direction in the diffusion plot. The angles α1 and α2 corresponding to the directions orthogonal to the two batch effects are very close to each other (Extended Data Fig. 6c); we took the average ᾱ between these two angles to approximate the direction orthogonal to both batch effects. Cells’ coordinates in the diffusion space were projected along the direction identified by ᾱ, and this projection was interpreted as a “pseudospace” coordinate representing the position of cells along the primitive streak (see main text and Figure 3). We tested the robustness of such a pseudospace coordinate by repeating the same analysis with alternative dimensionality reduction techniques (t-SNE and Independent Component Analysis), which gave highly correlated coordinates (see Extended Data Fig. 6d). A Principal Component Analysis carried out with a set of previously known markers for the anterior and posterior regions of the primitive streak also yielded a first component highly correlated with the pseudospace coordinate (see Extended Data Fig. 6d left panel). Moreover, the pseudospace coordinate had a positive (negative) correlation with the posterior (anterior) markers used (see Extended Data Fig. 6d right panel). These results strongly support the robustness of the signal we identified as well as its biological interpretation. Once the pseudospace trajectory was defined, we selected genes that were differentially expressed along the trajectory. First, we removed all genes that were not detected in any cell. Then, for each gene, we fitted the log10-expression levels (adding a pseudocount of 1) by using two local polynomial models: one with degree 0 and another with degree 2 (“locfit” function in “locfit” R package, nearest neighbor component parameter equal to 1). The first, simpler model is better suited for genes that do not change their expression level along the trajectory. The second model has a greater number of parameters and is able to reproduce the more complex dynamics of genes that are differentially expressed. We evaluated these two models by using the Akaike information criterion (AIC), a score that measures how well the data are reproduced by the model and includes a penalization for more complex models46. Better models according to this criterion correspond to smaller AIC scores. To compute the AIC scores for the two models, we used the “aic” function available in the “locfit” R package, and then calculated the difference: ΔAIC = AIC(degree = 2) – AIC(degree = 0) Negative values indicate that the more complex model with degree 2 local polynomials perform better, and therefore correspond to genes that are more likely to be differentially expressed. Genes having a ΔAIC < -2 were considered to be significantly differentially expressed along the trajectory46. A hierarchical tree was built with the normalised expression patterns of the 462 differentially expressed genes (function “hclust” with average linkage method and dissimilarity based on Spearman correlation) and a dynamic hybrid cut algorithm (“cutreeDynamic” function, minimum cluster size equal to 5) split this set of genes in 3 clusters according to the type of dynamics they have (see Figure 3, Extended Data Fig. 6e and Supplementary Information Table 4).

Single-cell trajectories in pseudo-time: the blood developmental trajectory

As discussed in the main text, clusters 7 and 8 (yellow and brown clusters in Figure 1b and 1d) include blood progenitors at different stages of differentiation. By using a procedure analogous to the one described above, we aligned these cells along a trajectory representing embryonic blood development. Extended Data Fig. 8a shows the diffusion plot with cells from the yellow and the brown clusters. Most of these cells come from plates SLX-8344 and SLX-8345 that were collected from embryos at Neural Plate and late Head Fold stages (see Extended Data Fig. 8b). With a linear regression model, where we controlled for biological parameters like stage and sorting, we found the direction that correlates the least with the batch effect associated to these two plates and projected all cells onto it (Extended Data Fig. 8c). Note that the minimum correlation with the batch effect is achieved at a very small value of α (~10°, see Extended Data Fig. 8c), suggesting that the first diffusion component is mainly driven by a biologically meaningful signal and the batch effect plays a minor role here even at this more detailed scale of analysis. The new cell coordinate obtained from the projection was interpreted as a “pseudotime” coordinate, which represents the differentiation stage of each cell along their journey towards erythroid fate. As expected, cells in the yellow cluster have a smaller pseudotime coordinate compared to the brown cluster that is mainly composed of more differentiated primitive erythroid cells. An analysis with alternative dimensionality reduction techniques yielded highly correlated pseudotime coordinates, suggesting the robustness of the signal (Extended Data Fig. 8d). Furthermore, our biological interpretation of the pseudotime coordinate is supported by the expression pattern of genes that are known to be up-regulated or down-regulated along the blood developmental trajectory, as is clear via Principal Component Analysis (see Extended Data Fig. 8f). By using the filtering and clustering procedure described in the previous section, we were able to detect 897 genes that were differentially expressed along the trajectory, which were divided in 3 clusters, each displaying a different type of dynamics (see Extended Data Fig. 8e and Supplementary Information Table 5).

Random Forest to allocate cells to previously identified clusters

Cells captured in the Tal1 experiment (testing dataset) were allocated to the clusters we previously identified by using a Random Forest algorithm47 (R package “randomForest”, version 4.6-12)48 trained on the cells captured in the first experiment (training dataset). The rank-normalised expression levels of all highly variable genes in the training dataset were used as variables (the R function “rank” was used for normalization, ties were averaged). The random forest algorithm was first used on the training data to assess variable importance with 1,000 classification trees. The 25% most important variables were selected to grow another set of 1,000 trees that were then used for the classification of the testing dataset. With this filtered set of variables the out-of-bag error estimate was ~4.8%. The quality of allocation of each cell in the testing dataset was verified by computing the median of pairwise dissimilarities (defined as (1-ρ)/2, with ρ being the Spearman correlation) of that cell to all other cells in the training data allocated to the same cluster. Cells in the testing dataset having a median pairwise dissimilarity larger than the maximum of the medians of pairwise dissimilarities of cells in the training data were considered to be “unclassified” (~1.8% of all cells from the testing dataset). For the identification of differentially expressed genes between clusters in the testing data, only cells that were confidently allocated to the clusters (i.e., cells with a minimum difference of 10% probability between the best and the second best cluster allocation) were used.

Generation, maintenance and haematopoietic differentiation of Runx1-GFP/Gata1-mCherry ESCs

Runx1 ESCs were generated from morulae as described previously49,50. ESCs were grown on gelatinised plates (0.1 % gelatin in water) at 37°C and 5 % CO2 in ESC media (Knockout DMEM (Life Technologies) with 15 % FCS (batch-tested for ESC culture; Life Technologies), 2mM L-glutamine (PAA Laboratories), 0.5 % P/S, 0.1mM β-mercaptoethanol (Life Technologies) and 103 U/ml recombinant LIF (ORF Genetics)). Cells were passaged with TrypLE Express dissociation reagent (Life Technologies) every 1-3 days. ESCs were differentiated as EBs as previously described31,51. EBs were harvested into Falcon tubes after 5 days of culture and dissociated with TrypLE Express dissociation reagent and prepared for FACS.

ChIP-seq

ChIP was performed as described52 with modifications for low cell numbers53. Approximately 7x106 FACS-sorted day 5 EB cells (Runx1-ires-GFP+/Gata1-mCherry+; Extended Data Fig. 9a) per ChIP were cross-linked using formaldehyde to a final concentration of 1%. As samples were pooled from several sorts, isolated nuclei were frozen on dry ice-cold isopropanol and stored at -80°C. During the immunoprecipitation step, 4 μl recombinant histone 2B (New England Biolabs) and 1 μl of mouse RNA (Qiagen; diluted 1/5 in IP dilution buffer) were added as carriers, followed by 7 μg of primary antibody (rabbit anti-Gata1, Abcam ab11963). Sequencing libraries were prepared using the TruSeq Kit (Illumina) for high throughput sequencing on an Illumina HiSeq 2500, according to manufacturer’s instructions, with size selection for fragments of 150-400 bp.

ChIP-seq mapping and analysis

Alignment of the ChIP-seq reads to the mouse mm10 genome, quality control and peak calling were performed according to the data pipeline set out by Sanchez-Castillo et al54. Peak calling was performed using MACS255 with p=1e-6. Post-processing using in-house scripts converted the peak coordinates to 400 bp based on peak summits given in the MACS output. Coordinates of genomic regions that lie at the end of chromosomes and/or in repeat regions were discarded from the final high-confidence peak lists. PolyAPeak56 was run in R to remove abnormally shaped peaks. Peaks were assigned to genes using an in-house script according to whether they overlapped with a known TSS or fell within 50 kb each side of a gene.

In situ hybridization

Whole-mount in situ hybridization for Tal1 was performed as described previously57. In situ hybridization probe for Tal1 was synthesised using published sequence (Tal1 860-1428, accession number M59764) with the DIG RNA labeling kit (Roche).

Code availability

All data were analysed with standard programs and packages, as detailed above. Code is available on request.

Fluorescence activated cell sorting of single cells.

Quality control of single cell RNA-seq data.

Identifying cell clusters.

Expression of key marker genes in E7.0-7.75 embryos.

Expression of key genes used for sorting single cells.

Pseudospace analysis of cluster 4 correlates with anterior-posterior position along the primitive streak.

Expression of key genes along the anterior-posterior axis of the primitive streak in E7.0-7.75 embryos.

Pseudotime analysis of primitive erythroid development.

ChIP-seq for Gata1 in ESC-derived haematopoietic cells.

Collection of embryos from Tal1 LacZ/+ crosses.

97 in total

1. Acquisition of Hox codes during gastrulation and axial elongation in the mouse embryo.

Authors: Sylvie Forlani; Kirstie A Lawson; Jacqueline Deschamps
Journal: Development Date: 2003-08 Impact factor: 6.868

2. Arrested development of embryonic red cell precursors in mouse embryos lacking transcription factor GATA-1.

Authors: Y Fujiwara; C P Browne; K Cunniff; S C Goff; S H Orkin
Journal: Proc Natl Acad Sci U S A Date: 1996-10-29 Impact factor: 11.205

3. Single-lineage transcriptome analysis reveals key regulatory pathways in primitive erythroid progenitors in the mouse embryo.

Authors: Joan Isern; Zhiyong He; Stuart T Fraser; Sonja Nowotschin; Anna Ferrer-Vaquer; Rebecca Moore; Anna-Katerina Hadjantonakis; Vincent Schulz; David Tuck; Patrick G Gallagher; Margaret H Baron
Journal: Blood Date: 2011-01-24 Impact factor: 22.113

Review 4. Computational and analytical challenges in single-cell transcriptomics.

Authors: Oliver Stegle; Sarah A Teichmann; John C Marioni
Journal: Nat Rev Genet Date: 2015-01-28 Impact factor: 53.242

Review 5. Dose-dependent Nodal/Smad signals pattern the early mouse embryo.

Authors: Elizabeth J Robertson
Journal: Semin Cell Dev Biol Date: 2014-04-01 Impact factor: 7.727

6. A global role for EKLF in definitive and primitive erythropoiesis.

Authors: Denise Hodge; Elise Coghill; Janelle Keys; Tina Maguire; Belinda Hartmann; Alasdair McDowall; Mitchell Weiss; Sean Grimmond; Andrew Perkins
Journal: Blood Date: 2005-12-27 Impact factor: 22.113

7. Mouse Lefty2 and zebrafish antivin are feedback inhibitors of nodal signaling during vertebrate gastrulation.

Authors: C Meno; K Gritsman; S Ohishi; Y Ohfuji; E Heckscher; K Mochida; A Shimono; H Kondoh; W S Talbot; E J Robertson; A F Schier; H Hamada
Journal: Mol Cell Date: 1999-09 Impact factor: 17.970

8. The bicoid-related homeoprotein Ptx1 defines the most anterior domain of the embryo and differentiates posterior from anterior lateral mesoderm.

Authors: C Lanctôt; B Lamolet; J Drouin
Journal: Development Date: 1997-07 Impact factor: 6.868

9. Expression of CD41 marks the initiation of definitive hematopoiesis in the mouse embryo.

Authors: Hanna K A Mikkola; Yuko Fujiwara; Thorsten M Schlaeger; David Traver; Stuart H Orkin
Journal: Blood Date: 2002-09-19 Impact factor: 22.113

10. HTSeq--a Python framework to work with high-throughput sequencing data.

Authors: Simon Anders; Paul Theodor Pyl; Wolfgang Huber
Journal: Bioinformatics Date: 2014-09-25 Impact factor: 6.937

121 in total

Review 1. Microfluidics cell sample preparation for analysis: Advances in efficient cell enrichment and precise single cell capture.

Authors: Liang Huang; Shengtai Bian; Yinuo Cheng; Guanya Shi; Peng Liu; Xiongying Ye; Wenhui Wang
Journal: Biomicrofluidics Date: 2017-02-06 Impact factor: 2.800