| Literature DB >> 22962491 |
Florian Buettner1, Fabian J Theis.
Abstract
MOTIVATION: Single-cell experiments of cells from the early mouse embryo yield gene expression data for different developmental stages from zygote to blastocyst. To better understand cell fate decisions during differentiation, it is desirable to analyse the high-dimensional gene expression data and assess differences in gene expression patterns between different developmental stages as well as within developmental stages. Conventional methods include univariate analyses of distributions of genes at different stages or multivariate linear methods such as principal component analysis (PCA). However, these approaches often fail to resolve important differences as each lineage has a unique gene expression pattern which changes gradually over time yielding different gene expressions both between different developmental stages as well as heterogeneous distributions at a specific stage. Furthermore, to date, no approach taking the temporal structure of the data into account has been presented.Entities:
Mesh:
Year: 2012 PMID: 22962491 PMCID: PMC3436812 DOI: 10.1093/bioinformatics/bts385
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.The totipotent blastomere differentiates first into inner and outer cells. Next, after approximately 3.5 days, the ICM differentiates into PE cells and EPI cells (A). The data-driven illustration is shown on the right hand side. For PCA (panel B), differentiation into ICM and TE can be seen, followed by differentiation from ICM into PE and EPI. ICM and PE/EPI as well as early cell stages could not be resolved. For our novel approach (bottom right), all developmental stages could be resolved and a new TE-like sub-population at the 16-cell stage was discovered. The dashed arrows reflect that the lower subpopulation at the 16-cell stage is significanlty more TE-like than the other
Fig. 2.Standard PCA (a) and ICA (b) for all cells from 1 to 64 cell stage
Fig. 3.(a) GPLVM for all cells from 1 to 64 cell stage. The uncertainty corresponding to the probabilistic mapping from latent space to data space is colour-coded (high SD dark, low SD light); (b) nearest neighbour errors for the original high-dimensional space and three embeddings in 2D of all cells from 1 to 64 cell stage
Fig. 4.GPLVM for all cells from 2 to 64 cell stage. (a) Standard GPLVM. The nearest-neighbour error was 11. (b) Structure-preserving GPLVM for all cells from 2 to 64 cell stage with locality parameter γ=104 for all cell stages. The nearest-neighbour error was 11
Fig. 5.Structure-preserving GPLVM for all cells from 2 to 64 cell stage with different values of γ. (a) γ = 100 for cell stages 2 to 8, γ = 15000 for the 16-cell stange and γ = 20000 for the 32- and 64-cell stages. Cells assigned to the TE-like subcluster are within the purple triangle. The nearest-neighbour error was 6. (b) γ = 100 for cell stages 2 to 8, γ = 20000 for the 16-cell stange and γ = 30000 for the 32- and 64-cell stages. The nearest-neighbour error was 5
Fig. 6.Difference in gene expression between the two subclusters at the 16-cell stage for different mappings. The error bars show the variation of gene expression within the smaller subcluster (1 SD in each direction). For convenience, genes with the strongest differences are labelled in the plots. The order of all genes from top to bottom is Actb, Ahcy, Aqp3, Atp12a, Bmp4, Cdx2, Creb312, Cebpa, Dab2, DppaI, Eomes, Esrrb, Fgf4, Fgfr2, Fn1, Gapdh, Gata3, Gata4, Gata6, Grhl1, Grhl2, Hand1, Hnf4a, Id2, Klf2, Klf4, Klf5, Krt8, Lcp1, Mbnl3, Msc, Msx2, Nanog, Pdgfa, Pdgfra, Pecam1, Pou5f1, Runx1, Sox2, Sall4, Sox17, Snail, Sox13, Tcfap2a, Tcfap2c, Tcf23, Utf1 and Tspan8
Fig. 7.Relevance map showing the greatest norm of the gradient across the entire map (left) and norm of the gradient for all genes at the centre of the ICM cluster (right). (a) Gene relevance map corresponding to the mapping in Figure 5b. The region of the map corresponding to early cell stages, including the 16-cell stage is shown in more detail (middle). Here, the gradient of Gata4 with respect to x is shown: the colour illustrates the norm of the gradient, the arrows illustrate the direction. It can be seen how between the 8-cell stage and the TE-like subcluster at the 16-cell stage considerably greater changes in Gata4 occur than between the 8-cell stage and the non-TE-like subcluster. For convenince, also the corresponding part of the embedding in Figure 5b is shown (middle, top). (b) Gradient at the centre of the ICM cluster; the error bars reflect the uncertainty of the mapping (1 SD in each direction)