| Literature DB >> 34349778 |
Michael Thompson1, Maika Matsumoto1, Tianqi Ma1, Anne Senabouth2, Nathan J Palpant1, Joseph E Powell2,3, Quan Nguyen1.
Abstract
Finding cell states and their transcriptional relatedness is a main outcome from analysing single-cell data. In developmental biology, determining whether cells are related in a differentiation lineage remains a major challenge. A seamless analysis pipeline from cell clustering to estimating the probability of transitions between cell clusters is lacking. Here, we present Single Cell Global fate Potential of Subpopulations (scGPS) to characterise transcriptional relationship between cell states. scGPS decomposes mixed cell populations in one or more samples into clusters (SCORE algorithm) and estimates pairwise transitioning potential (scGPS algorithm) of any pair of clusters. SCORE allows for the assessment and selection of stable clustering results, a major challenge in clustering analysis. scGPS implements a novel approach, with machine learning classification, to flexibly construct trajectory connections between clusters. scGPS also has a feature selection functionality by network and modelling approaches to find biological processes and driver genes that connect cell populations. We applied scGPS in diverse developmental contexts and show superior results compared to a range of clustering and trajectory analysis methods. scGPS is able to identify the dynamics of cellular plasticity in a user-friendly workflow, that is fast and memory efficient. scGPS is implemented in R with optimised functions using C++ and is publicly available in Bioconductor.Entities:
Keywords: cell fate; clustering; machine learning; single cell; trajectory analysis
Year: 2021 PMID: 34349778 PMCID: PMC8326972 DOI: 10.3389/fgene.2021.666771
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Figure 1scGPS workflow. Brown boxes represent inputs and green boxes show the main scGPS analysis. The dotted boxes represent optional inputs. The input for scGPS analysis can be either a single expression matrix or two expression matrices of two different cell populations, the following (shown as MixedPop 1 and MixedPop 2). scGPS provides the functionality to determine stable clusters within a cell population by SCORE algorithm. Alternatively, users can provide their predetermined subpopulation list (clusters). The scGPS class object is based on the widely adoptable SingleCellExperiment class (Lun and Risso, 2019). Based on the subpopulation input, scGPS performs gene feature selection by training an Elastic net regularisation. Informative genes are then used in a logistic regression classifier to predict cell transition probability between subpopulations. When the input is one mixed sample (MixedPop1), scGPS computes the transition scores between different subpopulations within the same sample. When the inputs are two mixed samples (MixedPop1 and MixedPop2), scGPS computes the transition scores from the subpopulations in population one to those in population two. scGPS also predicts marker genes and their contribution to the transition between two subpopulations (refer to Supplementary Figure 1).
Figure 2Selection of stable clustering results. (A) SCORE cluster dendrogram. Coloured branches and bars (underneath the dendrogram) represent clusters. A coloured row displays the result from one clustering resolution (40 resolutions are shown). (B) Bagging cluster estimations. Each dot is one bootstrap result. The X-axis shows the order of the bootstraps from 1 to 100. The y-axis shows the number of clusters. Blue dots are the cumulative running average between consecutive clusters (i.e., continuous bootstraps). (C) UMAP plot representing the final chosen clustering result. Each colour represents one cluster.
Figure 3scGPS trajectory analysis. scGPS trajectory analysis can perform transitioning prediction between two populations (this Figure 3) or within a heterogenous population (Figure 4). In (A) edges show pairwise connections between two clusters, where each node (a vertical bar) represents a cluster, and the edge width is proportional to the transitioning score from one cluster to another. MP represents a mixed population (i.e., the total dataset for one mixed sample, containing multiple clusters). Each coloured bar represents one subpopulation (a cluster). (A) Transitioning between three subpopulations in the mixed population 1 (MP1) to four subpopulations in the mixed population 2 (MP2). (B) The number of cells in each cluster shown in (A). (C) Bootstrapping results displaying the summary scores from one hundred bootstraps used for the same data shown in (A).
Figure 4scGPS trajectory comparisons with methods Slingshot and Monocole on the full-length total RNA sequencing dataset from Petropoulos et al. (2016), processed by Saelens et al. (2019). (A) PCA dimensionality reduction of the dataset. The data is labelled with timing information for the cells which were collected at 5 time points. (B) scGPS trajectory analysis. Arrows show direction and numbers show transitioning scores. For example, number 50 on the arrow from cluster 2–4 indicates a score of 50% total cells transitioning from 2 to 4. (C) Monocole trajectory graph. (D) Slingshot trajectory graph. The black, smoothed curve shows predicted linage, connecting clusters represented by coloured dots.
Benchmarking of clustering results and running time.
| Baron | 0.613 | 0.613 | 0.265 | 23.934 min | 137.735 min | 9 | 9 | 54 |
| Klein | 0.800 | 0.800 | 0.636 | 3.342 min | 15.891 min | 6 | 6 | 16 |
| Camp | 0.559 | 0.597 | 0.556 | 0.544 min | 2.693 min | 4 | 5 | 10 |
| Koh | 0.565 | 0.661 | 0.824 | 0.696 min | 3.239 min | 7 | 8 | 18 |
| Kumar | 0.574 | 1.000 | 0.994 | 0.281 min | 0.496 min | 2 | 3 | 4 |
| Yan | 0.588 | 0.588 | 0.650 | 0.108 min | 0.247 min | 3 | 3 | 6 |
Individual tests can be found through the links at: .
SCORE Algorithm
| 1 Create a dendrogram tree using |
| 2 Create a vector b |
| 3 Populate b |
| 4 Create a new matrix, |
| 5 Generate a new dendrogram tree and clustering of cells; |
| 6 Record result from optimal stability of subsampled tree; |
| 7 Vote on most commonly occurring result; |
| 8 Choose most stable result from the original dendrogram tree |
scGPS trajectory analysis