Literature DB >> 30587579

Semisoft clustering of single-cell data.

Lingxue Zhu1, Jing Lei1, Lambertus Klei2, Bernie Devlin2, Kathryn Roeder3,4.   

Abstract

Motivated by the dynamics of development, in which cells of recognizable types, or pure cell types, transition into other types over time, we propose a method of semisoft clustering that can classify both pure and intermediate cell types from data on gene expression from individual cells. Called semisoft clustering with pure cells (SOUP), this algorithm reveals the clustering structure for both pure cells and transitional cells with soft memberships. SOUP involves a two-step process: Identify the set of pure cells and then estimate a membership matrix. To find pure cells, SOUP uses the special block structure in the expression similarity matrix. Once pure cells are identified, they provide the key information from which the membership matrix can be computed. By modeling cells as a continuous mixture of K discrete types we obtain more parsimonious results than obtained with standard clustering algorithms. Moreover, using soft membership estimates of cell type cluster centers leads to better estimates of developmental trajectories. The strong performance of SOUP is documented via simulation studies, which show its robustness to violations of modeling assumptions. The advantages of SOUP are illustrated by analyses of two independent datasets of gene expression from a large number of cells from fetal brain.
Copyright © 2019 the Author(s). Published by PNAS.

Entities:  

Keywords:  developmental trajectories; neuronal lineages; single-cell RNA-seq; soft clustering

Mesh:

Year:  2018        PMID: 30587579      PMCID: PMC6329952          DOI: 10.1073/pnas.1817715116

Source DB:  PubMed          Journal:  Proc Natl Acad Sci U S A        ISSN: 0027-8424            Impact factor:   11.205


Development often involves pluripotent cells transitioning into other cell types, sometimes in a series of stages. For example, early in development of the cerebral cortex (1), one progression begins with neuroepithelial cells differentiating to apical progenitors, which can develop into basal progenitors, which will transition to neurons. Moreover, there are diverse classes of neurons, some arising from distinct types of progenitor cells (2, 3). By the human midfetal period there are myriad cell types and the foundations of typical and atypical neurodevelopment are already established (4). While the challenges for neurobiology in this setting are obvious, some of them could be alleviated by statistical methods that permit cells to be classified into pure or transitional types. We develop such a method here. Similar scenarios arise with the development of bone-marrow–derived immune cells, cancer cells, and disease cells (5); hence we envision broad applicability of the proposed modeling tools. Different types of cells have different transcriptomes or gene expression profiles (4). Thus, they can be identified by these profiles (6), especially by expression of certain genes that tend to have cell-specific expression (marker genes). Characterization of these profiles has recently been facilitated by single-cell RNA sequencing (scRNA-seq) techniques (7, 8), which seek to quantify expression for all genes in the genome. For single cells, the number of possible sequence reads is limited and therefore the data can be noisy. Nonetheless, cells of the same and different cell types can be successfully clustered using these data (6, 9–12). What is missing from the clustering toolbox is a method that recognizes development, with both pure type and transitional cells. In this paper, we develop an efficient algorithm for semisoft clustering with pure cells (SOUP). SOUP intelligently recovers the set of pure cells by exploiting the block structures in a cell–cell similarity matrix and also estimates the soft memberships for transitional cells. We also incorporate a gene selection procedure to identify the informative genes for clustering. This selection procedure is shown to retain fine-scaled clustering structures in the data and substantially enhances clustering accuracy. Incorporating soft-clustering results into methods that estimate developmental trajectories yields less biased estimates of developmental courses. We first document the performance of SOUP via extensive simulations. These show that SOUP performs well in a wide range of contexts; it is superior to natural competitors for soft clustering; and it compares quite well, if not better, than other clustering methods in settings ideal for hard clustering. Next, we apply it to two single-cell datasets from fetal development of the prefrontal cortex of the human brain. In both settings SOUP produces results congruent with known features of fetal development.

Results

Model Overview.

Suppose we observe the expression levels of cells measured on genes and let be the cell-by-gene expression matrix. Consider the problem of semisoft clustering, where we expect the existence of both (i) pure cells, each belonging to a single cluster and requiring a hard cluster assignment, and (ii) mixed cells (transitional cells) that are transitioning between two or more cell types and hence should obtain soft assignments. With distinct cell types, to represent the soft membership, let be a nonnegative membership matrix. Each row of the membership matrix, , contains nonnegative numbers that sum to one, representing the proportions of cell in clusters. In particular, a pure cell in type has and zeros elsewhere. Let denote the cluster centers, which represent the expected gene expression for each pure cell type. When a cell is developing or transitioning from one category to another, it may exhibit properties of both subcategories, which is naturally viewed as a combination of the two cluster centers. Weights in the membership matrix reflect the stage (early or late) of the transition. Here we formulate a simple probability model that is convenient for analysis and highly robust to expected violations of the assumptions. Letwhere is a zero-mean noise matrix with . It follows directly that the cell–cell similarity matrix takes a convenient form,where represents the association among different cell types. In practice, many genes will not follow the developmental trajectory described by Eq. ; however, it is expected that the expression of many marker genes and other highly informative genes will transition smoothly between cluster centers during development (for example, the genes featured in ref. 13). In particular, one can empirically check the plausibility of Eq. for marker genes; see below for details. Moreover, because SOUP’s inferences are based on the empirical cell–cell similarity matrix , it is sufficient that approximately follows the form specified in Eq. , a weaker assumption than Eq. . Indeed, similar assumptions are implicit in many algorithms that estimate developmental trajectories (14–17). Gene expression is also likely to have nonconstant variance, depending on gene and cell type. However, our pure cell search algorithm does not depend on the diagonal entries of , and our estimate of is based on spectral decomposition of , so the method remains robust to moderate fluctuation of diagonal entries of unless the magnitude of noise is unrealistically large. As a graphical illustration of the SOUP model, we simulate an example with a developmental trajectory of type1 type2 type3. A fraction of the genes were chosen to have differential expression across cell types, and of these a fraction change nonlinearly between cell types (Fig. 1). Regardless of the violations of Eq. , the cells depict a smooth transition between cell types (Fig. 1).
Fig. 1.

Illustration of the SOUP framework for three cell types with simulated developmental trajectory of type1 type2 type3. (A) Example of four differentially expressed genes along the developmental trajectory, with potentially nonlinear differentiation patterns. (B) Simulation of 300 pure cells and 200 mixed cells, visualized in the leading principal component space.

Illustration of the SOUP framework for three cell types with simulated developmental trajectory of type1 type2 type3. (A) Example of four differentially expressed genes along the developmental trajectory, with potentially nonlinear differentiation patterns. (B) Simulation of 300 pure cells and 200 mixed cells, visualized in the leading principal component space. Similar factorization problems to that of Eq. have appeared in previous literature under different settings. The most popular are the mixed-membership stochastic block model (MMSB) (18) and topic modeling (for example, refs. 19–21). However, it is nontrivial to extend these algorithms to our scenario. A similar formulation also appeared in nonnegative matrix factorization (NMF), where nonnegative rank- matrices and are estimated such that , for example, by minimizing the Euclidean distance (22). However, traditional NMF differs from our setting in two important ways: (i) The NMF problem is nonidentifiable without introducing nontrivial assumptions, and (ii) SOUP does not rely on the nonnegativeness of , which makes it more broadly applicable to scRNA-seq data after certain preprocessing steps, such as batch-effect corrections, which can result in negative values. Recent work in ref. 23 considered the problem of overlapping variable clustering under latent factor models. Despite the different setup, the model comes down to a problem similar to Eq. , and the authors proposed the latent-model approach to overlapping clustering (LOVE) algorithm to recover the variable allocation matrix, which can be treated as a generalized membership matrix. LOVE consists of two steps: (i) finding pure variables and (ii) estimating the allocations of the remaining overlapping variables. Both steps rely on a critical tuning parameter that corresponds to the noise level, which can be estimated using a cross-validation procedure. When we applied the LOVE algorithm to our single-cell datasets, however, we found it sensitive to noise, leading to poor performance (). Nonetheless, inspired by the LOVE algorithm, SOUP works in a similar two-step manner, while adopting different approaches in both parts. Most importantly, SOUP parameters are intuitive to set, and it is illustrated to have robust performance in both simulations and real data.

SOUP algorithm.

The SOUP algorithm involves finding the set of pure cells and then estimating . Pure cells play a critical role in this problem. Intuitively, they provide valuable information from which to recover the cluster centers, which further guides the estimation of for the mixed cells. In fact, it has been shown in ref. 23 that the existence of pure cells is essential for model (2) in ref. 23 to be identifiable, and we restate the theorem below.

Theorem 1 (Identifiability).

Model (2) is identifiable up to the permutation of labels, if (a) is a membership matrix; (b) there exist at least two pure cells per cluster; and (c) is full rank. These assumptions are minimal, because in most single-cell datasets, it is natural to expect the existence of at least a few pure cells in each type, and usually has larger entries along the diagonal. The details of SOUP are presented in and . As an overview, to recover the pure cells the key is to notice the special block structure formed by the pure cells in the similarity matrix . SOUP exploits this structure to calculate a purity score for each cell. This calculation requires two tuning parameters: , the fraction of most similar neighbors to be examined for each cell, and , the fraction of cells declared as pure after ranking the purity scores. After selection, the pure cells are partitioned into clusters, by standard clustering algorithms such as K-means. The choice of is guided by empirical investigations, including a sample splitting procedure (). To recover , consider the top eigenvectors of the similarity matrix , denoted as . There exists a matrix , such that . If we have identified the set of pure cells and their partitions , we essentially know their memberships, . Then it is straightforward to recover the desired from the submatrix , which further recovers the full membership matrix (Theorem 2). In practice, we plug in the sample similarity matrix to obtain an estimate , and we can further estimate by minimizing .

Theorem 2 (SOUP clustering).

In model (2), let be the top eigenvectors of and be the set of pure cells. Under the same assumptions as those of Theorem 1, the optimization problemhas a unique solution such that . The majority membership probability is , and the majority type is the class that achieves the maximum.

Developmental trajectories.

SOUP provides two outcomes not available from hard-clustering procedures such as in refs. 24–26: soft membership probabilities, , and soft cluster centers, . The next step is to estimate one or more developmental trajectories from the cells. Various algorithms have been developed that can identify multibranching developmental trajectories in single-cell data (14–17, 27), and one successful direction is to estimate the lineages from cell clusters, usually by fitting a minimum spanning tree (MST) to the cluster centers in a low-dimensional space (15–17) and then fitting a smooth branching curve to the inferred lineages (17). It is straightforward to extend this idea to SOUP, where we identify the MST using SOUP-estimated soft cluster centers, . Following the common practice, can be projected to a low-dimensional space for MST estimation. Notably, soft clusters provide an alternative input for Slingshot (17), which yields more refined insights into development by providing less-biased estimates of cluster centers in developing cells.

Performance Evaluation.

Simulations.

There are no direct competitors of SOUP for semisoft clustering in the single-cell literature, and here we use the following three candidates for comparison: NMF, where we use the standard algorithm from ref. 22 to solve for nonnegative ; Fuzzy C-Means (FC) (28), a generic soft-clustering algorithm; and DIMMSC (Dirichlet mixture model for clustering droplet-based single cell) (29), a probabilistic clustering algorithm for single-cell data based on Dirichlet mixture models. All algorithms are applied to the log-transformed data, except for DIMMSC, which is developed under a multinomial model for count data. NMF can be applied to the raw count data as well, which usually has slightly worse performance. Although SOUP is derived from a linear model, it is robust and applicable to general scRNA-seq data. To illustrate this, we use the splat algorithm in the Splatter R package (30) to conduct simulations. Splatter is a single-cell simulation framework that generates synthetic scRNA-seq data with hyperparameters estimated from a real dataset. The algorithm incorporates expected violations of the model assumptions (). We simulate 500 genes and 300 pure cells from four clusters. Mixed cells are simulated along a developmental path and the number varies from 100 to 500. For comparable evaluation across different scenarios with different cell numbers, we present the average loss per cell, i.e., , where is the usual vector norm after vectorization. SOUP achieves the best performance under all scenarios (Fig. 2). In particular, with 100, 300, and 500 mixed cells, the true proportions of pure cells in the data are 75%, 50%, and 37.5%, respectively. Note that we always set for SOUP, which represents a prior guess of pure cells, and we see that SOUP remains stable even when the given clearly overestimates or underestimates the pure proportion.
Fig. 2.

Boxplot of the average losses of estimating in 10 repetitions. Using the splat algorithm in the Splatter package, expression levels of 500 genes are simulated for 300 pure cells from four clusters, as well as {100, 300, 500} mixed cells along the trajectory of type1 type2 {type3 or type4}. (A) Without dropout. (B) With dropout.

Boxplot of the average losses of estimating in 10 repetitions. Using the splat algorithm in the Splatter package, expression levels of 500 genes are simulated for 300 pure cells from four clusters, as well as {100, 300, 500} mixed cells along the trajectory of type1 type2 {type3 or type4}. (A) Without dropout. (B) With dropout. One of the biggest challenges in single-cell data is the existence of dropouts (31), where the mRNA for a gene fails to be amplified before sequencing, producing a “false” zero in the observed data. We see that SOUP remains robust and outperforms all other algorithms (Fig. 2).

SOUP as hard clustering.

Although SOUP aims at recovering the full membership matrix , it can also be used as a hard-clustering method by labeling each cell as the majority type. We benchmark SOUP as a hard-clustering method on seven labeled public single-cell datasets (refs. 6 and 10; details in ). We compare SOUP to three popular single-cell clustering algorithms: (i) SC3, or single-cell consensus clustering (24); (ii) CIDR, or clustering through imputation and dimensionality reduction (25); and (iii) Seurat, named for Georges Seurat (26). Because we aim at hard clustering, here we set for SOUP. We give the true as input to SC3, CIDR, and SOUP. For Seurat, we follow the choices in ref. 32 and set the resolution parameter to be 0.9 and use the estimated number of principal components (nPC) from CIDR. Even for hard clustering, SOUP is among the highest [Fig. 3, showing adjusted Rand index (ARI)]. Finally, when using the default choice of , SOUP also achieves sensible performance, sometimes with even higher ARI ().
Fig. 3.

ARI on seven labeled public datasets (6, 10), using (i) SC3, (ii) CIDR, (iii) Seurat, and (iv) SOUP.

ARI on seven labeled public datasets (6, 10), using (i) SC3, (ii) CIDR, (iii) Seurat, and (iv) SOUP.

Case Studies.

Fetal brain cells I.

We apply SOUP to a fetal brain scRNA-seq dataset, with 220 developing fetal brain cells between 12 and 13 gestational weeks (GW) (9). Guided with marker genes, these single cells are labeled with seven types in the original paper: two subtypes of apical progenitors (AP1, AP2), two subtypes of basal progenitors (BP1, BP2), and three subtypes of neurons (N1, N2, N3). We refer to these as Camp labels after the lead author of ref. 9. At this age many cells are still transitioning between different types, providing valuable information regarding brain development. Therefore, instead of the traditional hard-clustering methods, SOUP can be used to recover the fine-scaled soft-clustering structure. We run SOUP with on the log-transformed transcript counts and examine the clusters of cells, initially treating this as a hard-clustering problem and focusing on the dominating type for each cell. For and 7, some clusters have no cells assigned to them, which is indicative of a misspecified . For , the algorithm identifies cell types that correspond to A1, A2, B1, N2, and N3 in Camp’s nomenclature (Fig. 4 and ). For these data, when cells are in various developmental stages, hard clustering appears to overfit the data.
Fig. 4.

Expression levels of five anchor genes, visualized in log scale, where the 220 fetal brain cells are ordered by a SOUP unilineal developmental trajectory.

Expression levels of five anchor genes, visualized in log scale, where the 220 fetal brain cells are ordered by a SOUP unilineal developmental trajectory. Next, we examine the soft assignments. For each cluster , we label it by an anchor gene, which is the marker gene defined in ref. 9 that has the largest anchor score, , where represents the center values of gene on the (K-1) clusters other than . The expression levels of the five anchor genes along the SOUP trajectory vary smoothly over developmental time (Fig. 4), consistent with Eq. . In the top three PCs space, the cells show a smooth developmental trajectory between clusters (Fig. 5), which is also consistent with Eqs. and .
Fig. 5.

Two hundred twenty fetal brain cells, cluster centers, lineages, and branching curves in the top three PCs space. Cells are colored according to their SOUP major types, but annotated using Camp labels based on the largest overlap (Fig. 4). (A and B) MST of softSOUP and hardSOUP cluster centers. (C and D) Smooth branching curves fitted by Slingshot based on MST in A and B, respectively.

Two hundred twenty fetal brain cells, cluster centers, lineages, and branching curves in the top three PCs space. Cells are colored according to their SOUP major types, but annotated using Camp labels based on the largest overlap (Fig. 4). (A and B) MST of softSOUP and hardSOUP cluster centers. (C and D) Smooth branching curves fitted by Slingshot based on MST in A and B, respectively. To model the developmental trajectories we plot the cluster centers determined directly by SOUP (softSOUP) and by hard clustering (hardSOUP). Fitting a MST to the cluster centers, softSOUP identifies two lineages, AP-BP-N and AP-N (Fig. 5), both of which were previously described in ref. 9, while hardSOUP identifies less intuitive BP-AP-N and AP-N lineages (Fig. 5). Using Slingshot to fit smooth branching curves to these lineages via simultaneous principal curves, hardSOUP recovers AP-N and BP-N transitions, and the artificial BP1AP2 transition in the initial MST fit is dropped (Fig. 5). However, the AP–BP transition is still missing. softSOUP MST successfully reveals AP-N and AP-BP-N transitions (Fig. 5 ), thus capturing the true transition of cell types leading to neurons by accounting for the soft membership structures.

Fetal brain cells II.

We next applied SOUP to a richer dataset with 2,309 single cells from human embryonic prefrontal cortex (PFC) from 8 GW to 26 GW (33). Using the Seurat package (26) the authors identified six major clusters: neural progenitor cells (NPC), excitatory neurons (EN), interneurons (IN), astrocytes (AST), oligodendrocyte progenitor cells (OPC) and microglia (MIC), which are referred to as Zhong labels after the lead author of ref. 33. Our objective is to evaluate the developmental trajectories of the major cell types, after excluding IN and MIC, which are known to originate elsewhere and migrate to the PFC (33). After several iterations of hard clustering by SOUP to remove IN and MIC cells () 1,503 cells remain, and they cluster into types. These types correspond fairly well with the Zhong labels (Fig. 6); however, many cells have low majority membership probabilities () and do not strongly favor a particular cluster (). To illustrate this feature we display cells assigned to clusters 3 (NPC) and 7 (EN), color coded by the majority membership probability (Fig. 6). The two clusters divide the PC space evenly, with the pure cells identifying the cluster centers, while many nonpure cells can be best described as transitioning between clusters. SOUP captures the transitional nature by soft clustering.
Fig. 6.

(A) Contingency table of Zhong labels and major SOUP labels excluding IN and MIC. (B) Distribution of cluster 7 (EN) and cluster 3 (NPC) cells and their majority membership probabilities.

(A) Contingency table of Zhong labels and major SOUP labels excluding IN and MIC. (B) Distribution of cluster 7 (EN) and cluster 3 (NPC) cells and their majority membership probabilities. The SOUP trajectories reveal two developmental paths (Fig. 7): a neuronal lineage showing NPCs evolving to ENs (clusters: ) and a glial lineage showing NPCs evolving to OPCs and then to ASTs. Projecting the cells onto the lineages can provide pseudotime estimates of development. The lineages correspond roughly with sampled GWs (). Our results are similar to those in ref. 33; however, we found that NPCs evolve to OPCs and then to ASTs (clusters: ). The latter transitional step, which differs from the published analysis, is consistent with the literature (34). Finally, cluster 5, which consists of a mixture of cells Zhong labeled as EN and NPC, is placed at the end of the neuronal lineage, suggesting that some of the NPC labels are incorrect and that this cluster constitutes a distinct class of ENs.
Fig. 7.

Developmental trajectories of 1,503 Zhong cells delineate glial and neuronal pathways. Cluster labels are defined in Fig. 6.

Developmental trajectories of 1,503 Zhong cells delineate glial and neuronal pathways. Cluster labels are defined in Fig. 6. Additional strengths of SOUP are highlighted by analyses described in , which investigate gene expression as a function of cell membership to cluster and the proximity of cells to the neuronal trajectory (Fig. 7 and ). In particular, we evaluate the final clusters of the neuronal lineage, clusters 5 and 6. In terms of gene expression, cells in cluster 6 shows all of the hallmarks of neuronal development, including low expression of neuronal markers in immature neurons and much higher expression in maturing neurons. There is also some evidence of heterogeneity of expression of genes marking neurons in some cells, consistent with differentiation into different neuronal subtypes. For cells from cluster 5, the evidence is far less clear: The majority of cells manifest neuronal markers at high levels, consistent with maturing neurons; yet, there is also expression of a substantial set of NPC markers in these neurons, a puzzling feature that could be either a technical artifact or an unanticipated developmental feature of deep-layer projection neurons.

Discussion

We develop SOUP, a semisoft clustering algorithm for single-cell data. SOUP fills the gap of modeling uncertain cell labels, including cells that are transitioning between cell types, which is ubiquitous in single-cell datasets. SOUP outperforms generic soft-clustering algorithms and, if treated as hard clustering, it also achieves comparable performance to that of state-of-the-art single-cell clustering methods. By using soft-clustering input, it can provide an estimate of developmental trajectories that is less biased and these results reflect valuable information regarding developmental patterns. We present the results from two case studies based on expression of human fetal brain cells and find SOUP reveals patterns of development not apparent in prior published analyses. As is typical for clustering algorithms, selecting the optimal number of clusters, , is challenging. We recommend balancing input from several empirical approaches and iterating over a range of to determine a good choice. Notably, applying SOUP to different numbers of clusters reveals hierarchical structure among the cell types. To determine fine-scale structure within major cell types, SOUP can be applied iteratively to subsets of cells. Using SOUP to obtain soft membership probabilities and then estimate developmental trajectories provides two complementary views of the data. Some cells can be reliably assigned to a cluster and these cells constitute pure types, which can be highly informative. Other cells are transitioning and estimated membership will fall within two, or even more, cell types. Examining the membership probabilities, and the placement on a developmental trajectory, provides critical information about the developmental processes and offers a parsimonious and scientifically meaningful alternative to estimating a large number of discrete cell types. Notably, although SOUP is derived under a generic additive noise model and does not explicitly model the technical noise such as dropouts, we find it to be robust when applied to realistic simulations and to a variety of single-cell datasets. Moreover, it is computationally efficient. SOUP takes less than 15 min for 3,600 cells and 20,000 genes, benchmarked on a Linux computer equipped with an AMD Opteron Processor 6320 at 2.8 GHz. Therefore, SOUP is a versatile tool for single-cell analyses.

Methods

SOUP.

Our SOUP algorithm contains two steps: (i) Find the set of pure cells and (ii) estimate . Pure cells play a critical role in this problem. Intuitively, they provide valuable information from which to recover the cluster centers, which further guides the estimation of for the mixed cells. Once the pure cells are identified, then the algorithm proceeds as described in .

Find Pure Cells.

Denote the set of pure cells in cluster asand the set of all pure cells as . To recover , the key is to notice the special block structure formed by the pure cells in the similarity . In particular, under Eq. , the pure cells form blocks in , where the entries in these blocks are also the maxima in their rows and columns, ignoring the diagonal. Specifically, define and we call the extreme neighbors of cell . It can be shown that if cell is pure, then for all . On the contrary, for a mixed cell , there exist some cells where . Inspired by these observations, we define a purity score of each cell, and then naturally . Furthermore, the pure cells have the highest purity scores; that is, (). In practice, we plug in the sample similarity matrix and estimate and byand we estimate with the top percent of cells: . Finally, these pure cells are partitioned into clusters, , by standard clustering algorithms such as K-means. The complete algorithm is summarized in .

Tuning Parameters.

The two tuning parameters of SOUP are the quantiles, and , both intuitive to set. The quantile should be an estimate of the proportion of pure cells in the data, of which we usually have prior knowledge. In practice, we find that SOUP remains stable even when is far from the true pure proportion, and it is helpful to use a generous choice. Throughout this paper, we always set and obtain sensible results. As for , it corresponds to the smallest proportion of per-type pure cells, and it suffices if , so that for pure cells. This choice does not need to be exact, as long as is a reasonable lower bound. In practice, we find it often beneficial to use a smaller that corresponds to less than 100 pure cells per type. By default, we use for datasets with less than cells, for 1,000–2,000 cells, and for even larger datasets.

Gene Selection.

It is usually expected that not all genes are informative for clustering. For example, housekeeping genes are unlikely to differ across cell types and hence provide limited information for clustering other than introducing extra noise. Therefore, it is desirable to select a set of informative genes before applying SOUP clustering. Here, we combine two approaches for gene selection: (i) the DESCEND algorithm proposed in ref. 35 based on the Gini index and (ii) the Sparse PCA (SPCA) algorithm (36) (). The R package of SOUP is available at https://github.com/lingxuez/SOUPR.
  29 in total

1.  Brain structure. Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq.

Authors:  Amit Zeisel; Ana B Muñoz-Manchado; Simone Codeluppi; Peter Lönnerberg; Gioele La Manno; Anna Juréus; Sueli Marques; Hermany Munguba; Liqun He; Christer Betsholtz; Charlotte Rolny; Gonçalo Castelo-Branco; Jens Hjerling-Leffler; Sten Linnarsson
Journal:  Science       Date:  2015-02-19       Impact factor: 47.728

2.  DIMM-SC: a Dirichlet mixture model for clustering droplet-based single cell transcriptomic data.

Authors:  Zhe Sun; Ting Wang; Ke Deng; Xiao-Feng Wang; Robert Lafyatis; Ying Ding; Ming Hu; Wei Chen
Journal:  Bioinformatics       Date:  2018-01-01       Impact factor: 6.937

3.  A single-cell RNA-seq survey of the developmental landscape of the human prefrontal cortex.

Authors:  Suijuan Zhong; Shu Zhang; Xiaoying Fan; Qian Wu; Liying Yan; Ji Dong; Haofeng Zhang; Long Li; Le Sun; Na Pan; Xiaohui Xu; Fuchou Tang; Jun Zhang; Jie Qiao; Xiaoqun Wang
Journal:  Nature       Date:  2018-03-14       Impact factor: 49.962

4.  A Single-Cell Transcriptomic Map of the Human and Mouse Pancreas Reveals Inter- and Intra-cell Population Structure.

Authors:  Maayan Baron; Adrian Veres; Samuel L Wolock; Aubrey L Faust; Renaud Gaujoux; Amedeo Vetere; Jennifer Hyoje Ryu; Bridget K Wagner; Shai S Shen-Orr; Allon M Klein; Douglas A Melton; Itai Yanai
Journal:  Cell Syst       Date:  2016-09-22       Impact factor: 10.304

5.  The origins of cortical interneurons: mouse versus monkey and human.

Authors:  Edward G Jones
Journal:  Cereb Cortex       Date:  2009-05-08       Impact factor: 5.357

Review 6.  The Cellular and Molecular Landscapes of the Developing Human Central Nervous System.

Authors:  John C Silbereis; Sirisha Pochareddy; Ying Zhu; Mingfeng Li; Nenad Sestan
Journal:  Neuron       Date:  2016-01-20       Impact factor: 17.173

7.  The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells.

Authors:  Cole Trapnell; Davide Cacchiarelli; Jonna Grimsby; Prapti Pokharel; Shuqiang Li; Michael Morse; Niall J Lennon; Kenneth J Livak; Tarjei S Mikkelsen; John L Rinn
Journal:  Nat Biotechnol       Date:  2014-03-23       Impact factor: 54.908

8.  CIDR: Ultrafast and accurate clustering through imputation for single-cell RNA-seq data.

Authors:  Peijie Lin; Michael Troup; Joshua W K Ho
Journal:  Genome Biol       Date:  2017-03-28       Impact factor: 13.583

9.  Human cerebral organoids recapitulate gene expression programs of fetal neocortex development.

Authors:  J Gray Camp; Farhath Badsha; Marta Florio; Sabina Kanton; Tobias Gerber; Michaela Wilsch-Bräuninger; Eric Lewitus; Alex Sykes; Wulf Hevers; Madeline Lancaster; Juergen A Knoblich; Robert Lachmann; Svante Pääbo; Wieland B Huttner; Barbara Treutlein
Journal:  Proc Natl Acad Sci U S A       Date:  2015-12-07       Impact factor: 11.205

10.  Splatter: simulation of single-cell RNA sequencing data.

Authors:  Luke Zappia; Belinda Phipson; Alicia Oshlack
Journal:  Genome Biol       Date:  2017-09-12       Impact factor: 13.583

View more
  13 in total

1.  Dissecting transition cells from single-cell transcriptome data through multiscale stochastic dynamics.

Authors:  Peijie Zhou; Shuxiong Wang; Tiejun Li; Qing Nie
Journal:  Nat Commun       Date:  2021-09-23       Impact factor: 17.694

2.  Inference and multiscale model of epithelial-to-mesenchymal transition via single-cell transcriptomic data.

Authors:  Yutong Sha; Shuxiong Wang; Peijie Zhou; Qing Nie
Journal:  Nucleic Acids Res       Date:  2020-09-25       Impact factor: 16.971

3.  Flexible experimental designs for valid single-cell RNA-sequencing experiments allowing batch effects correction.

Authors:  Fangda Song; Ga Ming Angus Chan; Yingying Wei
Journal:  Nat Commun       Date:  2020-07-01       Impact factor: 14.919

4.  Multi-Objective Optimized Fuzzy Clustering for Detecting Cell Clusters from Single-Cell Expression Profiles.

Authors:  Saurav Mallik; Zhongming Zhao
Journal:  Genes (Basel)       Date:  2019-08-13       Impact factor: 4.096

5.  Single-Cell Transcriptome Data Clustering via Multinomial Modeling and Adaptive Fuzzy K-Means Algorithm.

Authors:  Liang Chen; Weinan Wang; Yuyao Zhai; Minghua Deng
Journal:  Front Genet       Date:  2020-04-17       Impact factor: 4.599

6.  Combined single-cell and spatial transcriptomics reveal the molecular, cellular and spatial bone marrow niche organization.

Authors:  Chiara Baccin; Jude Al-Sabah; Lars Velten; Patrick M Helbling; Florian Grünschläger; Pablo Hernández-Malmierca; César Nombela-Arrieta; Lars M Steinmetz; Andreas Trumpp; Simon Haas
Journal:  Nat Cell Biol       Date:  2019-12-23       Impact factor: 28.824

7.  Dissecting Cellular Heterogeneity Based on Network Denoising of scRNA-seq Using Local Scaling Self-Diffusion.

Authors:  Xin Duan; Wei Wang; Minghui Tang; Feng Gao; Xudong Lin
Journal:  Front Genet       Date:  2022-01-10       Impact factor: 4.599

8.  Cell type hierarchy reconstruction via reconciliation of multi-resolution cluster tree.

Authors:  Minshi Peng; Brie Wamsley; Andrew G Elkins; Daniel H Geschwind; Yuting Wei; Kathryn Roeder
Journal:  Nucleic Acids Res       Date:  2021-09-20       Impact factor: 16.971

9.  Contrastive self-supervised clustering of scRNA-seq data.

Authors:  Madalina Ciortan; Matthieu Defrance
Journal:  BMC Bioinformatics       Date:  2021-05-27       Impact factor: 3.169

10.  Accuracy, robustness and scalability of dimensionality reduction methods for single-cell RNA-seq analysis.

Authors:  Shiquan Sun; Jiaqiang Zhu; Ying Ma; Xiang Zhou
Journal:  Genome Biol       Date:  2019-12-10       Impact factor: 13.583

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.