Literature DB >> 33441024

Single-Cell RNA Sequencing to Disentangle the Blood System.

Jean Acosta¹, Daniel Ssozi¹, Peter van Galen¹.

Abstract

The blood system is often represented as a tree-like structure with stem cells that give rise to mature blood cell types through a series of demarcated steps. Although this representation has served as a model of hierarchical tissue organization for decades, single-cell technologies are shedding new light on the abundance of cell type intermediates and the molecular mechanisms that ensure balanced replenishment of differentiated cells. In this Brief Review, we exemplify new insights into blood cell differentiation generated by single-cell RNA sequencing, summarize considerations for the application of this technology, and highlight innovations that are leading the way to understand hematopoiesis at the resolution of single cells. Graphic Abstract: A graphic abstract is available for this article.

Entities: Chemical Disease Gene Species

Keywords: blood cells; computational biology; hematopoiesis; stem cells; technology

Year: 2021 PMID： 33441024 PMCID： PMC7901535 DOI： 10.1161/ATVBAHA.120.314654

Source DB: PubMed Journal: Arterioscler Thromb Vasc Biol ISSN： 1079-5642 Impact factor: 8.311

The hematopoietic hierarchy is being redrawn by single-cell sequencing. Technological developments have made single-cell assays widely available. Computational analysis is often rate limiting to generate new biological insight. Please see

Revision of the Hematopoietic Hierarchy

The blood system has long served as a model for hierarchical tissue organization. In 1909, the idea was introduced that a common stem cell could generate all blood cell types.[1] In the 1960s, irradiation of mouse bone marrow cells was used to introduce unique chromosomal markers, followed by transplantation and clonal analysis of spleen colony-forming units. These experimental advances enabled the identification of progenitor cells with the capacity for limited self-renewal, proliferation, and differentiation into multiple lineages.[2] Over the next decades, refinement of assays and the introduction of technologies such as flow cytometry delineated how hematopoietic stem cells undergo a series of differentiation steps to give rise to mature cell types of the megakaryocyte/erythroid, myeloid, and lymphoid lineages (Figure [A]). A limited number of phenotypically defined cell types with predictable lineage potential were considered to represent relatively homogeneous molecular states.[3,4] This orderly model allowed increasingly detailed descriptions of the factors that regulate hematopoietic differentiation. Different approaches to describe the hematopoietic hierarchy. A, Hematopoietic cell types can be separated, for example, using flow cytometry to sort based on surface markers. In vitro differentiation and transplantation inform the lineage potential of single cells or sorted populations. These assays played a major role in establishing the hierarchical relationships between cell types. B, Molecular characterization of individual cells provides an additional method to study cellular heterogeneity. Computational analysis of these datasets indicates that cell types are more heterogeneous and that differentiation trajectories are more gradual than appreciated previously. B/NK indicates B/natural killer cell progenitor; CMP, common myeloid progenitor; DC, dendritic cell; ETP, early T-cell progenitor; GMP, granulocyte/macrophage progenitor; HSC, hematopoietic stem cell; MEP, megakaryocyte/erythroid progenitor; MLP, multilymphoid progenitor; MPP, multipotent progenitor; Prog, progenitor; RNA-seq, RNA sequencing. A, Hierarchy derived from Doulatov et al.[4] B, K nearest neighbor graph derived from van Galen et al.[47] Over the past decade, single-cell technologies such as single-cell RNA sequencing (scRNA-seq) have developed at a fast pace.[5] The blood is an attractive system to study because it is readily available as a single-cell suspension and because there are well-established molecular markers and functionally validated cell types. Considering all this prior knowledge, it may come as a surprise that scRNA-seq is significantly revising our fundamental understanding of the blood system. Rather than an orderly hierarchy of a defined set of cell types, scRNA-seq studies consistently recover more continuous differentiation trajectories (Figure [B]).[6-8] It is important to keep in mind that these 2-dimensional representations are the result of measuring thousands of genes, representing a multitude of regulatory processes. Although the reduction of all these processes to a few dimensions results in a continuous trajectory, not all individual regulatory mechanisms are marked by gradual change.[9,10] For example, some cell states can be resolved by a molecular switch, like Irf8 and Gfi1 that compete in a bipotential state to direct myeloid fate decisions.[11] Other consequential decision points are irreversible, such as enucleation during erythroid differentiation or DNA recombination during B- and T-cell differentiation. Considering such variety in processes that control cellular differentiation, the gradual commitment trajectories detected by scRNA-seq should be viewed as a compendium of molecular mechanisms that each contribute to cell state changes in unique ways. Increasingly, single-cell analysis is not used in isolation but as a complementary tool. Its combination with orthogonal models and assays often reveals previously unappreciated complexity in the maintenance of immune cell homeostasis.[12] One example is that lineage differentiation does not just consist of branching points that lead to further diversification. In fact, separate differentiation trajectories can converge to replenish similar mature cell types, as shown in the myeloid lineage.[13-15] A second example is a deeper understanding of progenitor cells. Single-cell analyses of classically defined megakaryocyte-erythroid progenitors showed that this population contains bipotent and unipotent progenitors and that cell fate is regulated by cell cycle duration, reminiscent of myeloid-lymphoid fate choices.[16-19] In addition to the stepwise differentiation through megakaryocyte-erythrocyte progenitor stages, single-cell analyses have strengthened evidence that megakaryocytes can also branch directly from stem cells.[20-23] A third area where single-cell analyses will have high impact is the deconvolution of macrophages. These cells can originate from embryonic precursors or from adult bone marrow stem cells and reside in various tissues with concomitant environmental interactions. scRNA-seq has been used to characterize macrophages from the aorta, adipose tissue, liver, kidney, and airways.[24-29] With integrative studies, a more complete picture of macrophage biology will start to emerge.[30] A refined understanding of differentiation trajectories and cellular heterogeneity will inform strategies to modulate specific cell states in health and disease.

Technological Innovations

A large variety of scRNA-seq technologies have been developed that differ in critical ways. First, plate-based methods in which cells are separated by flow cytometry before processing include MARS-Seq, CEL-Seq, and Smart-Seq.[31-33] The high sensitivity and full-length coverage of current plate-based methods enable robust detection of genes with low expression levels, splice variants, and allele-specific expression. Second, high-throughput droplet- or nanowell-based approaches have enabled thousands of cells to be analyzed in a single experiment. Drop-Seq and InDrop capture cells and barcoded beads in droplets, which allows for the generation of barcoded cDNA for thousands of cells in a single tube.[34,35] Using similar chemistry but without droplets, Seq-Well captures cells and beads into nanowells on 90 000-well plates that are handled without specialized equipment.[36] In the commercial space, 10× Genomics provides a popular platform that has made droplet-based scRNA-seq accessible to the wider research community.[37] These methods yield 3′-biased reads and fewer transcripts per cell compared with plate-based methods, although recent improvements have increased the sensitivity.[38] The analysis of large cell numbers is advantageous for complex tissues and identification of rare populations. Third, combinatorial indexing provides another approach to high-throughput scRNA-seq.[39] This split-pool barcoding strategy has been applied to whole organisms due to its advantages in terms of throughput, cost, and broad cell type compatibility. Since extracted nuclei are used as a starting material, single-cell combinatorial-indexing RNA-seq (sci-RNA-seq3) is less sensitive to differences in cell size, stickiness, and mechanical sensitivity and only captures nuclear mRNAs. With this variety of available technologies and the use of carefully considered experimental designs,[40-42] scRNA-seq can be applied to a range of biological questions. An area of development is simultaneous acquisition of several layers of information (modalities) from the same single cell. First, simultaneous mRNA and protein capture can resolve situations where transcript levels do not correlate with protein levels. Plate-based scRNA-seq methods allow for the quantification of proteins using indexed sorting, but droplet methods cannot be combined with flow cytometry. To enable high-throughput scRNA-seq with detection of protein levels, CITE-seq uses oligonucleotide-labeled antibodies.[43] This approach was used to describe cellular heterogeneity in mixed-phenotype acute leukemia,[44] and commercial kits are currently available. Second, to combine RNA-seq with the detection of genetic variants such as somatic mutations, specific sites can be enriched with high sensitivity using TARGET-seq (compatible with Smart-Seq) or with high throughput using Genotyping of Transcriptomes (compatible with 10×). These approaches have been primarily applied to link leukemia-associated mutations to gene expression changes.[45-48] Third, several strategies have been developed to combine RNA-seq with lineage tracing, which provides insight into clonal relationships between cells. Applications range from tracking stem cell progeny during native hematopoiesis to identifying clones that develop resistance to cancer therapy.[49-52] Additional developments include combined RNA-seq and ATAC-seq (assay for transposase-accessible chromatin using sequencing),[53-56] RNA-seq with tissue localization,[57,58] and single-cell DNA-sequencing combined with surface protein expression.[59] While this list is by no means exhaustive, it demonstrates that the field is rapidly moving toward multiomic strategies to comprehensively map tissue heterogeneity.

Computational Challenges

Typically, analysis of a single sample yields hundreds to thousands of cells with thousands of sequencing reads per cell. An initial goal of the analysis is to detect what genes were expressed in every individual cell (Figure [B]). This requires a number of computational steps, including demultiplexing of reads, alignment to a reference genome, and grouping reads by cell barcodes. Fortunately, several software tools are available to facilitate these procedures, such as zUMIs, Alevin, and 10× CellRanger.[60,61] These programs allow bioinformaticians to generate a count matrix containing the number of transcripts that were detected per gene in each cell. After the generation of a count matrix, the greatest challenge is to interpret the data in a biologically meaningful way. The following procedures will start to acquaint the user with the data. High-quality cells can be selected on the basis of a minimum number of detected transcripts (ranging from 200 to 5000) and a maximum fraction of mitochondrial transcripts, which correlates with cell death. Additional strategies to clean up the data include subtraction of contaminating transcripts[62,63] and the removal of doublets or multiple cells that were captured together. To remove doublets, agreement between tools such as DoubletDecon, DoubletFinder, and Scrublet can prevent the erroneous classification of doublets as a new rare cell type.[64-66] The next steps will start to reveal structure in gene expression profiles. Transcript counts are normalized to a total of 10 000 per cell, and the data are log-transformed. To simplify the analysis, only genes with high variability between cells are selected. To reduce the noise associated with each individual gene, linear dimension reduction by principal component analysis identifies sets of correlated genes that are the primary source of heterogeneity in the data. At this point, datasets from different technologies or donors can be combined to minimize technical differences and facilitate the identification of biological differences between samples. Various approaches are available, including Harmony, Seurat integration, and batch-balanced k nearest neighbors.[67-70] Next, nonlinear dimensionality reduction generates a 2-dimensional representation of the data, wherein transcriptionally similar cells are placed close together (Figure [B]). This step is often performed by generating a k nearest neighbor graph, t-SNE (t-distributed stochastic neighbor embedding), or UMAP (uniform manifold approximation and projection).[71,72] Effectively, the expression of thousands of genes has now been reduced to x and y coordinates. This intuitive visualization enables superimposition of gene expression or other cell-specific features, which is helpful to begin to interpret the data. All the steps in this section are facilitated by freely available software packages such as Seurat, Scanpy, Monocle, or CellHarmony.[39,69,73,74] Still, these analyses require expertise in R and Python programming languages, biological insight, dedication, and time. In combination with deeper analyses and orthogonal validation, biological meaning can now start to be uncovered. Clustering cells is required for many downstream analyses, such as differential gene expression analysis between cell types, developmental stages, or perturbations. The goal of clustering is to group cells with similar gene expression patterns and inferred similar functional properties together.[75] The aforementioned packages include clustering algorithms, such as optimized graph-based methods to partition cells into interconnected communities.[76] The annotation of clusters can be fairly straightforward for distinct cell types that express canonical markers such as T cells and B cells or challenging for similar cell types such as progenitor or T-cell subsets. The automation of cluster annotation could substantially facilitate data analysis, but a comprehensive and widely accepted cell type atlas to use as a reference is currently lacking for most tissues.[77] When a reference dataset is available, automated cell type annotation can reveal gene expression differences between normal and perturbed tissues that are not compromised by developmental state heterogeneity or contaminating cell types. For example, (1) in acute myeloid leukemia patient samples, we used a Random Forest algorithm to classify malignant cell states by their similarity to annotated cell types from healthy donors,[47] (2) in pluripotent stem cell–derived cultures, Artificial Neural Networks identified hematopoietic stem cell–like cells similar to human fetal liver hematopoietic stem cells,[78] and (3) using a community clustering strategy, alignment of Gfi1 mutant cells to wild-type controls showed that Gfi1-target genes are altered sequentially, as cells traverse successive differentiation states.[79] Analytical innovations such as cellular trajectory analysis and inference of dynamic information using splice variants (RNA velocity) continue to expand the scope of single-cell genomics.[80,81] Ultimately, it is up to the researcher analyzing the data to apply these tools appropriately and to develop creative approaches to derive new insights.

Outlook

Single-cell genomics opens up avenues to study complex tissues and to discover molecular mechanisms at a new level of resolution. Using these technologies, we learned that blood cell types are more diverse than previously appreciated and that differentiation is best represented as a continuous trajectory rather than discrete steps. Single-cell analysis has been used to identify rare cell types in heterogeneous populations, such as hematopoietic stem cell–like cells in pluripotent stem cell–derived cultures. It also enables comparisons between control and perturbed cells across stages of differentiation. For example, single-cell sequencing has been used to study deregulation in malignant cell types compared with healthy controls without being obscured by cell type heterogeneity. Extensive efforts in technology development have resulted in robust scRNA-seq protocols that are accessible for widespread adoption, whereas multiomic strategies like combined DNA and RNA analysis are at the leading edge. Computational tools allow us to structure the data and generate intuitive visualizations. Creative data analysis is generally the greatest challenge and essential to derive biological insight. Software packages that perform complex tasks like dimensionality reduction and clustering are making computational analysis more accessible. Further standardization of procedures such as cell type annotation is imminent. Together with initiatives like the Human Cell Atlas, these innovations will accelerate our progress toward a comprehensive understanding of biological systems.

Acknowledgments

We thank A. Kreso for critical feedback. Figures were created with biorender.com.

Sources of Funding

P. van Galen is supported by the National Institutes of Health, National Cancer Institute (R00CA218832), the Ludwig Center at Harvard, the Harvard Medical School Epigenetics and Gene Dynamics Initiative, the Glenn Foundation for Medical Research and American Federation for Aging Research, the Leukemia & Lymphoma Society, and by Gilead Sciences as a Research Scholar in Hematology/Oncology.

Disclosures

None.

79 in total

1. SoupX removes ambient RNA contamination from droplet-based single-cell RNA sequencing data.

Authors: Matthew D Young; Sam Behjati
Journal: Gigascience Date: 2020-12-26 Impact factor: 6.524

Review 2. Challenges in unsupervised clustering of single-cell RNA-seq data.

Authors: Vladimir Yu Kiselev; Tallulah S Andrews; Martin Hemberg
Journal: Nat Rev Genet Date: 2019-05 Impact factor: 53.242

3. Generalizing RNA velocity to transient cell states through dynamical modeling.

Authors: Volker Bergen; Marius Lange; Stefan Peidli; F Alexander Wolf; Fabian J Theis
Journal: Nat Biotechnol Date: 2020-08-03 Impact factor: 54.908

4. Fundamental limits on dynamic inference from single-cell snapshots.

Authors: Caleb Weinreb; Samuel Wolock; Betsabeh K Tusi; Merav Socolovsky; Allon M Klein
Journal: Proc Natl Acad Sci U S A Date: 2018-02-20 Impact factor: 11.205

5. Simultaneous epitope and transcriptome measurement in single cells.

Authors: Marlon Stoeckius; Christoph Hafemeister; William Stephenson; Brian Houck-Loomis; Pratip K Chattopadhyay; Harold Swerdlow; Rahul Satija; Peter Smibert
Journal: Nat Methods Date: 2017-07-31 Impact factor: 28.547

6. Somatic mutations and cell identity linked by Genotyping of Transcriptomes.

Authors: Anna S Nam; Kyu-Tae Kim; Ronan Chaligne; Franco Izzo; Chelston Ang; Justin Taylor; Robert M Myers; Ghaith Abu-Zeinah; Ryan Brand; Nathaniel D Omans; Alicia Alonso; Caroline Sheridan; Marisa Mariani; Xiaoguang Dai; Eoghan Harrington; Alessandro Pastore; Juan R Cubillos-Ruiz; Wayne Tam; Ronald Hoffman; Raul Rabadan; Joseph M Scandura; Omar Abdel-Wahab; Peter Smibert; Dan A Landau
Journal: Nature Date: 2019-07-03 Impact factor: 49.962

7. DoubletDecon: Deconvoluting Doublets from Single-Cell RNA-Sequencing Data.

Authors: Erica A K DePasquale; Daniel J Schnell; Pieter-Jan Van Camp; Íñigo Valiente-Alandí; Burns C Blaxall; H Leighton Grimes; Harinder Singh; Nathan Salomonis
Journal: Cell Rep Date: 2019-11-05 Impact factor: 9.423

8. The stem/progenitor landscape is reshaped in a mouse model of essential thrombocythemia and causes excess megakaryocyte production.

Authors: Daniel Prins; Hyun Jung Park; Sam Watcham; Juan Li; Michele Vacca; Hugo P Bastos; Alexander Gerbaulet; Antonio Vidal-Puig; Berthold Göttgens; Anthony R Green
Journal: Sci Adv Date: 2020-11-25 Impact factor: 14.136

9. A molecular cell atlas of the human lung from single-cell RNA sequencing.

Authors: Kyle J Travaglini; Ahmad N Nabhan; Lolita Penland; Rahul Sinha; Astrid Gillich; Rene V Sit; Stephen Chang; Stephanie D Conley; Yasuo Mori; Jun Seita; Gerald J Berry; Joseph B Shrager; Ross J Metzger; Christin S Kuo; Norma Neff; Irving L Weissman; Stephen R Quake; Mark A Krasnow
Journal: Nature Date: 2020-11-18 Impact factor: 49.962