| Literature DB >> 33335900 |
Yunjin Li1, Qiyue Xu1, Duojiao Wu2, Geng Chen1.
Abstract
Single-cell RNA-seq (scRNA-seq) technologies are broadly applied to dissect the cellular heterogeneity and expression dynamics, providing unprecedented insights into single-cell biology. Most of the scRNA-seq studies mainly focused on the dissection of cell types/states, developmental trajectory, gene regulatory network, and alternative splicing. However, besides these routine analyses, many other valuable scRNA-seq investigations can be conducted. Here, we first review cell-to-cell communication exploration, RNA velocity inference, identification of large-scale copy number variations and single nucleotide changes, and chromatin accessibility prediction based on single-cell transcriptomics data. Next, we discuss the identification of novel genes/transcripts through transcriptome reconstruction approaches, as well as the profiling of long non-coding RNAs and circular RNAs. Additionally, we survey the integration of single-cell and bulk RNA-seq datasets for deconvoluting the cell composition of large-scale bulk samples and linking single-cell signatures to patient outcomes. These additional analyses could largely facilitate corresponding basic science and clinical applications.Entities:
Keywords: RNA velocity; cell-to-cell communication; cell-type deconvolution; copy number variations; non-coding RNAs; single-cell RNA-seq
Year: 2020 PMID: 33335900 PMCID: PMC7736616 DOI: 10.3389/fcell.2020.593007
Source DB: PubMed Journal: Front Cell Dev Biol ISSN: 2296-634X
FIGURE 1Overview of diverse common and additional valuable analyses of scRNA-seq data. The heterogeneous cells can be sequenced with the full-length transcript or 3′/5′-end capturing scRNA-seq protocols. Then the expression count matrices for all genes in each cell can be quantified from scRNA-seq data. Before downstream analysis, a series of preprocessing steps are needed to be conducted including quality control (e.g., elimination of low-quality cells), normalization, and correction (if need, such as batch effect). The common scRNA-seq data analyses in most studies include cell type identification, differential expression calling, trajectory inference, gene regulatory network reconstruction, and alternative splicing detection. Besides these routine explorations, other valuable analyses can be carried out, such as cell-to-cell communication exploration, RNA velocity inference, large-scale copy number variation, and single nucleotide change detection, chromatin accessibility prediction, transcriptome reconstruction for novel gene/isoform identification, lncRNA, and circRNA profiling, cell type decomposition, and patient outcome prediction.
FIGURE 2Cell-to-cell communication exploration of different cell types. (A) Network view of autocrine and paracrine cell-to-cell communications within and across cell types. Autocrine signaling represents the signaling cells and the target cells that are the same or similar cells (such as belonging to the same cell type), while paracrine signaling could be the interactions between different cell types in a microenvironment. The circles and edges are in proportion to the counts of ligand–receptor interaction pairs. (B) Heatmap showing the interaction scores of ligand–receptor pairs in each cell type. Interaction scores could be the significance (e.g., P-value) or the weighted scores for ligand–receptor interactions.
Computational approaches for additional analyses of scRNA-seq data.
| Categories | Suitable scRNA-seq protocols | Tools | URL | References |
| Cell to cell communication | Full-length transcript or 3′/5′-tag sequencing | PyMINEr | ||
| scTensor | ||||
| iTALK | ||||
| CellPhoneDB | ||||
| RNA velocity | Full-length transcript or 3′/5′-tag sequencing | velocyto | ||
| scVelo | ||||
| Copy number variation | Full-length transcript or 3′/5′-tag sequencing | inferCNV | ||
| HoneyBADGER | ||||
| Chromatin accessibility | Full-length transcript sequencing or 3′/5′-tag sequencing | BIRD | ||
| Single nucleotide variants | Full-length transcript sequencing | SAMtools | ||
| Strelka2 | ||||
| FreeBayes | ||||
| RNA editing | Full-length transcript sequencing | GIREMI | ||
| REDItools | ||||
| Transcriptome reconstruction | Full-length transcript sequencing | TransComb (genome-guided) | ||
| StringTie (genome-guided and | ||||
| Cufflinks (genome-guided) | ||||
| Trinity ( | ||||
| rnaSPAdes ( | ||||
| Coding potential assessment | Full-length transcript sequencing | CPAT | ||
| LncRNA-ID | ||||
| LGC | ||||
| Circular RNA identification | Total RNA (poly (A+) and poly (A−) RNAs) sequencing | find_circ2 | ||
| CircExplorer2 | ||||
| CIRI2 | ||||
| Cell composition deconvolution | Full-length transcript or 3′/5′-tag sequencing | CMP | ||
| MuSiC | ||||
| DWLS | ||||
| CIBERSORTx | ||||
| Survival analysis | Full-length transcript or 3′/5′-tag sequencing | Cox regression |
FIGURE 3Inference of large-scale copy number alterations and single nucleotide changes based on scRNA-seq data. (A) Representative heatmap displaying the large-scale copy number variations (CNVs) identified in different cell types with scRNA-seq data. Top panel shows that no significant large-scale CNVs were identified in reference normal cells, whereas chromosomal-scale deletions (blue) and gains (red) were observed for several chromosomes in different cell subtypes of tumor cells (second panel). The heatmap was created by inferCNV. (B) Graphic view of single nucleotide variations and RNA editing events. The reads of scRNA-seq data generated from full-length transcript sequencing protocols are mapped to the reference genome first. Then specific SNV calling tools or RNA-editing detection approaches can be applied to determine the SNVs or RNA-editing events based on the alignment result. Both SNV and RNA editing identifications are mainly suitable for the scRNA-seq methods that can generate full-length transcripts. Moreover, sequencing depth could be an important factor influencing the detection accuracy.
FIGURE 4RNA velocity and chromatin accessibility analyses. (A) RNA velocity inference of single cells to predict their future transcriptional states. The velocity of gene expression could be represented as the mRNA abundance over time, which enables the prediction of future transcriptional state of cells (the arrows denote the directionality). (B) Graphic view of chromatin accessibility prediction with scRNA-seq data. Transcriptome and regulome could have bidirectional interplay because of the feedback, thus scRNA-seq has the potential to predict the chromatin accessibility of transcribed regions using the corresponding computational approach. However, it is worth noting that the chromatin accessibility of non-transcribed regions cannot be predicted with scRNA-seq.
FIGURE 5Transcriptome reconstruction and identification of lncRNAs and circRNAs. (A) Schematic of single-cell transcriptome reconstruction with genome-guided and genome-independent approaches. Genome-guided strategies need to map the sequencing reads to the reference genome first, whereas genome-independent (de novo assembly) methods can assemble the sequencing reads directly without using the reference genome. (B) Novel lncRNAs can be identified by assessing the protein-coding potential of the transcripts assembled from transcriptome reconstruction methods. Since lncRNAs can be with or without poly (A) tails, the full-length transcript scRNA-seq technologies that can capture both poly (A+) and poly (A–) RNAs are preferred for comprehensively profiling lncRNAs. Moreover, sufficient sequencing depth can also benefit the lncRNA identification in considering that lncRNAs are usually expressed at relatively lower levels than that of mRNAs. (C) Profiling circRNAs with scRNA-seq data. CircRNAs are formed by back-splicing, which is different from linear RNAs. Unlike linear RNAs that can be captured with standard poly-A enriched methods, circRNAs are covalently closed and usually need to be profiled with rRNA-depleted total RNA protocols. Furthermore, the sequencing depth is also important to ensure the accuracy of circRNA identification and quantification.
FIGURE 6Integration of single-cell and bulk RNA-seq data for cell-type decomposition and survival analysis. (A) Deconvolution of cell-type composition in bulk samples with single-cell reference signatures. The computational approaches for deconvoluting the cell-type compositions or proportions of bulk samples often need the reference expression profile of the markers for specific cell types. Performing scRNA-seq on a few samples is an efficient and cost-effective way to generate the cell-type-specific gene expression profile as the reference. (B) Linking the expression of single-cell signatures to patient outcomes with related large-scale bulk datasets. The signatures or markers identified in diverse types of single-cell analyses (such as cell type identification, alternative splicing inference, cell–cell communication exploration, and gene regulatory network reconstruction) can be further investigated with relevant large-scale bulk RNA-seq dataset to check whether they are associated with different patient outcomes (e.g., survival of patients) based on available clinical information of involved patients. Such analysis can further gain insights into the associations of cell-type-specific markers with certain phenotypes of patients in a multitude of samples.