| Literature DB >> 30097058 |
Ángeles Arzalluz-Luque1, Ana Conesa2,3.
Abstract
Single-cell RNAseq and alternative splicing studies have recently become two of the most prominent applications of RNAseq. However, the combination of both is still challenging, and few research efforts have been dedicated to the intersection between them. Cell-level insight on isoform expression is required to fully understand the biology of alternative splicing, but it is still an open question to what extent isoform expression analysis at the single-cell level is actually feasible. Here, we establish a set of four conditions that are required for a successful single-cell-level isoform study and evaluate how these conditions are met by these technologies in published research.Entities:
Mesh:
Substances:
Year: 2018 PMID: 30097058 PMCID: PMC6085759 DOI: 10.1186/s13059-018-1496-z
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Comparison of published single-cell RNAseq isoform studies
| Reference | Main focus of the study | Full-length isoforms? | Computatio-nal method | Aim | Organism, cell type | Library prep | Feature or event targeted | ||
|---|---|---|---|---|---|---|---|---|---|
| Illumina sequencing | Ramskold et al. [ | Single-cell RNAseq, genes | ✗ | MISO | Experimental protocol development | Human, cancer cells | Smart-seq | Exon inclusion quantification | |
| Shalek et al. [ | Single-cell RNAseq, genes and isoforms | ✗ | MISO | Single-cell heterogeneity in immune response | Mouse, BMDCs | Smart-seq | Exon inclusion quantification | ||
| Zhang et al. [ | Bulk RNA-seq, isoforms | ✗ | WemIQ | Computational method development | Mouse, BMCDs | Smart-seq | Single-cell bias in differential isoform detection | ||
| Marinov et al. [ | Single-cell RNAseq, genes and isoforms | ✗ | Pervouchine et al. [ | Single-cell isoform and gene expression heterogeneity | Mouse, lymphobl-astoid cells | Smart-seq | Novel splice junctions, exon inclusion quantification | ||
| Velten et al. [ | Single-cell RNAseq, isoforms | ✗ | BATBayes | 3′ UTR variability among genes and cells | Mouse, ESCs | BATSeq | Alternative poly(A) sites | ||
| Welch et al. [ | Single-cell RNAseq, isoforms | ✗ | SingleSplice | Computational method development | Mouse, ESCs | Smart-seq/C1 | Differential isoform usage | ||
| Karlsson et al. [ | Single-cell RNAseq, isoforms | ✗ | Alignment to FANTOM 5 database | Single-cell isoform expression heterogeneity | Mouse, brain cells | STRT-seq/C1 | Alternative TSS | ||
| Song et al. [ | Single-cell RNAseq, isoforms | ✗ | Expedition | Computational method development | Human, iPSCs, NPCs and MNs | Smart-seq/C1 | Exon inclusion quantification | ||
| Huang et al. [ | Single-cell RNAseq, isoforms | ✗ | BRIE | Computational method development | Human HCT116 cells + mESCs | Smart-seq + Smart-seq2 | Exon inclusion quantification | ||
| Single-molecule sequencing | Oxford Nanopore | Byrne et al. [ | Single-cell RNAseq, isoforms | ✓ | Mandalorion | Computational method development | Mouse, B1 cells | Smart-seq2 | TSS, TTS, exon inclusion, intron retention, alt. 3′ and 5′ splice sites |
| PacBio | Karlsson and Linnarsson [ | Single-cell RNAseq, isoforms | ✓ | Self-designed pipeline | Single-cell isoform expression heterogeneity | Mouse, oligoden-drocytes and VLMCs | STRT-seq/C1 | TSS, TTS, exon inclusion, alt. 3′ and 5′ splice sites | |
Illumina involves short-read sequencing, and single-molecule sequencing involves long-read technologies. Studies are classified per ‘focus’, either bulk-RNAseq, single-cell RNAseq for gene expression or isoform single-cell RNAseq (or both). Only ‘computational methods’ used for isoform identification/quantification are specified. ‘Full-length’ is only considered as such when isoforms were reconstructed end-to-end, regardless of whether library preparation was full-length or not. Text in italics adds complementary information on the aim of the computational method/library protocol developed. When specified, the study was performed on data generated by other authors. ‘Feature/event targets’ refer to the approach taken to study isoform diversity, or to a specific aspect of it that is tackled. For more information, readers should refer to this review’s analysis or to the referenced papers
BMDC bone-marrow-derived dendritic cell, ESC embryonic stem cell, iPSC induced pluripotent stem cell, mESC murine embryonic stem cell, MN motor neuron, NPC neural progenitor cell, TSS transcription start site, TTS transcription termination site, UTR untranslated region, VLMC vascular and leptomeningeal cell
Fig. 1Single-cell mRNA sequencing methods and sources of mRNA variation. a Methodological approaches to single-cell isoform studies. The combination of library preparation and sequencing technologies yields three distinct methods to capture isoform diversity. UMI-based methods are limited to sequencing of the 3′ (or 5′ end), which enables usage of UMIs to capture efficiently PCR bias in addition to early cell barcoding, even if they are particularly suited to quantify expression at the gene level. Smart-based methods produce short reads across the entire transcript length, although they require late cell barcoding (barcodes inserted in tagmentation), cannot accommodate UMIs, and the reads might be difficult to assign unambiguously to an isoform. Single-molecule sequencing allows sequencing of each transcript molecule in a single read and provides full isoform connectivity, although it suffers from a high prevalence of sequencing errors. b Sources of transcript variation that yield alternative isoforms and their position along the transcript. When compared with a reference isoform (for convenience, that including all exons, no introns and the complete UTRs), alternative TSSs (transcription start sites) and TTSs (transcription termination sites) are generated during the transcription process by shortening of the UTRs. Processing of the pre-mRNA eliminates or retains introns and exons, adding variability to the isoforms that can be generated from the gene. In addition, more than one event can simultaneously be present in the same isoform, and consequently isoform diversity will increase with the number of possible combinations of AS events. Alt. alternative, RT reverse transcription, UMI unique molecular identifier
Fig. 2Summary of limitations of the four ideal conditions for successful studies of single-cell RNAseq isoforms. From left to right, the importance and current limitations of full-length transcript sequencing, capture efficiency and sequencing depth, the number of cells sequenced, and sequencing errors and artefacts for isoform detection are presented in the diagram. Each is discussed in the main text. Alt. alternative, RT reverse transcription, UMI unique molecular identifier
Summary of number of cells sequenced in studies of single-cell isoforms (short reads)
| Reference | Ramsköld et al. [ | Shalek et al. [ | Marinov et al. [ | Velten et al. [ | Welch et al. [ | Karlsson et al. [ | Song et al. [ |
|---|---|---|---|---|---|---|---|
| Reference for data | – | – | – | – | Buettner et al. [ | Zeisel et al. [ | – |
| Total number of cells | 12 | 18 | 15 | 144 | 96 | 2816 | 206 |
| Library preparation method | Smart-seq | Smart-seq | Smart-seq | BATSeq | Fluidigm C1/Smart-seq | Fluidigm C1/STRT-seq | Fluidigm C1/Smart-seq |
Fig. 3Qualitative performance comparison of the three main single-cell RNAseq methods for isoform detection. From the inside to the outside of the graph, the three dotted lines represent ‘low’, ‘medium’ and ‘high’ levels of each characteristic. The most prominent features of long reads (red) are high isoform resolution potential but a high occurrence of errors. Smart-based methods (yellow) provide high sequencing depth and medium isoform resolution power and number of cells. UMI-based methods (blue) can process high numbers of cells with medium to low sequencing depth and accurately quantify isoform expression, although their isoform resolution potential is strongly limited. UMI unique molecular identifier
Fig. 4Simulation of short- and long-read workflows and the modelling of a UMI-based library preparation strategy. a Short-read simulation workflow. Transcript sequences from the Tardaguila et al. 2018 neural transcriptome [66] were trimmed, and reads simulated from fragments to recreate UMI library preparation limitations in transcript covered length. Full-length reads were also simulated. Reads were aligned to the mouse genome using STAR and isoform expression quantified using RSEM. For UMI simulations, the number of isoforms resolved using Smart-seq reads was used as the 100% reference to calculate the percentage of resolution of MIG. For the Smart-seq simulation, the annotated number of isoforms per gene (in Tardaguila et al. [66]) was used as the 100% reference. b Long-read simulation workflow. The Illumina quantification of isoform expression available in Tardaguila et al. [66] was scaled to one million reads (TPM) to recreate a Sequel run of one million long reads, where a single cell is sequenced. Values were downsampled to simulate scenarios where an increasing number of cells (2, 6, 10, 16, 20) are sequenced together in a similar run. The number of reads per cell is therefore gradually decreasing. The number of MIGs in the Tardaguila et al. annotation was compared with the number of MIGs detected in the simulated scenarios. Then, the number of isoform switches detected in the Tardaguila et al. data was compared. c Short-read length simulated for each simulation scenario (represented for 3′ UMIs only). PacBio transcript sequences in the Tardaguila et al. dataset [66] were trimmed as described. To ensure that coverage was even when capturing growing lengths of the transcripts in simulated UMI-based protocols, the length of the simulated reads was increased for longer fragments (100 and 200 bp—25 bp reads, 300 and 500 bp—50 bp reads, 1000 bp—100 bp reads, full length—250 bp reads, paired-end). MIG multi-isoform gene, NSC neural stem cell, RSEM RNA-seq by expectation maximization, TPM transcripts per million, UMI unique molecular identifier
Comparative summary of five computational approaches used to study splicing in single-cell RNAseq
| SingleSplice [ | MISO [ | BRIE [ | Expedition [ | RSEM [ | |
|---|---|---|---|---|---|
| Observation level | Gene | Exon | Exon | Exon | Isoform (full transcript) |
| Measure of expression | Differentially alternatively spliced (yes/no) | PSI | PSI | PSI | Read counts per isoform |
| Single-cell specific | ✓ | ✗ | ✓ | ✓ |
|
| Includes interpretation of changes | ✓ | ✗ | ✗ | ✓ | ✗ |
PSI percentage spliced-in, RSEM RNA-seq by expectation maximization
Fig. 5Simulation results. a Short-read simulations—proportion of transcript length left uncovered as longer fragments are simulated in a UMI-based library preparation scenario. Short fragments (100–200 bp) leave most of the transcript uncovered by the reads (> 0.75 proportion), while the simulation of longer (> 300 bp) fragments affects transcripts differently depending on their length, hence the growing distributions in the boxplot. b Short-read simulations—multi-isoform genes (MIGs) detected in each 3′ and 5′ end as well as in the Smart-seq simulation are classified in four intervals, according to their individual percentage of resolution. Results shown for neural stem cells (NSCs) only. Intervals gather MIGs for which 0–25, 25–50, 50–75 and 75–100% of their isoforms are resolved. The 3′ end and 5′ end labels only refer to unique molecular identifier (UMI) simulations. Note that Smart-seq data have been plotted twice, in both the 3′ end and 5′ end bar-graph rows, for completeness and to ease visual comparison. c Long-read simulations—the number of genes for multi-isoform genes detected as sequencing depth per cell is progressively lost. The dashed line indicates the number of multi-isoform genes present in the original neural cell transcriptome. A decrease in depth per cell decreases the number of genes for which more than one isoform can be observed. d Long-read simulations—the number of isoform switches detected between neural stem cells and oligodendrocytes in a similar scenario, assuming half of the cells belong to each cell type (i.e. two cells equate to one oligodendrocyte and one NSC). A decrease in sequencing depth per cell not only prevents detection of isoform ratio expression changes (which constitute the majority differences in isoform expression), but also reduces the number of isoform switches that can be observed. The dashed line indicates the number of NSCs versus oligodendrocyte isoform switches detected in the original transcript expression data