| Literature DB >> 34149808 |
Simon Haile1, Richard D Corbett1, Veronique G LeBlanc1, Lisa Wei1, Stephen Pleasance1, Steve Bilobram1, Ka Ming Nip1, Kirstin Brown1, Eva Trinh1, Jillian Smith1, Diane L Trinh1, Miruna Bala1, Eric Chuah1, Robin J N Coope1, Richard A Moore1, Andrew J Mungall1, Karen L Mungall1, Yongjun Zhao1, Martin Hirst1, Samuel Aparicio2, Inanc Birol1,3, Steven J M Jones1,3, Marco A Marra1,3.
Abstract
RNA sequencing (RNAseq) has been widely used to generate bulk gene expression measurements collected from pools of cells. Only relatively recently have single-cell RNAseq (scRNAseq) methods provided opportunities for gene expression analyses at the single-cell level, allowing researchers to study heterogeneous mixtures of cells at unprecedented resolution. Tumors tend to be composed of heterogeneous cellular mixtures and are frequently the subjects of such analyses. Extensive method developments have led to several protocols for scRNAseq but, owing to the small amounts of RNA in single cells, technical constraints have required compromises. For example, the majority of scRNAseq methods are limited to sequencing only the 3' or 5' termini of transcripts. Other protocols that facilitate full-length transcript profiling tend to capture only polyadenylated mRNAs and are generally limited to processing only 96 cells at a time. Here, we address these limitations and present a novel protocol that allows for the high-throughput sequencing of full-length, total RNA at single-cell resolution. We demonstrate that our method produced strand-specific sequencing data for both polyadenylated and non-polyadenylated transcripts, enabled the profiling of transcript regions beyond only transcript termini, and yielded data rich enough to allow identification of cell types from heterogeneous biological samples.Entities:
Keywords: RNAseq; cellenONE; full-length; single-cell; total RNA
Year: 2021 PMID: 34149808 PMCID: PMC8209500 DOI: 10.3389/fgene.2021.665888
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
FIGURE 1DLP-scRNAseq workflow. Following single-cell isolation using the CellenONE automated cell spotter and lysis, RNA was fragmented using magnesium ion-dependent heating. Adapters containing 5′- and 3′-end sequencing primer targets were introduced sequentially as part of the cDNA synthesis steps, thereby achieving strand-specificity. Cell-specific barcodes were introduced in the first round of PCR (Index PCR). All steps up to Index PCR were performed in nanoliter-scale wells (Nanoliter platform). PCR products were then pooled and subsequent steps including rRNA depletion were performed in 96-well plate format (Microliter platform). Figure was created using biorender.com.
FIGURE 2Comparisons of DLP-scRNAseq data and bulk RNAseq data and benchmarking using orthogonally generated data. (A) Alignment-based metrics of scRNAseq (DLP-scRNAseq) data vs. bulk (SMARTer) RNAseq data. 80 million reads were used for each data set. (B) Number of genes detected in DLP-scRNAseq data vs. bulk RNAseq data. 80 million reads were used. (C) Number of genes detected by at least one read in each of the 90 uniquely barcoded single cells (blue dots). Cells are sorted in ascending order based on number of reads. (D) Evaluation of sequencing saturation. Reads were down-sampled to numbers between 0.125 and 1.25 million and the number of genes with >0 reads was enumerated at each sampling depth. Curve slopes are indicative of the yield of new genes sampled as a function of sequencing depth, with steeper slopes indicative of lower saturation levels. (E) Pearson correlation values comparing expression values from bulk-based RNAseq (SMARTer) data with DLP-scRNAseq data for UHR and NHA data. (F) Pearson correlations comparing DLP-scRNAseq and qPCR data (UHR) and known synthetic RNA measurements (ERCC).
FIGURE 3DLP-scRNAseq profiles full-length RNAs (A) A screen shot of an Integrative Genomics Viewer image of the genomic region spanning the ACTB (left) and FTL (right) genes. DLP-scRNAseq _1 is a single-cell library with a read number (710,000) representative of that obtained for other single cells (mean = 757,791 reads). Genomic location-specific read depth ranges are indicated within each plot, and the total number of reads for each library is shown between the plots. (B) Comparison of the normalized coverage of transcript bodies, from 5′ (left) to 3′ (right) of all annotated termini (3′ being the location of the polyadenylation site), achieved using DLP-scRNAseq and bulk RNAseq data. The left panel shows data from NHA cells and the right panel shows data that were generated from PBMCs. For the PBMC plot, data that were generated using the 3′-end profiling 10× Chromium protocol are also shown, illustrating the 3′ end bias expected from the 10X platform.
FIGURE 4Exon level quantification of gene expression. (A) Comparisons of sensitivity of exon-level detection between DLP-scRNAseq and bulk protocols. Violin plots show the distributions of the density of the data representing various fraction of exons covered by one or more reads (Y-axis) for various ranges of transcript lengths in Ensembl annotations. Shown are data for all exonic regions (left panel), for full and partial exons falling within 5′ untranslated regions (UTRs) of transcripts (middle panel), and for full and partial exons falling within 3′ UTRs of transcripts (right panel). The coverage across coding regions of transcripts ranging in length from 200 to 5,000 nucleotides (178,348 transcripts in total) was similar between data from the DLP-scRNAseq pool of single cells and bulk libraries generated using SMARTer and RNaseH methods. Transcripts that are shorter than 200 nt (9,750 in total) showed more variable coverage, particularly at the 3′- and 5′-UTR regions. (B) A log-log plot of exon-level expression values comparing DLP-scRNAseq to bulk SMARTer data. Correlation values were calculated for exons with one or more reads in both datasets. The Spearman correlation was 0.93, indicating high similarity of expression of 333,517 exons. Exons captured to a higher extent with DLP-scRNAseq than SMARTer (∼459 exons, blue dots), falling below the diagonal (using the formula y–1.28× < –5), spanned all chromosomes and mapped to 354 genes.
FIGURE 5DLP-scRNAseq can be used to detect fusion transcripts. (A) Reads from 62 UHR (5 pg total RNA) libraries that were generated using the DLP-scRNAseq protocol were pooled and analyzed for intergenic transcript fusion junctions, previously identified and validated using qPCR (Sakarya et al., 2012). Black boxes indicate events that were confirmed by de-novo transcript sequence assembly (Nip et al., 2019). The number on the black boxes indicate the number of contiguous reads covering the fusion transcript. The fraction of down-sampled reads is indicated in the legend (e.g., 1× corresponds to 250 million reads, 0.1× corresponds to 25 million reads). The fewest total reads corresponds to 0.4 million/cell and the highest total number of reads represents 4 million reads per cell. (B) Comparison of the sensitivity of detection of fusion transcripts between the pool of UHR libraries that were generated using DLP-scRNAseq data and data from UHR bulk libraries (100 ng total RNA) that were generated using the RNaseH protocol. The number on the black boxes indicate frequencies of detection for each fusion event.
FIGURE 7Classification of PBMC cell types based on expression profiles that were generated using DLP-scRNAseq. (A) tSNE plot with DLP-scRNAseq cells colored by cluster. (B) tSNE plots with cells colored by normalized expression of the indicated marker gene. (C) Proportions of cells identified as the indicated cell types in the DLP-scRNAseq and 10X PBMC datasets. (D) Heatmap showing proportion of all possible event pairs that were found to be alternatively spliced between indicated cell types. The number of cells assigned to each cell type is indicated on the right: the total number of possible event pairs was calculated by (# of cell type 1 cells × # of cell type 2 cells × total number of transcripts tested). Absolute numbers of AS events between cell type pairs are also shown on the heatmap. (E) Example of a cell type-specific AS event (HIPK3, BRIE transcript ID ENSG00000110422.7.AS2). Left: sashimi plots showing read densities (in RPKM) within pools of cells assigned to the same cell type. Junction reads linking exons are also indicated with lines and labeled by their count. The outside exons are exons 3 (left) and 4 (right) in most Gencode v19 HIPK3 transcripts (16 exons total in ENST00000525975.1, ENST00000379016.3, and ENST00000456517.1; 17 exons total in ENST00000303296.4); the middle exon, which is more frequently retained in dendritic cells compared to the other cell types shown, is specific to transcript ENST00000534262.1 (exon 2 of 4). Right: posterior distributions (blue curve, histogram in black) learned by BRIE for each cell type. Red bar depicts the mean, and the 95% confidence interval is indicated by dashed lines. The posterior (Ψ) is a measure of the frequency of exon inclusion (0–never; 1–always).
FIGURE 6Demonstration of the capacity of DLP-scRNAseq to capture both polyadenylated and non-polyadenylated RNAs. (A) Detection of various RNA biotypes. The proportion of various classes of detected transcripts is shown for a pool of single cell libraries generated using DLP-scRNAseq and for bulk libraries that were generated using the SMARTer and RNaseH protocols. Total UHR RNA was used as input. (B) Detection of histone mRNAs. The proportion of histone transcripts is shown for a pool of single cell libraries generated using DLP-scRNAseq and for bulk libraries that were generated using the SMARTer and RNaseH protocols. Total UHR RNA was used as input. (C) Detection and quantification of polyA– RNAs in scRNAseq data and bulk RNAseq data from NHA cells. Pearson correlations between expression profiles generated by DLP-scRNAseq or SMARTer and expression values of genes whose expression was enriched in polyA– and polyA+ fractions are shown.