| Literature DB >> 27470110 |
Alan Derr1, Chaoxing Yang2, Rapolas Zilionis3, Alexey Sergushichev4, David M Blodgett5, Sambra Redick2, Rita Bortell2, Jeremy Luban6, David M Harlan5, Sebastian Kadener7, Dale L Greiner8, Allon Klein9, Maxim N Artyomov10, Manuel Garber11.
Abstract
RNA-seq protocols that focus on transcript termini are well suited for applications in which template quantity is limiting. Here we show that, when applied to end-sequencing data, analytical methods designed for global RNA-seq produce computational artifacts. To remedy this, we created the End Sequence Analysis Toolkit (ESAT). As a test, we first compared end-sequencing and bulk RNA-seq using RNA from dendritic cells stimulated with lipopolysaccharide (LPS). As predicted by the telescripting model for transcriptional bursts, ESAT detected an LPS-stimulated shift to shorter 3'-isoforms that was not evident by conventional computational methods. Then, droplet-based microfluidics was used to generate 1000 cDNA libraries, each from an individual pancreatic islet cell. ESAT identified nine distinct cell types, three distinct β-cell types, and a complex interplay between hormone secretion and vascularization. ESAT, then, offers a much-needed and generally applicable computational pipeline for either bulk or single-cell RNA end-sequencing.Entities:
Mesh:
Year: 2016 PMID: 27470110 PMCID: PMC5052061 DOI: 10.1101/gr.207902.116
Source DB: PubMed Journal: Genome Res ISSN: 1088-9051 Impact factor: 9.043
Figure 1.End-sequencing libraries for bulk RNA. (A) Schematic representation of end-sequencing library methods. (B) Aggregation of the location of each aligned read, using the annotated transcription start sites (TSSs) and transcription termination sites (TTSs) as reference, for 5′ (green) and 3′ libraries (red).
Figure 2.Common problems in end-sequence analysis. (A,B) Scatter plots of gene expression computed from 3′ (A) and 5′ (B) libraries made from technical replicates of mouse bone marrow–derived dendritic cells (mBMDCs) 2 h after LPS stimulation. Red dots highlight outliers (at least 10-fold difference between replicates in 3′ libraries). (C,D) Examples of annotated TSSs and TTSs that do not correspond to observed start and end sites in our samples. Read coverage is normalized to library size. (E) Distance from the most highly enriched window within each gene to the annotated TTS for the 2-h 3′ library (left) and to the TSS for the 5′ library (right). (F) Fraction of repetitive sequence in 3′ UTRs, 5′ UTRs, and coding sequence (CDS) as estimated by RepeatMasker, downloaded from the UCSC Genome Browser (Smit et al. 2004; Rosenbloom et al. 2015), in mouse (black) and rat (gray) annotated genes.
Figure 3.Schematic representation of the ESAT pipeline.
Figure 4.Global switch to shorter 3′ UTR expression in stimulated DCs. (A) Boxplots of the fraction of transcripts expressing the shortest UTR for genes with detectable expression of at least two distinct 3′ UTRs in unstimulated DCs (total of 1807). P-values were computed using a Mann-Whitney rank sum test between the unstimulated distribution and each of the time points shown. (B) Illustrative example showing the subtle yet reproducible increase in the expression of the shorter isoform of Tmem248 in stimulated DCs. Read coverage is normalized to library size.
Figure 5.Single-cell analysis of rat pancreatic islets. (A) Summary of the study. (B) Two-dimensional view of a nine-component independent component analysis (ICA) projection using t-distributed stochastic neighbor embedding (t-SNE). Cells are colored according the clusters obtained after spectral clustering (Methods) of the nine-component ICA projection. (C) Violin plots showing the distribution of hormone expression across cells in each cluster.
Figure 6.Classification of 1000 islet cells. (A) Hierarchical clustering of the 940 genes that are differentially expressed between any pair of the nine-cell clusters. Gray and black rectangles indicate major gene clusters identified by hierarchical clustering, and Gene Ontology terms significantly enriched (adjusted P-value <0.001) for each group are indicated (data available in Supplemental Table S3). Each of the cell clusters (same order and color as in Fig. 5B) are indicated by the rectangles at the top of the figure. Hand-picked genes of interest are highlighted at the right of the figure. (B) Violin plot showing Vegfa expression and that of its receptors. (C) Boxplots showing the differences in the distribution of total UMIs per cell for each of the cell clusters. (D) Violin plots showing the normalized expression of amylin (Iapp) in UMIs per million across the cells in each cluster.