| Literature DB >> 34763716 |
Luyi Tian1,2, Jafar S Jabbari3,4, Rachel Thijssen3,5, Quentin Gouil3,5, Shanika L Amarasinghe3,5, Oliver Voogd3, Hasaru Kariyawasam3, Mei R M Du3, Jakob Schuster3, Changqing Wang3, Shian Su3,5, Xueyi Dong3,5, Charity W Law3,5, Alexis Lucattini4, Yair David Joseph Prawer6, Coralina Collar-Fernández7, Jin D Chung8, Timur Naim8, Audrey Chan8, Chi Hai Ly8,9, Gordon S Lynch8, James G Ryall8,10, Casey J A Anttila3, Hongke Peng3,5, Mary Ann Anderson3,5,11, Christoffer Flensburg3,5, Ian Majewski3,5, Andrew W Roberts3,5,11,12,13, David C S Huang3,5, Michael B Clark6, Matthew E Ritchie14,15.
Abstract
A modified Chromium 10x droplet-based protocol that subsamples cells for both short-read and long-read (nanopore) sequencing together with a new computational pipeline (FLAMES) is developed to enable isoform discovery, splicing analysis, and mutation detection in single cells. We identify thousands of unannotated isoforms and find conserved functional modules that are enriched for alternative transcript usage in different cell types and species, including ribosome biogenesis and mRNA splicing. Analysis at the transcript level allows data integration with scATAC-seq on individual promoters, improved correlation with protein expression data, and linked mutations known to confer drug resistance to transcriptome heterogeneity.Entities:
Keywords: Long-read sequencing; Single-cell gene expression; Single-cell multi-omics; Splicing
Mesh:
Substances:
Year: 2021 PMID: 34763716 PMCID: PMC8582192 DOI: 10.1186/s13059-021-02525-6
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Fig. 1Overview of experimental design, modified 10x protocol, FLAMES method, and basic summary statistics. A Summary of the study design, with an overview of the modified 10x protocol and FLAMES data processing pipeline. B UMAP visualization of cells in each sample, cells colored in red are sampled for long-read sequencing. scmixology1 and scmixology2 were integrated and shown together in one plot. All UMAP visualizations are based on short-read data. C The number of nanopore reads generated from each sample, and the percentage of reads that were assigned a cell barcode. D Distribution of UMI counts per cell for Illumina and nanopore data in each sample. E Correlations between gene UMI counts generated from nanopore long-read and Illumina short-read data. F Density scatter plot shows the relationship between transcript-level counts and scATAC-seq read counts around the TSS regions. The horizontal orange line shows the threshold calculated that separates open chromatin from the background. The percentage shows the transcripts that have their TSSs in open chromatin regions
Fig. 2Overview of the single-cell isoform-level analysis from FLAMES. A Classification of transcripts according to their splice sites when compared to reference annotations. B Summary of transcripts in different categories in A both in numbers (left) and in the percentage of UMI counts (right). C UpSet plot showing overlap of transcripts in human datasets, where the number of transcripts shared by different sets of samples is indicated in the top bar chart, colored by categories specified in A. D UMAP visualization of CLL2 and MuSC dataset on the cells sampled for long-read sequencing. Colored by percentage of UMI counts of transcripts in FSM (top) and NNC (bottom) categories. CLL cells and quiescent MuSCs are annotated on the plot. E Bar plot of the number of distinct transcripts expressed per gene. Genes with more than five distinct transcripts are merged. F Box plot showing the percentage of transcript abundance relative to gene abundance for genes express multiple transcripts. Transcripts are ranked by abundance, shown on the x-axis. G Summary of the type of alternative splicing between the two most abundant transcripts of each gene. The “Complex splicing changes” category represents transcripts with more than one type of constitutive alternatively spliced event
Fig. 3Summary of differential transcript usage results from FLAMES. A Summary of results from the statistical testing of DTU detected many significant genes per sample (adjusted P-value < 0.01). B Table of common functional categories among different samples from the functional enrichment analysis of gDTU. C Top 4 most abundant isoforms of RPS24 in human and heatmap of their expression at the single-cell level in the scmixology1, scmixology2, and CLL2 samples. D Top 4 most abundant isoforms of RPS24 in mouse and heatmap of their expression at the single-cell level in MuSCs. E UMAP of cells in CLL2, colored by RPS24 gene expression and transcript expression. Two transcripts with differential expression on different populations were selected. Transcript expression in each cell is colored by scaled relative expression to highlight the difference between different populations. F Similar to E, UMAP of cells in MuSC sample, colored by RPS24 gene expression and transcript expression. G Top 4 most abundant isoforms of CD45 in CLL2 and UMAP visualizations of the cells colored by (from left to right) gene expression, transcript expression, and corresponding protein expression. H Top 4 most abundant isoforms of CD82 in MuSC, with UMAP visualization of cells colored by expression of two isoforms that have differential expression between quiescent and activated MuSC. I scATAC-seq read coverage for PRDX1 with cells from each cell line aggregated and plotted together. UMAP plots showing isoform expression, with each cell colored by scaled transcript expression
Fig. 4Summary of differential allele frequency analysis to detect coding mutations. A Summary of variation filtering and analysis pipeline implemented in FLAMES. Candidate variants are filtered based on allele frequency first, then based on per cell allele frequency to remove technical artifacts. The remaining variants are used for differential allele frequency analysis. B PCA on alternative allele matrix using the variants after filtering, colored by unsupervised clustering results using top PCs and annotated with cell lines. C Manhattan plot of P-values from a differential allele frequency analysis with Benjamini–Hochberg adjustment. Genes with significant variants are labeled. D UMAP visualization highlighting two CLL populations that have differential allele frequency for the significant variants. E UMAP visualization of cells colored by Gly101Val mutation status