Literature DB >> 35604081

ScisorWiz: Visualizing Differential Isoform Expression in Single-Cell Long-Read Data.

Alexander N Stein^1,2, Anoushka Joglekar^1,2, Chi-Lam Poon^1,2, Hagen U Tilgner^1,2.

Abstract

RNA isoforms contribute to the diverse functionality of the proteins they encode within the cell. Visualizing how isoform expression differs across cell types and brain regions can inform our understanding of disease and gain or loss of functionality caused by alternative splicing with potential negative impacts. However, the extent to which this occurs in specific cell types and brain regions is largely unknown. This is the kind of information that ScisorWiz plots can provide in an informative and easily communicable manner. ScisorWiz affords its user the opportunity to visualize specific genes across any number of cell types, and provides various sorting options for the user to gain different ways to understand their data. ScisorWiz provides a clear picture of differential isoform expression through various clustering methods and highlights features such as alternative exons and single nucleotide variants (SNVs). Tools like ScisorWiz are key for interpreting single-cell isoform sequencing data. This tool applies to any single-cell long-read RNA sequencing data in any cell type, tissue, or species.

Entities: Chemical

Keywords: Computational Neuroscience; Differential Isoform Expression; Genetics; RNA Splicing

Year: 2022 PMID： 35604081 PMCID： PMC9237735 DOI： 10.1093/bioinformatics/btac340

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.931

1 Introduction

Differential isoform expression between cell types and across conditions plays a major role in the diversification of the proteome (Nilsen and Graveley, 2010) and functionality of transcripts in the cell (Yang ). Long-read sequencing has become widely used to address this problem (Au ; Bolisetty ; Koren ; Leung ; Oikonomopoulos ; Ruiz-Reche ; Schulz ; Sharon ; Tilgner ), and with applications to single-cell isoform sequencing studies (Arzalluz-Luque ; Gupta ; Hardwick ; Joglekar ; Volden and Vollmers, 2022). These approaches have been reviewed in Hardwick . Such data require informative visualizations for single genes, so that the impact of alternative exons, exon combinations, as well as those of transcription start site (TSS) and PolyA sites can be easily appreciated. Here, we present ScisorWiz, a streamlined tool to visualize isoform expression differences across single-cell clusters in an informative and easily communicable manner. ScisorWiz achieves this with an easy, fast and reliable method of visualizing differential isoform expression data across multiple clusters and is executable from the command line with the R language (R Core Team, 2018).

2 Usage

ScisorWiz visualizes pre-processed single-cell long-read RNA sequencing data. For a user-specified gene, reads for any number of cell types can be visualized and are clustered by chain of introns (the ordered list of a read’s introns), TSS and/or PolyA site for each cell type. We have used such plots in our long-read (Sharon ; Tilgner , 2015) and single-cell long-read publications (Gupta ; Hardwick ; Joglekar ). However, customizing such a plot for publication standards includes read mapping, shrinking of introns and recalculation of coordinates, calculation of alternative exons, adjusting plot area depending on number of reads and cell types, as well as plotting single-nucleotide variants (SNVs), insertions and deletions. This process was previously not automated and was only intended to be used for publication purposes. Now, ScisorWiz does this with a single command in R, allowing for many user-specified options including exploratory, interactive outputs and multiple ways to sort isoforms within each cell type: namely by intron chain, TSS, PolyA site, as well as all three combined. ScisorWiz can be run on output generated by scisorseqr (Joglekar ) or a similarly formatted dataset, which, in turn, can be based on diverse mappers including STAR (Dobin ) and minimap2 (Li, 2018). The first approach uses GFF-files for mappings and read-to-gene assignment files that are generated automatically by scisorseqr. However, the user is free to generate these standardized files by other means. The second method uses more specific files that are intrinsic to scisorseqr—the file in question already contains an assigned gene, TSS, PolyA sites and the intron and exon-mappings for each read. Thus, this gene plotting library communicates intimately with scisorseqr. Additionally, through the MismatchFinder function, the dataset in question can be compared against the reference genome to determine the locations of SNVs, insertions and deletions to be visualized in the plot.

3 Approach

To visualize exons separated by up to ∼100-fold larger introns, each purely intronic region is shrunk to 100 bases, while sequences that have annotated or novel exons are displayed with their real size. A drawback of this approach is that short introns (<1 kb) that are fully retained in a long read will be drawn to scale. However, very large introns (»10 kb), for which long reads are unlikely to represent the retained form will be shrunk to 100 bases. By default, the package clusters read according to intron chains. Reads with identical intron chains are thus displayed together to form exonic blocks. Alternatively, clustering can take into account any combination of TSS, PolyA sites and intron chains when using scisorseqr-generated files as input. In this situation, only reads with an assigned TSS and/or PolyA site are plotted. ScisorWiz provides a clear picture of differential isoform expression of genes in any dataset by clustering reads. This reveals differential patterns more clearly, such as alternate exon expression across and within cell types.

4 Output

ScisorWiz’s output visualizes isoforms read-by-read for any number of cell types for any user-specified gene. Figure 1 shows Snap25 gene isoforms across six cell types. Colored boxes are exons per read. For each cell type, reads are ordered by intron chain. Orange exons indicate alternatively spliced exons, defined as being included in at least 5% and at most 95% of overlapping reads taken from the entire dataset irrespective of cell type—this range is also user-specified. Consistent with previous observations (Joglekar ; Johansson ), we find that two neighboring alternative exons in Snap25 are mutually exclusive. Importantly, we observe this mutual exclusivity to be present in multiple cell types. For higher error rates such as currently in Oxford Nanopore, 20% and 80% cutoffs provide a clearer picture of alternative exons. There are multicolored dots among the cell types representing the locations of SNVs, insertions and deletions. By default, only SNVs, insertions and deletions present in at least 5% and at most 95% of overlapping reads are highlighted in order to avoid plotting random sequencing errors. However, these cutoffs can be adjusted as options by the user allowing the visualization of every single-nucleotide disagreeing with the reference genome, should this be of interest. This course of action may be useful in low error-rate sequencing such as Pacific Biosciences (Eid ). Similarly, any mismatches present within the first or last 20 bases of an alignment are not shown in order to avoid alignment artifacts at alignment ends. The bottom section is the GENCODE annotation covered by long reads. ScisorWiz also generates a file for all single-cell long reads that can be uploaded and inspected on the UCSC Genome Browser (Kent, 2002).

Fig. 1.

The isoforms of the Snap25 gene present in each read of a specific cell type are displayed one above the other to form a consistent picture of the gene expression of each cell type. The orange-colored exon represents an exon which is considered alternative as a result of a Ψ value of 5–95% inclusion irrespective of cell type. The multicolored dots on the plot represent SNVs (blue), insertions (green) and deletions (red). All SNVs, insertions and deletions included are in at least 5% and at most 95% of overlapping reads. The reads at the bottom (black) represent the part of the GENCODE annotation for Snap25 (A color version of this figure appears in the online version of this article.)

23 in total

1. Characterization of the human ESC transcriptome by hybrid sequencing.

Authors: Kin Fai Au; Vittorio Sebastiano; Pegah Tootoonchi Afshar; Jens Durruthy Durruthy; Lawrence Lee; Brian A Williams; Harm van Bakel; Eric E Schadt; Renee A Reijo-Pera; Jason G Underwood; Wing Hung Wong
Journal: Proc Natl Acad Sci U S A Date: 2013-11-26 Impact factor: 11.205

2. Single-cell isoform RNA sequencing characterizes isoforms in thousands of cerebellar cells.

Authors: Ishaan Gupta; Paul G Collier; Bettina Haase; Ahmed Mahfouz; Anoushka Joglekar; Taylor Floyd; Frank Koopmans; Ben Barres; August B Smit; Steven A Sloan; Wenjie Luo; Olivier Fedrigo; M Elizabeth Ross; Hagen U Tilgner
Journal: Nat Biotechnol Date: 2018-10-15 Impact factor: 54.908

3. STAR: ultrafast universal RNA-seq aligner.

Authors: Alexander Dobin; Carrie A Davis; Felix Schlesinger; Jorg Drenkow; Chris Zaleski; Sonali Jha; Philippe Batut; Mark Chaisson; Thomas R Gingeras
Journal: Bioinformatics Date: 2012-10-25 Impact factor: 6.937

Review 4. Expansion of the eukaryotic proteome by alternative splicing.

Authors: Timothy W Nilsen; Brenton R Graveley
Journal: Nature Date: 2010-01-28 Impact factor: 49.962

5. Comprehensive transcriptome analysis using synthetic long-read sequencing reveals molecular co-association of distant splicing events.

Authors: Hagen Tilgner; Fereshteh Jahanbani; Tim Blauwkamp; Ali Moshrefi; Erich Jaeger; Feng Chen; Itamar Harel; Carlos D Bustamante; Morten Rasmussen; Michael P Snyder
Journal: Nat Biotechnol Date: 2015-05-18 Impact factor: 54.908

6. An ancient duplication of exon 5 in the Snap25 gene is required for complex neuronal development/function.

Authors: Jenny U Johansson; Jesper Ericsson; Juliette Janson; Simret Beraki; Davor Stanić; Slavena A Mandic; Martin A Wikström; Tomas Hökfelt; Sven Ove Ogren; Björn Rozell; Per-Olof Berggren; Christina Bark
Journal: PLoS Genet Date: 2008-11-28 Impact factor: 5.917

Review 7. Getting the Entire Message: Progress in Isoform Sequencing.

Authors: Simon A Hardwick; Anoushka Joglekar; Paul Flicek; Adam Frankish; Hagen U Tilgner
Journal: Front Genet Date: 2019-08-16 Impact factor: 4.599

8. A spatially resolved brain region- and cell type-specific isoform atlas of the postnatal mouse brain.

Authors: Anoushka Joglekar; Andrey Prjibelski; Ahmed Mahfouz; Paul Collier; Susan Lin; Anna Katharina Schlusche; Jordan Marrocco; Stephen R Williams; Bettina Haase; Ashley Hayes; Jennifer G Chew; Neil I Weisenfeld; Man Ying Wong; Alexander N Stein; Simon A Hardwick; Toby Hunt; Qi Wang; Christoph Dieterich; Zachary Bent; Olivier Fedrigo; Steven A Sloan; Davide Risso; Erich D Jarvis; Paul Flicek; Wenjie Luo; Geoffrey S Pitt; Adam Frankish; August B Smit; M Elizabeth Ross; Hagen U Tilgner
Journal: Nat Commun Date: 2021-01-19 Impact factor: 14.919

9. acorde unravels functionally interpretable networks of isoform co-usage from single cell data.

Authors: Sonia Tarazona; Ana Conesa; Angeles Arzalluz-Luque; Pedro Salguero
Journal: Nat Commun Date: 2022-04-05 Impact factor: 17.694

10. Direct long-read RNA sequencing identifies a subset of questionable exitrons likely arising from reverse transcription artifacts.

Authors: Laura Schulz; Manuel Torres-Diz; Mariela Cortés-López; Katharina E Hayer; Mukta Asnani; Sarah K Tasian; Yoseph Barash; Elena Sotillo; Kathi Zarnack; Julian König; Andrei Thomas-Tikhonenko
Journal: Genome Biol Date: 2021-06-28 Impact factor: 13.583