Literature DB >> 30118475

ggsashimi: Sashimi plot revised for browser- and annotation-independent splicing visualization.

Diego Garrido-Martín1,2, Emilio Palumbo1,2, Roderic Guigó1,2,3, Alessandra Breschi1,2.   

Abstract

We present ggsashimi, a command-line tool for the visualization of splicing events across multiple samples. Given a specified genomic region, ggsashimi creates sashimi plots for individual RNA-seq experiments as well as aggregated plots for groups of experiments, a feature unique to this software. Compared to the existing versions of programs generating sashimi plots, it uses popular bioinformatics file formats, it is annotation-independent, and allows the visualization of splicing events even for large genomic regions by scaling down the genomic segments between splice sites. ggsashimi is freely available at https://github.com/guigolab/ggsashimi. It is implemented in python, and internally generates R code for plotting.

Entities:  

Mesh:

Year:  2018        PMID: 30118475      PMCID: PMC6114895          DOI: 10.1371/journal.pcbi.1006360

Source DB:  PubMed          Journal:  PLoS Comput Biol        ISSN: 1553-734X            Impact factor:   4.475


This is a PLoS Computational Biology Software paper.

Introduction

Alternative splicing is the process through which different combinations of exons of the same gene are selected to produce a variety of mature coding and non-coding transcripts [1]. The genome-wide landscape of alternative splicing can be easily profiled by RNA sequencing (RNA-seq) and tens of thousands of different RNA-seq experiments are now publicly available. While visualization of RNA-seq data is crucial for exploratory data analysis, visualization of splicing events is currently not dynamically integrated in common genome browsers, and stand-alone software are annotation-dependent. Visualizing splicing events is particularly challenging because such events usually occur between two regions, known as splice sites, that are not contiguous on the genome sequence, and can be as distant as tens or even hundreds of kilobases in linear space. The representation of a splicing event implies drawing a connective element that illustrates the presence of a splice junction between two splice sites. The sashimi plot [2] is a very effective and established diagram which combines the information of read coverage along a gene –a signal track– with curves connecting splice sites supported by RNA-seq data. A tool for drawing sashimi plots was initially developed as part of the MISO suite [3], a software that quantifies and compares alternative splicing from RNA-seq experiments. Current popular implementations include a stand-alone utility to create sashimi plots specifically for MISO-indexed splicing events [2] and a built-in available within the Integrative Genomics Viewer, IGV [4]. Thus, the former relies on a proper compatible annotation of the event, while the latter requires IGV installation and the time-consuming uploading of voluminous alignment files. Moreover, both of them represent splicing events for each RNA-seq experiment on a separate line, which hinders the comparison of more than a dozen samples.

Design and implementation

Like the original tool for sashimi plots [3], the data processing part of ggsashimi is implemented in python. In contrast to the original tool, ggsashimi internally generates an R script which uses the ggplot2 library [5] for the graphical rendering. To ensure reproducibility, it is distributed in a Docker image, which includes the ggsashimi python script and all the required dependencies. In its simplest usage, ggsashimi generates a publication-ready image with a read coverage histogram and curves connecting splice sites, from a single RNA-seq experiment. Curves have variable widths, proportional to the relative number of reads supporting the splice junction. In line with the most utilized bioinformatics file formats, the required input is a standard alignment BAM file (with no special aligner-dependent features), and genomic coordinates indicating the region to display. The BAM file must be coordinate-sorted and indexed in order to efficiently extract the reads from a determined genomic region. Splice junctions are inferred directly from the BAM file, and no prior knowledge of the splicing event is needed. The output of ggsashimi is available in both vector (SVG, PDF) and raster (PNG, JPEG, TIFF) formats. For the latter, the resolution in pixels per inch can be defined by the user. To allow comparisons across multiple experiments, a list of files can be specified and the signal for each experiment is plotted on a separate line. However, with increasing number of samples, visual comparison of separate plots becomes too overwhelming and some form of aggregation is essential. To this end, ggsashimi can aggregate data for hundreds of experiments and represent plots only for the aggregated measures (see Features). Finally, an annotation plot is optionally generated to visualize transcript structures in the specified region. Again, in line with current standards, a Gene Transfer Format (GTF) file is required, with no additional description of the splicing events. Because splicing events often involve short exons flanked by proportionally very large introns, the genomic regions included between two splice sites (inferred from the alignments and not from the annotation) can be optionally shrunk for better graphical representation. We observed that updating the length of the splice junctions to the original length raised to the power of 0.7 usually renders a good balance between the lengths of introns and exons.

Features

ggsashimi presents several unique features that distinguish it from its predecessors and make it a useful tool especially for large-scale projects: Annotation-independent: no need for annotation of the splicing events. Stand-alone command-line tool: no need for cpu-expensive applications (e.g. IGV). Scales for a large number of samples by multiple aggregation methods: overlay: the signal of each individual sample is placed upon the others, using transparency to enhance visualization. The number of reads supporting each event are shown for all samples. Transparency can be modified by tweaking the parameter --alpha. This is suitable when the number of samples per group is relatively small (≤10). mean: the mean signal and the mean number of reads supporting each event across individual samples are shown. median: the median signal and the median number of reads supporting each event across individual samples are shown. Both mean and median number of reads supporting an event can also be displayed in combination with the signal overlay. Focuses on informative regions, by compressing the length of long intronic segments with no splicing events.

Results

To illustrate how ggsashimi performs and to compare it with existing implementations, we obtained a set of 12 RNA-seq samples from the ENCODE project [6], publicly available at www.encodeproject.org. Samples were classified into three cell type groups: endothelial, epithelial and mesenchymal. We focused on a single cassette exon (chr10:27044584-27044670) with different levels of inclusion across the three cell type groups (mesenchymal > epithelial > endothelial). For comparison purposes, the genomic region containing the selected splicing event was represented both using ggsashimi and the sashimi-plot built-in available within the IGV Browser (Fig 1). In the case of ggsashimi, aggregation of samples belonging to the same group (through the --overlay option) and shrinkage of intron lengths were applied (see Features), enhancing the visualization of the event.
Fig 1

Comparison of sashimi plots generated by ggsashimi and IGV.

Sashimi plots of 12 ENCODE samples belonging to 3 cell type groups (endothelial, epithelial and mesenchymal) for the region chr10:27040584-27048100 obtained by ggsashimi (A) and the sashimi-plot utility within IGV (B). The inclusion level of the exon chr10:27044584-27044670 is clearly higher in mesenchymal cells (blue), followed by epithelial (green) and endothelial cells (red). While this trend is barely observable in the IGV sashimi, which becomes complex and confusing with multiple samples, as it makes one sashimi plot per sample; the --overlay option of ggsashimi allows aggregating samples belonging to the same groups, providing a much better overview of the event. In addition, the presence of long introns flanking the exon of interest substantially enlarges the connective elements and hinders visualization in IGV sashimi. Conversely, ggsashimi avoids this problem thanks to its --shrink option, which updates the original intron lengths, enhancing visualization.

Comparison of sashimi plots generated by ggsashimi and IGV.

Sashimi plots of 12 ENCODE samples belonging to 3 cell type groups (endothelial, epithelial and mesenchymal) for the region chr10:27040584-27048100 obtained by ggsashimi (A) and the sashimi-plot utility within IGV (B). The inclusion level of the exon chr10:27044584-27044670 is clearly higher in mesenchymal cells (blue), followed by epithelial (green) and endothelial cells (red). While this trend is barely observable in the IGV sashimi, which becomes complex and confusing with multiple samples, as it makes one sashimi plot per sample; the --overlay option of ggsashimi allows aggregating samples belonging to the same groups, providing a much better overview of the event. In addition, the presence of long introns flanking the exon of interest substantially enlarges the connective elements and hinders visualization in IGV sashimi. Conversely, ggsashimi avoids this problem thanks to its --shrink option, which updates the original intron lengths, enhancing visualization.

Availability and future directions

Although the sashimi representation for splicing events is one of the standards for splicing visualization, current implementations present several limitations that narrow substantially its application. We believe that our implementation solves many of the current issues, especially we eliminated the need for specialized annotation formats and we support summarized views for hundreds of samples. Since ggsashimi uses the most popular file formats and has very few dependencies, it can be easily integrated in any splicing analysis pipeline, and can facilitate the interrogation of alternative splicing in large-scale RNA sequencing projects, such as ENCODE [6] and GTEx [7]. ggsashimi is freely available at https://github.com/guigolab/ggsashimi. Further extensions of ggsashimi include incorporating spread metrics to accompany mean and median aggregating methods, allowing the user to select which type of reads to plot (e.g. uniquely mapped) or optionally showing only the aggregated coverage.
  6 in total

1.  Quantitative visualization of alternative exon expression from RNA-seq data.

Authors:  Yarden Katz; Eric T Wang; Jacob Silterra; Schraga Schwartz; Bang Wong; Helga Thorvaldsdóttir; James T Robinson; Jill P Mesirov; Edoardo M Airoldi; Christopher B Burge
Journal:  Bioinformatics       Date:  2015-01-22       Impact factor: 6.937

Review 2.  Alternative splicing: a pivotal step between eukaryotic transcription and translation.

Authors:  Alberto R Kornblihtt; Ignacio E Schor; Mariano Alló; Gwendal Dujardin; Ezequiel Petrillo; Manuel J Muñoz
Journal:  Nat Rev Mol Cell Biol       Date:  2013-02-06       Impact factor: 94.444

3.  Analysis and design of RNA sequencing experiments for identifying isoform regulation.

Authors:  Yarden Katz; Eric T Wang; Edoardo M Airoldi; Christopher B Burge
Journal:  Nat Methods       Date:  2010-11-07       Impact factor: 28.547

4.  Human genomics. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans.

Authors: 
Journal:  Science       Date:  2015-05-07       Impact factor: 47.728

5.  Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration.

Authors:  Helga Thorvaldsdóttir; James T Robinson; Jill P Mesirov
Journal:  Brief Bioinform       Date:  2012-04-19       Impact factor: 11.622

6.  An integrated encyclopedia of DNA elements in the human genome.

Authors: 
Journal:  Nature       Date:  2012-09-06       Impact factor: 49.962

  6 in total
  51 in total

1.  Impact of Losing hRpn13 Pru or UCHL5 on Proteasome Clearance of Ubiquitinated Proteins and RA190 Cytotoxicity.

Authors:  Vasty Osei-Amponsa; Vinidhra Sridharan; Mayank Tandon; Christine N Evans; Kimberly Klarmann; Kwong Tai Cheng; Justin Lack; Raj Chari; Kylie J Walters
Journal:  Mol Cell Biol       Date:  2020-08-28       Impact factor: 4.272

2.  Therapeutic Targeting of RNA Splicing Catalysis through Inhibition of Protein Arginine Methylation.

Authors:  Jia Yi Fong; Luca Pignata; Pierre-Alexis Goy; Kimihito Cojin Kawabata; Stanley Chun-Wei Lee; Cheryl M Koh; Daniele Musiani; Enrico Massignani; Andriana G Kotini; Alex Penson; Cheng Mun Wun; Yudao Shen; Megan Schwarz; Diana Hp Low; Alexander Rialdi; Michelle Ki; Heike Wollmann; Slim Mzoughi; Florence Gay; Christine Thompson; Timothy Hart; Olena Barbash; Genna M Luciani; Magdalena M Szewczyk; Bas J Wouters; Ruud Delwel; Eirini P Papapetrou; Dalia Barsyte-Lovejoy; Cheryl H Arrowsmith; Mark D Minden; Jian Jin; Ari Melnick; Tiziana Bonaldi; Omar Abdel-Wahab; Ernesto Guccione
Journal:  Cancer Cell       Date:  2019-08-12       Impact factor: 31.743

3.  Distinct Colorectal Cancer-Associated APC Mutations Dictate Response to Tankyrase Inhibition.

Authors:  Emma M Schatoff; Sukanya Goswami; Maria Paz Zafra; Miguel Foronda; Michael Shusterman; Benjamin I Leach; Alyna Katti; Bianca J Diaz; Lukas E Dow
Journal:  Cancer Discov       Date:  2019-07-23       Impact factor: 39.397

4.  Whole genome, transcriptome and methylome profiling enhances actionable target discovery in high-risk pediatric cancer.

Authors:  Marie Wong; Chelsea Mayoh; Loretta M S Lau; David S Ziegler; Paul G Ekert; Mark J Cowley; Dong-Anh Khuong-Quang; Mark Pinese; Amit Kumar; Paulette Barahona; Emilie E Wilkie; Patricia Sullivan; Rachel Bowen-James; Mustafa Syed; Iñigo Martincorena; Federico Abascal; Alexandra Sherstyuk; Noemi A Bolanos; Jonathan Baber; Peter Priestley; M Emmy M Dolman; Emmy D G Fleuren; Marie-Emilie Gauthier; Emily V A Mould; Velimir Gayevskiy; Andrew J Gifford; Dylan Grebert-Wade; Patrick A Strong; Elodie Manouvrier; Meera Warby; David M Thomas; Judy Kirk; Katherine Tucker; Tracey O'Brien; Frank Alvaro; Geoffry B McCowage; Luciano Dalla-Pozza; Nicholas G Gottardo; Heather Tapp; Paul Wood; Seong-Lin Khaw; Jordan R Hansford; Andrew S Moore; Murray D Norris; Toby N Trahair; Richard B Lock; Vanessa Tyrrell; Michelle Haber; Glenn M Marshall
Journal:  Nat Med       Date:  2020-10-05       Impact factor: 53.440

5.  ExonSkipAD provides the functional genomic landscape of exon skipping events in Alzheimer's disease.

Authors:  Mengyuan Yang; Yiya Ke; Pora Kim; Xiaobo Zhou
Journal:  Brief Bioinform       Date:  2021-09-02       Impact factor: 11.622

6.  Exploration of Alternative Splicing Events in Mesenchymal Stem Cells from Human Induced Pluripotent Stem Cells.

Authors:  Ji-Eun Jeong; Binna Seol; Han-Seop Kim; Jae-Yun Kim; Yee-Sook Cho
Journal:  Genes (Basel)       Date:  2021-05-13       Impact factor: 4.096

7.  Nanopore Sequencing and Hi-C Based De Novo Assembly of Trachidermus fasciatus Genome.

Authors:  Gangcai Xie; Xu Zhang; Feng Lv; Mengmeng Sang; Hairong Hu; Jinqiu Wang; Dong Liu
Journal:  Genes (Basel)       Date:  2021-05-06       Impact factor: 4.096

8.  Transcriptomes of an Array of Chicken Ovary, Intestinal, and Immune Cells and Tissues.

Authors:  Eliah G Overbey; Theros T Ng; Pietro Catini; Lisa M Griggs; Paul Stewart; Suzana Tkalcic; R David Hawkins; Yvonne Drechsler
Journal:  Front Genet       Date:  2021-06-30       Impact factor: 4.599

9.  PPM1G promotes the progression of hepatocellular carcinoma via phosphorylation regulation of alternative splicing protein SRSF3.

Authors:  Dawei Chen; Zhenguo Zhao; Lu Chen; Qinghua Li; Jixue Zou; Shuanghai Liu
Journal:  Cell Death Dis       Date:  2021-07-21       Impact factor: 8.469

10.  SMG5-SMG7 authorize nonsense-mediated mRNA decay by enabling SMG6 endonucleolytic activity.

Authors:  Volker Boehm; Sabrina Kueckelmann; Jennifer V Gerbracht; Sebastian Kallabis; Thiago Britto-Borges; Janine Altmüller; Marcus Krüger; Christoph Dieterich; Niels H Gehring
Journal:  Nat Commun       Date:  2021-06-25       Impact factor: 14.919

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.