Literature DB >> 35751589

ggtranscript: an R package for the visualization and interpretation of transcript isoforms using ggplot2.

Emil K Gustavsson1,2, David Zhang1,2, Regina H Reynolds1,2, Sonia Garcia-Ruiz1,3,2, Mina Ryten1,3,2.   

Abstract

MOTIVATION: The advent of long-read sequencing technologies has increased demand for the visualisation and interpretation of transcripts. However, tools that perform such visualizations remain inflexible and lack the ability to easily identify differences between transcript structures. Here, we introduce ggtranscript, an R package that provides a fast and flexible method to visualize and compare transcripts. As a ggplot2 extension, ggtranscript inherits the functionality and familiarity of ggplot2 making it easy to use. AVAILABILITY: ggtranscript is an R package available at https://github.com/dzhang32/ggtranscript (DOI: https://doi.org/10.5281/zenodo.6374061) via an open-source MIT license. Further is available at https://dzhang32.github.io/ggtranscript/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author(s) 2022. Published by Oxford University Press.

Entities:  

Year:  2022        PMID: 35751589      PMCID: PMC9344834          DOI: 10.1093/bioinformatics/btac409

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.931


1 Introduction

Alternative splicing is a crucial post-transcriptional step through which introns are excised from messenger RNA (mRNA) precursors, and exons are spliced together to form mature mRNA isoforms. In fact, ∼95% of human genes undergo alternative splicing resulting in various forms of mature mRNA (Wang ). This process is often regulated in a tissue-specific, disease-specific or developmental manner, resulting in multiple different transcripts being generated from the same gene. It is well-recognized that it is challenging to identify full-length transcript structures from standard transcriptomic assays relying on short-read RNA-sequencing, as short-reads rarely span multiple splice junctions and therefore make it difficult to infer transcript structures (Conesa ). However, long-read sequencing platforms such as PacBio and Oxford Nanopore have transformed the field and enabled the discovery of new transcript isoforms that could not have been recognized by the assembly of short-reads. In addition, long reads facilitate better transcript quantifications and improve mapping of highly homologous sequences. Current tools to visualize transcript structures are often inflexible, allowing users very limited control over the outputted plot aesthetics or lack the ability to compare transcript structures. For example, UCSC genome browser (Kent ), IGV Browser (Robinson ) and Gviz (Hahne and Ivanek, 2016) are genome-based tracks that allow for visualization of transcripts, but are not accessible programmatically. IsoformSwitchAnalyzeR (Vitting-Seerup ), wiggleplotr and ggsashimi (Li ) offers limited customization of plot aesthetics and comparisons of transcript structures. SWAN (Reese and Mortazavi, 2021) does offer some customizable transcript visualization functions, but has limited functionality to highlight differences and is within the python framework. Here, we introduce the R package ggtranscript which makes it easy to both visualize and compare transcript structures using ggplot2 (Wickham, 2016), a popular R-based framework for data visualization based upon an intuitive grammar system that permits flexibility via combination of independent components. As a ggplot2 extension, ggtranscript inherits a vast amount of flexibility when determining the plot aesthetics, as well as interoperability with existing ggplot2 geoms and ggplot2 extensions. Furthermore, the input data for ggtranscript matches widely used formats in genetic and transcriptomic analyses.

2 Implementation

ggtranscript is an R package released that extends the incredibly popular tool ggplot2 (RRID: SCR_014601 version: 3.3.5, https://cran.r-project.org/web/packages/ggplot2/index.html) for visualizing transcript structure and annotation. As a ggplot2 extension, the input data for ggtranscript are required to be a data.frame with columns specifying the start and end positions of each feature (e.g. exon or intron) as well as identifiers for the transcript(s) to be plotted. This data format is widely used across transcriptomic and genetic analyses and matches annotation and data structures such as the GTF/GFF3 files or GenomicRanges objects. To enable the visualization of the transcript structures, ggtranscript introduces five new geoms [geom_range(), geom_half_range(), geom_intron(), geom_junction() and geom_junction_label_repel()] and several helper functions designed to facilitate the visualization of transcript structure and annotation. geom_range() and geom_intron() enable the plotting of exons and introns, the core components of transcript annotation (Fig. 1A). ggtranscript also provides the helper function to_intron(), which converts exon co-ordinates to the corresponding introns. Together, ggtranscript enables users to plot transcript structures with only exons as the required input and only a few lines of code. geom_range() is designed to be used for any range-based genomic annotation. For instance, when plotting protein-coding transcripts, geom_range() can be used to visually distinguish the coding regions from untranslated regions (Fig. 1A).
Fig. 1.

ggtranscript enables a fast and flexible method to visualize and compare transcript isoforms. ggtranscript is a ggplot2 extension that introduces five new geoms and a set of helper functions: (A) geom_range() and geom_intron() enable the plotting of exons and introns, the core components of transcript annotation. In addition, geom_range() has been used to visually distinguish coding regions from untranslated regions. (B) geom_half_range() enables users to plot only half of a range on the top or bottom of a transcript structure; one use case of which is to visualize the differences between two transcripts (SOD201 and SOD202). (C) geom_junction() enables the plotting of junction curves, which can be overlaid across transcript structures to annotate them with supporting short-read RNA-sequencing data. The number represent junction usage. (D) Longer, more complex transcripts, with small differences between exons of interest, can be more difficult to visualize. (E) For this reason, ggtranscript includes a helper function shorten_gaps() which shortens regions that do not overlap an exon to a fixed, user-inputted width. Transcripts in D and E are coloured by their transcript biotype. (F) In addition, the function to_diff() facilitates visualization of longer transcripts by highlighting differences in comparison to a reference transcript

ggtranscript enables a fast and flexible method to visualize and compare transcript isoforms. ggtranscript is a ggplot2 extension that introduces five new geoms and a set of helper functions: (A) geom_range() and geom_intron() enable the plotting of exons and introns, the core components of transcript annotation. In addition, geom_range() has been used to visually distinguish coding regions from untranslated regions. (B) geom_half_range() enables users to plot only half of a range on the top or bottom of a transcript structure; one use case of which is to visualize the differences between two transcripts (SOD201 and SOD202). (C) geom_junction() enables the plotting of junction curves, which can be overlaid across transcript structures to annotate them with supporting short-read RNA-sequencing data. The number represent junction usage. (D) Longer, more complex transcripts, with small differences between exons of interest, can be more difficult to visualize. (E) For this reason, ggtranscript includes a helper function shorten_gaps() which shortens regions that do not overlap an exon to a fixed, user-inputted width. Transcripts in D and E are coloured by their transcript biotype. (F) In addition, the function to_diff() facilitates visualization of longer transcripts by highlighting differences in comparison to a reference transcript geom_half_range() takes advantage of the vertical symmetry of transcript annotation by plotting only half of a range on the top or bottom of a transcript structure; one use case of which is to visualize the differences between two transcripts more clearly (Fig. 1B). As a ggplot2 extension, ggtranscript inherits the familiarity and functionality of ggplot2. For instance, by leveraging ggforce::facet_zoom() users can zoom in on regions of interest (Fig. 1B). geom_junction() enables the plotting of junction curves, which can be overlaid across transcript structures to annotate them with supporting short-read RNA-sequencing data (Fig. 1C). geom_junction_label_repel() adds a label to junction curves, which can often be useful to mark junctions with a metric of their usage such as read counts (Fig. 1C). For longer, more complex transcripts, small differences between exons of interest can be more difficult to visualize (Fig. 1D). For this reason, ggtranscript includes a helper function shorten_gaps() which shortens regions that do not overlap an exon to a fixed, user-inputted width. Plotting of the rescaled exons and introns enables easier comparison between transcript structures when genes are long (Fig. 1E). In addition, the function to_diff() facilitates this by highlighting differences in comparison to a reference transcript (Fig. 1F). Together, ggtranscript simplifies the process of visualizing and comparing transcript structures, facilitating the exploration, analyses and interpretation of long-read sequencing and transcriptomic data.

3 Conclusion

ggtranscript enables a fast and simplified way to visualize, explore and interpret transcript isoforms. It allows users to combine data from both long-read and short-read RNA-sequencing technologies, making systematic assessment of transcript support easier. Finally, by being a ggplot2 extension it is highly flexible and can easily generate high-quality and publication-ready plots.
  8 in total

1.  The human genome browser at UCSC.

Authors:  W James Kent; Charles W Sugnet; Terrence S Furey; Krishna M Roskin; Tom H Pringle; Alan M Zahler; David Haussler
Journal:  Genome Res       Date:  2002-06       Impact factor: 9.043

2.  IsoformSwitchAnalyzeR: analysis of changes in genome-wide patterns of alternative splicing and its functional consequences.

Authors:  Kristoffer Vitting-Seerup; Albin Sandelin
Journal:  Bioinformatics       Date:  2019-11-01       Impact factor: 6.937

3.  Visualizing Genomic Data Using Gviz and Bioconductor.

Authors:  Florian Hahne; Robert Ivanek
Journal:  Methods Mol Biol       Date:  2016

4.  Integrative genomics viewer.

Authors:  James T Robinson; Helga Thorvaldsdóttir; Wendy Winckler; Mitchell Guttman; Eric S Lander; Gad Getz; Jill P Mesirov
Journal:  Nat Biotechnol       Date:  2011-01       Impact factor: 54.908

5.  Annotation-free quantification of RNA splicing using LeafCutter.

Authors:  Yang I Li; David A Knowles; Jack Humphrey; Alvaro N Barbeira; Scott P Dickinson; Hae Kyung Im; Jonathan K Pritchard
Journal:  Nat Genet       Date:  2017-12-11       Impact factor: 38.330

6.  Swan: a library for the analysis and visualization of long-read transcriptomes.

Authors:  Fairlie Reese; Ali Mortazavi
Journal:  Bioinformatics       Date:  2021-06-09       Impact factor: 6.937

7.  Alternative isoform regulation in human tissue transcriptomes.

Authors:  Eric T Wang; Rickard Sandberg; Shujun Luo; Irina Khrebtukova; Lu Zhang; Christine Mayr; Stephen F Kingsmore; Gary P Schroth; Christopher B Burge
Journal:  Nature       Date:  2008-11-27       Impact factor: 49.962

Review 8.  A survey of best practices for RNA-seq data analysis.

Authors:  Ana Conesa; Pedro Madrigal; Sonia Tarazona; David Gomez-Cabrero; Alejandra Cervera; Andrew McPherson; Michał Wojciech Szcześniak; Daniel J Gaffney; Laura L Elo; Xuegong Zhang; Ali Mortazavi
Journal:  Genome Biol       Date:  2016-01-26       Impact factor: 13.583

  8 in total
  2 in total

1.  High FLT3 expression indicates favorable prognosis and correlates with clinicopathological parameters and immune infiltration in breast cancer.

Authors:  Rui Chen; Xinyang Wang; Jingyue Fu; Mengdi Liang; Tiansong Xia
Journal:  Front Genet       Date:  2022-09-08       Impact factor: 4.772

2.  Prognostic Value of UBE2T and Its Correlation with Immune Infiltrates in Lung Adenocarcinoma.

Authors:  Feng Xu; Na Xiong; Yuhong Yuan; Jun Liu
Journal:  J Oncol       Date:  2022-09-20       Impact factor: 4.501

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.