Literature DB >> 30841857

svist4get: a simple visualization tool for genomic tracks from sequencing experiments.

Artyom A Egorov1,2, Ekaterina A Sakharova3, Aleksandra S Anisimova4,5, Sergey E Dmitriev4,5,6, Vadim N Gladyshev4,7, Ivan V Kulakovskiy8,9,10,11.   

Abstract

BACKGROUND: High-throughput sequencing often provides a foundation for experimental analyses in the life sciences. For many such methods, an intermediate layer of bioinformatics data analysis is the genomic signal track constructed by short read mapping to a particular genome assembly. There are many software tools to visualize genomic tracks in a web browser or with a stand-alone graphical user interface. However, there are only few command-line applications suitable for automated usage or production of publication-ready visualizations.
RESULTS: Here we present svist4get, a command-line tool for customizable generation of publication-quality figures based on data from genomic signal tracks. Similarly to generic genome browser software, svist4get visualizes signal tracks at a given genomic location and is able to aggregate data from several tracks on a single plot along with the transcriptome annotation. The resulting plots can be saved as the vector or high-resolution bitmap images. We demonstrate practical use cases of svist4get for Ribo-Seq and RNA-Seq data.
CONCLUSIONS: svist4get is implemented in Python 3 and runs on Linux. The command-line interface of svist4get allows for easy integration into bioinformatics pipelines in a console environment. Extra customization is possible through configuration files and Python API. For convenience, svist4get is provided as pypi package. The source code is available at https://bitbucket.org/artegorov/svist4get/.

Entities:  

Keywords:  Genome browser; Genomic tracks; High-throughput sequencing; Next-generation sequencing; Python; RNA-Seq; Ribo-Seq; Visualization

Mesh:

Substances:

Year:  2019        PMID: 30841857      PMCID: PMC6404320          DOI: 10.1186/s12859-019-2706-8

Source DB:  PubMed          Journal:  BMC Bioinformatics        ISSN: 1471-2105            Impact factor:   3.169


Background

Next-generation sequencing gave birth to multiple high-throughput methods of the life sciences, many of which are based on mapping short sequence reads to an existing genome assembly. Visualization of mapped read densities and computationally derived genome signal tracks is one of the most routine tasks in bioinformatics sequencing data analysis. One approach is the usage of dedicated genome browsers. The most popular universal tools such as UCSC Genome Browser [1] or Zenbu [2] are web-based and allow interactive exploration of existing genome annotation along with uploaded user data. However, in some cases, it is not convenient to upload the user data to a remote server, and the data can be visualized and explored with the help of stand-alone applications with graphical user interface such as Integrated Genome Viewer [3] and Integrated Genome Browser [4], or even directly in the console environment [5]. Finally, there are hybrid approaches, for example, BioUML bioinformatics platform [6] provides genome browsing functionality in both web-based and stand-alone versions. Web-based genome browsers are great for exploratory data analysis of processed public data, but stand-alone tools are better suited for generation of custom paper-quality graphics, and for exploring data from ongoing or private experiments. In addition, it is often convenient to have a programmatic way to generate multiple images for several genomic loci. There are several existing tools aimed specifically to solve this task via scripting or command-line interface (Table 1). Among the most advanced tools, there are Gviz [7] and ggbio [8], the Bioconductor R packages dedicated to the production of paper-quality figures of genomic tracks and annotations. Users preferring command-line utilities can use fluff [9] and ngs.plot [10]. These tools provide some advanced functions for data analysis but allow only a minimalistic approach to the visualization of genomic track segments in particular genomic windows. Here we present svist4get, the software tool allowing detailed paper-quality visualization of signal tracks along transcriptome annotation at a particular genomic location.
Table 1

An overview of existing programmatic visualization tools for genomic signal tracks

svist4getfluffngs.plotggbioGvizASCIIGenome
Command-line interfaceyesyesyesnonoyes
Programming languagePythonPythonR, PythonRRJava
APIyesnonoyesyesno
Outputpdf, pngpngpng, tiffpdf, pngpdf, pngconsole (text)
Vector graphics outputyesnonoyesyesno
Reference[9][10][8][7][5]
An overview of existing programmatic visualization tools for genomic signal tracks All the listed tools can function in Linux environment and support bed or bedGraph format for genomic signal tracks and gtf or gff for genomic annotation. Most of the tools are not focused on visualization of genomic windows and include advanced functions for data analysis or exploration.

Implementation

Svist4get is implemented in Python 3 and uses multiple pypi packages (argparse, biopython [11], configs, reportlab, pybedtools [12], wand, wheel). Pypi ‘wand’ package and ImageMagick are utilized for pdf-to-png conversion. Svist4get was developed and tested in Linux environment. The python svist4get package is available in pypi (python3 -m pip install svist4get), the source code and example data are provided in Additional file 1. Details of svist4get installation are given in Additional file 2. As input data, svist4get supports bedGraph format for genomic signals and gtf format of the genome annotation. As the output data, svist4get can generate vector graphics in pdf and export raster graphics in png. ImageMagick is used to provide raster (png) output. Given a particular genomic window and a set of genomic signal tracks, svist4get automatically performs moving-average smoothing of the signal tracks, if necessary, taking into account the image width and the visible length of the genomic window. However, svist4get is a pure visualization tool, thus the technical data conversion and pre-processing, such as read depth normalization, should be performed with external tools, such as deeptools [13], bedtools [14], or UCSC utilities [15]. To facilitate application of svist4get in standard scenarios and data exploration, the command line interface covers several practical use cases that arise in transcriptomic studies, without additional effort for user-side scripting. Furthermore, svist4get provides a Python API allowing additional customization and programmatic usage from within a Python program. The use cases and examples of svist4get results are described in the next section.

Results and discussion

Svist4get capabilities are demonstrated in [16], where figures were produced with svist4get Python API. Here we show several practical use cases of the command-line interface by visualizing particular genomic windows related to genes and transcripts using existing genome annotation. The command line parameters to reproduce the presented images are provided in Additional file 2. The basic cases (Figs. 1 and 2) are illustrated by using yeast ribosome profiling (Ribo-Seq) and RNA-Seq data from [17] downloaded from GWIPs-viz [18], and yeast genome reference annotation from Ensembl [19]. For convenience, in the svist4get package, we include truncated sample data that is used for demonstration purposes. Visualization of tissue-specific expression of different transcript isoforms (Fig. 3) uses mouse Ribo-Seq data [20, 21] that was downloaded from GWIPs-viz [18, 22] and GENCODE M13 [23] mouse genome annotation.
Fig. 1

Transcript-centric selection and visualization of a genomic window. The top track shows the YFL031W transcript structure with the collapsed intronic region (short red bar on the right). The tracks in the middle show Ribo-Seq (ribosome A-sites and aggregated read density) and RNA-Seq (aggregated read density) signals. The bottom track shows the 0, + 1, and + 2 reading frames with the start and stop codons marked by green and red bars, respectively. The transcript open reading frame is highlighted. The data is taken from [17]

Fig. 2

Ribo-Seq (ribosome A-sites and aggregated coverage) and RNA-Seq (aggregated coverage) signals in the vicinity of the translation initiation site of DFG16 gene. Upstream ORF in 5′ region is highlighted. In comparison to Fig. 1, the genomic window has lower length and the image uses a wider template, allowing single-nucleotide resolution. The data is taken from [17]

Fig. 3

Ribo-Seq and RNA-Seq aggregated coverage signals in mouse kidney and liver data. The genomic window is centered on overlapping annotated transcripts displaying tissue-specific ribosome occupancy (Ribo-Seq tracks) and transcript abundance (RNA-Seq tracks). The red marks on the transcript structure track (on top) correspond to the collapsed intronic regions which are reconcilable for both shown transcripts. The data is taken from [20, 21]

Transcript-centric selection and visualization of a genomic window. The top track shows the YFL031W transcript structure with the collapsed intronic region (short red bar on the right). The tracks in the middle show Ribo-Seq (ribosome A-sites and aggregated read density) and RNA-Seq (aggregated read density) signals. The bottom track shows the 0, + 1, and + 2 reading frames with the start and stop codons marked by green and red bars, respectively. The transcript open reading frame is highlighted. The data is taken from [17] Ribo-Seq (ribosome A-sites and aggregated coverage) and RNA-Seq (aggregated coverage) signals in the vicinity of the translation initiation site of DFG16 gene. Upstream ORF in 5′ region is highlighted. In comparison to Fig. 1, the genomic window has lower length and the image uses a wider template, allowing single-nucleotide resolution. The data is taken from [17] Ribo-Seq and RNA-Seq aggregated coverage signals in mouse kidney and liver data. The genomic window is centered on overlapping annotated transcripts displaying tissue-specific ribosome occupancy (Ribo-Seq tracks) and transcript abundance (RNA-Seq tracks). The red marks on the transcript structure track (on top) correspond to the collapsed intronic regions which are reconcilable for both shown transcripts. The data is taken from [20, 21]

Basic visualization of genomic windows

We employed svist4get to generate a visualization of the genomic window containing the YFL031W transcript of HAC1 gene (Fig. 1). Based on genome annotation and a transcript identifier, svist4get selects a genomic window that includes a particular transcript. Alternative scenarios include the selection of a genomic window based on gene identifier and visualization of all transcripts in a given window (Additional file 2). Svist4get renders the transcript structure (based on genome annotation) as the top track, below it places the signal tracks (based on data in bedGraph format), and the structure of open reading frames (0, + 1, + 2, based on the nucleotide sequence of the displayed window) is shown at the bottom.

Visualizing a genomic window at the single-nucleotide resolution

We also used svist4get to show a surrounding region of a translation initiation site of DFG16 yeast gene (Fig. 2), including an upstream open reading frame (ORF). The general layout of tracks in Fig. 2 is similar to that of Fig. 1. An additional track is used to show arbitrary genomic segments with user-defined labels (upstream ORF and CDS). A smaller genomic region surrounding DFG16 translation initiation site was selected based on transcript ID. A wider template (the predefined configuration file) allowed single-nucleotide resolution.

Visualizing ribosome occ2upancy in overlapping transcripts

We also show a multi-track visualization illustrating differential ribosome occupancy in mouse kidney and liver Ribo-Seq data (Fig. 3). Reconcilable parts of introns of two annotated transcripts are collapsed (red vertical marks on the transcript structure tracks) to facilitate a non-interrupted view of the translated shortened open reading frame that is specific to the liver.

Advanced features and customization

A basic bedGraph track is potentially useful to display various transcriptomic and genomic signals, such as DNase-Seq or ChIP-Seq. However, it is often necessary to visually separate signals on the primary and the reverse complementary DNA strands. To this end, svist4get provides paired bedGraph tracks, which use a single Y-axis to plot signals from a given pair of bedGraph files in the positive and negative value ranges (Fig. 4). Figure 4 also demonstrates multiple highlighting by showcasing translated segments of the MATa locus transcripts.
Fig. 4

Ribo-Seq and RNA-Seq aggregated coverage of MAT locus in MATa yeast strain. Translated segments of transcripts are highlighted. Paired bedGraph tracks with custom colors are used to show coverage of two DNA strands separately. The data is taken from [16]

Ribo-Seq and RNA-Seq aggregated coverage of MAT locus in MATa yeast strain. Translated segments of transcripts are highlighted. Paired bedGraph tracks with custom colors are used to show coverage of two DNA strands separately. The data is taken from [16] The visualization of svist4get is highly customizable. Some essential options, such as custom track coloring, are available directly through the command-line interface. Other parameters, such as color palette, bitmap DPI setting, font typeface, and page size are defined in configuration files (see Additional file 2 for details). The package includes default color palette and editable configuration files for generating figures to fit one- and two-column layout of an A4 page.

Conclusions

Data from high-throughput sequencing requires specialized visualization tools. Here, we present svist4get, which produces publication-quality images of signal tracks along transcript structure in arbitrary genomic windows. We believe svist4get provides a reasonable compromise between tools with advanced R APIs and user-friendly graphical interfaces and can be useful as a component of bioinformatics pipelines as well as a stand-alone tool for data exploration.

Availability and requirements

Project name: svist4get. Project home page: https://bitbucket.org/artegorov/svist4get Operating system(s): Linux. Programming language: Python 3. Other requirements: pypi packages (argparse, biopython, configs, reportlab, pybedtools, wand, wheel), ImageMagick (OS-level requirement for wand). License: WTFPL http://www.wtfpl.net Svist4get source code and sample data. Python code and samples of yeast data used for generation of figures. (TGZ 5787 kb) Svist4get installation instructions and command-line examples. Installation instructions and console commands to reproduce the figures. (PDF 393 kb)
  22 in total

1.  Interactive visualization and analysis of large-scale sequencing datasets using ZENBU.

Authors:  Jessica Severin; Marina Lizio; Jayson Harshbarger; Hideya Kawaji; Carsten O Daub; Yoshihide Hayashizaki; Nicolas Bertin; Alistair R R Forrest
Journal:  Nat Biotechnol       Date:  2014-03       Impact factor: 54.908

2.  Biopython: freely available Python tools for computational molecular biology and bioinformatics.

Authors:  Peter J A Cock; Tiago Antao; Jeffrey T Chang; Brad A Chapman; Cymon J Cox; Andrew Dalke; Iddo Friedberg; Thomas Hamelryck; Frank Kauff; Bartek Wilczynski; Michiel J L de Hoon
Journal:  Bioinformatics       Date:  2009-03-20       Impact factor: 6.937

3.  GENCODE: the reference human genome annotation for The ENCODE Project.

Authors:  Jennifer Harrow; Adam Frankish; Jose M Gonzalez; Electra Tapanari; Mark Diekhans; Felix Kokocinski; Bronwen L Aken; Daniel Barrell; Amonida Zadissa; Stephen Searle; If Barnes; Alexandra Bignell; Veronika Boychenko; Toby Hunt; Mike Kay; Gaurab Mukherjee; Jeena Rajan; Gloria Despacio-Reyes; Gary Saunders; Charles Steward; Rachel Harte; Michael Lin; Cédric Howald; Andrea Tanzer; Thomas Derrien; Jacqueline Chrast; Nathalie Walters; Suganthi Balasubramanian; Baikang Pei; Michael Tress; Jose Manuel Rodriguez; Iakes Ezkurdia; Jeltje van Baren; Michael Brent; David Haussler; Manolis Kellis; Alfonso Valencia; Alexandre Reymond; Mark Gerstein; Roderic Guigó; Tim J Hubbard
Journal:  Genome Res       Date:  2012-09       Impact factor: 9.043

4.  Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration.

Authors:  Helga Thorvaldsdóttir; James T Robinson; Jill P Mesirov
Journal:  Brief Bioinform       Date:  2012-04-19       Impact factor: 11.622

5.  The UCSC Genome Browser database: update 2011.

Authors:  Pauline A Fujita; Brooke Rhead; Ann S Zweig; Angie S Hinrichs; Donna Karolchik; Melissa S Cline; Mary Goldman; Galt P Barber; Hiram Clawson; Antonio Coelho; Mark Diekhans; Timothy R Dreszer; Belinda M Giardine; Rachel A Harte; Jennifer Hillman-Jackson; Fan Hsu; Vanessa Kirkup; Robert M Kuhn; Katrina Learned; Chin H Li; Laurence R Meyer; Andy Pohl; Brian J Raney; Kate R Rosenbloom; Kayla E Smith; David Haussler; W James Kent
Journal:  Nucleic Acids Res       Date:  2010-10-18       Impact factor: 16.971

6.  Pybedtools: a flexible Python library for manipulating genomic datasets and annotations.

Authors:  Ryan K Dale; Brent S Pedersen; Aaron R Quinlan
Journal:  Bioinformatics       Date:  2011-09-23       Impact factor: 6.937

7.  ggbio: an R package for extending the grammar of graphics for genomic data.

Authors:  Tengfei Yin; Dianne Cook; Michael Lawrence
Journal:  Genome Biol       Date:  2012-08-31       Impact factor: 13.583

8.  The UCSC Genome Browser database: extensions and updates 2013.

Authors:  Laurence R Meyer; Ann S Zweig; Angie S Hinrichs; Donna Karolchik; Robert M Kuhn; Matthew Wong; Cricket A Sloan; Kate R Rosenbloom; Greg Roe; Brooke Rhead; Brian J Raney; Andy Pohl; Venkat S Malladi; Chin H Li; Brian T Lee; Katrina Learned; Vanessa Kirkup; Fan Hsu; Steve Heitner; Rachel A Harte; Maximilian Haeussler; Luvina Guruvadoo; Mary Goldman; Belinda M Giardine; Pauline A Fujita; Timothy R Dreszer; Mark Diekhans; Melissa S Cline; Hiram Clawson; Galt P Barber; David Haussler; W James Kent
Journal:  Nucleic Acids Res       Date:  2012-11-15       Impact factor: 16.971

9.  GWIPS-viz: development of a ribo-seq genome browser.

Authors:  Audrey M Michel; Gearoid Fox; Anmol M Kiran; Christof De Bo; Patrick B F O'Connor; Stephen M Heaphy; James P A Mullan; Claire A Donohue; Desmond G Higgins; Pavel V Baranov
Journal:  Nucleic Acids Res       Date:  2013-10-31       Impact factor: 16.971

10.  ngs.plot: Quick mining and visualization of next-generation sequencing data by integrating genomic databases.

Authors:  Li Shen; Ningyi Shao; Xiaochuan Liu; Eric Nestler
Journal:  BMC Genomics       Date:  2014-04-15       Impact factor: 3.969

View more
  7 in total

1.  Mechanism and functional role of the interaction between CP190 and the architectural protein Pita in Drosophila melanogaster.

Authors:  Marat Sabirov; Olga Kyrchanova; Galina V Pokholkova; Artem Bonchuk; Natalia Klimenko; Elena Belova; Igor F Zhimulev; Oksana Maksimenko; Pavel Georgiev
Journal:  Epigenetics Chromatin       Date:  2021-03-22       Impact factor: 4.954

2.  Ribo-Seq and RNA-Seq of TMA46 ( DFRP1) and GIR2 ( DFRP2) knockout yeast strains.

Authors:  Artyom A Egorov; Desislava S Makeeva; Nadezhda E Makarova; Dmitri A Bykov; Yanislav S Hrytseniuk; Olga V Mitkevich; Valery N Urakov; Alexander I Alexandrov; Ivan V Kulakovskiy; Sergey E Dmitriev
Journal:  F1000Res       Date:  2021-11-16

3.  PHF10 subunit of PBAF complex mediates transcriptional activation by MYC.

Authors:  N V Soshnikova; E V Tatarskiy; V V Tatarskiy; N S Klimenko; A A Shtil; M A Nikiforov; S G Georgieva
Journal:  Oncogene       Date:  2021-08-31       Impact factor: 9.867

4.  RNA-Seq data of ALKBH5 and FTO double knockout HEK293T human cells.

Authors:  Egor A Smolin; Andrey I Buyan; Dmitry N Lyabin; Ivan V Kulakovskiy; Irina A Eliseeva
Journal:  Data Brief       Date:  2022-04-20

5.  Genome-scale RNA interference profiling of Trypanosoma brucei cell cycle progression defects.

Authors:  Catarina A Marques; Melanie Ridgway; Michele Tinti; Andrew Cassidy; David Horn
Journal:  Nat Commun       Date:  2022-09-10       Impact factor: 17.694

6.  SEQing: web-based visualization of iCLIP and RNA-seq data in an interactive python framework.

Authors:  Martin Lewinski; Yannik Bramkamp; Tino Köster; Dorothee Staiger
Journal:  BMC Bioinformatics       Date:  2020-03-18       Impact factor: 3.169

7.  A standard knockout procedure alters expression of adjacent loci at the translational level.

Authors:  Artyom A Egorov; Alexander I Alexandrov; Valery N Urakov; Desislava S Makeeva; Roman O Edakin; Artem S Kushchenko; Vadim N Gladyshev; Ivan V Kulakovskiy; Sergey E Dmitriev
Journal:  Nucleic Acids Res       Date:  2021-11-08       Impact factor: 16.971

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.