Literature DB >> 26346985

VING: a software for visualization of deep sequencing signals.

Marc Descrimes1, Yousra Ben Zouari2, Maxime Wery3, Rachel Legendre4, Daniel Gautheret5, Antonin Morillon6.   

Abstract

BACKGROUND: Next generation sequencing (NGS) data treatment often requires mapping sequenced reads onto a reference genome for further analysis. Mapped data are commonly visualized using genome browsers. However, such software are not suited for a publication-ready and versatile representation of NGS data coverage, especially when multiple experiments are simultaneously treated.
RESULTS: We developed 'VING', a stand-alone R script that takes as input NGS mapping files and genome annotations to produce accurate snapshots of the NGS coverage signal for any specified genomic region. VING offers multiple viewing options, including strand-specific views and a special heatmap mode for representing multiple experiments in a single figure.
CONCLUSIONS: VING produces high-quality figures for NGS data representation in a genome region of interest. It is available at http://vm-gb.curie.fr/ving/. We also developed a Galaxy wrapper, available in the Galaxy tool shed with installation and usage instructions.

Entities:  

Mesh:

Year:  2015        PMID: 26346985      PMCID: PMC4562374          DOI: 10.1186/s13104-015-1404-5

Source DB:  PubMed          Journal:  BMC Res Notes        ISSN: 1756-0500


Findings

Background

NGS is now widely used to study all aspects of gene expression from chromatin conformation (Hi-C) to protein-DNA binding (chromatin immunoprecipitation sequencing, ChIP-seq), transcription (native elongating transcript sequencing, NET-seq), RNA abundance (RNA-seq) and translation (ribosome profiling). A common step in most NGS approaches is the mapping of sequenced reads to a reference genome and analysis of the resulting signal. Multiple tools have been developed for quantitative analysis of NGS data. However, data visualization remains difficult because of the large quantity of information to display. Genome browsers such as Artemis [1], IGV [2] or Gbrowse [3] enable rapid navigation along the genome and coverage visualization, but are not fit for accurate, publication-quality image, neither for displaying multiple libraries. Alternatively, combinations of software such as BEDtools [4] and R or Matlab functions can produce customized plots, but require programming skills. Likewise, the Gviz R package [5], which enables customized display of a variety of genome annotation tracks, including NGS data, requires mastering the R environment and R objects. Here, we describe ‘VING’, an R package dedicated to the custom visualization of NGS data that can be easily launched using a single Unix command line, or within the Galaxy environment. VING introduces functionalities to handle data produced by the most recent NGS protocols, in a strand-specific manner. The code is optimized to enable a fast figure generation, even for the largest mapping files and genomes.

VING components

VING produces snapshots of genomic regions from any set of mapping and annotation files, using a single command line. VING combines: loading of bam mapping files with optional user-provided normalization factors, loading of gff annotation file(s), plotting of signal and annotated genomic features.

Inputs

VING uses as input bam alignment files [6] and gff annotation files (description of the gff format can be found at http://www.sanger.ac.uk/resources/software/gff/). VING loads bam files using the Bioconductor package “Rsamtools”. Single-end or paired-end data are allowed and the library type can be specified as a parameter to assign reads to the proper strands. For paired-end data, each properly paired read is loaded as one single fragment. Users can also provide weights for normalization of each bam file. Annotation files are read by a custom function that only loads genomic features within coordinates defined by the users, enabling a faster operation. Users can also select the features to display.

Signal visualization

The coverage signal (number of reads covering each nucleotide) is only computed for the requested genome area. Users may provide optional normalization factors for weighting each signal. These factors should be computed independently, either based on library sizes (RPM normalization) or using a dedicated package such as DESeq [7] or EdgeR [8]. The signal is plotted in a strand-specific manner using any of the three visualization modes: “classic” coverage plots using solid areas (each library in a distinct panel, Fig. 1a); “line” plots using lines of different colors and/or styles (one panel for all libraries, limited to 16 libraries, Fig. 1b, c); “heatmap” views based on a color-code to reveal high/low-density coverage regions (one panel for each strand, libraries shown as lanes in each of the two panels, no limitation of samples, Fig. 1d, e). Output files can be produced in high-resolution (300 dpi) tiff, jpeg, png or pdf format.
Fig. 1

Examples of NGS signal visualization using VING. a Strand-specific “classic” visualization of 21–25 nucleotides small RNA densities along the SPAC167.03c locus in rdp1Δ Schizosaccharomyces pombe control cells (vector) or cells overexpressing Dcr1. Signal from each library is shown in a separate panel. Reads mapped on the + and − strands are shown on the top and bottom sides of the 0 horizontal line, respectively (additional representation in different colors optional). Annotated genomic features are represented as “box” (ORF) and “line” (mRNA). Original data described in [9]. The Y axis (log2 tag densities) shows the log2 of the number of reads (or pairs of reads in case of paired-end sequencing) at each position. b Unstranded “line” visualization of RNA Polymerase II ChIP-seq profile along the YDL140C (RPO21) locus in a wild-type strain of Saccharomyces cerevisiae. Signal intensity for each library is represented by a different colored line (IP, black; input, green). Strands are as in the “classic” view. Annotated ORF are represented as “box”. Original data described in [10]. Y axis see above. c Strand-specific “line” visualization of the NET-seq profile along the same region as B in wild-type (black) and dst1Δ (red) cells of S. cerevisiae. Original data described in [11]. Y axis see above. d Strand-specific “heatmap” visualization of the paired-end total RNA-seq signal along the YBR019C-YBR020W (GAL10-GAL1) locus in two biological replicates of S. cerevisiae wild-type cells grown in glucose- or shifted for 1 h in galactose-containing medium. Distinct panels are used for each strand. In each panel, each lane corresponds to one library. Signal intensities range from white (low) to dark blue (high). Annotated ORF are represented as “box”. Original data described in [12]. e Strand-specific “heatmap” visualization of the paired-end total RNA-seq signal along the HOTAIR locus in MCF-7, HeLa-S3 and NHLF cell lines. Annotated transcripts and exons are represented as “arrow” and “rectangle”. Original data from the ENCODE project described in [13]

Examples of NGS signal visualization using VING. a Strand-specific “classic” visualization of 21–25 nucleotides small RNA densities along the SPAC167.03c locus in rdp1Δ Schizosaccharomyces pombe control cells (vector) or cells overexpressing Dcr1. Signal from each library is shown in a separate panel. Reads mapped on the + and − strands are shown on the top and bottom sides of the 0 horizontal line, respectively (additional representation in different colors optional). Annotated genomic features are represented as “box” (ORF) and “line” (mRNA). Original data described in [9]. The Y axis (log2 tag densities) shows the log2 of the number of reads (or pairs of reads in case of paired-end sequencing) at each position. b Unstranded “line” visualization of RNA Polymerase II ChIP-seq profile along the YDL140C (RPO21) locus in a wild-type strain of Saccharomyces cerevisiae. Signal intensity for each library is represented by a different colored line (IP, black; input, green). Strands are as in the “classic” view. Annotated ORF are represented as “box”. Original data described in [10]. Y axis see above. c Strand-specific “line” visualization of the NET-seq profile along the same region as B in wild-type (black) and dst1Δ (red) cells of S. cerevisiae. Original data described in [11]. Y axis see above. d Strand-specific “heatmap” visualization of the paired-end total RNA-seq signal along the YBR019C-YBR020W (GAL10-GAL1) locus in two biological replicates of S. cerevisiae wild-type cells grown in glucose- or shifted for 1 h in galactose-containing medium. Distinct panels are used for each strand. In each panel, each lane corresponds to one library. Signal intensities range from white (low) to dark blue (high). Annotated ORF are represented as “box”. Original data described in [12]. e Strand-specific “heatmap” visualization of the paired-end total RNA-seq signal along the HOTAIR locus in MCF-7, HeLa-S3 and NHLF cell lines. Annotated transcripts and exons are represented as “arrow” and “rectangle”. Original data from the ENCODE project described in [13]

Annotation representation

Users can define a color and shape for each type of annotation feature (Fig. 1). Shapes include “box” (rectangle with an arrow at one side indicating the feature orientation), “rectangle” (plain rectangle), “arrow” (line with an arrow indicating the orientation) and “line” (straight line). VING automatically groups the different annotated features corresponding to the same ID such as untranslated regions (UTRs), exons and introns (or any other feature) from the same transcript, provided that these features are defined in the gff annotation file.

Performance

VING was tested on a variety of NGS data from different species, including yeast small RNA-seq (Fig. 1a), ChIP-seq (Fig. 1b), NET-seq (Fig. 1c), total RNA-seq (Fig. 1d), and human total RNA-seq data (Fig. 1e). Execution time depends on input files size. On an Intel Xeon 2,4 GHz processor with 32 Gb RAM, runtime ranged from 5 s and 2 min for the smaller (such as for Fig. 1a) and larger datasets (such as for Fig. 1d, e), respectively. Memory usage was under 500 Megabytes in all cases.

Usage

VING can be operated as a single command line. For graphical interface operation, we wrote a Galaxy wrapper enabling the users to input all parameters through the user-friendly Galaxy interface (available in the Galaxy Tool Shed: https://testtoolshed.g2.bx.psu.edu/view/rlegendre/ving).

Conclusion

The VING program produces high-quality figures for NGS data representation in a genome region of interest. VING input and outputs have been rendered Galaxy-compatible so that automated coverage plots can be easily incorporated in Galaxy pipelines. The resulting, integrated view of a genome region is immediately suitable for figure production.

Availability and requirements

Project name: VING. Project home page: http://vm-gb.curie.fr/ving/. Operating system(s): Linux. VING has also been successfully tested on MacOSX and Windows 7. Programming language: R. Other requirements: Bioconductor packages GenomicRanges and Rsamtools. License: GNU GPL (version 3, 29 June 2007). Any restrictions to use by non-academics: none.

Availability of supporting data

Original raw data used in Fig. 1a, c–e were retrieved from the NCBI Gene Expression Omnibus, accession numbers GSE52535, GSE25107, GSE63444 and GSE26284, respectively. Original raw data used in Fig. 1b were retrieved from the NCBI Sequence Read Archive, accession number SRA030505. Truncated bam and gff files used for figure generation are provided on the VING website.
  12 in total

1.  The generic genome browser: a building block for a model organism system database.

Authors:  Lincoln D Stein; Christopher Mungall; ShengQiang Shu; Michael Caudy; Marco Mangone; Allen Day; Elizabeth Nickerson; Jason E Stajich; Todd W Harris; Adrian Arva; Suzanna Lewis
Journal:  Genome Res       Date:  2002-10       Impact factor: 9.043

2.  XUTs are a class of Xrn1-sensitive antisense regulatory non-coding RNA in yeast.

Authors:  E L van Dijk; C L Chen; Y d'Aubenton-Carafa; S Gourvennec; M Kwapisz; V Roche; C Bertrand; M Silvain; P Legoix-Né; S Loeillet; A Nicolas; C Thermes; A Morillon
Journal:  Nature       Date:  2011-06-22       Impact factor: 49.962

3.  The Sequence Alignment/Map format and SAMtools.

Authors:  Heng Li; Bob Handsaker; Alec Wysoker; Tim Fennell; Jue Ruan; Nils Homer; Gabor Marth; Goncalo Abecasis; Richard Durbin
Journal:  Bioinformatics       Date:  2009-06-08       Impact factor: 6.937

4.  BEDTools: a flexible suite of utilities for comparing genomic features.

Authors:  Aaron R Quinlan; Ira M Hall
Journal:  Bioinformatics       Date:  2010-01-28       Impact factor: 6.937

5.  Nascent transcript sequencing visualizes transcription at nucleotide resolution.

Authors:  L Stirling Churchman; Jonathan S Weissman
Journal:  Nature       Date:  2011-01-20       Impact factor: 49.962

6.  Integrative genomics viewer.

Authors:  James T Robinson; Helga Thorvaldsdóttir; Wendy Winckler; Mitchell Guttman; Eric S Lander; Gad Getz; Jill P Mesirov
Journal:  Nat Biotechnol       Date:  2011-01       Impact factor: 54.908

7.  Differential expression analysis for sequence count data.

Authors:  Simon Anders; Wolfgang Huber
Journal:  Genome Biol       Date:  2010-10-27       Impact factor: 13.583

8.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data.

Authors:  Mark D Robinson; Davis J McCarthy; Gordon K Smyth
Journal:  Bioinformatics       Date:  2009-11-11       Impact factor: 6.937

9.  Determinants of heterochromatic siRNA biogenesis and function.

Authors:  Ruby Yu; Gloria Jih; Nahid Iglesias; Danesh Moazed
Journal:  Mol Cell       Date:  2013-12-26       Impact factor: 17.970

10.  Landscape of transcription in human cells.

Authors:  Sarah Djebali; Carrie A Davis; Angelika Merkel; Alex Dobin; Timo Lassmann; Ali Mortazavi; Andrea Tanzer; Julien Lagarde; Wei Lin; Felix Schlesinger; Chenghai Xue; Georgi K Marinov; Jainab Khatun; Brian A Williams; Chris Zaleski; Joel Rozowsky; Maik Röder; Felix Kokocinski; Rehab F Abdelhamid; Tyler Alioto; Igor Antoshechkin; Michael T Baer; Nadav S Bar; Philippe Batut; Kimberly Bell; Ian Bell; Sudipto Chakrabortty; Xian Chen; Jacqueline Chrast; Joao Curado; Thomas Derrien; Jorg Drenkow; Erica Dumais; Jacqueline Dumais; Radha Duttagupta; Emilie Falconnet; Meagan Fastuca; Kata Fejes-Toth; Pedro Ferreira; Sylvain Foissac; Melissa J Fullwood; Hui Gao; David Gonzalez; Assaf Gordon; Harsha Gunawardena; Cedric Howald; Sonali Jha; Rory Johnson; Philipp Kapranov; Brandon King; Colin Kingswood; Oscar J Luo; Eddie Park; Kimberly Persaud; Jonathan B Preall; Paolo Ribeca; Brian Risk; Daniel Robyr; Michael Sammeth; Lorian Schaffer; Lei-Hoon See; Atif Shahab; Jorgen Skancke; Ana Maria Suzuki; Hazuki Takahashi; Hagen Tilgner; Diane Trout; Nathalie Walters; Huaien Wang; John Wrobel; Yanbao Yu; Xiaoan Ruan; Yoshihide Hayashizaki; Jennifer Harrow; Mark Gerstein; Tim Hubbard; Alexandre Reymond; Stylianos E Antonarakis; Gregory Hannon; Morgan C Giddings; Yijun Ruan; Barbara Wold; Piero Carninci; Roderic Guigó; Thomas R Gingeras
Journal:  Nature       Date:  2012-09-06       Impact factor: 49.962

View more
  9 in total

1.  The anti-cancer drug 5-fluorouracil affects cell cycle regulators and potential regulatory long non-coding RNAs in yeast.

Authors:  Bingning Xie; Emmanuelle Becker; Igor Stuparevic; Maxime Wery; Ugo Szachnowski; Antonin Morillon; Michael Primig
Journal:  RNA Biol       Date:  2019-03-20       Impact factor: 4.652

2.  Meiotic Cells Counteract Programmed Retrotransposon Activation via RNA-Binding Translational Repressor Assemblies.

Authors:  Raphaelle Laureau; Annie Dyatel; Gizem Dursuk; Samantha Brown; Hannah Adeoye; Jia-Xing Yue; Matteo De Chiara; Anthony Harris; Elçin Ünal; Gianni Liti; Ian R Adams; Luke E Berchowitz
Journal:  Dev Cell       Date:  2020-12-04       Impact factor: 12.270

3.  Native elongating transcript sequencing reveals global anti-correlation between sense and antisense nascent transcription in fission yeast.

Authors:  Maxime Wery; Camille Gautier; Marc Descrimes; Mayuko Yoda; Hervé Vennin-Rendos; Valérie Migeot; Daniel Gautheret; Damien Hermand; Antonin Morillon
Journal:  RNA       Date:  2017-11-07       Impact factor: 4.942

4.  C-State: an interactive web app for simultaneous multi-gene visualization and comparative epigenetic pattern search.

Authors:  Divya Tej Sowpati; Surabhi Srivastava; Jyotsna Dhawan; Rakesh K Mishra
Journal:  BMC Bioinformatics       Date:  2017-09-13       Impact factor: 3.169

5.  Bases of antisense lncRNA-associated regulation of gene expression in fission yeast.

Authors:  Maxime Wery; Camille Gautier; Marc Descrimes; Mayuko Yoda; Valérie Migeot; Damien Hermand; Antonin Morillon
Journal:  PLoS Genet       Date:  2018-07-05       Impact factor: 5.917

6.  CRISPR/CAS9 targeted CAPTURE of mammalian genomic regions for characterization by NGS.

Authors:  Alexei Slesarev; Lakshmi Viswanathan; Yitao Tang; Trissa Borgschulte; Katherine Achtien; David Razafsky; David Onions; Audrey Chang; Colette Cote
Journal:  Sci Rep       Date:  2019-03-05       Impact factor: 4.379

7.  Endogenous RNAi pathway evolutionarily shapes the destiny of the antisense lncRNAs transcriptome.

Authors:  Ugo Szachnowski; Sara Andjus; Dominika Foretek; Antonin Morillon; Maxime Wery
Journal:  Life Sci Alliance       Date:  2019-08-28

8.  Reference-free transcriptome exploration reveals novel RNAs for prostate cancer diagnosis.

Authors:  Marina Pinskaya; Zohra Saci; Mélina Gallopin; Marc Gabriel; Ha Tn Nguyen; Virginie Firlej; Marc Descrimes; Audrey Rapinat; David Gentien; Alexandre de la Taille; Arturo Londoño-Vallejo; Yves Allory; Daniel Gautheret; Antonin Morillon
Journal:  Life Sci Alliance       Date:  2019-11-15

9.  Nonsense-Mediated Decay Restricts LncRNA Levels in Yeast Unless Blocked by Double-Stranded RNA Structure.

Authors:  Maxime Wery; Marc Descrimes; Nicolas Vogt; Anne-Sophie Dallongeville; Daniel Gautheret; Antonin Morillon
Journal:  Mol Cell       Date:  2016-01-21       Impact factor: 17.970

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.