Literature DB >> 20624783

genoPlotR: comparative gene and genome visualization in R.

Lionel Guy1, Jens Roat Kultima, Siv G E Andersson.   

Abstract

UNLABELLED: The amount of gene and genome data obtained by next-generation sequencing technologies generates a need for comparative visualization tools. Complementing existing software for comparison and exploration of genomics data, genoPlotR automatically creates publication-grade linear maps of gene and genomes, in a highly automatic, flexible and reproducible way. AVAILABILITY: genoPlotR is a platform-independent R package, available with full source code under a GPL2 license at R-Forge: http://genoplotr.r-forge.r-project.org/.

Entities:  

Mesh:

Year:  2010        PMID: 20624783      PMCID: PMC2935412          DOI: 10.1093/bioinformatics/btq413

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 INTRODUCTION

Comparison of genes and genomes has increasingly been used to infer patterns and processes in evolution and to relate phenotypic differences to genomic changes. With the recent advent of modern high-throughput sequencing technologies, the need for methods and visualization tools in comparative genomics has vastly increased. As of 2010, more than 1000 bacterial and archaeal genomes are available in public databases, making the number of possible comparisons almost infinite. Several programs such as the Artemis Comparison Tool (ACT; Carver et al., 2005), UCSC Genome Browser (Rhead et al., 2010), Mauve (Darling et al., 2004), M-GCAT (Treangen et al., 2006) and Murasaki (Sakakibara et al., 2007) have extensive visualization functions. However, these tools lack the ability (i) to directly add annotations to the publication quality graphics they produce or (ii) to fully automate the production of comparative figures. GenomeGraphs (Durinck et al., 2009) is a R package, which allows the visualization of one genomic region with related datasets such as microarray data. It can display several annotations for the same region, but it cannot show several regions in a single plot. The R package genoPlotR is an attempt to fill in those gaps, by providing a flexible, automatable tool. It allows the user to graphically represent the comparison between several segments or subsegments of genomes in a linear fashion. It reads data stored in commonly used formats (EMBL, Genbank, BLAST and Mauve outputs) or in user-created tabular files and allows comparisons of one or several subsegments of a genome. A tree can be added to show the phylogenetic relationships between the segments, as can also scales and annotations to each subsegment. The use of R (R Development Core Team, 2009) and its grid package enables the use of its graphical power and flexibility to manipulate data and to integrate gene and genome maps into more complex graphics. The results can be saved either in high-quality raster or vector formats for further editing.

2 INPUT DATA

genoPlotR uses two main objects: dna_seg, which represent segments of DNA containing genes or other features, and comparison, which represent the relationships between two dna_seg. genoPlotR reads dna_seg objects from the widely used Genbank and EMBL formats (described here: http://www.ncbi.nlm.nih.gov/collab/FT/; Fig. 1A), or from tabular formats, either as protein table files (NCBI) or user-generated. Genbank or EMBL files can be generated by the user or downloaded from nucleotide databases, e.g. from NCBI's Nucleotide Entrez (http://www.ncbi.nlm.nih.gov/sites/entrez). By default, only CDS features are read, but any set of features can be specified for reading.
Fig. 1.

Examples of gene and genome comparisons with genoPlotR. The code used to generate this figure is available at http://genoplotr.r-forge.r-project.org/screenshots.php. (A) Example of minimal code to represent a three-way comparison in genoPlotR. (B) Comparison of four Bartonella genomes using the backbone output of Mauve. The features are colored according to their order along the second segment (BG), following a rainbow palette. (C) BLAST comparison of genes in several subsegments of the same four Bartonella genomes. Note the scale of the first subsegment on the top row, which is in reverse orientation. (D) Comparison of a 220 kb segment of the Y chromosome in Homo sapiens and Pan troglodytes. In (C and D), the E-value of the BLAST alignment is represented with shades of blue and red. All datasets used here are present in the package.

Examples of gene and genome comparisons with genoPlotR. The code used to generate this figure is available at http://genoplotr.r-forge.r-project.org/screenshots.php. (A) Example of minimal code to represent a three-way comparison in genoPlotR. (B) Comparison of four Bartonella genomes using the backbone output of Mauve. The features are colored according to their order along the second segment (BG), following a rainbow palette. (C) BLAST comparison of genes in several subsegments of the same four Bartonella genomes. Note the scale of the first subsegment on the top row, which is in reverse orientation. (D) Comparison of a 220 kb segment of the Y chromosome in Homo sapiens and Pan troglodytes. In (C and D), the E-value of the BLAST alignment is represented with shades of blue and red. All datasets used here are present in the package. genoPlotR reads comparison objects using the tabular output of, e.g. BLAST (Altschul et al., 1990) or from user-generated tabular files (Fig. 1A). Hit tables in text format produced by stand-alone or online BLAST programs are suitable for input in genoPlotR. For example, from the NCBI BLAST web page, sequence alignments of genes and genomes can be produced with the option ‘align two or more sequences’. The hit table can then be downloaded. The backbone file produced by Mauve, a multiple-genome alignment tool (Darling et al., 2004), can be transformed into both dna_seg and comparison objects (Fig. 1B). Both dna_seg and comparison objects can be filtered either by using arguments to the reading functions, or by using R functions, for example, to remove short genes or low-significance comparisons. Objects can be modified to specify color, size or appearance (arrows, blocks, lines, etc.) for each element of the DNA segments and of the comparisons. Intron-containing genes in EMBL and Genbank files are automatically recognized (Fig. 1D). In addition, a phylogenetic tree in Newick format can be parsed as a phylog object, using the package ade4 (Dray et al., 2007; Fig. 1A). Finally, annotation objects can be designed to add a legend to the DNA segments.

3 VISUALIZATION

After being read and modified, objects can be passed to the main graphical function plot_gene_map. The user can define, for each segment, several subsegments that will be represented in the plot. These subsegments can be represented in reverse orientation (e.g. see Fig. 1B, first subsegment on the top DNA segment). The placement of each subsegment on the plot is either automatically determined by minimizing the area of the comparisons, or fixed by the user. annotation and phylog objects are also passed to the main function plot_gene_map, and a scale can be added to any of the DNA segments, or placed at the bottom right of the plot. Colors of the comparisons can be set to gray scales or to shades of blue and red, depending on the e-value or the bit score of the hit, as in ACT. In addition to DNA segments and comparisons, other objects can be added to the plot, such as a phylogenetic tree—parsed into R using the package ade4 (Dray et al., 2007)—or annotations for each DNA segment. Using R embedded graphic generation functions, the figure can then be displayed on the screen or saved to various graphical formats, including both raster (functions png, jpeg, tiff, etc.) and vector formats (functions postscript, svg, pdf, etc.). The output of the graphical function can also be embedded in larger plots, allowing the user to create multi-panel figures. Numerous examples and datasets, including the ones in Figure 1, are included in the package, to guide the users' first steps.

4 CONCLUSIONS

By using the graphical power and flexibility of R, the package genoPlotR generates reproducible maps of genes and genomes that can be used to generate publication-ready figures, starting from a wide range of formats. Since all the instructions for drawing the figures are contained in R code, it is highly flexible, and thus it is straightforward to automate the process of drawing very similar figures for different datasets. The use of a scripting language makes it particularly suitable for integration into annotation and comparative genomics pipelines.
  7 in total

1.  Mauve: multiple alignment of conserved genomic sequence with rearrangements.

Authors:  Aaron C E Darling; Bob Mau; Frederick R Blattner; Nicole T Perna
Journal:  Genome Res       Date:  2004-07       Impact factor: 9.043

2.  Basic local alignment search tool.

Authors:  S F Altschul; W Gish; W Miller; E W Myers; D J Lipman
Journal:  J Mol Biol       Date:  1990-10-05       Impact factor: 5.469

3.  ACT: the Artemis Comparison Tool.

Authors:  Tim J Carver; Kim M Rutherford; Matthew Berriman; Marie-Adele Rajandream; Barclay G Barrell; Julian Parkhill
Journal:  Bioinformatics       Date:  2005-06-23       Impact factor: 6.937

Review 4.  [Development of a large-scale comparative genome system and its application to the analysis of mycobacteria genomes].

Authors:  Yasubumi Sakakibara; Yasunori Osana; Kris Popendorf
Journal:  Nihon Hansenbyo Gakkai Zasshi       Date:  2007-09

5.  GenomeGraphs: integrated genomic data visualization with R.

Authors:  Steffen Durinck; James Bullard; Paul T Spellman; Sandrine Dudoit
Journal:  BMC Bioinformatics       Date:  2009-01-06       Impact factor: 3.169

6.  M-GCAT: interactively and efficiently constructing large-scale multiple genome comparison frameworks in closely related species.

Authors:  Todd J Treangen; Xavier Messeguer
Journal:  BMC Bioinformatics       Date:  2006-10-05       Impact factor: 3.169

7.  The UCSC Genome Browser database: update 2010.

Authors:  Brooke Rhead; Donna Karolchik; Robert M Kuhn; Angie S Hinrichs; Ann S Zweig; Pauline A Fujita; Mark Diekhans; Kayla E Smith; Kate R Rosenbloom; Brian J Raney; Andy Pohl; Michael Pheasant; Laurence R Meyer; Katrina Learned; Fan Hsu; Jennifer Hillman-Jackson; Rachel A Harte; Belinda Giardine; Timothy R Dreszer; Hiram Clawson; Galt P Barber; David Haussler; W James Kent
Journal:  Nucleic Acids Res       Date:  2009-11-11       Impact factor: 16.971

  7 in total
  254 in total

1.  chromDraw: an R package for visualization of linear and circular karyotypes.

Authors:  Jan Janečka; Martin A Lysak
Journal:  Chromosome Res       Date:  2016-01-20       Impact factor: 5.239

2.  CrusView: a Java-based visualization platform for comparative genomics analyses in Brassicaceae species.

Authors:  Hao Chen; Xiangfeng Wang
Journal:  Plant Physiol       Date:  2013-07-29       Impact factor: 8.340

3.  Implications of Genome-Based Discrimination between Clostridium botulinum Group I and Clostridium sporogenes Strains for Bacterial Taxonomy.

Authors:  Michael R Weigand; Angela Pena-Gonzalez; Timothy B Shirey; Robin G Broeker; Maliha K Ishaq; Konstantinos T Konstantinidis; Brian H Raphael
Journal:  Appl Environ Microbiol       Date:  2015-06-05       Impact factor: 4.792

4.  Genome-Wide Transcription Factor Binding in Leaves from C3 and C4 Grasses.

Authors:  Steven J Burgess; Ivan Reyna-Llorens; Sean R Stevenson; Pallavi Singh; Katja Jaeger; Julian M Hibberd
Journal:  Plant Cell       Date:  2019-08-19       Impact factor: 11.277

5.  A fungal wheat pathogen evolved host specialization by extensive chromosomal rearrangements.

Authors:  Fanny E Hartmann; Andrea Sánchez-Vallet; Bruce A McDonald; Daniel Croll
Journal:  ISME J       Date:  2017-01-24       Impact factor: 10.302

6.  Genomic Diversity of Phages Infecting Probiotic Strains of Lactobacillus paracasei.

Authors:  Diego J Mercanti; Geneviève M Rousseau; María L Capra; Andrea Quiberoni; Denise M Tremblay; Simon J Labrie; Sylvain Moineau
Journal:  Appl Environ Microbiol       Date:  2015-10-16       Impact factor: 4.792

7.  Analysis of tandem gene copies in maize chromosomal regions reconstructed from long sequence reads.

Authors:  Jiaqiang Dong; Yaping Feng; Dibyendu Kumar; Wei Zhang; Tingting Zhu; Ming-Cheng Luo; Joachim Messing
Journal:  Proc Natl Acad Sci U S A       Date:  2016-06-27       Impact factor: 11.205

8.  Cultivable, Host-Specific Bacteroidetes Symbionts Exhibit Diverse Polysaccharolytic Strategies.

Authors:  Arturo Vera-Ponce de León; Benjamin C Jahnes; Jun Duan; Lennel A Camuy-Vélez; Zakee L Sabree
Journal:  Appl Environ Microbiol       Date:  2020-04-01       Impact factor: 4.792

9.  Intra-species differences in population size shape life history and genome evolution.

Authors:  David Willemsen; Rongfeng Cui; Martin Reichard; Dario Riccardo Valenzano
Journal:  Elife       Date:  2020-09-01       Impact factor: 8.140

10.  Ubiquitous marine bacterium inhibits diatom cell division.

Authors:  Helena M van Tol; Shady A Amin; E Virginia Armbrust
Journal:  ISME J       Date:  2016-09-13       Impact factor: 10.302

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.