Literature DB >> 25075113

Circleator: flexible circular visualization of genome-associated data with BioPerl and SVG.

Jonathan Crabtree1, Sonia Agrawal1, Anup Mahurkar1, Garry S Myers2, David A Rasko3, Owen White3.   

Abstract

SUMMARY: Circleator is a Perl application that generates circular figures of genome-associated data. It leverages BioPerl to support standard annotation and sequence file formats and produces publication-quality SVG output. It is designed to be both flexible and easy to use. It includes a library of circular track types and predefined configuration files for common use-cases, including. (i) visualizing gene annotation and DNA sequence data from a GenBank flat file, (ii) displaying patterns of gene conservation in related microbial strains, (iii) showing Single Nucleotide Polymorphisms (SNPs) and indels relative to a reference genome and gene set and (iv) viewing RNA-Seq plots.
AVAILABILITY AND IMPLEMENTATION: Circleator is freely available under the Artistic License 2.0 from http://jonathancrabtree.github.io/Circleator/ and is integrated with the CloVR cloud-based sequence analysis Virtual Machine (VM), which can be downloaded from http://clovr.org or run on Amazon EC2.
© The Author 2014. Published by Oxford University Press.

Entities:  

Mesh:

Year:  2014        PMID: 25075113      PMCID: PMC4201160          DOI: 10.1093/bioinformatics/btu505

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 INTRODUCTION

There are numerous circular genome visualization tools, with varying degrees of interactivity, usability and utility. They differ in the data types and formats they accept, the types of output they produce and the ease of customizing the mapping from input data to graphical display. Flexibility and ease-of-use are frequently at odds, with the most flexible tools often being the hardest to customize, particularly for non-programmers. To address this issue, Circleator provides multiple configuration options, allowing each user to choose the one that best suits his/her needs. Similar tools include GenomePlot (Gibson and Smith, 2003), a Perl/Tk application and DNAPlotter (Carver ), a Java application, both of which have graphical user interfaces and also support linear displays. GenoMap (Sato and Ehira, 2003) is a Tcl/Tk microarray data viewer, and the GeneWiz browser (Hallin ) is an interactive Java applet. Some combine analysis and visualization: BRIG (Alikhan ) incorporates a BLAST-based prokaryotic genome comparison algorithm, and CGView Server (Grant and Stothard, 2008) is a CGView-based (Stothard and Wishart, 2005) web service that runs on-the-fly BLAST comparisons. At the other extreme, some tools display only predefined datasets: The Microbial Genome Viewer (Kerkhoven ) and Genome Projector (Arakawa ) are web-based tools in this category. GenomeDiagram (Pritchard ) supports linear and circular displays, but requires Python programming expertise. Circos (Krzywinski ) is a popular and powerful stand-alone tool but its use of complex hierarchical configuration files may put it out of reach of some users. The D3.js toolkit (Bostock ) has a Circos-like ‘chord diagram’, but does not, to our knowledge, accept standard bioinformatic file formats. Circster (Goecks ), which uses D3.js, adds circular drawing capabilities to Galaxy. Circleator follows the computer design principle of making the common case fast: if a researcher has data in a standard format and needs a routine visualization then he/she should not have to reformat the data or experiment with parameter values. Conversely, with sufficient time, it should be possible to create novel and intricately detailed figures. Circleator users who wish to accomplish the former can choose a predefined configuration file, whereas those seeking more flexibility can write their own. Circleator’s configuration file format supports several novel high-level abstractions (e.g. loops, symbolic track references, feature-based coordinate scaling) and reuses existing standards e.g. SVG, CSS (Cascading Style Sheets), where possible.

2 IMPLEMENTATION

Circleator is a stand-alone Perl application that has also been incorporated into CloVR (Angiuoli ). It uses BioPerl (Stajich ) for internal data representation and produces SVG (Dahlström ) output, from which PDF, PNG and JPEG may be generated.

2.1 Input data

Circleator accepts reference sequence(s) and annotation in any BioPerl-supported format, including GenBank format; Sequence Alignment/Map and BGZF-compressed SAM (SAM/BAM) alignment files; output from Cufflinks (Trapnell ), Tandem Repeats Finder (TRF) (Benson, 1999) and the BLAST Score Ratio (Rasko ) utility; SNPs in Variant Call Format (VCF) and tab-delimited quantitative data, such as gene expression data.

2.2 Features

Circleator outputs SVG natively, rather than using a graphics library that supports only a subset of SVG. It can draw text along circular paths and display semitransparent and overlapping tracks. The scale may vary around the circle, as in Circos, but also along the radius of the circle (i.e. a single figure may combine both global context and local detail, as in Fig. 1B). Regions to scale may be selected with a user-defined filter, e.g. to magnify by 100× all SNP loci at which more than half of the genomes differ from the reference without having to explicitly list the relevant coordinate spans.
Fig. 1.

(A) The genome of Gardnerella vaginalis HMP9231 annotated with percent GC content (red), genes, GC-skew (green) and read coverage (blue) from five human metagenomic samples. (B) SNPs from an 80-genome Yersinia pestis SNP panel with the scale in the outer rings expanded to show the affected bases. The reference base and position is shown on the outside and SNPS are color-coded according to their predicted type. Additional details for these figures and others may be found in the supplementary information

(A) The genome of Gardnerella vaginalis HMP9231 annotated with percent GC content (red), genes, GC-skew (green) and read coverage (blue) from five human metagenomic samples. (B) SNPs from an 80-genome Yersinia pestis SNP panel with the scale in the outer rings expanded to show the affected bases. The reference base and position is shown on the outside and SNPS are color-coded according to their predicted type. Additional details for these figures and others may be found in the supplementary information

2.3 Configuration

Each line in the manually editable Circleator configuration file corresponds to a circular track. The configuration file supports loops, which allow the same set of tracks to be displayed for 80 genomes in a SNP comparison without repeating everything 80 times; pseudo-tracks, which do not appear in the figure but can load data or perform data transformations (e.g. the compute-deserts track, which identifies all regions of a specified length that do not contain any features of a specified type); track references, which allow tracks to reference each other by name, e.g. highlight each of the SNP deserts identified in track SD1 in red; and various feature filters, e.g. to draw only forward-strand genes whose gene product field contains the keyword ‘kinase’. Circleator supports the following configuration options, listed in order of increasing flexibility and decreasing ease-of-use: (i) reuse a predefined configuration file as is; (ii) customize a predefined configuration file; (iii) write a new configuration file using the predefined track types and (iv) define new track types, glyphs and/or filters.

2.4 Documentation and test suite

The predefined configuration files and track types are well documented, and the HTML track documentation is automatically generated from the same Circleator configuration file that defines them. Circleator also has a set of regression tests that help to verify the correctness of the images it produces.

3 CONCLUSIONS

Circleator is a visualization tool that leverages BioPerl and SVG to produce publication-ready circular figures of genome-associated data. It is highly configurable but includes predefined configuration files and a library of well-documented circular track types that allows users to create complex figures without programming expertise.
  18 in total

1.  D³: Data-Driven Documents.

Authors:  Michael Bostock; Vadim Ogievetsky; Jeffrey Heer
Journal:  IEEE Trans Vis Comput Graph       Date:  2011-12       Impact factor: 4.579

2.  GenomeDiagram: a python package for the visualization of large-scale genomic data.

Authors:  Leighton Pritchard; Jennifer A White; Paul R J Birch; Ian K Toth
Journal:  Bioinformatics       Date:  2005-12-23       Impact factor: 6.937

3.  Circos: an information aesthetic for comparative genomics.

Authors:  Martin Krzywinski; Jacqueline Schein; Inanç Birol; Joseph Connors; Randy Gascoyne; Doug Horsman; Steven J Jones; Marco A Marra
Journal:  Genome Res       Date:  2009-06-18       Impact factor: 9.043

4.  Tandem repeats finder: a program to analyze DNA sequences.

Authors:  G Benson
Journal:  Nucleic Acids Res       Date:  1999-01-15       Impact factor: 16.971

5.  Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation.

Authors:  Cole Trapnell; Brian A Williams; Geo Pertea; Ali Mortazavi; Gordon Kwan; Marijke J van Baren; Steven L Salzberg; Barbara J Wold; Lior Pachter
Journal:  Nat Biotechnol       Date:  2010-05-02       Impact factor: 54.908

6.  GeneWiz browser: An Interactive Tool for Visualizing Sequenced Chromosomes.

Authors:  Peter F Hallin; Hans-Henrik Stærfeldt; Eva Rotenberg; Tim T Binnewies; Craig J Benham; David W Ussery
Journal:  Stand Genomic Sci       Date:  2009-09-25

7.  CloVR: a virtual machine for automated and portable sequence analysis from the desktop using cloud computing.

Authors:  Samuel V Angiuoli; Malcolm Matalka; Aaron Gussman; Kevin Galens; Mahesh Vangala; David R Riley; Cesar Arze; James R White; Owen White; W Florian Fricke
Journal:  BMC Bioinformatics       Date:  2011-08-30       Impact factor: 3.307

8.  Web-based visual analysis for high-throughput genomics.

Authors:  Jeremy Goecks; Carl Eberhard; Tomithy Too; Anton Nekrutenko; James Taylor
Journal:  BMC Genomics       Date:  2013-06-13       Impact factor: 3.969

9.  Genome Projector: zoomable genome map with multiple views.

Authors:  Kazuharu Arakawa; Satoshi Tamaki; Nobuaki Kono; Nobuhiro Kido; Keita Ikegami; Ryu Ogawa; Masaru Tomita
Journal:  BMC Bioinformatics       Date:  2009-01-23       Impact factor: 3.169

10.  DNAPlotter: circular and linear interactive genome visualization.

Authors:  Tim Carver; Nick Thomson; Alan Bleasby; Matthew Berriman; Julian Parkhill
Journal:  Bioinformatics       Date:  2008-11-05       Impact factor: 6.937

View more
  14 in total

1.  Analysis of Shigella flexneri Resistance, Biofilm Formation, and Transcriptional Profile in Response to Bile Salts.

Authors:  Kourtney P Nickerson; Rachael B Chanin; Jeticia R Sistrunk; David A Rasko; Peter J Fink; Eileen M Barry; James P Nataro; Christina S Faherty
Journal:  Infect Immun       Date:  2017-05-23       Impact factor: 3.441

2.  Characterization of a Large Antibiotic Resistance Plasmid Found in Enteropathogenic Escherichia coli Strain B171 and Its Relatedness to Plasmids of Diverse E. coli and Shigella Strains.

Authors:  Tracy H Hazen; Jane Michalski; Sushma Nagaraj; Iruka N Okeke; David A Rasko
Journal:  Antimicrob Agents Chemother       Date:  2017-08-24       Impact factor: 5.191

3.  Functional dynamics of the gut microbiome in elderly people during probiotic consumption.

Authors:  Emiley A Eloe-Fadrosh; Arthur Brady; Jonathan Crabtree; Elliott F Drabek; Bing Ma; Anup Mahurkar; Jacques Ravel; Miriam Haverkamp; Anne-Maria Fiorino; Christine Botelho; Irina Andreyeva; Patricia L Hibberd; Claire M Fraser
Journal:  MBio       Date:  2015-04-14       Impact factor: 7.867

4.  Genome sequence and plasmid transformation of the model high-yield bacterial cellulose producer Gluconacetobacter hansenii ATCC 53582.

Authors:  Michael Florea; Benjamin Reeve; James Abbott; Paul S Freemont; Tom Ellis
Journal:  Sci Rep       Date:  2016-03-24       Impact factor: 4.379

5.  Complete Genome Sequence of Dehalobacterium formicoaceticum Strain DMC, a Strictly Anaerobic Dichloromethane-Degrading Bacterium.

Authors:  Gao Chen; Robert W Murdoch; E Erin Mack; Edward S Seger; Frank E Löffler
Journal:  Genome Announc       Date:  2017-09-14

6.  Whole genome sequences of three Clade 3 Clostridium difficile strains carrying binary toxin genes in China.

Authors:  Rong Chen; Yu Feng; Xiaohui Wang; Jingyu Yang; Xiaoxia Zhang; Xiaoju Lü; Zhiyong Zong
Journal:  Sci Rep       Date:  2017-03-06       Impact factor: 4.379

7.  CloVR-Comparative: automated, cloud-enabled comparative microbial genome sequence analysis pipeline.

Authors:  Sonia Agrawal; Cesar Arze; Ricky S Adkins; Jonathan Crabtree; David Riley; Mahesh Vangala; Kevin Galens; Claire M Fraser; Hervé Tettelin; Owen White; Samuel V Angiuoli; Anup Mahurkar; W Florian Fricke
Journal:  BMC Genomics       Date:  2017-04-27       Impact factor: 3.969

8.  Sequencing of the complete mitochondrial genome of the common raven Corvus corax (Aves: Corvidae) confirms mitogenome-wide deep lineages and a paraphyletic relationship with the Chihuahuan raven C. cryptoleucus.

Authors:  Arild Johnsen; Anna M Kearns; Kevin E Omland; Jarl Andreas Anmarkrud
Journal:  PLoS One       Date:  2017-10-30       Impact factor: 3.240

9.  PACVr: plastome assembly coverage visualization in R.

Authors:  Michael Gruenstaeudl; Nils Jenke
Journal:  BMC Bioinformatics       Date:  2020-05-24       Impact factor: 3.169

10.  Flexible genes establish widespread bacteriophage pan-genomes in cryoconite hole ecosystems.

Authors:  Christopher M Bellas; Declan C Schroeder; Arwyn Edwards; Gary Barker; Alexandre M Anesio
Journal:  Nat Commun       Date:  2020-09-02       Impact factor: 14.919

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.