Literature DB >> 27006949

VGSC: A Web-Based Vector Graph Toolkit of Genome Synteny and Collinearity.

Yiqing Xu1, Changwei Bi2, Guoxin Wu3, Suyun Wei2, Xiaogang Dai4, Tongming Yin4, Ning Ye2.   

Abstract

BACKGROUND: In order to understand the colocalization of genetic loci amongst species, synteny and collinearity analysis is a frequent task in comparative genomics research. However many analysis software packages are not effective in visualizing results. Problems include lack of graphic visualization, simple representation, or inextensible format of outputs. Moreover, higher throughput sequencing technology requires higher resolution image output. IMPLEMENTATION: To fill this gap, this paper publishes VGSC, the Vector Graph toolkit of genome Synteny and Collinearity, and its online service, to visualize the synteny and collinearity in the common graphical format, including both raster (JPEG, Bitmap, and PNG) and vector graphic (SVG, EPS, and PDF). RESULT: Users can upload sequence alignments from blast and collinearity relationship from the synteny analysis tools. The website can generate the vector or raster graphical results automatically. We also provide a java-based bytecode binary to enable the command-line execution.

Entities:  

Mesh:

Year:  2016        PMID: 27006949      PMCID: PMC4783527          DOI: 10.1155/2016/7823429

Source DB:  PubMed          Journal:  Biomed Res Int            Impact factor:   3.411


1. Introduction

Synteny is the collection of contiguous genes located on the chromosome of different species. Collinearity is a particular kind of synteny in which the genes are conserved in the same order [1]. Understanding this colocalization of genetic loci amongst species is a frequent task in comparative genomics research, and it often relies on the accuracy of homology identification within or across genomes. During the evolution, eukaryotic genomes between different species reveal this synteny and collinearity in various levels [2]. There are many reasons for the structural variation of genes all over the long evolutionary history, such as whole-genome duplication (WGD), segmental duplication, inversions, and translocations [3, 4]. Genomes have been shaped and restructured dynamically. Related application includes [5] annotation of newly sequenced genomes [6], identification of conserved noncoding sequences [7], estimation of whole genome duplication events [1], prediction of chromosomal rearrangements, and the structure of ancestral genomes [8]. As a result, the procedure of synteny and collinearity analysis has become a hot topic in evolutionary biology as a standard step for elucidating the evolutionary histories of both genomes and gene families. To meet the requirement of synteny and collinearity analysis, the majority of softwares focus on the detection and alignment of the original sequencing data. Using the traditional clustering of neighboring match of gene pairs, various softwares have been developed to match gene pairs, including ADHoRe [9], the Max-gap Clusters by Multiple Sequence Comparison (MCMuSeC) [10], and OrthoCluster [11, 12]. More recent methods apply dynamic algorithm to pairwise collinear genes chains, in which a matching system scores the adjacent collinear gene pairs, known as anchor genes, and penalizes the distance between anchor genes. This method has been implemented in software tools such as ColinearScan [13], MCScan [1], SyMAP [6], FISH [14], and CYNTENATOR [15]. Besides the pairwise collinear relationships among chromosomal regions, the multialignment (alignment of three or more regions) of collinear chromosomal regions (referred to as collinear blocks) is more important as it can reveal ancient WGD events [1] and complex chromosomal duplication/rearrangement relationships [16]. One of the early software packages providing analysis of collinearity within gene families is MicroSyn [17]. MCScan [1], Multiple Collinearity Scan, is another very popular algorithm in synteny and collinearity detection. It scans multiple genomes or subgenomes, identifies putative homologous chromosomal regions, and marks these gene regions with alignment anchors. The latest i-ADHoRe 3.0 [18] combines pairwise comparison with an iterative profile search, and it uses rigorous statistical tests to ensure that regions found are significant. All these software packages have focused on the process of data rather than downstream analysis. Many of them do not even provide visual graphic outputs. Another class of synteny and collinearity tools works with the general-purpose genome browsers, which are softwares that allow the user to view genome annotations in the context of a reference sequence. Most of them use vector graphics to enable the scrolling and zooming through arbitrary regions of a genome. GBrowse-syn [19] is the plugin of GBrowse 2.0 [20, 21], one of the most powerful web-based applications to visualize genomic data. It allows the comparison of collinear regions of multiple genomes using the GBrowse-styled web page, in which the synteny and collinearity are displayed as traditional connection diagram. This kind of general-purpose software packages however only provides very basic drawings, as they are not designed to meet the advance visualization requirement of the synteny and collinearity representation. As synteny and collinearity visualization becomes increasingly important, many specific software programs have been developed lately. Most of these software programs, such as SynChro [5], GSV [22], and Easyfig [23], however inherit the linear tradition in this area, which plots the synteny and collinearity relationship into lines and bars. A typical output style uses two bars for the chromosomes and lines for the colocational relationship. While it is easier and more convenient to use web based interface to generate the linear plot, it is difficult for research reporting, especially for paper pipelines. The extension package of MCScan named MCScanX [24] implements 15 utility programs for display and analyses. However, MCScanX provides a command-line based plotter with PNG output only. Another case in point is that the i-ADHoRe 3.0 [18] extends ADHoRe [9] and provides a package to draw dot plot in SVG vector graphics and PNG raster images. Circos [25] is a well-known visualization tool using circular ideogram layout to facilitate the identification and analysis of similarities and differences found in comparisons of genomes. Raster or vector images can be created from GFF-style data inputs and hierarchical configuration files, which are popular in bioinformatics researches, making Circos suitable for rapid reporting pipelines. A typical case is C-Sibelia [26], which focuses on the synteny and collinearity analysis and outputs the Circos-formated file to plot. Many recent genetic research reports in Nature and Science have applied Circos-styled figures, but still it only provides circular plot. There are many online platforms for genome evolution that are dedicated to synteny and collinearity analysis. Meanwhile, more and more researchers use their visualization services in their research procedures. Since the cost of calculation grows exponentially with the amount of data, particularly in the process of analysis, most of these platforms provide dotted or linear plot because it is much simpler and faster to accomplish. Examples of such platforms include Plant Genome Duplication DataBase [27], MIPS CrowsNest [28], and Yeast Gene Order Browser [29]. Only very few platforms can generate complex plots, such as circular plot and multialignment plot, for example, the famous Ensembl [30, 31]. In plant comparative genomics, PLAZA 3.0 is one of the most powerful all-in-one solutions in this area. It has collected a large quantity of data and developed the full utility sets to support research from analyses to visualizations [32]. And yet none of them provides full support of vector graphic outputs. The gap for multistyled vector-based plots in synteny and collinearity remains to be filled. Generally, synteny and collinearity analysis is a frequent task in comparative genomics research. Many analysis software packages are available, but not effective in visualizing the result, shown in Table 1. The problems include lack of graphic visualization, simple representation, or inextensible output format. On the other hand, general-purpose visualization tools are powerful, but not specific for synteny and collinearity display. This requirement grows rapidly while higher throughput of datasets generates higher resolution outputs.
Table 1

Software list for synteny and collinearity visualization.

Software namePublishing yearGraphical syntenyVisualization typesVector graphics
MCMuSeC2009
OrthoCluster2009Dual bar, linear
i-ADHoRe 32011Dotted hierarchy
FISH2003
ColinearScan2006
MCScan2008
CYNTENATOR2010
SyMAP2011Dual bar
MCScanX2012 Dotted linear circular
SynChro2014Dual bar
GSV2011Dual bar
EasyFig2011Dual bar
C-Sibelia2013 Circos-format
Gbrowse-syn2010Dual bar
Kegg2000Network
Circos2009Circular
WebLogo2004 Textual
VGSC 2015 Dotted linear circular
In this paper, we introduce VGSC, a purpose-built toolkit in visualizing the synteny and collinearity into general graphical format, including both raster (JPEG, Bitmap, and PNG) and vector graphics (SVG, EPS, and PDF). Vector graphics are a computational representation of graphical objects using vectors, a geometric object with a magnitude and a direction. In this way, vector graphics are normally combinations of geometrical primitives, such as points, lines, curves, shapes, and polygons. In contrast, raster images use dot matrix data to represent a generally rectangular grid of pixels or points of color. The advantages of vectors are scale-invariance, rotate-invariance, and transform-invariance. They enable the antialiasing feature, which means graphics can be magnified infinitely without loss of quality. Therefore, vector graphics are widely used in scientific research, especially in the bioinformatics research where a massive amount of data from the sequencing process generates various types of high-resolution graphs. A good case in point is WebLogo [33], which is a software package to generate sequence logos, the graphical representations of the patterns within a multiple sequence alignment. WebLogo is so popular that in some areas it becomes the gold standard. This tool is very effective and efficient because it provides both command line interface and web interface, as well as both raster and vector graphics as outputs.

2. Implementation and Result

2.1. Software Architecture

Vector Graphic toolkit of genome Synteny and Collinearity (VGSC) is a new web-based interface for synteny and collinearity representation. Its software architecture is shown in Figure 1, in which the command-line toolkits and web-based service are both illustrated. The workflow of plotting remains as simple as most visualization tools: the end user prepares the required datasets and configures the basic parameters; the software then plots accordingly. Many of these features have simplified the process of drawing, so that researches can focus more on the analysis and interpretation of the data.
Figure 1

System architecture of VGSC.

2.2. Data Input and Configuration

In Figure 2, three inputs from end users are required: (1) synteny and collinearity file, (2) gene annotation file, and (3) control file. And they are explained as follows:If end users run VGSC in the command line, these settings serve as inputs as textual parameters. A Java Runtime Environment 1.8 is mandatory, as the software is packaged as a Java executable. For users, synteny and collinearity file and annotation file should be uploaded, and the parameters in the control file can be configured directly in the web form. In addition, we have listed a set of data samples with preconfigured parameters in the “Example” section of the website to help end users carry out tests.
Figure 2

Four types of synteny and collinearity plot: (a) Circle Plot, (b) Bar Plot, (c) Dot Plot, and (d) Dual Synteny Plot. Chromosomes are labeled in species abbreviation plus chromosome ID. os, Oryza sativa; sb, Sorghum bicolor.

Synteny and collinearity file: VGSC operates on the preprocessed synteny and collinearity data. It is easy to convert results from all the common synteny and collinearity analysis software packages into the required format. The detailed requirement is available in the software manual. Gene annotation file: this GFF3 annotation file (http://www.gmod.org/wiki/GFF3) provides the fundamental map for the plotting, which is widely used in gene assembling software and gene databases. Control file: in this file, the detailed configuration sets the width, length, color, and so forth for the plot.

2.3. Output and Result

VGSC provides four different types of plots in six different file formats, with which the synteny and collinearity information can be drawn into circle, bars, dots, and dual synteny. Figure 2 demonstrates the four plots generated by a sample data set of the synteny and collinearity across Rice (Oryza sativa) and Sorghum (Sorghum bicolor) from MCScanX website (http://chibba.pgml.uga.edu/mcscan2). In the command-line executable, we have implemented a plot manager to integrate all types of plots into one command, which has made the selection much easier. We have introduced a multiple file format adaptor, which enables both raster and vector graphics, so that the output file formats expend to SVG, EPS, PDF, JPEG, and BMP, in addition to the popular PNG format. This automatic configuration mechanism is also applied to all the parameter settings, and the detailed settings list is in the software manual, available at http://bio.njfu.edu.cn:8080/vgsc-web/static/downloads/vgsc-manual.pdf. One of the most important features of VGSC is its ability to produce vector graphics. As Figure 3 demonstrates, compared with raster graphics (right), vector graphics (left) provide higher compatibility when the image is magnified. This is particularly noticeable when high-throughput datasets are concerned. High-quality images are often a requirement for scientific research reports and papers.
Figure 3

Resolution comparison between vector graphics and raster graphics.

For web users, there is a list of options, where end users can specify the type of plot. A dropdown menu is also available, where end users can choose the output file format. Once the settings are confirmed, results can be downloaded as a separate file when the “Download” button is clicked. In the online service, both vector graphics and raster images are provided.

2.4. Online System

Parallel with command-line toolkit, we have published a web-based system, VGSC online, to provide the plotting service and to improve the experience in plotting. It is available at http://bio.njfu.edu.cn:8080/vgsc-web. The VGSC online uses Java Web Technology and is compatible with most of web containers including Tomcat and jetty. Figure 4 shows the screenshot from the example pages in VGSC online. It lists all types of plots with sample data, providing end users with a visual scaffold. We have also published the command line executable for downloading, along with some sample data and relevant documentation. All these resources are provided free.
Figure 4

Screenshot of VGSC online.

3. Conclusion

While many synteny and collinearity tools have become available in recent years, their visual presentation has not been developed accordingly. For this reason, users often have to write additional programs or redraw the synteny and collinearity output files in order to plot a representative high-quality image. This incompleteness of visualization has reduced the efficiency of existing synteny and collinearity detection pipeline. VGSC has been created to fill this gap. A distinguishing feature of VGSC and its online service is that diverse tools for vector graphics of synteny and collinearity are incorporated, which enables rapid and convenient conversion of synteny and collinearity information into graphical insights. Additional plots for downstream analysis, such as plots for gene family, will be implemented in the coming version of VGSC. VGSC therefore will also be an effective tool for structural changes and evolution analysis, annotation for new genomes, and gene family history research.
  32 in total

1.  Fast identification and statistical evaluation of segmental homologies in comparative maps.

Authors:  Peter P Calabrese; Sugata Chakravarty; Todd J Vision
Journal:  Bioinformatics       Date:  2003       Impact factor: 6.937

2.  The generic genome browser: a building block for a model organism system database.

Authors:  Lincoln D Stein; Christopher Mungall; ShengQiang Shu; Michael Caudy; Marco Mangone; Allen Day; Elizabeth Nickerson; Jason E Stajich; Todd W Harris; Adrian Arva; Suzanna Lewis
Journal:  Genome Res       Date:  2002-10       Impact factor: 9.043

3.  Reconstruction of the vertebrate ancestral genome reveals dynamic genome reorganization in early vertebrates.

Authors:  Yoichiro Nakatani; Hiroyuki Takeda; Yuji Kohara; Shinichi Morishita
Journal:  Genome Res       Date:  2007-07-25       Impact factor: 9.043

4.  Using OrthoCluster for the detection of synteny blocks among multiple genomes.

Authors:  Ismael A Vergara; Nansheng Chen
Journal:  Curr Protoc Bioinformatics       Date:  2009-09

5.  Finding and comparing syntenic regions among Arabidopsis and the outgroups papaya, poplar, and grape: CoGe with rosids.

Authors:  Eric Lyons; Brent Pedersen; Josh Kane; Maqsudul Alam; Ray Ming; Haibao Tang; Xiyin Wang; John Bowers; Andrew Paterson; Damon Lisch; Michael Freeling
Journal:  Plant Physiol       Date:  2008-10-24       Impact factor: 8.340

6.  Easyfig: a genome comparison visualizer.

Authors:  Mitchell J Sullivan; Nicola K Petty; Scott A Beatson
Journal:  Bioinformatics       Date:  2011-01-28       Impact factor: 6.937

7.  MicroSyn: a user friendly tool for detection of microsynteny in a gene family.

Authors:  Bin Cai; Xiaohan Yang; Gerald A Tuskan; Zong-Ming Cheng
Journal:  BMC Bioinformatics       Date:  2011-03-18       Impact factor: 3.169

8.  i-ADHoRe 3.0--fast and sensitive detection of genomic homology in extremely large data sets.

Authors:  Sebastian Proost; Jan Fostier; Dieter De Witte; Bart Dhoedt; Piet Demeester; Yves Van de Peer; Klaas Vandepoele
Journal:  Nucleic Acids Res       Date:  2011-11-18       Impact factor: 16.971

9.  Using GBrowse 2.0 to visualize and share next-generation sequence data.

Authors:  Lincoln D Stein
Journal:  Brief Bioinform       Date:  2013-02-01       Impact factor: 11.622

10.  MIPS PlantsDB: a database framework for comparative plant genome research.

Authors:  Thomas Nussbaumer; Mihaela M Martis; Stephan K Roessner; Matthias Pfeifer; Kai C Bader; Sapna Sharma; Heidrun Gundlach; Manuel Spannagl
Journal:  Nucleic Acids Res       Date:  2012-11-29       Impact factor: 16.971

View more
  16 in total

1.  The genome of Chenopodium quinoa.

Authors:  David E Jarvis; Yung Shwen Ho; Damien J Lightfoot; Sandra M Schmöckel; Bo Li; Theo J A Borm; Hajime Ohyanagi; Katsuhiko Mineta; Craig T Michell; Noha Saber; Najeh M Kharbatia; Ryan R Rupper; Aaron R Sharp; Nadine Dally; Berin A Boughton; Yong H Woo; Ge Gao; Elio G W M Schijlen; Xiujie Guo; Afaque A Momin; Sónia Negrão; Salim Al-Babili; Christoph Gehring; Ute Roessner; Christian Jung; Kevin Murphy; Stefan T Arold; Takashi Gojobori; C Gerard van der Linden; Eibertus N van Loo; Eric N Jellen; Peter J Maughan; Mark Tester
Journal:  Nature       Date:  2017-02-08       Impact factor: 49.962

2.  Computational analysis of potential candidate genes involved in the cold stress response of ten Rosaceae members.

Authors:  K Mohamed Shafi; Ramanathan Sowdhamini
Journal:  BMC Genomics       Date:  2022-07-16       Impact factor: 4.547

3.  Chromosome-scale genome assembly of Rhododendron molle provides insights into its evolution and terpenoid biosynthesis.

Authors:  Guo-Lin Zhou; Yong Li; Fei Pei; Ting Gong; Tian-Jiao Chen; Jing-Jing Chen; Jin-Ling Yang; Qi-Han Li; Shi-Shan Yu; Ping Zhu
Journal:  BMC Plant Biol       Date:  2022-07-15       Impact factor: 5.260

4.  Major Chromosomal Rearrangements Distinguish Willow and Poplar After the Ancestral "Salicoid" Genome Duplication.

Authors:  Jing Hou; Ning Ye; Zhongyuan Dong; Mengzhu Lu; Laigeng Li; Tongming Yin
Journal:  Genome Biol Evol       Date:  2016-06-27       Impact factor: 3.416

5.  Corrigendum to "VGSC: A Web-Based Vector Graph Toolkit of Genome Synteny and Collinearity".

Authors:  Yiqing Xu; Changwei Bi; Guoxin Wu; Suyun Wei; Xiaogang Dai; Tongming Yin; Ning Ye
Journal:  Biomed Res Int       Date:  2016-09-14       Impact factor: 3.411

6.  Reduced chromatin accessibility underlies gene expression differences in homologous chromosome arms of diploid Aegilops tauschii and hexaploid wheat.

Authors:  Fu-Hao Lu; Neil McKenzie; Laura-Jayne Gardiner; Ming-Cheng Luo; Anthony Hall; Michael W Bevan
Journal:  Gigascience       Date:  2020-06-01       Impact factor: 6.524

7.  Partnering With a Pest: Genomes of Hemlock Woolly Adelgid Symbionts Reveal Atypical Nutritional Provisioning Patterns in Dual-Obligate Bacteria.

Authors:  Kathryn M Weglarz; Nathan P Havill; Gaelen R Burke; Carol D von Dohlen
Journal:  Genome Biol Evol       Date:  2018-06-01       Impact factor: 3.416

8.  Comparative genomics and the nature of placozoan species.

Authors:  Michael Eitel; Warren R Francis; Frédérique Varoqueaux; Jean Daraspe; Hans-Jürgen Osigus; Stefan Krebs; Sergio Vargas; Helmut Blum; Gray A Williams; Bernd Schierwater; Gert Wörheide
Journal:  PLoS Biol       Date:  2018-07-31       Impact factor: 8.029

9.  Corrigendum #2 to "VGSC: A Web-Based Vector Graph Toolkit of Genome Synteny and Collinearity".

Authors:  Yiqing Xu; Changwei Bi; Guoxin Wu; Suyun Wei; Xiaogang Dai; Tongming Yin; Ning Ye
Journal:  Biomed Res Int       Date:  2019-05-30       Impact factor: 3.411

10.  Single-molecule sequencing and Hi-C-based proximity-guided assembly of amaranth (Amaranthus hypochondriacus) chromosomes provide insights into genome evolution.

Authors:  D J Lightfoot; D E Jarvis; T Ramaraj; R Lee; E N Jellen; P J Maughan
Journal:  BMC Biol       Date:  2017-08-31       Impact factor: 7.431

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.