Literature DB >> 20185404

Genomicus: a database and a browser to study gene synteny in modern and ancestral genomes.

Matthieu Muffato1, Alexandra Louis, Charles-Edouard Poisnel, Hugues Roest Crollius.   

Abstract

UNLABELLED: Comparative genomics remains a pivotal strategy to study the evolution of gene organization, and this primacy is reinforced by the growing number of full genome sequences available in public repositories. Despite this growth, bioinformatic tools available to visualize and compare genomes and to infer evolutionary events remain restricted to two or three genomes at a time, thus limiting the breadth and the nature of the question that can be investigated. Here we present Genomicus, a new synteny browser that can represent and compare unlimited numbers of genomes in a broad phylogenetic view. In addition, Genomicus includes reconstructed ancestral gene organization, thus greatly facilitating the interpretation of the data. AVAILABILITY: Genomicus is freely available for online use at http://www.dyogen.ens.fr/genomicus while data can be downloaded at ftp://ftp.biologie.ens.fr/pub/dyogen/genomicus.

Entities:  

Mesh:

Year:  2010        PMID: 20185404      PMCID: PMC2853686          DOI: 10.1093/bioinformatics/btq079

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 INTRODUCTION

From less than 20 fully sequenced metazoan genomes 4 years ago, nearly 80 species are now represented in a variety of centralized databases. This abundance of sequence data has reinforced the role of comparative genomics as the primary approach to gain insight in the organization of a genome. Comparing sequences from different species serves several purposes: (i) to outline conserved regions, a powerful guide to rapidly focus on functional regions; (ii) to document differences among these functional sequences as a first step to understand broader biological differences (metabolic, developmental, etc.) between organisms; and (iii) to identify evolutionary events that have interrupted the gene colinearity between the genomes of two species since their last common ancestor. To document and study the latter, the inference of ancestral gene orders starting from extant species provides important reference points; yet no visualization tool currently allows comparisons between an ancestral genome to one or more of its modern descendant. Existing software still limit the comparison to two or three extant genomes at a time, and are restricted to a limited range of species (Byrne and Wolfe, 2005; Courcelle et al., 2008; Derrien et al., 2007; Dong et al., 2009; Jensen et al., 2009; Lyons et al., 2008; Pan et al., 2005; Sinha and Meller, 2007). To address these issues, we have developed Genomicus, a browser dedicated to the study of synteny and the conservation of gene order among multiple genomes (currently 52 metazoan genomes and the yeast Saccharomyces cerevisiae). Importantly, Genomicus also integrates reconstructed ancestral synteny blocks at 44 ancestral nodes.

2 METHODS

2.1 Data integration

Most of the genome data displayed in Genomicus is already stored, integrated and publicly available from the Ensembl database (Hubbard et al., 2009) but without extensive synteny visualization tools. The two main types of information that are required by Genomicus are gene positional information in their respective genomes and phylogenetic relationships (orthology, paralogy) between genes. Genomicus then edits Ensembl phylogenetic trees (Vilella et al., 2009) in three ways. First, duplication nodes with a Duplication Consistency Score (Vilella et al., 2009) below a threshold, that is optimized to increase the synteny between extant genomes, are selected. In such cases, duplication nodes are shifted towards terminal branches unless stopped by an intermediate, strong, duplication node. Second, we have added Boreoeutheria, Euarchontoglires and Atlantogenata ancestral nodes in existing trees of placental mammals (Prasad et al., 2008). Third, we have added some extant species that are not currently referenced in Ensembl (Branchiostoma floridae, Nematostella vectensis and Oikopleura dioica), together with their respective ancestral nodes. For each of these new species, best reciprocal blast comparisons [best reciprocal hit (BRH)] are performed between predicted proteins and the proteins from a set of key species already referenced in Genomicus. Comparisons that are internally consistent (mutual orthology relationships are respected) allow a given protein to be added in the same phylogenetic tree as that of its BRH. In rare cases, a new protein may act as outgroup to two existing trees and fuse them through a new duplication node.

2.2 Reconstruction method

Ancestral syntenic blocks are reconstructed by a complex procedure that will be described in details elsewhere (M.Muffato et al., manuscript in preparation). Briefly, parsimonious scenarios are estimated based on pairwise comparisons of gene order between all available sequenced genomes (1378 comparisons in Genomicus v56.01). For a given ancestor, all ancestral genes that are identified as conserved neighbours in at least one such comparison become linked nodes in a graph. A weight (with values comprised between 1 and 1378) reflecting the number of times this situation was observed in all the comparisons is then applied to each link. At this stage, inconsistencies may appear in the form of ancestral genes connected to more than two neighbours. To resolve these, the weighted graph is processed using a top-down greedy algorithm where the links of highest weight are selected first and are used to select the most likely gene-to-gene connection in case of multiple choice. This produces a set of linear paths in the graph connecting ancestral genes based on the number of times their respective descendants are observed as extant neighbours. We performed extensive simulations and benchmarked our methods against several alternative methods: MGR (Bourque and Pevzner, 2002), MGRA (Alekseyev and Pevzner, 2009) and InferCars (Ma et al., 2006). Our method is the only approach able to satisfactorily analyse data with the volume (53 species and 888 217 extant genes) and complexity (duplications, deletions) found in the complete set of sequenced vertebrate genomes. The reconstructed gene order is correct in >95% of the cases (specificity), and includes between 70% and 95% of the expected ancestral gene pairs (sensitivity).

2.3 Systems and technical aspects

Genomicus is composed of Perl scripts and modules, executed with mod_perl on an Apache2 server and querying an MySQL database. The pages embed inline-SVG drawings in XHTML while the JavaScript usage is limited to an information panel retrieved with AJAX calls. Users with browsers that are not yet compliant with open web technologies require the Google Chrome Frame extension (http://www.google.com/chromeframe).

3 USAGE AND ‘VIEWS’

The home page invites the user to enter its gene of interest and will by default show a graphical representation in PhyloView. In both views, the tree can be edited (by expanding, collapsing, hiding, showing chosen nodes) to clarify the view. Genomicus also displays orthologous conserved non-coding elements (CNEs) at three levels of conservation. Finally, gene and loci information can be reached with links to other browsers such as Ensembl, UCSC and NCBI. PhyloView shows the chosen reference gene in the centre of the display with 15 neighbouring genes on both side, as well as orthologs and paralogs of the query gene in their own respective genomic regions, also with 15 neighbouring genes. When these neighbouring genes are orthologs or paralogs of genes in the reference species, they are shown with matching colours. Some species may appear twice if a copy of the reference gene underwent a duplication (shown as a red square) within the evolutionary range presented on the display. AlignView shows an alignment between (i) the genes contained within the genomic region of the reference gene and (ii) all their respective orthologs in other species. Here also, the ‘query’ gene is centred and the colour code is used to indicate orthologs between different genomes. A species spanning multiple lines means that the reference gene content is distributed over multiple chromosomes (or scaffolds; see the case of dog in Fig. 1B).
Fig. 1.

PhyloView (A) and AlignView (B) of the horse PHOX2B gene as reference. In both views, the horse PHOX2B gene and its orthologs is shown in light green over a thin vertical line. In (A), the right part of dog chromosome 13 is not syntenic with the horse and cow chromosomes (and therefore neither with their ancestral one in Laurasiatheria). In (B), AlignView shows that this region underwent a dog-specific translocation onto chromosome 3. Furthermore, the pig locus can be analysed with (B) but not in (A), because PhyloView is based on the phylogenetic tree of PHOX2B, which is not annotated in the pig genome, whereas AlignView shows genes that are orthologous to genes across the locus of reference species, not just the reference gene. Coloured circles between genes represent conserved CNEs.

PhyloView (A) and AlignView (B) of the horse PHOX2B gene as reference. In both views, the horse PHOX2B gene and its orthologs is shown in light green over a thin vertical line. In (A), the right part of dog chromosome 13 is not syntenic with the horse and cow chromosomes (and therefore neither with their ancestral one in Laurasiatheria). In (B), AlignView shows that this region underwent a dog-specific translocation onto chromosome 3. Furthermore, the pig locus can be analysed with (B) but not in (A), because PhyloView is based on the phylogenetic tree of PHOX2B, which is not annotated in the pig genome, whereas AlignView shows genes that are orthologous to genes across the locus of reference species, not just the reference gene. Coloured circles between genes represent conserved CNEs.

4 FUTURE DEVELOPMENTS

The main perspectives are to extend the functionalities and the breadth of species displayed in Genomicus. In particular, a ‘chromosome painting’ view showing extant and ancestral karyotypes that are colour coded according to a species of interest is currently in development. Genomicus will also follow the ‘Ensembl Genomes’ project and will therefore extend its scope to include plant and fungal genomes.
  14 in total

1.  Genome-scale evolution: reconstructing gene orders in the ancestral species.

Authors:  Guillaume Bourque; Pavel A Pevzner
Journal:  Genome Res       Date:  2002-01       Impact factor: 9.043

2.  SynBrowse: a synteny browser for comparative sequence analysis.

Authors:  Xiaokang Pan; Lincoln Stein; Volker Brendel
Journal:  Bioinformatics       Date:  2005-06-30       Impact factor: 6.937

3.  AutoGRAPH: an interactive web server for automating and visualizing comparative genome maps.

Authors:  Thomas Derrien; Catherine André; Francis Galibert; Christophe Hitte
Journal:  Bioinformatics       Date:  2006-12-04       Impact factor: 6.937

4.  Reconstructing contiguous regions of an ancestral genome.

Authors:  Jian Ma; Louxin Zhang; Bernard B Suh; Brian J Raney; Richard C Burhans; W James Kent; Mathieu Blanchette; David Haussler; Webb Miller
Journal:  Genome Res       Date:  2006-09-18       Impact factor: 9.043

5.  The Yeast Gene Order Browser: combining curated homology and syntenic context reveals gene fate in polyploid species.

Authors:  Kevin P Byrne; Kenneth H Wolfe
Journal:  Genome Res       Date:  2005-09-16       Impact factor: 9.043

6.  Finding and comparing syntenic regions among Arabidopsis and the outgroups papaya, poplar, and grape: CoGe with rosids.

Authors:  Eric Lyons; Brent Pedersen; Josh Kane; Maqsudul Alam; Ray Ming; Haibao Tang; Xiyin Wang; John Bowers; Andrew Paterson; Damon Lisch; Michael Freeling
Journal:  Plant Physiol       Date:  2008-10-24       Impact factor: 8.340

7.  STRING 8--a global view on proteins and their functional interactions in 630 organisms.

Authors:  Lars J Jensen; Michael Kuhn; Manuel Stark; Samuel Chaffron; Chris Creevey; Jean Muller; Tobias Doerks; Philippe Julien; Alexander Roth; Milan Simonovic; Peer Bork; Christian von Mering
Journal:  Nucleic Acids Res       Date:  2008-10-21       Impact factor: 16.971

8.  Confirming the phylogeny of mammals by use of large comparative sequence data sets.

Authors:  Arjun B Prasad; Marc W Allard; Eric D Green
Journal:  Mol Biol Evol       Date:  2008-05-02       Impact factor: 16.240

9.  Cinteny: flexible analysis and visualization of synteny and genome rearrangements in multiple organisms.

Authors:  Amit U Sinha; Jaroslaw Meller
Journal:  BMC Bioinformatics       Date:  2007-03-08       Impact factor: 3.169

10.  Narcisse: a mirror view of conserved syntenies.

Authors:  Emmanuel Courcelle; Yoann Beausse; Sébastien Letort; Olivier Stahl; Romain Fremez; Catherine Ngom-Bru; Jérôme Gouzy; Thomas Faraut
Journal:  Nucleic Acids Res       Date:  2007-11-02       Impact factor: 16.971

View more
  83 in total

1.  Annotation of the domestic dog genome sequence: finding the missing genes.

Authors:  Thomas Derrien; Amaury Vaysse; Catherine André; Christophe Hitte
Journal:  Mamm Genome       Date:  2011-11-11       Impact factor: 2.957

2.  Whole-genome duplications spurred the functional diversification of the globin gene superfamily in vertebrates.

Authors:  Federico G Hoffmann; Juan C Opazo; Jay F Storz
Journal:  Mol Biol Evol       Date:  2011-09-30       Impact factor: 16.240

3.  Gene tree correction guided by orthology.

Authors:  Manuel Lafond; Magali Semeria; Krister M Swenson; Eric Tannier; Nadia El-Mabrouk
Journal:  BMC Bioinformatics       Date:  2013-10-15       Impact factor: 3.169

4.  On-line resources for Xenopus.

Authors:  Jeff Bowes
Journal:  Methods Mol Biol       Date:  2012

5.  Fish lateral line innovation: insights into the evolutionary genomic dynamics of a unique mechanosensory organ.

Authors:  Siby Philip; João Paulo Machado; Emanuel Maldonado; Vítor Vasconcelos; Stephen J O'Brien; Warren E Johnson; Agostinho Antunes
Journal:  Mol Biol Evol       Date:  2012-07-27       Impact factor: 16.240

6.  Molecular evolution and functional divergence of the metallothionein gene family in vertebrates.

Authors:  Nina Serén; Scott Glaberman; Miguel A Carretero; Ylenia Chiari
Journal:  J Mol Evol       Date:  2014-02-21       Impact factor: 2.395

7.  A fish-specific member of the TPPP protein family?

Authors:  Ferenc Orosz
Journal:  J Mol Evol       Date:  2012-10-07       Impact factor: 2.395

8.  Diversity as Opportunity: Insights from 600 Million Years of AHR Evolution.

Authors:  Mark E Hahn; Sibel I Karchner; Rebeka R Merson
Journal:  Curr Opin Toxicol       Date:  2017-02-16

9.  Trout myomaker contains 14 minisatellites and two sequence extensions but retains fusogenic function.

Authors:  Aurélie Landemaine; Andres Ramirez-Martinez; Olivier Monestier; Nathalie Sabin; Pierre-Yves Rescan; Eric N Olson; Jean-Charles Gabillard
Journal:  J Biol Chem       Date:  2019-02-28       Impact factor: 5.157

10.  The vertebrate makorin ubiquitin ligase gene family has been shaped by large-scale duplication and retroposition from an ancestral gonad-specific, maternal-effect gene.

Authors:  Astrid Böhne; Amandine Darras; Helena D'Cotta; Jean-Francois Baroiller; Delphine Galiana-Arnoux; Jean-Nicolas Volff
Journal:  BMC Genomics       Date:  2010-12-20       Impact factor: 3.969

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.