Literature DB >> 22815356

UniMoG--a unifying framework for genomic distance calculation and sorting based on DCJ.

Rolf Hilker1, Corinna Sickinger, Christian N S Pedersen, Jens Stoye.   

Abstract

SUMMARY: UniMoG is a software combining five genome rearrangement models: double cut and join (DCJ), restricted DCJ, Hannenhalli and Pevzner (HP), inversion and translocation. It can compute the pairwise genomic distances and a corresponding optimal sorting scenario for an arbitrary number of genomes. All five models can be unified through the DCJ model, thus the implementation is based on DCJ and, where reasonable, uses the most efficient existing algorithms for each distance and sorting problem. Both textual and graphical output is possible for visualizing the operations.
AVAILABILITY AND IMPLEMENTATION: The software is available through the Bielefeld University Bioinformatics Web Server at http://bibiserv.techfak.uni-bielefeld.de/dcj with instructions and example data. CONTACT: rhilker@cebitec.uni-bielefeld.de.

Entities:  

Mesh:

Year:  2012        PMID: 22815356      PMCID: PMC3463123          DOI: 10.1093/bioinformatics/bts440

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 INTRODUCTION

Genome rearrangements describe the dynamics of evolution at an abstracted genomic level, in contrast to local mutations of single DNA base pairs. Very little is known about the exact procedure of rearrangement events and how and when they are triggered. More detailed knowledge of evolution could help to improve the understanding of the mechanisms important for survival and development of species. The evolutionary distance between at least two organisms with shared gene content can be estimated by solving the combinatorial problem of finding a possible sequence of rearrangement operations among their shared genes under the aspect of parsimony. Thus, all genes unique to one of the genomes are ignored and only one representative among duplicated genes is chosen for the comparison. In recent years, large amounts of genomic data have become available and genome comparison has become a routine task. For example, the Chimpanzee Sequencing and Analysis Consortium (2005) compared chimpanzee and human genomes and developed a catalogue of genetic differences. Since both are closely related, only one fusion of two chromosomes and several inversions were identified. Another example is the comparison of human and mouse genomes by Pevzner and Tesler (2003). Among other methods they used GRIMM (Tesler, 2002b) for the analysis, because utilizing automated methods allows for easier and faster analyses, no matter how divergent the investigated organisms are. GRIMM is based on the Hannenhalli and Pevzner (HP) model (Hannenhalli and Pevzner, 1995), thus its set of rearrangement operations comprises inversions, translocations, fusions and fissions of linear genomes. However, one can investigate the phylogenetic distance under different aspects and the HP model is only one of the common models. Besides the HP model we consider four additional models. The inversion model (Hannenhalli and Pevzner, 1999) allows for inversions of internal genomic regions in linear, uni-chromosomal genomes, while the translocation model (Hannenhalli, 1996) comprises the exchange of two linear chromosome ends. As already mentioned, HP combines both models and adds fusions and fissions of two chromosomes to the repertoire of rearrangement operations. Among the included models, the most general is the double cut and join (DCJ) model (Bergeron ; Yancopoulos ), which allows for all common rearrangement operations: inversions, translocations, fusions, fissions, circularizations and decircularizations. Besides these operations, block interchanges, which describe the exchange of two DNA segments, can be mimicked through two operations by all models, except the inversion model. Finally, the restricted DCJ model (Kováč ) allows the same operations as the DCJ model, but constricts it by requiring immediate decircularization in the next step for emerging circular chromosomes. In our software, UniMoG, the DCJ Adjacency Graph data structure (Bergeron ), serves as basis for all calculations, and in contrast to GRIMM it implements, based on DCJ, all of these five distance models and is able to return either the desired distance or the distance and a corresponding optimal sorting scenario. For fast comparisons between the different distances, it is also possible to calculate all five distances and sorting scenarios at once, if applicable. Another advantage is that the input is neither limited to two genomes at a time nor can genes only be represented by integers. Instead, gene names are converted to integers for the internal representation. In the case of multiple input genomes, all of them are compared pairwise with each other. The distance results are then returned in a matrix, which is also provided in PHYLIP format (Fig. 1, inset), and can further be fed into distance-based phylogenetic tree reconstruction methods, possibly after applying distance correction models like the ones presented by Lin and Moret (2008).
Fig. 1

Two of the three output levels of a restricted DCJ sorting scenario involving the common t-RNA genes of four yeast genomes. The circular chromosome in step one is directly reincorporated in the next step according to the restricted DCJ definition

Two of the three output levels of a restricted DCJ sorting scenario involving the common t-RNA genes of four yeast genomes. The circular chromosome in step one is directly reincorporated in the next step according to the restricted DCJ definition UniMoG was implemented with a strong focus on computational efficiency. Therefore, all five distance calculations and the DCJ sorting are carried out in linear time as explained in Bergeron , Erdős , Pevzner and Tesler (2003) and Tesler (2002a). For restricted DCJ sorting, we implemented the linearithmic time algorithm of Kováč . The implemented translocation sorting algorithm, explained in Bergeron , was chosen even though its worst case running time is cubic, because in practice it almost always runs in linear time. Our implementation of the inversion sorting algorithm is the sequence augmentation algorithm introduced by Tannier with a quadratic worst case running time, based on the data structures from Bergeron . This algorithm also defines the running time of the HP sorting algorithm, since it uses the preprocessing explicated in Tesler (2002a) and afterwards hands over the concatenated genomes to the inversion sorting algorithm. Although GRIMM still contains an error, revealed by Jean and Nikolski (2007), we use their corrected capping and concatenation algorithm. Note that all of these algorithms return only one of possibly many sorting scenarios. Sampling uniformly among all scenarios will be subject of a future version of UniMoG. Because of the efficient implementation, UniMoG can handle large genomes and was tested with genomes up to 32 500 genes without encountering any problems. For further improvement of the computational performance, regions with identical gene order can be merged into larger synteny blocks, since none of the considered models can break up conserved blocks. For an intuitive handling, the output of UniMoG is divided into three levels (Fig. 1): first, the graphical output is designed for closely studying the rearrangement scenarios, highlighting each performed operation and allowing three different zoom levels. Furthermore, when color mode is active, each chromosome is assigned a unique color for easier analysis of large genomes. Second, an optimal sorting scenario is returned in text format, which allows for easy reuse of intermediate genomes. Finally, the results are also returned as a list of adjacencies of each intermediate genome. The integrated save functions allow quick saving of graphical or textual output data. Conflict of Interest: none declared.
  7 in total

1.  GRIMM: genome rearrangements web server.

Authors:  Glenn Tesler
Journal:  Bioinformatics       Date:  2002-03       Impact factor: 6.937

2.  Efficient sorting of genomic permutations by translocation, inversion and block interchange.

Authors:  Sophia Yancopoulos; Oliver Attie; Richard Friedberg
Journal:  Bioinformatics       Date:  2005-06-09       Impact factor: 6.937

3.  On sorting by translocations.

Authors:  Anne Bergeron; Julia Mixtacki; Jens Stoye
Journal:  J Comput Biol       Date:  2006-03       Impact factor: 1.479

4.  Restricted DCJ model: rearrangement problems with chromosome reincorporation.

Authors:  Jakub Kováč; Robert Warren; Marília D V Braga; Jens Stoye
Journal:  J Comput Biol       Date:  2011-09       Impact factor: 1.479

5.  Initial sequence of the chimpanzee genome and comparison with the human genome.

Authors: 
Journal:  Nature       Date:  2005-09-01       Impact factor: 49.962

6.  Genome rearrangements in mammalian evolution: lessons from human and mouse genomes.

Authors:  Pavel Pevzner; Glenn Tesler
Journal:  Genome Res       Date:  2003-01       Impact factor: 9.043

7.  Estimating true evolutionary distances under the DCJ model.

Authors:  Yu Lin; Bernard M E Moret
Journal:  Bioinformatics       Date:  2008-07-01       Impact factor: 6.937

  7 in total
  20 in total

1.  Plastid genome sequences of Gymnochlora stellata, Lotharella vacuolata, and Partenskyella glossopodia reveal remarkable structural conservation among chlorarachniophyte species.

Authors:  Shigekatsu Suzuki; Yoshihisa Hirakawa; Rumiko Kofuji; Mamoru Sugita; Ken-Ichiro Ishida
Journal:  J Plant Res       Date:  2016-02-26       Impact factor: 2.629

2.  A New Algorithm for Identifying Genome Rearrangements in the Mammalian Evolution.

Authors:  Juan Wang; Bo Cui; Yulan Zhao; Maozu Guo
Journal:  Front Genet       Date:  2019-10-29       Impact factor: 4.599

3.  Basin-scale biogeography of Prochlorococcus and SAR11 ecotype replication.

Authors:  Alyse A Larkin; George I Hagstrom; Melissa L Brock; Nathan S Garcia; Adam C Martiny
Journal:  ISME J       Date:  2022-10-22       Impact factor: 11.217

4.  A high resolution map of mammalian X chromosome fragile regions assessed by large-scale comparative genomics.

Authors:  Carlos Fernando Prada; Paul Laissue
Journal:  Mamm Genome       Date:  2014-08-03       Impact factor: 2.957

5.  The complete chloroplast genome of the green algae Hariotina reticulata (Scenedesmaceae, Sphaeropleales, Chlorophyta).

Authors:  Lijuan He; Zhaokai Wang; Sulin Lou; Xiangzhi Lin; Fan Hu
Journal:  Genes Genomics       Date:  2018-02-01       Impact factor: 1.839

6.  The chloroplast genomes of Bryopsis plumosa and Tydemania expeditiones (Bryopsidales, Chlorophyta): compact genomes and genes of bacterial origin.

Authors:  Frederik Leliaert; Juan M Lopez-Bautista
Journal:  BMC Genomics       Date:  2015-03-17       Impact factor: 3.969

7.  High variability of mitochondrial gene order among fungi.

Authors:  Gabriela Aguileta; Damien M de Vienne; Oliver N Ross; Michael E Hood; Tatiana Giraud; Elsa Petit; Toni Gabaldón
Journal:  Genome Biol Evol       Date:  2014-02       Impact factor: 3.416

8.  The complete chloroplast and mitochondrial genomes of the green macroalga Ulva sp. UNA00071828 (Ulvophyceae, Chlorophyta).

Authors:  James T Melton; Frederik Leliaert; Ana Tronholm; Juan M Lopez-Bautista
Journal:  PLoS One       Date:  2015-04-07       Impact factor: 3.240

9.  Algorithms for reconstruction of chromosomal structures.

Authors:  Vassily Lyubetsky; Roman Gershgorin; Alexander Seliverstov; Konstantin Gorbunov
Journal:  BMC Bioinformatics       Date:  2016-01-19       Impact factor: 3.169

10.  The complete mitochondrial genome of Cycas debaoensis revealed unexpected static evolution in gymnosperm species.

Authors:  Sadaf Habib; Shanshan Dong; Yang Liu; Wenbo Liao; Shouzhou Zhang
Journal:  PLoS One       Date:  2021-07-22       Impact factor: 3.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.