Literature DB >> 19401401

baobabLUNA: the solution space of sorting by reversals.

Marília D V Braga1.   

Abstract

SUMMARY: Computing the reversal distance and searching for an optimal sequence of reversals to transform a unichromosomal genome into another are useful algorithmic tools to analyse real evolutionary scenarios. Currently, these problems can be solved by at least two available softwares, the prominent of which are GRAPPA and GRIMM. However, the number of different optimal sequences is usually huge and taking only the distance and/or one example is often insufficient to do a proper analysis. Here, we offer an alternative and present baobabLUNA, a framework that contains an algorithm to give a compact representation of the whole space of solutions for the sorting by reversals problem.
AVAILABILITY AND IMPLEMENTATION: Compiled code implemented in Java is freely available for download at http://pbil.univ-lyon1.fr/software/luna/. Documentation with methodological background, technical aspects, download and setup instructions, interface description and tutorial are available at http://pbil.univ-lyon1.fr/software/luna/doc/luna-doc.pdf.

Entities:  

Mesh:

Year:  2009        PMID: 19401401      PMCID: PMC2705226          DOI: 10.1093/bioinformatics/btp285

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 INTRODUCTION

Computing the reversal distance between two unichromosomal genomes without duplications, insertions and deletions and finding one optimal sequence of reversals (that is, a sequence with a minimum number of reversals) that transforms one genome into the other can be solved in polynomial time, thanks to Hannenhalli and Pevzner (1999). These two problems have been the topic of several works, such as Tannier et al. (2007), and their solutions are valuable tools to analyse evolutionary scenarios. Currently, there are at least two available softwares to solve these problems. One is the package GRAPPA and the other is the software GRIMM, described respectively, in Moret et al. (2001) and Tesler (2002). Nevertheless, there are many different solutions, with each solution representing an optimal sequence of reversals that sort one genome into another, and finding only one is often insufficient. Exploring the whole set of solutions is thus an interesting strategy to do a more realistic analysis. The first step in this direction was the enumeration of all solutions, thanks to an algorithm proposed by Siepel (2003). However, since the number of solutions is usually huge, the whole set is very hard to handle and this could be as useless as finding one of them. Bergeron et al. (2002) then proposed a model to represent the solutions in a compact way, grouping them into classes of equivalence. This allows to reduce the set to be handled and an algorithm to directly enumerate the classes was given by Braga et al. (2008). The number of non-equivalent solutions can be still too large, therefore, a method was proposed for filtering solutions using constraints (Braga, 2009). In this work, we describe baobabLUNA, a framework that contains the implementation of the algorithm developed by Braga et al. (2008) to directly enumerate all the classes of equivalent solutions and also the further use of biological constraints to filter the classes.

2 DESCRIPTION

2.1 Permutations, reversals and sorting sequences

Genomes are represented by the list of homologous markers between them. These markers correspond to the integers 1, 2,…, n, with a plus or minus sign to indicate the strand they lie on. The order and orientation of the markers of one genome in relation to the other is represented by a signed permutation π = (π1, π2,…, π, π) of size n over {−n,…, −1, 1,…, n}, such that, for each value i from 1 to n, either i or −i is mandatorily represented, but not both. The identity permutation (1, 2, 3,…, n) is denoted by I. A subset of numbers ρ⊆{1, 2,…, n−1, n} is said to be an interval of a permutation π if there exist i,j∈{1,…, n}, 1≤i≤j≤n, such that ρ={|π|, |π|,…, |π|, |π|}. Given a permutation π and an interval ρ of π, we can apply a reversal on the interval ρ of π, that is, the operation which reverses the order and flips the signs of the elements of ρ, that results in the permutation (π1,…, π, −π,…, −π, π,…, π). If s=ρ1ρ2…ρ is a sequence of reversals for a permutation π, we say that s sorts π into π if the result of the consecutive application of the reversals ρ1, ρ2, …ρ on π is π. The length of a shortest sequence sorting π into π is called the reversal distance of π and π, denoted by d(π, π). Let s=ρ1ρ2…ρ be a sequence of reversals sorting π into π. If d(π, π)=i, then s is said to be an optimal sorting sequence. As an example, the sequence {1}{2}{4}{1, 2, 3} sorts (−3, 2, 1, −4) into I4 and is optimal.

2.2 Main functionalities

2.2.1 Computing traces

Given two permutations π and π, the enumeration of all solutions (sequences) that sort π into π can be done by iterating an algorithm given by Siepel (2003). However, the number of solutions is huge and the complexity of enumerating all of them is O(n2) (Braga, 2009). Bergeron et al. (2002) introduced a more compact representation of the space of solutions, grouping them into equivalence classes called traces. All equivalent solutions in a trace are composed by the same reversals but in different orders. Observe however that this is not the formal definition of a trace, which can be obtained in Braga (2009). Braga et al. (2008) later proposed an algorithm to directly give one representative solution and the number of solutions in each trace. The complexity of this algorithm is also exponential in a property of the traces called width (Braga, 2009), but, as the number of traces is usually much smaller than the number of solutions, enumerating traces runs considerably faster. The framework baobabLUNA contains the implementation of the algorithm developed by Braga et al. (2008). As a simple example of the gain represented by this algorithm with respect to the enumeration of all solutions, the 28 solutions that sort (−3, 2, 1, −4) into I4, can be grouped in only two traces, one is represented by {1}{1, 2, 3}{2}{4} and has 24 solutions, while the other is {1, 2, 4}{3}{1, 3, 4}{2, 3, 4} and has 4 solutions. More details on how the algorithm generates directly the traces and also counts the number of solutions in each trace can be obtained in Braga (2009).

2.2.2 Filtering traces with constraints

Biological constraints can be used to filter the traces of optimal sequences, as described in Braga (2009). Besides the two signed permutations π and π, this approach requires a list C of compatible constraints for selecting the sequences that sort π into π and respect the given constraints. Frequently, only a subset of the sorting sequences of a trace is in agreement with the constraints in C, and this subset is called C-induced subtrace. The result of applying this method is the complete set of non-empty C-induced subtraces of sequences sorting π into π. Generally, we have no guarantee that a sorting sequence that respects all constraints exists, thus this approach can lead to an empty result. One of the considered constraints is the list of common intervals detected between the two initial permutations, that may correspond to the clusters of co-localized genes between the considered genomes—an optimal sequence of reversals that does not break the common intervals may be more realistic than one that does break. This approach was previously used in several studies [see for instance, Diekmann et al. (2007)]. We used the common intervals initially detected and also a variation of this approach, described in Braga (2009), that is the list of common intervals progressively detected when sorting one permutation into another by reversals. Another constraint implemented in baobabLUNA is called strata and is specific to the evolution of sexual X and Y chromosomes in mammals and some other organisms. Although X and Y are usually very different, they still share an identical region (called ‘pseudo-autosomal’ region) at one of their extremities and are believed to have evolved from an identical pair of chromosomes. This process is at the origin of sexual differentiation: the female XX and the male XY pairs. Current theories suggest that the pseudo-autosomal region, which originally covered the whole chromosomes, was successively pruned by a few big reversals on the Y chromosome (Lahn and Page, 1999). The successive limits of the pseudo-autosomal region on the X chromosome represent the limits of what have been called the ‘evolutionary strata’ of X chromosome and a sequence of reversals that could have created the strata on human X chromosome is given by Ross et al. (2005). The use of the strata as a constraint to filter the space of solutions of the sorting by reversals problem is described in Braga et al. (2008) and is used by Lemaitre et al. (2009) to evaluate the scenario of reversals given by Ross et al. (2005).

2.3 Experiments

In order to evaluate the performance of the algorithm that computes directly the traces, named traces, we used the algorithm enumSol that enumerates all solutions. We also tested the filters perfTrcs, that selects traces whose solutions do not break common intervals initially detected, prgSubt, which selects subtraces whose solutions do not break common intervals progressively detected and strSubt that selects subtraces whose solutions produce a given strata in the origin permutation. The analysed permutations are π=(−12,11,−10,6,13,−5,2,7,8,−9,3,4,1) and π=(−12,11,−10,−1,16,−4,−3,15,−14,9,−8,−7,−2,−13,5,−6) (both fictitious), Rfe=(1,3,−2,−11,5,−9,−10,8,6,−7,−4,12) and R2=I12 [the bacterium Rickettsia felis and its ancestor R2, reconstructed in Blanc et al. (2007)], X=I12 and Y=(−12,11,−2,−1,−10,−9,8,−5,7,6,−4,3), [human X and Y chromosomes, as the scenario proposed in Ross et al. (2005)]. The results are in Table 1 and show that computing traces directly indeed runs much faster than computing solutions. Moreover, the variants that take constraints in consideration usually run faster than computing all traces. Additional analyses and experimental results can be found in Braga (2009).
Table 1.

Computation results for each pair of permutations (the number of elements and reversal distance of each pair is given in the first column).

PERMUT.AlgorithmNSNTExecution time
πA, I12enumSol8 278 54013.5 min
n=12, d=10traces8 278 540215127 sec
perfTrcs1 698 480124 sec
prgSubt453 60032 sec
πB, I16enumSol505 634 25616 h
n=16, d=12traces505 634 25621 9027.3 min
perfTrcs122 862 96017127 sec
prgSubt5 963 760614 sec
Rfe, R2enumSol546 84042 sec
n=12, d=9traces546 840133 sec
prgSubt263 08862 sec
X, YenumSol31 752-5 sec
n=12, d=8traces31 75261.3 sec
strSubt42010.5 sec

The columns N and N give, respectively, the number of sorting sequences and traces computed by each algorithm. Experiments were made on a 64 bit personal computer with two 3 GHz CPUs and 2 GB of RAM.

Computation results for each pair of permutations (the number of elements and reversal distance of each pair is given in the first column). The columns N and N give, respectively, the number of sorting sequences and traces computed by each algorithm. Experiments were made on a 64 bit personal computer with two 3 GHz CPUs and 2 GB of RAM.

2.4 Download, setup and tutorial

Download and setup instructions, interface description and tutorial for computing traces (including the versions that take constraints in consideration) are available in http://pbil.univ-lyon1.fr/software/luna.

3 FINAL REMARKS

The framework baobabLUNA contains the implementation of a method proposed by Braga et al. (2008), that gives a compact representation of the solution space of the sorting by reversals problem, grouping solutions into traces. This is an interesting alternative to most of the previous methods that give either only one or all solutions, and are provided by tools such as GRIMM (Tesler, 2002) and GRAPPA (Moret et al., 2001). However, although the number of traces is much smaller than the number of solutions, it may be still too big to be interpreted, and in some cases, too big to be computed. Indeed, currently we are unable to compute traces for permutations with a reversal distance of about 20 or higher. Different biological constraints can be used to filter the traces and reduce the universe to be handled. Nevertheless, there is no guarantee that a solution that respects the given constraints exists, thus this approach may lead to empty results.
  9 in total

1.  A new implementation and detailed study of breakpoint analysis.

Authors:  B M Moret; S Wyman; D A Bader; T Warnow; M Yan
Journal:  Pac Symp Biocomput       Date:  2001

2.  GRIMM: genome rearrangements web server.

Authors:  Glenn Tesler
Journal:  Bioinformatics       Date:  2002-03       Impact factor: 6.937

3.  An algorithm to enumerate sorting reversals for signed permutations.

Authors:  Adam C Siepel
Journal:  J Comput Biol       Date:  2003       Impact factor: 1.479

4.  Evolution under reversals: parsimony and conservation of common intervals.

Authors:  Yoan Diekmann; Marie-France Sagot; Eric Tannier
Journal:  IEEE/ACM Trans Comput Biol Bioinform       Date:  2007 Apr-Jun       Impact factor: 3.710

5.  Exploring the solution space of sorting by reversals, with experiments and an application to evolution.

Authors:  Marília D V Braga; Marie-France Sagot; Celine Scornavacca; Eric Tannier
Journal:  IEEE/ACM Trans Comput Biol Bioinform       Date:  2008 Jul-Sep       Impact factor: 3.710

6.  Four evolutionary strata on the human X chromosome.

Authors:  B T Lahn; D C Page
Journal:  Science       Date:  1999-10-29       Impact factor: 47.728

7.  The DNA sequence of the human X chromosome.

Authors:  Mark T Ross; Darren V Grafham; Alison J Coffey; Steven Scherer; Kirsten McLay; Donna Muzny; Matthias Platzer; Gareth R Howell; Christine Burrows; Christine P Bird; Adam Frankish; Frances L Lovell; Kevin L Howe; Jennifer L Ashurst; Robert S Fulton; Ralf Sudbrak; Gaiping Wen; Matthew C Jones; Matthew E Hurles; T Daniel Andrews; Carol E Scott; Stephen Searle; Juliane Ramser; Adam Whittaker; Rebecca Deadman; Nigel P Carter; Sarah E Hunt; Rui Chen; Andrew Cree; Preethi Gunaratne; Paul Havlak; Anne Hodgson; Michael L Metzker; Stephen Richards; Graham Scott; David Steffen; Erica Sodergren; David A Wheeler; Kim C Worley; Rachael Ainscough; Kerrie D Ambrose; M Ali Ansari-Lari; Swaroop Aradhya; Robert I S Ashwell; Anne K Babbage; Claire L Bagguley; Andrea Ballabio; Ruby Banerjee; Gary E Barker; Karen F Barlow; Ian P Barrett; Karen N Bates; David M Beare; Helen Beasley; Oliver Beasley; Alfred Beck; Graeme Bethel; Karin Blechschmidt; Nicola Brady; Sarah Bray-Allen; Anne M Bridgeman; Andrew J Brown; Mary J Brown; David Bonnin; Elspeth A Bruford; Christian Buhay; Paula Burch; Deborah Burford; Joanne Burgess; Wayne Burrill; John Burton; Jackie M Bye; Carol Carder; Laura Carrel; Joseph Chako; Joanne C Chapman; Dean Chavez; Ellson Chen; Guan Chen; Yuan Chen; Zhijian Chen; Craig Chinault; Alfredo Ciccodicola; Sue Y Clark; Graham Clarke; Chris M Clee; Sheila Clegg; Kerstin Clerc-Blankenburg; Karen Clifford; Vicky Cobley; Charlotte G Cole; Jen S Conquer; Nicole Corby; Richard E Connor; Robert David; Joy Davies; Clay Davis; John Davis; Oliver Delgado; Denise Deshazo; Pawandeep Dhami; Yan Ding; Huyen Dinh; Steve Dodsworth; Heather Draper; Shannon Dugan-Rocha; Andrew Dunham; Matthew Dunn; K James Durbin; Ireena Dutta; Tamsin Eades; Matthew Ellwood; Alexandra Emery-Cohen; Helen Errington; Kathryn L Evans; Louisa Faulkner; Fiona Francis; John Frankland; Audrey E Fraser; Petra Galgoczy; James Gilbert; Rachel Gill; Gernot Glöckner; Simon G Gregory; Susan Gribble; Coline Griffiths; Russell Grocock; Yanghong Gu; Rhian Gwilliam; Cerissa Hamilton; Elizabeth A Hart; Alicia Hawes; Paul D Heath; Katja Heitmann; Steffen Hennig; Judith Hernandez; Bernd Hinzmann; Sarah Ho; Michael Hoffs; Phillip J Howden; Elizabeth J Huckle; Jennifer Hume; Paul J Hunt; Adrienne R Hunt; Judith Isherwood; Leni Jacob; David Johnson; Sally Jones; Pieter J de Jong; Shirin S Joseph; Stephen Keenan; Susan Kelly; Joanne K Kershaw; Ziad Khan; Petra Kioschis; Sven Klages; Andrew J Knights; Anna Kosiura; Christie Kovar-Smith; Gavin K Laird; Cordelia Langford; Stephanie Lawlor; Margaret Leversha; Lora Lewis; Wen Liu; Christine Lloyd; David M Lloyd; Hermela Loulseged; Jane E Loveland; Jamieson D Lovell; Ryan Lozado; Jing Lu; Rachael Lyne; Jie Ma; Manjula Maheshwari; Lucy H Matthews; Jennifer McDowall; Stuart McLaren; Amanda McMurray; Patrick Meidl; Thomas Meitinger; Sarah Milne; George Miner; Shailesh L Mistry; Margaret Morgan; Sidney Morris; Ines Müller; James C Mullikin; Ngoc Nguyen; Gabriele Nordsiek; Gerald Nyakatura; Christopher N O'Dell; Geoffery Okwuonu; Sophie Palmer; Richard Pandian; David Parker; Julia Parrish; Shiran Pasternak; Dina Patel; Alex V Pearce; Danita M Pearson; Sarah E Pelan; Lesette Perez; Keith M Porter; Yvonne Ramsey; Kathrin Reichwald; Susan Rhodes; Kerry A Ridler; David Schlessinger; Mary G Schueler; Harminder K Sehra; Charles Shaw-Smith; Hua Shen; Elizabeth M Sheridan; Ratna Shownkeen; Carl D Skuce; Michelle L Smith; Elizabeth C Sotheran; Helen E Steingruber; Charles A Steward; Roy Storey; R Mark Swann; David Swarbreck; Paul E Tabor; Stefan Taudien; Tineace Taylor; Brian Teague; Karen Thomas; Andrea Thorpe; Kirsten Timms; Alan Tracey; Steve Trevanion; Anthony C Tromans; Michele d'Urso; Daniel Verduzco; Donna Villasana; Lenee Waldron; Melanie Wall; Qiaoyan Wang; James Warren; Georgina L Warry; Xuehong Wei; Anthony West; Siobhan L Whitehead; Mathew N Whiteley; Jane E Wilkinson; David L Willey; Gabrielle Williams; Leanne Williams; Angela Williamson; Helen Williamson; Laurens Wilming; Rebecca L Woodmansey; Paul W Wray; Jennifer Yen; Jingkun Zhang; Jianling Zhou; Huda Zoghbi; Sara Zorilla; David Buck; Richard Reinhardt; Annemarie Poustka; André Rosenthal; Hans Lehrach; Alfons Meindl; Patrick J Minx; Ladeana W Hillier; Huntington F Willard; Richard K Wilson; Robert H Waterston; Catherine M Rice; Mark Vaudin; Alan Coulson; David L Nelson; George Weinstock; John E Sulston; Richard Durbin; Tim Hubbard; Richard A Gibbs; Stephan Beck; Jane Rogers; David R Bentley
Journal:  Nature       Date:  2005-03-17       Impact factor: 49.962

8.  Footprints of inversions at present and past pseudoautosomal boundaries in human sex chromosomes.

Authors:  Claire Lemaitre; Marilia D V Braga; Christian Gautier; Marie-France Sagot; Eric Tannier; Gabriel A B Marais
Journal:  Genome Biol Evol       Date:  2009-04-30       Impact factor: 3.416

9.  Reductive genome evolution from the mother of Rickettsia.

Authors:  Guillaume Blanc; Hiroyuki Ogata; Catherine Robert; Stéphane Audic; Karsten Suhre; Guy Vestris; Jean-Michel Claverie; Didier Raoult
Journal:  PLoS Genet       Date:  2007-01-19       Impact factor: 5.917

  9 in total
  4 in total

1.  Sampling solution traces for the problem of sorting permutations by signed reversals.

Authors:  Christian Baudet; Zanoni Dias; Marie-France Sagot
Journal:  Algorithms Mol Biol       Date:  2012-06-15       Impact factor: 1.405

2.  Listing all sorting reversals in quadratic time.

Authors:  Krister M Swenson; Ghada Badr; David Sankoff
Journal:  Algorithms Mol Biol       Date:  2011-04-19       Impact factor: 1.405

3.  A Genetic Map of Ostrich Z Chromosome and the Role of Inversions in Avian Sex Chromosome Evolution.

Authors:  Homa Papoli Yazdi; Hans Ellegren
Journal:  Genome Biol Evol       Date:  2018-08-01       Impact factor: 3.416

4.  An asymmetric approach to preserve common intervals while sorting by reversals.

Authors:  Marília D V Braga; Christian Gautier; Marie-France Sagot
Journal:  Algorithms Mol Biol       Date:  2009-12-30       Impact factor: 1.405

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.