| Literature DB >> 23658708 |
Nan Gao1, Ning Yang, Jijun Tang.
Abstract
Recent advancement of technologies has now made it routine to obtain and compare gene orders within genomes. Rearrangements of gene orders by operations such as reversal and transposition are rare events that enable researchers to reconstruct deep evolutionary histories. An important application of genome rearrangement analysis is to infer gene orders of ancestral genomes, which is valuable for identifying patterns of evolution and for modeling the evolutionary processes. Among various available methods, parsimony-based methods (including GRAPPA and MGR) are the most widely used. Since the core algorithms of these methods are solvers for the so called median problem, providing efficient and accurate median solver has attracted lots of attention in this field. The "double-cut-and-join" (DCJ) model uses the single DCJ operation to account for all genome rearrangement events. Because mathematically it is much simpler than handling events directly, parsimony methods using DCJ median solvers has better speed and accuracy. However, the DCJ median problem is NP-hard and although several exact algorithms are available, they all have great difficulties when given genomes are distant. In this paper, we present a new algorithm that combines genetic algorithm (GA) with genomic sorting to produce a new method which can solve the DCJ median problem in limited time and space, especially in large and distant datasets. Our experimental results show that this new GA-based method can find optimal or near optimal results for problems ranging from easy to very difficult. Compared to existing parsimony methods which may severely underestimate the true number of evolutionary events, the sorting-based approach can infer ancestral genomes which are much closer to their true ancestors. The code is available at http://phylo.cse.sc.edu.Entities:
Mesh:
Year: 2013 PMID: 23658708 PMCID: PMC3642123 DOI: 10.1371/journal.pone.0062156
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1The DCJ median problem and its bounding box formed by the three outer edges.
Figure 2Adjacency graph and DCJ distance of two genomes
G 1 = (3,−1,−4,2,5) and G 2 = (1,2,3,4,5). The number of cycles C is 1, the number of paths I is 2, the DCJ distance is .
Figure 3Adjacency graphs of each stage of one DCJ sorting sequence that transforms (3 −1 −4 2 5) to (1 2 3 4 5).
Comparison of median scores.
| Comparison of the median scores (the lower the better): | ||||||||||
| r = 20 | r = 40 | r = 60 | r = 80 | r = 100 | r = 120 | r = 140 | r = 160 | r = 180 | r = 200 | |
| Our GA Method | 53.7 | 109.8 | 155.5 | 180.9 | 232.1 | 247.1 | 279.4 | 287.7 | 281.6 | 309.1 |
| ASMedian | 53.7 | 109.8 | 154.8 | 175.5 | 228 | 242.3 | - | - | - | - |
| Perfect Score | 53.6 | 109.4 | 152.2 | 173.4 | 210.6 | 221.8 | 242.4 | 254.8 | 244.4 | 261.9 |
r is the averaged number of events per edge. “-”indicates that a method cannot finish.
Comparison of the breakpoint distance from the inferred median to the true ancestor.
| Comparison to the true ancestors (the lower the better): | ||||||||||
| r = 20 |
| r = 60 | r = 80 | r = 100 | r = 120 | r = 140 | r = 160 | r = 180 | r = 200 | |
| Our GA Method | 0.3 | 0.4 | 5.0 | 9.9 | 28 | 32.7 | 44.9 | 49.2 | 57.5 | 54.9 |
| ASMedian | 0.4 | 0.3 | 6.3 | 15.6 | 40.7 | 50.5 | - | - | - | - |
r is the averaged number of events per edge. “-”indicates that a method cannot finish.
Number of generations to find the best genome.
|
|
|
|
|
|
|
|
|
|
| |
| Average | 7.9 | 27.3 | 43 | 50.6 | 94.3 | 128.6 | 99.4 | 142.8 | 172.2 | 180.4 |
| Max | 21 | 104 | 108 | 110 | 201 | 290 | 151 | 303 | 337 | 496 |