| Literature DB >> 33176688 |
Zhaojuan Zhang1, Wanliang Wang2, Ruofan Xia3, Gaofeng Pan3, Jiandong Wang3, Jijun Tang3,4.
Abstract
BACKGROUND: Reconstructing ancestral genomes is one of the central problems presented in genome rearrangement analysis since finding the most likely true ancestor is of significant importance in phylogenetic reconstruction. Large scale genome rearrangements can provide essential insights into evolutionary processes. However, when the genomes are large and distant, classical median solvers have failed to adequately address these challenges due to the exponential increase of the search space. Consequently, solving ancestral genome inference problems constitutes a task of paramount importance that continues to challenge the current methods used in this area, whose difficulty is further increased by the ongoing rapid accumulation of whole-genome data.Entities:
Keywords: Ancestral genome inference; DCJ sorting; Discrete optimization; Genome arrangement; Quantum-behaved particle swarm optimization
Mesh:
Year: 2020 PMID: 33176688 PMCID: PMC7656761 DOI: 10.1186/s12859-020-03833-7
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Adjacency graph of two genomes. Given and . The number of paths is 2, the number of cycles is 1, and the length of the genome is 5. By Eq (1), the DCJ distance between and is
Fig. 2The median problem. Find a median genome that minimizes the sum of DCJ distance denoted by if given three genomes , , and
Fig. 3The procedure of updating mbest by adopting two averages of the fitness value. The left figure a represents the distance to the first average fitness value among all the particles, and is selected as cbest because it is the closest distance to the first average fitness value. The right figure b represents the distance to the second average fitness value among the top particles, and is selected as mbest because it is the closest distance to the second average fitness value
Fig. 4The illustration of algorithm overview. Assuming each generation contains a set of M genomes represented as , as well as mbest and . The population of the next generation is created by sorting each toward the best genome
Performance of IDQPSO-Median with respect to size of population
| # Events | Population size | Median score | Distance to true | Adj. accuracy | Mean time (s) |
|---|---|---|---|---|---|
| 20 | 299.45 | 0.20 | 1 | 2 | |
| 40 | 299.45 | 0.20 | 1 | 2 | |
| 60 | 299.45 | 0.15 | 1 | 5 | |
| 20 | 599.95 | 1.50 | 0.998 | 11 | |
| 40 | 599.95 | 1.05 | 0.999 | 15 | |
| 60 | 598.85 | 0.95 | 0.999 | 25 | |
| 20 | 922.20 | 63.50 | 0.905 | 73 | |
| 40 | 921.15 | 62.05 | 0.909 | 81 | |
| 60 | 920.50 | 61.75 | 0.910 | 84 | |
| 20 | 1244.35 | 251.85 | 0.684 | 89 | |
| 40 | 1242.25 | 250.80 | 0.686 | 91 | |
| 60 | 1240.45 | 249.50 | 0.687 | 124 | |
| 20 | 1462.70 | 430.05 | 0.507 | 89 | |
| 40 | 1459.85 | 428.45 | 0.508 | 100 | |
| 60 | 1458.65 | 425.05 | 0.511 | 121 | |
| 20 | 1610.50 | 574.60 | 0.367 | 62 | |
| 40 | 1608.25 | 572.85 | 0.370 | 87 | |
| 60 | 1608.80 | 572.75 | 0.370 | 140 | |
| 20 | 1697.95 | 670.50 | 0.288 | 84 | |
| 40 | 1693.60 | 669.15 | 0.289 | 99 | |
| 60 | 1693.55 | 668.65 | 0.290 | 142 | |
| 20 | 1763.70 | 748.85 | 0.214 | 95 | |
| 40 | 1759.85 | 748.15 | 0.216 | 125 | |
| 60 | 1759.35 | 747.85 | 0.218 | 155 | |
| 20 | 1800.45 | 802.10 | 0.172 | 48 | |
| 40 | 1797.80 | 802.05 | 0.172 | 100 | |
| 60 | 1798.00 | 800.80 | 0.173 | 131 | |
| 20 | 1827.55 | 848.05 | 0.134 | 62 | |
| 40 | 1823.35 | 846.25 | 0.134 | 106 | |
| 60 | 1820.45 | 845.70 | 0.136 | 130 |
Median score of IDQPSO-Median, SAMedian, GAMedian, and ASMedian
| Median solver | Median score | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| r = 100 | r = 200 | r = 300 | r = 400 | r = 500 | r = 600 | r = 700 | r = 800 | r = 900 | r = 1000 | |
| IDQPSO-Median | 598.9 | 920.5 | 1240.5 | |||||||
| SAMedian | 299.5 | 599.4 | 933.4 | 1284.4 | 1516.1 | 1664.0 | 1750.3 | 1811.4 | 1850.0 | 1876.2 |
| GAMedian | 333.0 | 747.8 | 1166.6 | 1464.1 | 1648.5 | 1764.8 | 1835.3 | 1890.4 | 1918.3 | 1940.0 |
| ASMedian | 299.5 | 1621.8 | 1719.3 | 1790.9 | 1830.2 | 1856.2 | ||||
The best values of all the compared algorithms are indicated in italics
Distance to true ancestors of IDQPSO-Median, SAMedian, GAMedian, and ASMedian
| Median solver | Distance to true ancestors | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| r = 100 | r = 200 | r = 300 | r = 400 | r = 500 | r = 600 | r = 700 | r = 800 | r = 900 | r = 1000 | |
| IDQPSO-Median | 0.95 | 61.75 | 249.50 | |||||||
| SAMedian | 0.25 | 1.75 | 75.65 | 290.25 | 467.20 | 602.70 | 699.40 | 767.00 | 817.60 | 857.90 |
| GAMedian | 30.35 | 147.40 | 284.65 | 389.40 | 500.95 | 595.50 | 677.45 | 754.20 | 803.60 | 848.40 |
| ASMedian | 0.35 | 451.35 | 615.90 | 726.40 | 802.55 | 854.40 | 888.00 | |||
The best values of all the compared algorithms are indicated in italics
Adjacency accuracy of IDQPSO-Median, SAMedian, GAMedian, and ASMedian
| Median solver | Adjacency accuracy | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| r = 100 | r = 200 | r = 300 | r = 400 | r = 500 | r = 600 | r = 700 | r = 800 | r = 900 | r = 1000 | |
| IDQPSO-Median | 0.999 | 0.910 | ||||||||
| SAMedian | 1.00 | 0.99 | 0.80 | 0.48 | 0.31 | 0.208 | 0.149 | 0.112 | 0.086 | 0.066 |
| GAMedian | 0.89 | 0.60 | 0.40 | 0.31 | 0.24 | 0.184 | 0.147 | 0.112 | 0.089 | 0.071 |
| ASMedian | 1.00 | 0.57 | 0.35 | 0.219 | 0.146 | 0.101 | 0.073 | 0.055 | ||
The best values of all the compared algorithms are indicated in italics
Mean running time of IDQPSO-Median, SAMedian, GAMedian, and ASMedian
| Median solver | Mean running time (s) | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| r = 100 | r = 200 | r = 300 | r = 400 | r = 500 | r = 600 | r = 700 | r = 800 | r = 900 | r = 1000 | |
| IDQPSO-Median | 5 | 25 | ||||||||
| SAMedian | 277 | 327 | 430 | 470 | 503 | 454 | 440 | 427 | 422 | 417 |
| GAMedian | 36,112 | 34,779 | 34,091 | 33,436 | 32,994 | 32,908 | 327,35 | 32,528 | 32,445 | 32,420 |
| ASMedian | 2725 | 12,875 | 17,787 | 48,625 | 96,020 | 123,077 | 131,510 | 142,356 | ||
The best values of all the compared algorithms are indicated in italics
Average Robinson–Foulds (RF) errors for IDQPSO, Simulated Annealing and GRAPPA
| Program | RF error (%) (No transposition) | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| r = 20 | r = 40 | r = 60 | r = 80 | r = 100 | r = 120 | r = 140 | r = 160 | r = 180 | |
| IDQPSO-Median | 0 | 0 | 0 | ||||||
| SA-Median | 0 | 0 | 0 | 2.5 | 0 | 2.5 | 6.3 | 5.0 | 10.0 |
| GRAPPA-Exact | 0 | 0 | 0 | – | – | – | – | – | |
| Neighbor-joining | 0 | 0 | 0 | 2.5 | 6.3 | 6.3 | 12.5 | 20 | 20 |
– indicates a program cannot finish after 5 days of computation. For the IDQPSO and Simulated Annealing methods, results for are from the best trees obtained within 5 days of computation
The best values of all the compared algorithms are indicated in italics
Average score of the best tree for IDQPSO, Simulated Annealing and GRAPPA
| Program | Tree score (No transposition) | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| r = 20 | r = 40 | r = 60 | r = 80 | r = 100 | r = 120 | r = 140 | r = 160 | r = 180 | |
| IDQPSO-Median | 496.9 | ||||||||
| SA-Median | 496.9 | 986.0 | 1459.9 | 1862.3 | 2086.4 | 2792.6 | 3705.7 | 4306.1 | 5210.9 |
| GRAPPA-Exact | 496.9 | – | – | – | – | – | |||
– indicates a program cannot finish the scoring of any tree after 5 days of computation. For the IDQPSO and Simulated Annealing methods, results for are from the best trees obtained within 5 days of computation
The best values of all the compared algorithms are indicated in italics
Average running time for IDQPSO, Simulated Annealing and GRAPPA
| Program | Running time (s) (No transposition) | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| r = 20 | r = 40 | r = 60 | r = 80 | r = 100 | r = 120 | r = 140 | r = 160 | r = 180 | |
| IDQPSO-Median | 118.1 | 138.7 | 178.0 | ||||||
| SA-Median | 167.4 | 298.6 | 577.1 | 1361.7 | 7930.5 | 42,249.7 | |||
| GRAPPA-Exact | 365.3 | – | – | – | – | – | |||
For , both IDQPSO and Simulated Annealing are stopped after 5 days of computation
The best values of all the compared algorithms are indicated in italics
Average distance between the inferred and true tree ancestors for IDQPSO, Simulated Annealing and GRAPPA
| Program | Distance to the true ancestor (No transposition) | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| r = 20 | r = 40 | r = 60 | r = 80 | r = 100 | r = 120 | r = 140 | r = 160 | r = 180 | |
| IDQPSO-Median | 0 | 0 | 0 | ||||||
| SA-Median | 0 | 0 | 0 | 0.8 | 1.9 | 2.0 | 7.3 | 27.8 | 55.0 |
| GRAPPA-Exact | 0 | 0 | 0 | – | – | – | – | – | |
– indicates a program cannot finish the scoring of any tree after 5 days of computation. For the IDQPSO and Simulated Annealing methods, results for are from the best trees obtained within 5 days of computation
The best values of all the compared algorithms are indicated in italics
Fig. 5The topology of species. The left figure a shows the true topology of 10 drosophila species, the right figure b shows the inferred topology by the QPSO-GRAPPA method