| Literature DB >> 25572274 |
Caroline Larlee, Chunfang Zheng, David Sankoff.
Abstract
BACKGROUND: The breakpoint median for a set of k ≥ 3 random genomes tends to approach (any) one of these genomes ("corners") as genome length increases, although there are diminishing proportion of medians equidistant from all k ("medians in the middle"). Algorithms are likely to miss the latter, and this has consequences for the general case where input genomes share some or many gene adjacencies, where the tendency for the median to be closer to one input genome may be an artifact of the corner tendency.Mesh:
Year: 2014 PMID: 25572274 PMCID: PMC4239572 DOI: 10.1186/1471-2164-15-S6-S1
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Small phylogeny and median problems. a. Small phyogeny problem. Open dots represent ancestral nodes with unknown position in the metric space. Black dots are at given positions in the metric space. b. Median problem, with three given points and one to be inferred. c. Decomposition of the small phylogeny problem into several overlapping median problems to be solved simultaneously. N.B. open nodes may have degree ≥ 3.
Figure 2Two stage sampling scheme. a. First sampling of θn adjacencies from each of three genomes. b. Supplementary sampling of residual adjacencies consisting of two free ends.
Probability of at least θn adjacencies with two free ends in genome III.
| 0.20 | 0.24 | 0.25 | 0.26 | 0.30 | 0.333 | ||
|---|---|---|---|---|---|---|---|
| 0.9798 | 0.6101 | 0.3630 | 0.3028 | 0.2457 | 0.0716 | 0.0109 | |
| 10 | 0.9994 | 0.7657 | 0.4445 | 0.3575 | 0.2750 | 0.0506 | 0.0031 |
| 50 | 1 | 0.9864 | 0.6273 | 0.4351 | 0.2530 | 0.0026 | 0 |
| 100 | 1 | 0.9994 | 0.7163 | 0.4540 | 0.2056 | 0 | 0 |
| 500 | 1 | 1 | 0.9761 | 0.4794 | 0.0521 | 0 | 0 |
| 1000 | 1 | 1 | 0.9834 | 0.4854 | 0.0119 | 0 | 0 |
| 2000 | 1 | 1 | 0.9988 | 0.4897 | 0.0008 | 0 | 0 |
| 10000 | 1 | 1 | 1 | 0.4954 | 0 | 0 | 0 |
Figure 3Phase change at . Probability of at least θn adjacencies with two free ends in genome III.
Approach of normalized sum of distances to median score, with increasing k.
| ∑ | |
|---|---|
| 2 | 2.25 |
| 3 | 3.2457 |
| 4 | 4.2375 |
| 9 | 9.2026 |
| 49 | 49.1153 |
| 99 | 99.0864 |
| 499 | 499.0419 |
| 999 | 999.0302 |
| 1999 | 1999.0216 |
| 9999 | 9,999.0099 |
Approach to true median for ψ = 0.625 as k increases
| distance | true median value | difference | |
|---|---|---|---|
| 3 | 0.9375 | 0.750 | 0.1875 |
| 4 | 1.2989 | 1.125 | 0.1739 |
| 5 | 1.6634 | 1.500 | 0.1634 |
| 10 | 3.5067 | 3.375 | 0.1317 |
| 50 | 18.4468 | 18.375 | 0.0718 |
| 100 | 37.1785 | 37.125 | 0.0535 |
| 500 | 187.1507 | 187.125 | 0.0257 |
| 1000 | 374.6435 | 374.625 | 0.0185 |
| 2000 | 749.6383 | 749.625 | 0.0133 |
| 10,000 | 3,749.6310 | 3,749.625 | 0.0060 |