| Literature DB >> 28105920 |
Caroline Anne Larlee1, Alex Brandts1, David Sankoff2.
Abstract
BACKGROUND: The median of k≥3 genomes was originally defined to find a compromise genome indicative of a common ancestor. However, in gene order comparisons, the usual definitions based on minimizing the sum of distances to the input genomes lead to degenerate medians reflecting only one of the input genomes. "Near-medians", consisting of equal samples of gene adjacencies from all the input genomes, were designed to restore the idea of compromise to the median problem. RESULT: We explore adjacency sampling constructions in full generality in the case k=3, with given overlapping sets of adjacencies in the three genomes, where all adjacencies in two-way or three-way overlaps are included in the sample. We require the construction to be maximal, in the sense that no additional proportion of adjacencies from any of the genomes may be added without violating the local linearity of the genome. We discover that in incorporating as many adjacencies as possible, evenly from all the input genomes, we are actually maximizing, rather than minimizing, the sum of distances over all other maximal sampling schemes.Entities:
Keywords: Breakpoint distance; Gene adjacency; Gene order; Median problem
Mesh:
Year: 2016 PMID: 28105920 PMCID: PMC5249007 DOI: 10.1186/s12859-016-1340-y
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1a First sampling of θ n adjacencies from each of three genomes. b Supplementary sampling of residual adjacencies consisting of two free ends
Fig. 2Surface of distance sum as a function of θ 1,θ 2,θ 3, in barycentric coordinates when θ 3 is maximized as a function of θ 1 and θ 2
Maximizing θ 3 for various combinations of θ 1 and θ 2
|
|
| max |
|
|---|---|---|---|
| 0.15 | 0.15 | 0.4900 | 2.2100 |
| 0.20 | 0.4225 | 2.2275 | |
| 0.25 | 0.3600 | 2.2400 | |
| 0.30 | 0.3025 | 2.2475 | |
| 0.35 | 0.2500 | 2.2500 | |
| 0.20 | 0.15 | 0.4225 | 2.2275 |
| 0.20 | 0.3600 | 2.2400 | |
| 0.25 | 0.3025 | 2.2475 | |
| 0.30 | 0.2500 | 2.2500 | |
| 0.35 | 0.2025 | 2.2475 | |
| 0.25 | 0.15 | 0.3600 | 2.2400 |
| 0.20 | 0.3025 | 2.2475 | |
| 0.25 | 0.2500 | 2.2500 | |
| 0.30 | 0.2025 | 2.2475 | |
| 0.35 | 0.1600 | 2.2400 | |
| 0.30 | 0.15 | 0.3025 | 2.2475 |
| 0.20 | 0.2500 | 2.2500 | |
| 0.25 | 0.2025 | 2.2475 | |
| 0.30 | 0.1600 | 2.2400 | |
| 0.35 | 0.1225 | 2.2275 | |
| 0.35 | 0.15 | 0.2500 | 2.2500 |
| 0.20 | 0.2025 | 2.2475 | |
| 0.25 | 0.1600 | 2.2400 | |
| 0.30 | 0.1225 | 2.2275 | |
| 0.35 | 0.0900 | 2.2100 |
Fig. 3Sampling scheme showing variable proportions θ 1,θ 2,θ 3, and given two-way intersections ω 1,ω 2,ω 3 and three-way intersection ψ. All these contributions lower s. White area in genome h represent the randomly completed portion