| Literature DB >> 18586750 |
Chunfang Zheng1, Qian Zhu, Zaky Adam, David Sankoff.
Abstract
MOTIVATION: Some present day species have incurred a whole genome doubling event in their evolutionary history, and this is reflected today in patterns of duplicated segments scattered throughout their chromosomes. These duplications may be used as data to 'halve' the genome, i.e. to reconstruct the ancestral genome at the moment of doubling, but the solution is often highly nonunique. To resolve this problem, we take account of outgroups, external reference genomes, to guide and narrow down the search.Entities:
Mesh:
Year: 2008 PMID: 18586750 PMCID: PMC2718624 DOI: 10.1093/bioinformatics/btn146
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.(left) Even-size natural graph completed by adding two pairs of gray edges. (right) Two odd-size natural graphs, containing x,y,z vertices and a,b,c vertices, respectively, combined into one supernatural graph so that three pairs of gray edges may be added.
Fig. 2.Halving a doubling descendent T, with one (R) or two (R1, R2) unduplicated outgroups. The double circles represent two copies of potential ancestral genomes, including solutions to the genome halving in S, and those on best trajectories between S and outgroups.
Fig. 3.Priority levels of some pathgroups for GGH with one outgroup.
Fig. 4.Phylogeny of yeasts in YGOB. Whole genome doubling event giving rise to ancestor of S.cerevisiae and C.glabrata indicated, followed by rediploidization and speciation and the divergence of these two species.
Performance comparison of sampling method and guided halving algorithm in the case of one outgroup
| Halving analysis | Sampling method | Guided halving | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2 | Δ | Time | Δ | Time | |||||||||
| AG-CG | 538 | 186 | 204 | 196 | 180 | −16 | 156 | 37 | 153 | 153 | 0 | 120 | 2.3 |
| AG-SC | 1012 | 119 | 237 | 229 | 208 | −21 | 53 | 158 | 184 | 183 | −1 | 32 | 5.3 |
| KL-CG | 546 | 186 | 210 | 203 | 184 | −19 | 154 | 50 | 160 | 160 | 0 | 120 | 3.5 |
| KL-SC | 1026 | 122 | 241 | 232 | 216 | −16 | 51 | 140 | 197 | 197 | 0 | 39 | 6.1 |
| KW-CG | 542 | 188 | 247 | 238 | 230 | −8 | 167 | 26 | 216 | 215 | −1 | 142 | 3.3 |
| KW-SC | 994 | 121 | 364 | 355 | 350 | −5 | 70 | 72 | 325 | 323 | −2 | 41 | 5.1 |
| A*-CG | 600 | 199 | 183 | 169 | 129 | −40 | 129 | 81 | 84 | 84 | 0 | 84 | 1.5 |
| A*-SC | 1062 | 124 | 79 | 70 | 37 | −33 | 37 | 114 | 5 | 5 | 0 | 5 | 0.3 |
| AG-V | 576 | 61 | 157 | 151 | 149 | −2 | 54 | 12 | 148 | 148 | 0 | 51 | 0.9 |
| KL-V | 584 | 62 | 167 | 160 | 158 | −2 | 53 | 12 | 157 | 157 | 0 | 51 | 0.9 |
| KW-V | 582 | 62 | 224 | 218 | 215 | −3 | 52 | 13 | 212 | 212 | 0 | 51 | 1.0 |
| A*-V | 600 | 62 | 57 | 49 | 39 | −10 | 39 | 14 | 29 | 29 | 0 | 29 | 0.2 |
Sample size 2000 for the sampling method. R−T represents the outgroup and doubling descendant. n is the number of genes available in that pair of genomes, with two copies in T. d=d(T,X′⊕X′′) is the doubling distance, constant over all analyses. represents the average, over all samples, of the distance estimate between the ancestor, just before doubling, and the outgroup, and the adjacent entry dmin=minsampled(X,R) is the minimum found. Δ is the improvement over d(T,X′⊕X′′)+d(X,R) due to local searching, allowing A to be found outside the set of halving solutions. d=d(A, A*) is the distance between the inferred ancestor and the ‘ground truth’. Time is measured in minutes, for 2000 samples of unrestricted halving or for one GGH run.
Results of guided halving algorithm in the case of two outgroups
| Median cost | Median cost | Δ | Time | |||||
|---|---|---|---|---|---|---|---|---|
| AG-KL-SC | 497 | 117 | 364 | 117 | 361 | −3 | 40 | 131 |
| AG-KW-SC | 478 | 116 | 502 | 116 | 498 | −4 | 41 | 204 |
| KL-KW-SC | 471 | 121 | 518 | 121 | 516 | −2 | 48 | 217 |
| AG-KL-CG | 265 | 183 | 300 | 183 | 297 | −3 | 124 | 48 |
| AG-KW-CG | 261 | 184 | 362 | 184 | 361 | −1 | 138 | 55 |
| KL-KW-CG | 259 | 184 | 368 | 184 | 366 | −2 | 136 | 62 |
| AG-KL-V | 283 | 61 | 278 | 61 | 275 | −3 | 47 | 38 |
| AG-KW-V | 280 | 61 | 340 | 62 | 339 | 0 | 51 | 41 |
| KL-KW-V | 277 | 62 | 354 | 62 | 352 | −2 | 54 | 54 |
Median cost refers to the sum of the three distances, from R1,R2 and the inferred ancestor X or A, to the median. The objective is d(T,X′⊕X″)+median cost. Δ is the improvement of d(T,A⊕A)+median cost over d(T,X′⊕X′′)+median cost due to local searching, allowing A to move outside the set of halving solutions. Time in minutes.
Fig. 5.First three dimensions of principal coordinate analysis of distances among 22 inferences of ancestral genome, based on different configurations of outgroups. Left: dimensions 1 and 2. Right dimensions 1 and 3. Dimension labels assigned subjectively after the analysis. Genomes SC, CG, AG, KL and KW further abbreviated in displays to S, G, A (not to be confused with A for ancestor elsewhere in the text, nor with A*), L and W, respectively.