| Literature DB >> 21342543 |
Wen-Chieh Chang1, Gordon J Burleigh, David F Fernández-Baca, Oliver Eulenstein.
Abstract
BACKGROUND: The gene duplication (GD) problem seeks a species tree that implies the fewest gene duplication events across a given collection of gene trees. Solving this problem makes it possible to use large gene families with complex histories of duplication and loss to infer phylogenetic trees. However, the GD problem is NP-hard, and therefore, most analyses use heuristics that lack any performance guarantee.Entities:
Mesh:
Year: 2011 PMID: 21342543 PMCID: PMC3044268 DOI: 10.1186/1471-2105-12-S1-S14
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
| Notation | Definition |
|---|---|
| Taxon-cluster representation of (the) species tree: | |
| Compatibility: | |
| Rooted triple: | |
| t-inconsistency: |
Notation used in our ILP solution.
| time | Dup | time | Dup | time | Dup | time | Dup | time | Dup | |
| 10 | 0.06 | 34.80 | 0.34 | 49.70 | 22.98 | 60.10 | 200.53 | 68.80 | 12597.21 | 78.40 |
| 50 | 0.03 | 189.50 | 1.26 | 265.00 | 8.74 | 280.00 | 159.26 | 346.40 | 2953.62 | 393.10 |
| 100 | 0.06 | 382.80 | 0.63 | 523.30 | 9.64 | 598.50 | 117.38 | 701.60 | 2191.65 | 825.70 |
| 200 | 0.05 | 788.20 | 0.54 | 994.90 | 11.03 | 1217.30 | 168.85 | 1372.50 | 2709.91 | 1627.70 |
| 500 | 0.25 | 1910.30 | 0.79 | 2458.60 | 13.92 | 2987.00 | 220.17 | 3678.80 | 4270.05 | 4001.70 |
| 1000 | 0.57 | 3842.60 | 0.96 | 5283.10 | 23.54 | 6140.90 | 330.34 | 7026.40 | 5014.61 | 8258.80 |
ILP running time and the optimal duplication cost using k simulated gene trees of n taxa as inputs. At each configuration, the result is the average of 10 trials. The running time is measured in seconds.
Figure 1The optimal seed plant phylogeny. The unique optimal seed plant phylogeny based on 12 taxa and 6,084 genes under the GD model.