| Literature DB >> 25572381 |
Paola Bonizzoni, Anna Paola Carrieri, Gianluca Della Vedova, Gabriella Trucco.
Abstract
BACKGROUND: The perfect phylogeny is an often used model in phylogenetics since it provides an efficient basic procedure for representing the evolution of genomic binary characters in several frameworks, such as for example in haplotype inference. The model, which is conceptually the simplest, is based on the infinite sites assumption, that is no character can mutate more than once in the whole tree. A main open problem regarding the model is finding generalizations that retain the computational tractability of the original model but are more flexible in modeling biological data when the infinite site assumption is violated because of e.g. back mutations. A special case of back mutations that has been considered in the study of the evolution of protein domains (where a domain is acquired and then lost) is persistency, that is the fact that a character is allowed to return back to the ancestral state. In this model characters can be gained and lost at most once. In this paper we consider the computational problem of explaining binary data by the Persistent Perfect Phylogeny model (referred as PPP) and for this purpose we investigate the problem of reconstructing an evolution where some constraints are imposed on the paths of the tree.Entities:
Mesh:
Year: 2014 PMID: 25572381 PMCID: PMC4240218 DOI: 10.1186/1471-2164-15-S6-S10
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Running times on unconstrained simulated instances.
| Species | Characters | Instances completed within 15 minutes | Min time (sec) | Max time (sec) | Average time (sec) | Standard deviation |
|---|---|---|---|---|---|---|
| 10 | 5 | 100 | 0.00 | 0.01 | 0.00 | 0.00 |
| 10 | 7 | 100 | 0.00 | 0.25 | 0.01 | 0.03 |
| 10 | 10 | 100 | 0.00 | 1.93 | 0.11 | 0.30 |
| 10 | 12 | 94 | 0.00 | 12.95 | 0.84 | 1.93 |
| 10 | 15 | 84 | 0.00 | 43.89 | 5.71 | 9.80 |
| 20 | 10 | 100 | 0.00 | 4.72 | 0.08 | 0.47 |
| 20 | 15 | 97 | 0.02 | 18.12 | 1.15 | 2.53 |
| 20 | 20 | 93 | 0.13 | 95.03 | 10.44 | 19.14 |
| 20 | 25 | 79 | 1.09 | 253.68 | 41.98 | 60.35 |
| 20 | 30 | 63 | 3.84 | 247.03 | 59.06 | 63.81 |
| 40 | 20 | 100 | 0.06 | 89.02 | 2.04 | 8.93 |
| 40 | 30 | 98 | 0.99 | 156.16 | 22.03 | 33.17 |
| 40 | 40 | 80 | 7.23 | 598.32 | 128.47 | 154.92 |
| 40 | 50 | 45 | 19.14 | 585.42 | 198.81 | 146.39 |
| 40 | 60 | 19 | 50.26 | 577.1 | 319.25 | 183.10 |
| 60 | 30 | 99 | 0.64 | 222.79 | 14.36 | 33.21 |
| 60 | 45 | 90 | 8.76 | 590.03 | 123.05 | 148.48 |
| 60 | 60 | 51 | 37.63 | 593.06 | 252.34 | 168.92 |
All times are in seconds.
Improvements of constrained simulated instances over unconstrained instances.
| Species | Characters | Number of added constraints | |||||
|---|---|---|---|---|---|---|---|
| 1 | 16 | ||||||
| 10 | 5 | 0 | 0 | 0 | 0 | ||
| 10 | 7 | 1 | 0 | 1 | 1 | ||
| 10 | 10 | 7 | 5 | 7 | 7 | ||
| 10 | 12 | 7 | 5 | 7 | 6 | ||
| 10 | 15 | 8 | 3 | 9 | 8 | ||
| 20 | 10 | 9 | 4 | 10 | 10 | ||
| 20 | 15 | 10 | 9 | 10 | 10 | ||
| 20 | 20 | 9 | 1 | 10 | 10 | ||
| 20 | 25 | 9 | 7 | 9 | 9 | ||
| 20 | 30 | 7 | 2 | 10 | 9 | ||
| 40 | 20 | 9 | 7 | 10 | 10 | ||
| 40 | 30 | 10 | 7 | 10 | 10 | ||
| 40 | 40 | 8 | 1 | 10 | 9 | ||
| 40 | 50 | 10 | 0 | 10 | 10 | ||
| 40 | 60 | 1 | 0 | 9 | 6 | ||
| 60 | 30 | 8 | 7 | 10 | 10 | ||
| 60 | 45 | 10 | 8 | 10 | 10 | ||
| 60 | 60 | 7 | 6 | 8 | 7 | ||
For each choice of the number of species and of characters, we state the number of instances where at least one of the 10 random constrained instances is solved more quickly than the unconstrained instance (columns labeled Fastest). Moreover we state the number of instances where the median of the 10 random constrained instances is solved more quickly than the unconstrained instance (columns labeled Median).