| Literature DB >> 21335609 |
Marina Marcet-Houben1, Toni Gabaldón.
Abstract
Comparisons of tree topologies provide relevant information in evolutionary studies. Most existing methods share the drawback of requiring a complete and exact mapping of terminal nodes between the compared trees. This severely limits the scope of genome-wide analyses, since trees containing duplications are pruned arbitrarily or discarded. To overcome this, we have developed treeKO, an algorithm that enables the comparison of tree topologies, even in the presence of duplication and loss events. To do so treeKO recursively splits gene trees into pruned trees containing only orthologs to subsequently compute a distance based on the combined analyses of all pruned tree comparisons. In addition treeKO, implements the possibility of computing phylome support values, and reconciliation-based measures such as the number of inferred duplication and loss events.Entities:
Mesh:
Year: 2011 PMID: 21335609 PMCID: PMC3105381 DOI: 10.1093/nar/gkr087
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Example of how treeKO derives pruned trees from a tree containing duplications. The initial tree (tree on the left) contains two duplication nodes (in black) marked as node 1 and node 2. treeKO splits the tree by node 1 and generate two different trees, each one of them containing one of the daughter partitions of node 1. This results in pruned tree 1 and an intermediate pruned tree that still contains duplication (node 2). treeKO will then scan these pruned trees for more duplications. In this case one of the pruned trees has a second duplication and the subtree will be once again split and reconstructed, resulting in pruned trees 2 and 3. treeKO will repeat this process until no resulting subtree contains further duplication nodes.
Figure 2.Distribution of distances between trees in P12a phylome and three alternative species trees. The upper left part of the figure shows the three topologies used. The first one is the T12a tree while the other two represent changes in this topology. Alternative topology 1 represents a change in a poorly supported node while Alternative topology 2 represents a well supported node. The two upper right graphs plot each distribution of distances of the alternative topologies against the reference T12a topology. The lower panel represents the frequency graph for the three distance distributions.
Percentage of tractable trees by comparison algorithm
| Phylome | TreeKO (%) | TOPD/ FMTS (%) | RF (%) | RF + pruning (%) |
|---|---|---|---|---|
| P60 | 100 | 100 | 0 | 22 |
| P21 | 100 | 100 | 0 | 36 |
| P12a | 100 | 100 | 14 | 38 |
| P12b | 100 | 100 | 2 | 27 |
Percentage of gene trees in a given phylome that is suitable for comparison by any given method. Columns represent the four compared programs: treeKO, TOPD/FMTS, RF and RF with an initial pruning step. Rows represent each of the fours yeast phylomes with different taxonomic coverage that can be found in phylomeDB (32).
Comparative performance of tree-comparison algorithms
| TreeKO | TOPD/FMTS | RF | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Set1 | Set2 | Set3 | Set1 | Set2 | Set3 | Set1 | Set2 | Set3 | |
| Percentage of trees compared (%) | 100 | 100 | 100 | 100 | 100 | 100 | 34 | 43 | 41 |
| Average time consumption per tree (s) | 1.31 | 1.65 | 1.95 | 10.38 | 8.84 | 9.57 | – | – | – |
| Average time consumption per single-gene tree (s) | 1.09 | 1.09 | 1.15 | 2.12 | 2.26 | 2.07 | 0.06 | 0.07 | 0.05 |
| Average time consumption per multi-gene tree (s) | 2.62 | 3.22 | 3.42 | 22.00 | 21.05 | 22.11 | – | – | – |
| Average distance | 0 | 0 | 0 | 0.45 | 0.36 | 0.43 | – | – | – |
| Average distance single gene trees | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| Average distance multiple gene trees | 0 | 0 | 0 | 0.70 | 0.61 | 0.69 | – | – | – |
Comparison between three tree comparison programs (treeKO, TOPD/FMTS and RF). Three sets containing 100 randomly chosen trees of the P12a phylome were used for comparison. Columns represent one of the sets of trees and a program. Rows contain data regarding the percentage of trees that were compared, the time consumption (expressed in seconds) and the average distance between pairs of identical trees. Data on separated by single-gene and multi-gene trees is also provided.