| Literature DB >> 19455229 |
Duhong Chen1, Oliver Eulenstein, David Fernández-Baca, J Gordon Burleigh.
Abstract
The utility of the matrix representation with flipping (MRF) supertree method has been limited by the speed of its heuristic algorithms. We describe a new heuristic algorithm for MRF supertree construction that improves upon the speed of the previous heuristic by a factor of n (the number of taxa in the supertree). This new heuristic makes MRF tractable for large-scale supertree analyses and allows the first comparisons of MRF with other supertree methods using large empirical data sets. Analyses of three published supertree data sets with between 267 to 571 taxa indicate that MRF supertrees are equally or more similar to the input trees on average than matrix representation with parsimony (MRP) and modified min-cut supertrees. The results also show that large differences may exist between MRF and MRP supertrees and demonstrate that the MRF supertree method is a practical and potentially more accurate alternative to the nearly ubiquitous MRP supertree method.Entities:
Keywords: Supertree; matrix representation with flipping; matrix representation with parsimony; phylogenetic trees; tree search heuristics
Year: 2007 PMID: 19455229 PMCID: PMC2674677
Source DB: PubMed Journal: Evol Bioinform Online ISSN: 1176-9343 Impact factor: 1.625
Figure 1(a) The trees T and T − T obtained after a cut at node v (b) The first rSPR neighbor tree T(1) obtained by regrafting at the root. (c–d) The transformation from T( − 1) to T().
Figure 2Internal node u and the three possible pairs of subtrees it may have, depending on the rooting. Each requires a different assignment.
Supertree data sets. The second column lists the total number of input trees in each data set, and the third column lists the number of taxa that are found in the set of all input trees. The last column lists the number of characters in the binary matrix representation of the set of input trees.
| Data set | Num. of input trees | Num. of taxa | Num. of characters |
|---|---|---|---|
| Marsupial | 158 | 267 | 1775 |
| Cetartiodactyla | 201 | 290 | 1975 |
| Legume | 20 | 571 | 765 |
Results of the supertree analyses of three empirical data sets. The triplet-fit and MAST-fit columns show the average triplet-fit or MAST-fit distances of the input trees to the supertree. The Pars. score column shows the parsimony score of the supertree based on the binary matrix representation of input trees, and the Flip dist. column shows the minimum flip distance of the supertree based on the binary matrix representation of input trees. CPU time is the computational time for each supertree algorithm.
| Data set | Supertree | Triplet-fit | MAST-fit | Pars. score | Flip dist. | CPU time (sec) |
|---|---|---|---|---|---|---|
| Marsupial | MMC | 0.544 | 0.542 | 3891 | 3058 | 164 |
| MRP | 0.823 | 0.713 | 2274 | 823 | 583 | |
| MRF(rSPR) | 0.823 | 0.717 | 2296 | 801 | 989 | |
| MRF(rTBR) | 0.823 | 0.717 | 2594 | 801 | 1398 | |
| Cetartiodactyla | MMC | 0.489 | 0.508 | 5017 | 4339 | 144 |
| MRP | 0.796 | 0.654 | 2510 | 904 | 805 | |
| MRF(rSPR) | 0.803 | 0.659 | 2524 | 893 | 2258 | |
| MRF(rTBR) | 0.804 | 0.659 | 2523 | 893 | 2895 | |
| Legume | MMC | 0.713 | 0.711 | 1489 | 1567 | 39 |
| MRP | 0.789 | 0.663 | 962 | 710 | 6884 | |
| MRF(rSPR) | 0.849 | 0.764 | 1043 | 397 | 4958 | |
| MRF(rTBR) | 0.856 | 0.764 | 1041 | 392 | 8099 |