| Literature DB >> 22962004 |
Andre J Aberer1, Denis Krompass, Alexandros Stamatakis.
Abstract
The presence of rogue taxa (rogues) in a set of trees can frequently have a negative impact on the results of a bootstrap analysis (e.g., the overall support in consensus trees). We introduce an efficient graph-based algorithm for rogue taxon identification as well as an interactive webservice implementing this algorithm. Compared with our previous method, the new algorithm is up to 4 orders of magnitude faster, while returning qualitatively identical results. Because of this significant improvement in scalability, the new algorithm can now identify substantially more complex and compute-intensive rogue taxon constellations. On a large and diverse collection of real-world data sets, we show that our method yields better supported reduced/pruned consensus trees than any competing rogue taxon identification method. Using the parallel version of our open-source code, we successfully identified rogue taxa in a set of 100 trees with 116 334 taxa each. For simulated data sets, we show that when removing/pruning rogue taxa with our method from a tree set, we consistently obtain bootstrap consensus trees as well as maximum-likelihood trees that are topologically closer to the respective true trees.Entities:
Mesh:
Year: 2012 PMID: 22962004 PMCID: PMC3526802 DOI: 10.1093/sysbio/sys078
Source DB: PubMed Journal: Syst Biol ISSN: 1063-5157 Impact factor: 15.683
Figure 1Runtimes for the STA, BMA, and RNR algorithm with maximum dropset size l:= 1 and l:= 2. x-axis refers to the initial number of bipartitions |ℬ| for a bootstrap tree collection. Runtimes for MRC as consensus threshold (SC similar).
Figure 2Support improvement (in %) for optimization with a MRC threshold. RNR-l depicts RNR runs with l ∈ [1,3], BMA-mod is a less conservative modification of the BMA.
Figure 3Support improvement (in %) for optimization with a SC threshold. RNR-l depicts RNR runs with l ∈ [1,3], BMA-mod is a less conservative modification of the BMA.