| Literature DB >> 23418397 |
Bui Quang Minh1, Minh Anh Thi Nguyen, Arndt von Haeseler.
Abstract
Nonparametric bootstrap has been a widely used tool in phylogenetic analysis to assess the clade support of phylogenetic trees. However, with the rapidly growing amount of data, this task remains a computational bottleneck. Recently, approximation methods such as the RAxML rapid bootstrap (RBS) and the Shimodaira-Hasegawa-like approximate likelihood ratio test have been introduced to speed up the bootstrap. Here, we suggest an ultrafast bootstrap approximation approach (UFBoot) to compute the support of phylogenetic groups in maximum likelihood (ML) based trees. To achieve this, we combine the resampling estimated log-likelihood method with a simple but effective collection scheme of candidate trees. We also propose a stopping rule that assesses the convergence of branch support values to automatically determine when to stop collecting candidate trees. UFBoot achieves a median speed up of 3.1 (range: 0.66-33.3) to 10.2 (range: 1.32-41.4) compared with RAxML RBS for real DNA and amino acid alignments, respectively. Moreover, our extensive simulations show that UFBoot is robust against moderate model violations and the support values obtained appear to be relatively unbiased compared with the conservative standard bootstrap. This provides a more direct interpretation of the bootstrap support. We offer an efficient and easy-to-use software (available at http://www.cibiv.at/software/iqtree) to perform the UFBoot analysis with ML tree inference.Entities:
Mesh:
Year: 2013 PMID: 23418397 PMCID: PMC3670741 DOI: 10.1093/molbev/mst024
Source DB: PubMed Journal: Mol Biol Evol ISSN: 0737-4038 Impact factor: 16.240
Simulation Settings.
| True Tree | Data Type | No. Sequences | No. Sites | No. Alignments |
|---|---|---|---|---|
| Yule–Harding | DNA | 100 | 500 | 200 |
| 200 | 1,000 | 200 | ||
| 500 | 1,000 | 200 | ||
| Protein | 100 | 300 | 200 | |
| 200 | 500 | 200 | ||
| PANDIT | DNA | 4–403 | 24–6,891 | 6,222 |
| Protein | 4–545 | 12–2,297 | 6,182 |
FAccuracies of SBS, RBS with RAxML, SH-aLRT with PhyML, and UFBoot approximation from the Yule–Harding (left panel) and the PANDIT-based simulations (right panel).
FImpact of moderate (JC + ) and severe model violations (JC) on the accuracies of SBS, SH-aLRT, and UFBoot in the PANDIT-based simulations.
FDistributions of run-time ratios (log2-scale) between RBS and UFBoot for 300 DNA and AA PANDIT alignments. The percentages of alignments where UFBoot runs slower (left from the dashed line) or faster (right from the dashed line) than RBS are shown.
FSchematic view of the tree space sampled by the IQPNNI algorithm. The solid curve reflects the log-likelihood surface on the tree space. The structure of tree space is defined by the NNI operations where each -taxon tree has exactly neighboring trees.