| Literature DB >> 19455207 |
Abstract
Many of the estimated topologies in phylogenetic studies are presented with the bootstrap support for each of the splits in the topology indicated. If phylogenetic estimation is unbiased, high bootstrap support for a split suggests that there is a good deal of certainty that the split actually is present in the tree and low bootstrap support suggests that one or more of the taxa on one side of the estimated split might in reality be located with taxa on the other side. In the latter case the follow-up questions about how many and which of the taxa could reasonably be incorrectly placed as well as where they might alternatively be placed are not addressed through the presented bootstrap support. We present here an algorithm that finds the set of all trees with minimum bootstrap support for their splits greater than some given value. The output is a ranked list of trees, ranked according to the minimum bootstrap supports for splits in the trees. The number of such trees and their topologies provides useful supplementary information in bootstrap analyses about the reasons for low bootstrap support for splits. We also present ways of quantifying low bootstrap support by considering the set of all topologies with minimum bootstrap greater than some quantity as providing a confidence region of topologies. Using a double bootstrap we are able to choose a cutoff so that the set of topologies with minimum bootstrap support for a split greater than that cutoff gives an approximate 95% confidence region. As with bootstrap support one advantage of the methods is that they are generally applicable to the wide variety of phylogenetic estimation methods.Entities:
Keywords: bootstrap support; confidence regions; phylogeny; splits; statistical tests
Year: 2007 PMID: 19455207 PMCID: PMC2674659
Source DB: PubMed Journal: Evol Bioinform Online ISSN: 1176-9343 Impact factor: 1.625
Figure 1The estimated mammalian mitochondrial tree (first panel) with the top ranked trees in terms of minimum bootstrap support given across rows. Bootstrap support is indicated for each of the branches. Since the ranked trees are constructed from splits alone, branch lengths are arbitrary and taken as equal. Min BP is the minimum bootstrap support among splits in the tree. P gives the p-value for the null hypothesis that the tree is correct based on a double bootstrap procedure.
The p-values for the hypothesis that the tree is correct for the 15 trees with cow and harbour seal split from the rest. Trees are ranked according to log likelihood values as in Table 3 of Shimodaira (2002) based upon fits using PAML (Yang, 1997) and are listed in Table 2. PP denotes approximate Bayes posterior probabilities, KH, AU, SH and WSH denote p-values from the KH, AU, SH and weighted SH tests. The minBP values for each tree is given as is the p-value based on bootstrapped minBP values from 100 bootstrap samples each using 100 bootstrap sample to obtain a minBP value.
| Tree | PP | BP | KH | AU | SH | WSH | GLS | minBP | p-value |
|---|---|---|---|---|---|---|---|---|---|
| 1 | 0.934 | 0.579 | 0.039 | 0.789 | 0.944 | 0.948 | 0.0410 | 62.3 | 0.67 |
| 2 | 0.065 | 0.312 | 0.361 | 0.516 | 0.799 | 0.791 | 0.0380 | 29.2 | 0.31 |
| 3 | 0.001 | 0.036 | 0.122 | 0.114 | 0.575 | 0.422 | 0.0353 | 1.3 | 0.02 |
| 4 | 0.000 | 0.013 | 0.044 | 0.075 | 0.178 | 0.210 | 0.0024 | 6.5 | 0.05 |
| 5 | 0.000 | 0.035 | 0.066 | 0.128 | 0.149 | 0.299 | 0.0013 | 5.6 | 0.03 |
| 6 | 0.000 | 0.005 | 0.049 | 0.029 | 0.114 | 0.105 | 0.0050 | 5.6 | 0.03 |
| 7 | 0.000 | 0.017 | 0.051 | 0.101 | 0.112 | 0.252 | 0.0013 | 1.4 | 0.02 |
| 8 | 0.000 | 0.001 | 0.032 | 0.009 | 0.073 | 0.050 | 0.0050 | 1.0 | 0.01 |
| 9 | 0.000 | 0.000 | 0.003 | 0.000 | 0.032 | 0.015 | 0.0024 | 0.0 | 0.00 |
| 10 | 0.000 | 0.003 | 0.019 | 0.028 | 0.034 | 0.124 | 0.0013 | 1.0 | 0.01 |
| 11 | 0.000 | 0.000 | 0.010 | 0.003 | 0.018 | 0.069 | 0.0013 | 0.0 | 0.00 |
| 12 | 0.000 | 0.000 | 0.003 | 0.001 | 0.006 | 0.033 | 0.0013 | 1.3 | 0.02 |
| 13 | 0.000 | 0.000 | 0.003 | 0.001 | 0.006 | 0.034 | 0.0013 | 0.0 | 0.00 |
| 14 | 0.000 | 0.000 | 0.001 | 0.005 | 0.003 | 0.013 | 0.0013 | 1.0 | 0.01 |
| 15 | 0.000 | 0.000 | 0.001 | 0.002 | 0.002 | 0.009 | 0.0013 | 1.0 | 0.01 |
The topologies for the 15 trees in Table 1.
| Tree | Topology |
|---|---|
| 1 | ((human,(seal,cow)),rabbit),mouse,opossum |
| 2 | (human,((seal,cow),rabbit)),mouse,opossum |
| 3 | (human,rabbit),(seal,cow),(mouse,opossum) |
| 4 | (human,(seal,cow)),(rabbit,mouse),opossum |
| 5 | human,((seal,cow),(rabbit,mouse)),opossum |
| 6 | human,(((seal,cow),rabbit),mouse),opossum |
| 7 | (human,(rabbit,mouse)),(seal,cow),opossum |
| 8 | (human,mouse),((seal,cow),rabbit),opossum |
| 9 | ((human,(seal,cow)),mouse),rabbit,opossum |
| 10 | ((human,mouse),rabbit),(seal,cow),opossum |
| 11 | ((human,rabbit),mouse),(seal,cow),opossum |
| 12 | ((human,mouse),(seal,cow)),rabbit,opossum |
| 13 | human,(((seal,cow),mouse),rabbit),opossum |
| 14 | (human,rabbit),((seal,cow),mouse),opossum |
| 15 | (human,((seal,cow),mouse)),rabbit,opossum |
The topologies with minBP larger than zero for the HIV data set.
| Topology | minBP | p-value |
|---|---|---|
| A1,(A2,(E1,E2)),(D,B) | 83.7 | 0.47 |
| A2,(A1,(E1,E2)),(D,B) | 9.5 | 0.01 |
| A1,A2,((D,B),(E1,E2)) | 6.8 | 0.01 |
| E2,(E1,(A1,A2)),(D,B) | 2.0 | 0.01 |
Figure 2The estimated EF-1α tree (first panel) with the top ranked trees in terms of minimum bootstrap support given across rows. Bootstrap support is indicated for each of the branches. Since the ranked trees are constructed from splits alone, branch lengths are arbitrary and taken as equal. Min BP is the minimum bootstrap support among splits in the tree. P gives the p-value for the null hypothesis that the tree is correct based on a double bootstrap procedure.
Full names for the 13 taxa in the archaebacterial EF-1 data set.
| S | |
| D | |
| Ao | |
| Pa | |
| Tc | |
| Ph | |
| Pw | |
| Af | |
| Mj | |
| Mv | |
| Hh | |
| Hm | |
| Ta |
Figure 3The EF-1α trees for groups Af, DSAP {D, S, Ao, Pa}, H {Hm, Hh}, M {Mj, Mv}, Ta and P {Tc, Ph, Pw}. All of the trees with minBP greater than or equal to 1 are indicated. The trees, ranked in terms of minBP, are given across rows.
Bootstrap support and minimum bootstrap support for trees arising in 1000 bootstrap replications for the HIV and mammalian mitochondrial data.
| HIV
| Mammal
| ||
|---|---|---|---|
| BP | minBP | BP | minBP |
| 83.7 | 83.7 | 59.2 | 59.9 |
| 9.5 | 9.5 | 33.7 | 33.8 |
| 6.6 | 6.8 | 5.6 | 5.6 |
| 0.2 | 0.2 | 0.7 | 0.8 |
| 0.6 | 1.3 | ||
| 0.1 | 0.8 | ||
| 0.1 | 0.1 | ||
Figure 4A scatter plot and boxplot of the minBP values and BP values for trees arising in 1000 bootstrap replicates for the EF1-α data.
Figure 5Plots of the cumulative distribution of bootstrap support for the two tree. Each curve gives the probability that bootstrap support is less than or equal to the corresponding quantity on the x-axis. Curves are given for both the bootstrap support of the topology and the minimum bootstrap support of the splits of that topology. The true generating tree was the same as the estimated tree for the mammalian mitochondrial data in Figure 1, but with branch lengths
((seal: 0.1, cow: 0.1): a, human: 0.1): b, (rabbit: 0.1, (mouse: 0.1, opposum: 0.1): a)
The internal branch lengths a and b were allowed to vary. The cumulative distribution functions were estimated from 1000 nucleotide data sets simulated under a Jukes-Cantor process each with B = 100 bootstrap replicates.