| Literature DB >> 22900065 |
Abstract
BACKGROUND: Phylogenies are essential to many areas of biology, but phylogenetic methods may give incorrect estimates under some conditions. A potentially common scenario of this type is when few taxa are sampled and terminal branches for the sampled taxa are relatively long. However, the best solution in such cases (i.e., sampling more taxa versus more characters) has been highly controversial. A widespread assumption in this debate is that added taxa must be complete (no missing data) in order to save analyses from the negative impacts of limited taxon sampling. Here, we evaluate whether incomplete taxa can also rescue analyses under these conditions (empirically testing predictions from an earlier simulation study). METHODOLOGY/PRINCIPALEntities:
Mesh:
Year: 2012 PMID: 22900065 PMCID: PMC3416753 DOI: 10.1371/journal.pone.0042925
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Phylogeny of 16 sampled vertebrates.
Phylogeny of the 16 vertebrate taxa used in subsampling experiments. The same topology is estimated by Bayesian, likelihood, and parsimony methods, and the branch support from each method is shown (posterior probabilities for Bayesian analysis, bootstrap support for likelihood and parsimony). Branch lengths shown here were estimated using likelihood (absolute branch lengths from Bayesian analysis are somewhat longer, but relative branch lengths are effectively identical to those from likelihood).
Figure 2Major results of subsampling experiments.
Major results of subsampling experiments from higher-level vertebrate phylogeny, showing that highly incomplete taxa can rescue analyses from the impacts of limited taxon sampling. Accuracy represents the proportion of replicates in which relationships among the 4 complete taxa are estimated correctly after adding 12 incomplete taxa, from among the set of replicates in which analysis of the 4 complete taxa alone yields an incorrect estimate. Thus, accuracy here represents the proportion of replicates in which the analysis of 4 complete taxa is initially incorrect but is “rescued” by addition of the 12 incomplete taxa (i.e. correct relationships among the original 4 taxa are restored).
The proportion of replicates in which analysis of the 4 complete taxa yields an incorrect phylogeny but addition of different numbers of complete or incomplete taxa leads to estimation of correct relationships among the original 4 complete taxa (i.e. the analysis is rescued).
| Phylogenetic method | |||
| Sampling approach | Bayesian | Likelihood | Parsimony |
| 12 incomplete taxa (50% missing each) added | 82% | 86% | 38% |
| 4 incomplete taxa (50% missing each) added | 73% | 79% | 56% |
| 2 complete taxa added | 73% | 86% | 75% |
| 4 complete taxa added | 91% | 100% | 83% |
| Number of replicates | 11 | 7 | 12 |
Percentage of replicates in which estimated relationships among the 4 taxa alone are initially correct, but adding incomplete taxa yields an incorrect estimate (for the 4 complete taxa).
| Phylogenetic method | |||
| Sampling approach | Bayesian | Likelihood | Parsimony |
| 12 incomplete taxa (50% missing each) added | 0.6% | 1.6% | 0% |
| 12 incomplete taxa (75% missing each) added | 0.6% | 1.0% | 1.1% |
| 12 incomplete taxa (90% missing each) added | 1% | 1.6% | 2.3% |
| Number of relevant replicates | 178 | 186 | 175, 175, 173 |
Accuracy of phylogenetic methods for the 4 complete taxa (before and after addition of incomplete or complete taxa), including all 200 replicates.
| Phylogenetic method | |||
| Sampling approach | Bayesian | Likelihood | Parsimony |
| 4 complete taxa alone | 0.89 | 0.93 | 0.88 |
| 12 incomplete taxa (50% missing each) added | 0.98 | 0.98 | 0.93 |
| 12 incomplete taxa (75% missing each) added | 0.98 | 0.98 | 0.92 |
| 12 incomplete taxa (90% missing each) added | 0.92 | 0.94 | 0.89 |
| 2 complete taxa added | 0.96 | 0.99 | 0.96 |
| 4 complete taxa added | 0.99 | 1.00 | 0.97 |
Overall accuracy for estimated trees for all 16 taxa, when 12 of 16 taxa are incomplete, including all 200 replicates and including only the replicates in which the initial relationships of the 4 taxa are incorrect (accuracy = 0).
| Phylogenetic method | |||
| Sampling approach | Bayesian | Likelihood | Parsimony |
| All 200 replicates | |||
| 12 incomplete taxa (50% missing each) | 0.93 | 0.94 | 0.88 |
| 12 incomplete taxa (75% missing each) | 0.90 | 0.91 | 0.88 |
| 12 incomplete taxa (90% missing each) | 0.70 | 0.65 | 0.50 |
| Initial relationships incorrect | |||
| 12 incomplete taxa (50% missing each) added | 0.92 | 0.91 | 0.86 |
| 12 incomplete taxa (75% missing each) added | 0.90 | 0.92 | 0.84 |
| 12 incomplete taxa (90% missing each) added | 0.70 | 0.60 | 0.48 |
| Number of replicates | 11 | 7 | 12 |