| Literature DB >> 25274273 |
Abstract
A method was developed for simultaneous Bayesian inference of species delimitation and species phylogeny using the multispecies coalescent model. The method eliminates the need for a user-specified guide tree in species delimitation and incorporates phylogenetic uncertainty in a Bayesian framework. The nearest-neighbor interchange algorithm was adapted to propose changes to the species tree, with the gene trees for multiple loci altered in the proposal to avoid conflicts with the newly proposed species tree. We also modify our previous scheme for specifying priors for species delimitation models to construct joint priors for models of species delimitation and species phylogeny. As in our earlier method, the modified algorithm integrates over gene trees, taking account of the uncertainty of gene tree topology and branch lengths given the sequence data. We conducted a simulation study to examine the statistical properties of the method using six populations (two sequences each) and a true number of three species, with values of divergence times and ancestral population sizes that are realistic for recently diverged species. The results suggest that the method tends to be conservative with high posterior probabilities being a confident indicator of species status. Simulation results also indicate that the power of the method to delimit species increases with an increase of the divergence times in the species tree, and with an increased number of gene loci. Reanalyses of two data sets of cavefish and coast horned lizards suggest considerable phylogenetic uncertainty even though the data are informative about species delimitation. We discuss the impact of the prior on models of species delimitation and species phylogeny and of the prior on population size parameters (θ) on Bayesian species delimitation.Entities:
Keywords: Bayesian species delimitation; guide tree; multispecies coalescent; nearest-neighbor interchange; reversible-jump MCMC; species tree
Mesh:
Year: 2014 PMID: 25274273 PMCID: PMC4245825 DOI: 10.1093/molbev/msu279
Source DB: PubMed Journal: Mol Biol Evol ISSN: 0737-4038 Impact factor: 16.240
FNNI on a rooted species tree. Each internal branch (say, X-Y) defines three possible trees relating three nodes A, B, and C. Given the current tree S1, the algorithm moves to one of the other two trees, S2 and S3, chosen at random.
FSome nodes on the gene tree are modified when the NNI algorithm is used to change species tree S1 to S2 in figure 1, that is, to prune species A and regraft it to branch C. A moved node (marked with •) lies in species AB and has exactly one daughter node with descendents in species A only. This, together with the subtree represented by the daughter node with descendents in species A only, is pruned and regrafted to a random contemporary branch in species C. In addition, four other kinds of “affected” nodes have their population IDs changed. They all have ages in the interval and reside in either species AB or C. Any node marked with ○ or △ has descendents in species A only and changes its population ID from AB to AC. Any node marked with ⋄ is in species C and changes its population ID from C to AC. Any node marked with □ is in species AB with each of the two daughter nodes having descendents in species B, and changes its population ID from AB to B.
FModels of species delimitation and species phylogeny for three populations A, B, and C. Models on the same row correspond to different species delimitation models given the same guide tree, formed by collapsing internal nodes on the guide tree (represented by short gray branches). The one-species model is represented three times, and there are nine models in our MCMC algorithm even though there are only seven biologically distinct models. The two priors constructed in this article assign equal probabilities () to the nine models. An NNI algorithm is used to move between the guide trees, whereas rjMCMC is used to move between species-delimitation models.
FThe models of species delimitation and species phylogenies for four populations A–D. There should be 15 rows, but only two rows are shown here, to represent the two guide tree shapes. On the same row are the species delimitation models generated by collapsing internal nodes on the same guide tree. The pair of numbers next to each model is the number of species and the number of labeled histories for the species tree. rjMCMC moves between different species-delimitation models are shown, but most of the NNI moves changing species phylogenies are not shown here.
FThe models of species delimitation and species phylogeny for five populations A–E. There should be 105 rows but only three are shown here, to represent the three different guide tree shapes. See legends to figures 3 and 4.
Prior Probability for the Number of Delimited Species under Prior 1 (uniform distribution for rooted trees).
| Number of Delimited Species | Number of Delimitations | Number of Rooted Trees | Number of Guide Trees | Product | Probability |
|---|---|---|---|---|---|
| | 1 | 1 | 3 | 3 | |
| | 3 (1 2) | 1 | 1 | 3 | |
| | 1 (1 1 1) | 3 | 1 | 3 | |
| | 1 | 1 | 15 | 15 | |
| | 3 (2 2) | 1 | 1 | 3 | |
| 4 (1 3) | 1 | 3 | 12 | ||
| | 6 (1 1 2) | 3 | 1 | 18 | |
| | 1 | 15 | 1 | 15 | |
| | 1 | 1 | 105 | 105 | |
| | 5 (1 4) | 1 | 15 | 75 | |
| 10 (2 3) | 1 | 3 | 30 | ||
| | 10 (1 1 3) | 3 | 3 | 90 | |
| 15 (1 2 2) | 3 | 1 | 45 | ||
| | 10 (1 1 1 2) | 15 | 1 | 150 | |
| | 1 | 105 | 1 | 105 | |
| | 1 | 1 | 945 | 945 | |
| | 6 (1 5) | 1 | 105 | 630 | |
| 15 (2 4) | 1 | 15 | 225 | ||
| 10 (3 3) | 1 | 9 | 90 | ||
| | 15 (1 1 4) | 3 | 15 | 675 | |
| 60 (1 2 3) | 3 | 3 | 540 | ||
| 15 (2 2 2) | 3 | 1 | 45 | ||
| | 20 (1 1 1 3) | 15 | 3 | 900 | |
| 45 (1 1 2 2) | 15 | 1 | 675 | ||
| | 15 (1 1 1 1 2) | 105 | 1 | 1,575 | |
| | 1 | 945 | 1 | 945 |
Note.—Number of delimitations is the number of ways that s populations can be partitioned into d delimited species with the given configuration shown in parentheses. The sum over all configurations is the Stirling number of the second kind, S(s, d). For s = 5 populations, this is 1, 15, 25, 10, 1 for d = 1, 2, 3, 4, 5, respectively; and for s = 6, this is 1, 31, 90, 65, 15, 1 for d = 1, 2, 3, 4, 5, 6, respectively. The total number of delimitations is given by the sum of S(s, d) over d, known as the Bell number. This is 5, 15, 52, 203 for s = 3, 4, 5, 6, respectively. Number of rooted trees R is the number of rooted tree topologies for d species. The total number of models (of species delimitation and species phylogeny) for s populations is then given by the product of the number of delimitations and the number of rooted tree topologies, summed over the different configurations. This is 7, 41, 346, 3,797, for s = 3, 4, 5, 6, respectively. Number of guide trees is the number of collapsed guided trees that are compatible with the delimitation model; those guide trees correspond to different representations of the same biological model in our algorithm. For example, with s = 5 populations, there are possible ways of delimiting three species. Ten of them group three populations into one species with the other two as distinct species (i.e., configuration 1, 1, 3 in the table), such as . There are three rooted tree topologies for each of such delimitations of d = 3 species, and each tree topology, such as , is compatible with three guide trees (which resolve the species ABC in different ways) and thus has three representations in our algorithm. Under prior 1, with s > 4 populations, for .
FTwo species trees used to simulate data, with (a) and (b) . Data simulated on tree (a) will be informative about species phylogeny but not about species delimitation, whereas the opposite is true for data simulated on tree (b). The five species on each tree (three contemporary, two ancestral) have the same population size parameter θ, and .
Average MAP Probability of Model versus Percent Correct.
| Number of Loci | ||||||
|---|---|---|---|---|---|---|
| Prob | % Correct | Prob | % Correct | Prob | % Correct | |
| 1 | 0.53 | 0.64 | 0.28 | 0.30 | 0.17 | 0.06 |
| 2 | 0.77 | 0.92 | 0.40 | 0.52 | 0.24 | 0.32 |
| 5 | 0.83 | 0.88 | 0.53 | 0.76 | 0.34 | 0.42 |
| 10 | 0.89 | 1.0 | 0.61 | 0.70 | 0.39 | 0.52 |
| 20 | 0.93 | 1.0 | 0.78 | 0.94 | 0.57 | 0.78 |
Note.—Prob is the average probability of the MAP model over all the simulated data sets for each specific combination of simulation parameters and % correct is the proportion of these data sets for which the delimitation and phylogeny both matched the true model used in the simulation (i.e., the MAP model is the true model).
FBoxplot of posterior probabilities for the true delimitation (a–c) and true model (d–f) in data of different numbers of loci simulated with and (left panels, a and d), and (middle panels, b and e), and and (right panels, c and f). The median is represented by black horizontal lines, the 95% CI by rectangles, the 99% CI by dashed lines, and the outliers as open dots.
FPosterior probabilities for six models of species delimitation and species phylogeny in the 95% credibility set for the coast horned lizard data, obtained from a BPP analysis under the prior and . Models (a′)–(c′) are identical to (a)–(c), respectively, except that 2.SCA and 3.NBC are one species. Note that species trees of (a), (b), (a′), and (b′) are consistent with the geographical distributions of the populations, but those of (c) and (c′) are not. The posterior probabilities in parentheses are for the top four models under the prior and .