| Literature DB >> 20547777 |
Bryan C Carstens1, Tanya A Dewey.
Abstract
Coalescent model-based methods for phylogeny estimation force systematists to confront issues related to the identification of species boundaries. Unlike conventional phylogenetic analysis, where species membership can be assessed qualitatively after the phylogeny is estimated, the phylogenies that are estimated under a coalescent model treat aggregates of individuals as the operational taxonomic units and thus require a priori definition of these sets because the models assume that the alleles in a given lineage are sampled from a single panmictic population. Fortunately, the use of coalescent model-based approaches allows systematists to conduct probabilistic tests of species limits by calculating the probability of competing models of lineage composition. Here, we conduct the first exploration of the issues related to applying such tests to a complex empirical system. Sequence data from multiple loci were used to assess species limits and phylogeny in a clade of North American Myotis bats. After estimating gene trees at each locus, the likelihood of models representing all hierarchical permutations of lineage composition was calculated and Akaike information criterion scores were computed. Metrics borrowed from information theory suggest that there is strong support for several models that include multiple evolutionary lineages within the currently described species Myotis lucifugus and M. evotis. Although these results are preliminary, they illustrate the practical importance of coupled species delimitation and phylogeny estimation.Entities:
Mesh:
Year: 2010 PMID: 20547777 PMCID: PMC2885268 DOI: 10.1093/sysbio/syq024
Source DB: PubMed Journal: Syst Biol ISSN: 1063-5157 Impact factor: 15.683
Geographic distributions of M. lucifugus/western long-eared Myotis subspecies
| Lineage | Distribution | No. |
|
| Interior BC, SK, AB, MT, W ND, W SD, WY, CO, UT, NV, CA east of the Sierras, SE OR, S ID | 6 |
|
| NE AZ, NW NM | 2 |
|
| S AK, BC, coastal WA | 3 |
|
| Central N Mexico through Sierra Madres to NM, AZ, CO, UT, ID, NV, E & S CA | 2 |
|
| E WY, SW SD, W NE | 1 |
|
| Coastal N CA, OR, WA, BC | 1 |
|
| E North America (NF/NS south to GA, AL, MS), west to OK, E KS, NE, SD, ND, MB, AB, SK, YT, AK | 7 |
|
| Coastal CA, OR, WA, BC | 4 |
|
| W SD, W ND, MT, ID, E WA, E OR, NV, UT, CO, E CA, N AZ, N NM | 3 |
|
| Central Sierras, W NV, ID, SW MT | 4 |
|
| Baja Cal, N. Mexico, AZ, NM, CA, NV, OR, WA, ID, MT, SD, ND, AB, BC to SE AK, | 1 |
Note: States and provinces are abbreviated with their postal abbreviations.
FSpecies phylogenies used in the false lumping/splitting simulations. Branch lengths are in units of N generations, and genealogies were simulated using θ = 10.0.
Information about each locus
| Locus | bp |
|
| Indels | Model | – lnL clock | – lnL unc | LRT | df |
| 681a | 732 | 26 | 53 | 3 | K80+I | 1595.86433 | 1579.12674 | 33.47518 | 24 |
| 681b | 546 | 37 | 69 | None | K81uf+I | 1493.57263 | 1474.77693 | 37.5914 | 35 |
| 685a | 605 | 38 | 80 | 5 | K81+I+γ | 1619.8739 | 1595.43153 | 48.88474 | 36 |
| 734z | 718 | 37 | 110 | 8 | HKY+I | 2183.66386 | 2135.17373 | 96.98026* | 35 |
| 735b | 627 | 42 | 73 | 6 | TIMef+I+γ | 1809.92806 | 1755.56346 | 108.7292* | 40 |
| 735f | 523 | 46 | 43 | 3 | K80+I+γ | 1399.23408 | 1346.43538 | 105.5974* | 44 |
| cytb | 741 | 34 | 134 | None | HKY+γ | 2083.05541 | 2067.00842 | 32.09398 | 32 |
Note: Shown for each locus are the length (bp), the number of alleles (n), segregating sites (s), indels, the model of sequence evolution, the log-likelihoods with the molecular clock enforced and not enforced, the test statistic for the LRT, and the degrees of freedom (df).
Per-lineage estimates of θ from Migrate-n
| Group | Low | θ | High |
| Ave (species) | 8.383 | 10.127 | 12.362 |
|
| 11.859 | 13.471 | 15.325 |
|
| 6.970 | 8.721 | 11.054 |
|
| 6.318 | 8.187 | 10.706 |
| Ave (subspecies) | 7.610 | 10.011 | 13.236 |
|
| 9.652 | 12.284 | 15.468 |
|
| 6.481 | 8.800 | 11.977 |
|
| 6.522 | 8.919 | 12.407 |
|
| 8.765 | 11.591 | 15.448 |
|
| 6.630 | 8.460 | 10.877 |
Notes: Shown are the ML estimates of θ (average per locus) for each species and subspecies, as well as the averages at each taxonomic level. Also shown are the upper and lower 95% confidence intervals. All estimates for any partition fall well within the confidence intervals for all other partitions. Note that we have not included estimates for partitions that sampled three or fewer individuals.
FSpecies trees estimated using STEM. a) Shows the species tree estimated using the species as OTUs and only the clocklike loci, whereas (b) shows the species tree estimated using the subspecies as OTUs. c) Shows the species tree estimated using all loci, including those that violate the molecular clock assumption. d) Shows the topology estimated when only the non-clocklike loci are used. Note that the trees shown in (a) and (b) are drawn to equivalent scales, as are the trees shown in (c) and (d).
FRepresentation of the posterior distribution of species tree space from a BEST analysis. a) Shows the species tree estimated using species as OTUs and all loci, whereas (b) shows the species tree estimated using the 3 loci that were consistent with the molecular clock. c) Shows phylogeny estimate from the clocklike loci and subspecies as OTUs. For each, the branches are the mean values of all tree in the posterior distribution, the shaded bars represent the 95% highest posterior density interval of branch lengths, and the numbers above each branch depict the Bayesian posterior probabilities of each node.
FPhylogeny estimated from the concatenated data. Unlike the other methods used to estimate phylogeny, individuals serve as the OTUs in this phylogeny. We randomly chose one of the phased alleles from the nuclear loci and combined them with the cytochrome b data, we then repeated the analysis with the other set of alleles. Shown is the tree from one run, the other was broadly similar. Samples are color-coded to subspecies using the colors shown to the left of the phylogeny.
Delimitation using LRTs
| Species | Lineage composition | – lnL | 2(Δ – lnL) | |
|
| Each spp. separate | 570.533 | ||
|
| 570.533 | 0.000 | ||
|
| 570.533 | 0.000 | ||
|
| 570.533 | 0.000 | ||
|
| 573.375 | 5.686 | ||
|
|
| 570.533 | ||
|
| 579.581 | 18.097* | ||
|
| Each ssp. separate | 570.533 | ||
|
| 648.174 | 155.282* | ||
|
| 620.935 | 100.804* | ||
|
| 588.732 | 36.398* | ||
|
| 637.752 | 134.440* | ||
|
| 635.230 | 129.395* | ||
|
| 573.375 | 5.686* | ||
|
| 613.120 | 85.175* | ||
|
| 583.284 | 25.504* | ||
|
| 611.458 | 81.850* | ||
|
| 582.416 | 23.767* | ||
|
| 602.181 | 63.298* | ||
|
| 605.059 | 69.052* | ||
|
| 630.831 | 120.597* | ||
|
| 630.017 | 118.969* |
Notes: Shown for each species are the comparisons, the highest likelihood given a particular model, and the test statistic. Within each species, we used a Bonferroni correction for multiple comparisons. Significant values are shown with an “*”. The first letter of the genus, species, and subspecies names are used to abbreviate each subspecies (e.g., ).
Results of species delimitation using BEST and the Bayes factor
| Lineage Composition | – lnL | BF | Interpretation |
| Mla, Mlc, Mll, Mlr, Mec, Mej, Mtp, Mtt, Mtv | 12502.211 | Por | |
| Mla_Mlc_Mll_Mlr, Mec_Mej, Mtp_Mtt_Mtv | 12857.375 | 0.9724 | Modest support subspecies as lineages |
| Mla, Mlc, Mll, Mlr, Mec, Mej, Mtp_Mtv, Mtt | 11830.384 | 1.0568 | Modest support for |
| Mla, Mlc, Mll, Mlr, Mec, Mej, Mtt_Mtp, Mtv | 11851.727 | 1.0549 | Modest support for |
| Mla, Mlc, Mll, Mlr, Mec, Mej, Mtt_Mtv_Mtp | 12598.799 | 0.9923 | Modest support for |
Note: Shown are the details of the lineages combined for a given BEST analysis, the mean log-likelihood, the Bayes factor (BF), and the interpretation of these values.
Summary of information-theoretic assessment of species limits
| Lineage composition | – lnL |
| AIC | Δ |
|
|
| Clocklike loci | ||||||
| | – 570.533 | 8 | 1157.065 | 0.000 | 1.000 | 0.685 |
| | – 570.533 | 9 | 1159.065 | 2.000 | 0.135 | 0.093 |
| | – 570.533 | 9 | 1159.065 | 2.000 | 0.135 | 0.093 |
| | – 570.533 | 9 | 1159.065 | 2.000 | 0.135 | 0.093 |
| | – 573.375 | 7 | 1160.751 | 3.686 | 0.025 | 0.017 |
| | – 570.533 | 10 | 1161.065 | 4.000 | 0.018 | 0.013 |
| | – 573.375 | 8 | 1162.751 | 5.686 | 0.003 | 0.002 |
| | – 573.375 | 8 | 1162.751 | 5.686 | 0.003 | 0.002 |
| | – 573.375 | 8 | 1162.751 | 5.686 | 0.003 | 0.002 |
| | – 573.375 | 9 | 1164.751 | 7.686 | 0.000 | 0.000 |
| All loci | ||||||
| | – 1242.720 | 8 | 2501.440 | 0.000 | 1.000 | 0.775 |
| | – 1242.716 | 9 | 2503.432 | 1.992 | 0.136 | 0.106 |
| | – 1242.720 | 9 | 2503.440 | 2.000 | 0.135 | 0.105 |
| | – 1242.720 | 10 | 2505.440 | 4.000 | 0.018 | 0.014 |
| | – 1248.517 | 7 | 2511.035 | 9.595 | 0.000 | 0.000 |
| | – 1248.513 | 8 | 2513.027 | 11.586 | 0.000 | 0.000 |
| | – 1248.517 | 8 | 2513.035 | 11.594 | 0.000 | 0.000 |
| | – 1248.517 | 9 | 2515.035 | 13.594 | 0.000 | 0.000 |
| | – 1249.002 | 9 | 2516.005 | 14.564 | 0.000 | 0.000 |
| | – 1251.395 | 7 | 2516.790 | 15.349 | 0.000 | 0.000 |
Note: Columns (from left) show the set used as OTUs (using the abbreviations shown in Table 1), the log-likelihood of the model given the data, the number of parameters, the AIC, AIC differences (Δi), relative likelihood of the model given the data, and the model probabilities.
FExample results from simulation exploration of species delimitation using STEM. a) Shows the average of species tree estimates made using 10 loci with 6 lineages as OTUs (phylogeny with thick lines) and the effects of falsely lumping 2 lineages together. b) Shows the effects of falsely splitting a single lineage. In all cases, the topology is consistent, and the species tree estimated using the correct assignment of individuals to lineages is essentially identical to that used for the simulations. Shaded boxes represent the 95% confidence interval of branch length for the species tree estimates.