| Literature DB >> 25901289 |
João Tonini1, Andrew Moore2, David Stern3, Maryia Shcheglovitova4, Guillermo Ortí1.
Abstract
Phylogeneticists have long understood that several biological processes can cause a gene tree to disagree with its species tree. In recent years, molecular phylogeneticists have increasingly foregone traditional supermatrix approaches in favor of species tree methods that account for one such source of error, incomplete lineage sorting (ILS). While gene tree-species tree discordance no doubt poses a significant challenge to phylogenetic inference with molecular data, researchers have only recently begun to systematically evaluate the relative accuracy of traditional and ILS-sensitive methods. Here, we report on simulations demonstrating that concatenation can perform as well or better than methods that attempt to account for sources of error introduced by ILS. Based on these and similar results from other researchers, we argue that concatenation remains a useful component of the phylogeneticist's toolbox and highlight that phylogeneticists should continue to make explicit comparisons of results produced by contemporaneous and classical methods.Entities:
Year: 2015 PMID: 25901289 PMCID: PMC4391732 DOI: 10.1371/currents.tol.34260cc27551a527b124ec5f6334b6be
Source DB: PubMed Journal: PLoS Curr ISSN: 2157-3999
Note that this list is not exhaustive.
| Method | Details |
|---|---|
|
| Uses Metropolis–Hastings algorithms to estimate population histories and mutation rates. The ancestral DNA data are inferred at each node of the genealogical tree. Different types of sequence data can be analyzed. The program estimates demographic parameters, but do not allow for the effects of either recombination or selection. |
|
| Explicitly models the multispecies coalescent, coestimating multiple gene trees within a shared species tree and effective population sizes of extant and extinct populations in a Bayesian Markov chain Monte Carlo framework. |
|
| Estimates species tree topology, divergence times, and population sizes from gene trees under a multispecies coalescent model in a Bayesian framework; unlike *BEAST, BEST estimates each gene tree individually, then uses importance sampling to infer the species tree. |
|
| Uses non-parametric clustering of genes to estimate concordance factors based on proportions of genes supporting a clade; these factors provide the basis for inferring species-trees. Implemented in a Bayesian phylogenetic framework, and thus integrates over gene tree uncertainty. |
|
| An estimator of pairwise divergence times under the multispecies coalescent. Applies pairwise estimates by single-linkage clustering to generate species trees with branch lengths, using multiple unlinked loci and allow the use of several alleles from each population. |
|
| Improves GLASS estimation of divergence times by deriving the expected waiting time until the first interspecific coalescence occurs among independent loci for a pair of taxa. |
|
| A Bayes method for simultaneous estimation of the species divergence times and current and ancestral population sizes. Uses multiple loci, the topology of the species tree is assumed known, and a MCMC algorithm integrates over uncertain gene trees and branch lengths (or coalescence times) at each locus as well as species divergence times. It can handle any species tree and allows different numbers of sequences at different loci. |
|
| Parsimony-based method that heuristically searches for the species tree that minimizes the number of deep coalescences implied by the gene trees, and thus does not model the coalescent process. |
|
| Uses pseudo-likelihood function of the species tree to obtain maximum pseudo-likelihood estimates of species trees, with branch lengths in coalescent units. It assumes no gene flow or horizontal gene transfer, but method is robust to a small amount of the former. |
|
| The distance between two species is defined as the average number of internodes between two species across gene trees, then the species tree is estimated by the neighbor joining tree built from the distance matrix. |
|
| Uses a polynomial-time algorithm that computes the likelihood of a species tree directly from the sequences under a finite-sites model of mutation effectively integrating over all possible gene trees. The method applies to unlinked biallelic markers and it is implemented in a Markov chain Monte Carlo sampler for inferring species trees, divergence dates, and population sizes. |
|
| Uses summary statistics of coalescence times by ordering the expected ranks of the coalescences among sequences, which is consistent with the ancestral order of populations in the species tree. It is resistant to variable substitution rates along the branches in gene trees. |
|
| Uses summary statistics of average coalescence times to estimate the species tree. The gene trees are generated from multilocus sequences using a consistent method for gene tree estimation (e.g. maximum likelihood) without the molecular clock assumption. |
|
| Analytically derives the maximum likelihood species tree for a set of gene trees with branch-length information, explicitly modeling discord as a function of the coalescent process. |
|
| Infers relationships among quartets of taxa under the coalescent model using techniques from algebraic statistics; it accounts for mutational and coalescent variance. Uncertainty in the estimated relationships is quantified using the nonparametric bootstrap. |
|
| Uses coalescent-based maximum likelihood approach for inferring the species tree from a set of gene tree topologies (no branch lengths), which are used to estimated branch lengths in coalescent units.; the algorithm assumes free recombination between genes. It assumes the gene tree topologies are correctly inferred, therefore does not model uncertainty of gene tree estimation from sequences. |
Accuracy measured by the topological distance (Robinson-Foulds distance) between the true and the obtained tree. Significant differences denoted with asterisks. See also Figure 1.
| 3 loci | 9 loci | 27 loci | Concatenation | MDC | STEM | |
|---|---|---|---|---|---|---|
| 1N | <0.001* | 0.0255* | 0.6384 | <0.0001* | <0.0001* | <0.0001* |
| 10N | 0.3601 | 0.8233 | 0.8505 | <0.0002 | <0.0001* | <0.0011* |