| Literature DB >> 26273822 |
Abstract
With the advent of more sophisticated models and increase in computational power, an ever-growing amount of information can be extracted from DNA sequence data. In particular, recent advances have allowed researchers to estimate the date of historical events for a group of interest including time to most recent common ancestor (TMRCA), dates of specific nodes in a phylogeny, and the date of divergence or speciation date. Here I use coalescent simulations and re-analyze an empirical dataset to illustrate the importance of taxon sampling, in particular, on correctly estimating such dates. I show that TMRCA of representatives of a single taxon is often not the same as divergence date due to issues such as incomplete lineage sorting. Of critical importance is when estimating divergence or speciation dates a representative from a different taxonomic lineage must be included in the analysis. Without considering these issues, studies may incorrectly estimate the times at which historical events occurred, which has profound impacts within both research and applied (e.g., those related to public health) settings.Entities:
Mesh:
Year: 2015 PMID: 26273822 PMCID: PMC4537086 DOI: 10.1371/journal.pone.0128407
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Ten randomly selected genealogies from the coalescent simulations of two taxa X (N = 10) and Y (N = 1) using the ms [6] program under the a) early divergence scenario where the TMRCA is much more recent than divergence, b) the late divergence scenario where TMRCA and divergence data overlap.
Visualization was performed using DensiTree v2.2.1 [14]. The red line and blue line approximate the range of estimates for divergence time and TMRCA, respectively. Histograms of the distribution of estimates of the date of divergence between X and Y and TMRCA of X run under the c) early divergence and d) late divergence scenarios. In panel (d) bars are offset (“dodge”) to illustrate that the distributions for each parameter are indistinguishable.
Metadata for the five samples of Salmonella enterica ssp. enterica Serovar included in the empirical analyses.
| Serovar | Tree ID | Strain | Year | Country | Reads | Project | Assemblies | Source |
|---|---|---|---|---|---|---|---|---|
| Agona | CARC 1952 | WS0243 | 1952 | Ghana | ERS180381 | PRJEB1134 | CARC01000001-203 | Zhou et al. (2013) |
| Agona | CARK 2000 | DBS_20001356 | 2000 | Scotland | ERS180373 | PRJEB1126 | CARK01000001-169 | Zhou et al. (2013) |
| Agona | CART 2008 | MC_08–0610 | 2008 | Ireland | ERS180363 | PRJEB1116 | CART01000001-146 | Zhou et al. (2013) |
| Agona | CATS 2009 | MC_09–0426 | 2009 | Ireland | ERS180376 | PRJEB1129 | CATS01000001-167 | Zhou et al. (2013) |
| Soerenga | S. Soerenga 2003 | 695 | 2003 | USA | SRR652950 | PRJNA78407 | ASM48656v1 | Timme et al. (2013) |
Fig 2Genealogies under the different BEAST analyses where a) only Agona samples were included and analyzed with the best fitting model identified in Zhou et al. [5], b) with the outgroup analyzed with a strict molecular clock (note that the scale is in 103 ybp) and, c) with the outgroup Soerenga and analyzed under best fitting model from the publication and, d) a phylogeny inferred with MrBayes [12].