| Literature DB >> 21705379 |
Joel O Wertheim1, Sergei L Kosakovsky Pond.
Abstract
Statistical methods for molecular dating of viral origins have been used extensively to infer the time of most common recent ancestor for many rapidly evolving pathogens. However, there are a number of cases, in which epidemiological, historical, or genomic evidence suggests much older viral origins than those obtained via molecular dating. We demonstrate how pervasive purifying selection can mask the ancient origins of recently sampled pathogens, in part due to the inability of nucleotide-based substitution models to properly account for complex patterns of spatial and temporal variability in selective pressures. We use codon-based substitution models to infer the length of branches in viral phylogenies; these models produce estimates that are often considerably longer than those obtained with traditional nucleotide-based substitution models. Correcting the apparent underestimation of branch lengths suggests substantially older origins for measles, Ebola, and avian influenza viruses. This work helps to reconcile some of the inconsistencies between molecular dating and other types of evidence concerning the age of viral lineages.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21705379 PMCID: PMC3247791 DOI: 10.1093/molbev/msr170
Source DB: PubMed Journal: Mol Biol Evol ISSN: 0737-4038 Impact factor: 16.240
Mean tMRCA and 95% highest posterior density for the root of viral lineages inferred under different evolutionary models
| Evolutionary model | MeV/RPV/PPRV | EBOV | AIV |
| GTR | 333 (165–528) | 819 (274–1,514) | 333 (268–396) |
| GTR + Γ4 | 667 (333–1,062) | 1,492 (421–2,869) | 1,265 (965–1,618) |
| GTR + Γ4 (third position excluded) | 285 (110–515) | 498 (67–1,118) | 1,493 (938–2,089) |
| SRD06 | 646 (288–1,051) | 1,358 (378–2,592) | 1,103 (855–1,378) |
| Whelan and Goldman + Γ4 | 265 (94–477) | 1,243 (87–2,948) | 942 (619–1,334) |
| GY94 + Γ4 | 698 (353–1,088) | 2,247 (751–4,170) | 1,243 (972–1,528) |
FLengths of single branches simulated under a codon substitution model with variable selection pressures are underestimated when inferred under a GTR + Γ4 nucleotide substitution model. All quantities are means from 10,000 replicates. (A) Ancestral sequence (1,000 codons) was random (i.e., all codons equiprobable) with varied proportion of sites under strong purifying selection (βs=0) and sites evolving neutrally (αs=βs). (B–D) Branch lengths were inferred using GTR + Γ4 on single branches simulated under neutral selection regimes (αs=βs) and empirical site-by-site substitution rate (IFEL) profiles for MeV/RPV/PPRV, EBOV, and AIV. Black crosses represent the degree of bias, defined as the ratio between the true branch length and the one inferred under GTR + Γ4 on IFEL sequences. Horizontal gray lines show saturation asymptotes and diagonal dashed line depicts the behavior of an unbiased estimator.
FBranch lengths inferred under a Dual model provide reliable estimates of branches simulated under variable selection pressures (i.e., empirical IFEL profiles) for MeV/RPV/PPRV, EBOV, and AIV. The diagonal dashed lines depict the behavior of an unbiased estimator.
FLong branches are disproportionately affected by evolutionary models that differ in their treatment of rate variation. Each datapoint represents the length of a single branch of the MeV/RPV/PPRV, EBOV, or AIV phylogeny inferred under GTR + Γ4 and an alternate evolutionary model. The extreme of the y axis represents infinite branch lengths under the Lineage+Dual model for EBOV and AIV. Dashed lines are an x=y reference.
Goodness of fit for various codon models and the effect of model choice on the estimates of the branch lengths of deep and recent lineages
| Taxa | Model | L | AIC | ||||
| Deep | Recent | Deep | Recent | ||||
| MeV/RPV/PPRV | Constant | – 12207.97 | 24909.95 | 1.29 | 0.91 | 0.10 | |
| Proportional | – 12122.06 | 24746.11 | 1.46 | 0.93 | 0.10 | ||
| Nonsynonymous | – 11928.55 | 24359.11 | 1.93 | 0.92 | 0.12 | ||
| Dual | – 11919.35 | 24348.70 | 2.00 | 0.93 | 0.12 | ||
| Lineage+Dual (two rate) | – 11900.7 | 3.61 | 0.92 | 0.04 | 0.14 | ||
| Lineage+Dual (four rate) | – 11900.1 | 24316.20 | 3.95 | 0.92 | 000.03–0.06 | 0.14 | |
| EBOV | Constant | – 6762.76 | 13757.52 | 2.37 | 0.17 | 0.05 | |
| Proportional | – 6720.47 | 13676.94 | 3.00 | 0.17 | 0.04 | ||
| Nonsynonymous | – 6682.31 | 13600.62 | 2.56 | 0.17 | 0.06 | ||
| Dual | – 6679.79 | 13599.57 | 2.68 | 0.17 | 0.06 | ||
| Lineage+Dual (two rate) | – 6638.5 | 13420.99 | 4.84 | 0.17 | 0.03 | 0.18 | |
| Lineage+Dual (eight rate) | – 6631.24 | 69.4 | 0.17 | 0.0005–0.05 | 0.18 | ||
| AIV | Constant | – 44860.13 | 90794.26 | 11.96 | 8.8 | 0.05 | |
| Proportional | – 44483.52 | 90049.05 | 12.80 | 9.04 | 0.05 | ||
| Nonsynonymous | – 43862.45 | 88806.90 | 20.92 | 9.08 | 0.05 | ||
| Dual | – 43730.4 | 88550.9 | 20.98 | 9.2 | 0.05 | ||
| Lineage+Dual (two rate) | – 43711.00 | 2230.9 | 9.2 | 0.0004 | 0.05 | ||
| Lineage+Dual (16 rate) | – 43710.97 | 88541.9 | 2278.23 | 9.2 | 0.0003–0.0004 | 0.05 | |
note.—ALC, Akaike information criterion.
The number of lineages (selected a priori) with their own E[β]/E[α] for the Lineage+Dual models are shown in parentheses.
The best fitting model for each data set is highlighted in boldface.
T shows the cumulative length of branches classified as recent or deep lineages a priori, measured in the expected number of substitutions per nucleotide site.
E[β]/E[α] reports the ratio of means of the expected nonsynonymous to expected synonymous rates (similar to ω for each model). For Lineage+Dual models, the values are stratified by branch class and ranges are reported when appropriate.
FMCC phylogeny for MeV, RPV, and PPRV. Branch lengths were optimized under (A) GTR + Γ4 and (B) Lineage+Dual models. The Lineage+Dual model branch lengths were estimated assuming two different sets of synonymous and nonsynonymous substitution rates: one for short branches and another for long internal branches. Both trees are shown on the same scale.