| Literature DB >> 17974035 |
Yan Zhou1, Nicolas Rodrigue, Nicolas Lartillot, Hervé Philippe.
Abstract
BACKGROUND: The evolutionary rate at a given homologous position varies across time. When sufficiently pronounced, this phenomenon - called heterotachy - may produce artefactual phylogenetic reconstructions under the commonly used models of sequence evolution. These observations have motivated the development of models that explicitly recognize heterotachy, with research directions proposed along two main axes: 1) the covarion approach, where sites switch from variable to invariable states; and 2) the mixture of branch lengths (MBL) approach, where alignment patterns are assumed to arise from one of several sets of branch lengths, under a given phylogeny.Entities:
Mesh:
Year: 2007 PMID: 17974035 PMCID: PMC2248194 DOI: 10.1186/1471-2148-7-206
Source DB: PubMed Journal: BMC Evol Biol ISSN: 1471-2148 Impact factor: 3.260
Figure 1Topology used for computer simulations. The tree under the newick format is: ((((A:0.375, B:0.3):0.25, C:1):0.08, D:0.32):0.8,((E:0.42, F:0.31):0.24,(G:0.27,(H:0.2,(I:0.5, J:0.5):0.25):0.12):0.25):0.26). Scale bar indicates the expected number of changes per site.
Inferred values of α, the parameter of the discrete gamma distribution of the rates across sites, inferred weight of one of the two components (w) and Pearson correlation (r) of the inferred tree branch lengths with the true ones of their respective component, for sequences simulated with various values for τ and α.
| 0.51/0.028/n.a. | 0.52/0.42/0.976 | 0.49/0.46/0.993 | 0.52/0.50/0.998 | |
| 1.06/0.033/n.a. | 1.04/0.43/0.993 | 1.00/0.47/0.993 | 1.02/0.49/0.998 | |
| 1.51/0.07/n.a. | 1.56/0.50/0.993 | 1.56/0.48/0.997 | 1.46/0.49/0.998 | |
| 2.01/0.005/n.a. | 2.04/0.41/0.979 | 1.89/0.49/0.999 | 1.99/0.50/0.998 |
Note that the correlation between the true branch lengths of the two components are 0.86, 0.52, 0.19 and -0.16 with τ = 0.2, 0.4, 0.6 and 0.8, respectively. Two components were used for the inference. When τ = 0.2, the partition identity cannot be recovered, so the branch lengths cannot be compared with the true ones.
Optimal numbers of components determined by AIC, BIC or cross-validation (CV) on the simulated data with different levels of heterotachy (τ) and with different rate across sites heterogeneity (α).
| AIC/BIC/CV | ||||
| 1/1/1 | 2/1/2 | 2/2/2 | 2/2/2 | |
| 1/1/1 | 2/1/2 | 3/2/2 | 2/2/2 | |
| 2/1/1 | 2/2/2 | 2/2/2 | 2/2/2 | |
| 1/1/1 | 2/2/2 | 3/2/2 | 2/2/2 |
Cross-validation for the simulated datasets (α = 0.5)
| One component (homotachy) | Two-component | Three-component | Four-component | Covarion | |
| 0 | 10.5 ± 5.5 | 18.6 ± 7.9 | 20.6 ± 10.9 | 0.8 ± 2.4 | |
| 2.0 ± 8.7 | 0 | 4.7 ± 9.4 | 14.7 ± 8.7 | 2.0 ± 8.6 | |
| 84.5 ± 12.4 | 0 | 10.0 ± 7.2 | 21.9 ± 10.1 | 85.2 ± 12.9 | |
| 359.5 ± 30.0 | 0 | 8.1 ± 6.5 | 15.9 ± 9.3 | 359.6 ± 29.4 | |
| 0 | 9.6 ± 4.3 | 18.5 ± 9.1 | 23.8 ± 9.0 | 0.6 ± 1.9 | |
| 13.0 ± 5.9 | 0 | 10.6 ± 4.4 | 17.3 ± 8.1 | 14.6 ± 5.3 | |
| 101.4 ± 8.6 | 0 | 11.0 ± 6.0 | 18.1 ± 9.2 | 101.7 ± 8.4 | |
| 472.0 ± 13.9 | 0 | 10.2 ± 5.5 | 13.6 ± 5.6 | 453.4 ± 14.0 | |
| 0 | 11.7 ± 6.3 | 7.4 ± 4.4 | 18.4 ± 12.1 | 0.7 ± 1.8 | |
| 36.6 ± 5.9 | 0 | 12.1 ± 7.1 | 18.9 ± 9.2 | 34.9 ± 5.4 | |
| 136.7 ± 12.8 | 0 | 7.7 ± 6.3 | 15.9 ± 9.6 | 135.3 ± 12.7 | |
| 505.6 ± 23.8 | 0 | 10.8 ± 7.6 | 19.1 ± 8.8 | 490.9 ± 24.5 | |
| 0 | 11.2 ± 5.3 | 17.7 ± 10.4 | 26.1 ± 9.9 | 1.7 ± 2.4 | |
| 37.5 ± 17.5 | 0 | 9.3 ± 11.6 | 18.6 ± 15.7 | 39.2 ± 18.5 | |
| 173.9 ± 12.6 | 0 | 10.6 ± 4.6 | 12.4 ± 5.3 | 169.5 ± 12.0 | |
| 596.1 ± 22.2 | 0 | 8.0 ± 1.5 | 15.1 ± 6.9 | 588.0 ± 23.0 | |
The mean (± SD) of the difference between the CV log likelihood of the current model and the model with the highest CV log likelihood is given. Five random runs were performed for this two-fold CV.
Comparison of the covarion model and MBL models with different number of components for three real datasets
| -LnL | AIC | BIC | CV | |
| Animal dataset (5,000 sites and 20 species) | ||||
| one-component | 86468.5 | 86506.5 | 86630.3 | 82.1 ± 7.9 |
| two-component | 86302.7 | 86378.7 | 86626.4 | 37.8 ± 13.5 |
| three-component | 86222.7 | 86336.7 | 86708.2 | 47.9 ± 10.7 |
| four-component | 86167.6 | 86319.6 | 86814.9 | 69.0 ± 17.2 |
| five-component | 86126.8 | 86316.8 | 86936.0 | 82.2 ± 21.2 |
| Six-component | 86087.1 | 87058.1 | NC | |
| covarion | 86300.7 | 86340.7 | ||
| plastid dataset (3,754 sites and 22 species) | ||||
| one-component | 78225.2 | 78267.2 | 78398.0 | 75.3 ± 8.8 |
| two-component | 78056.4 | 78140.4 | 78402.1 | 34.2 ± 24.5 |
| three-component | 77996.7 | 78122.7 | 78515.2 | 49.8 ± 15.6 |
| four-component | 77925.8 | 78617.2 | 60.3 ± 21.0 | |
| five-component | 77926.2 | 78136.2 | 78790.4 | 72.4 ± 22.0 |
| six-component | 77900.4 | 78152.4 | 78937.5 | NC |
| covarion | 78070.9 | 78114.9 | ||
| mitochondrial mammal dataset (3,591 sites and 17 species) | ||||
| one-component | 44285.9 | 44317.9 | 44416.9 | 45.9 ± 3.7 |
| two-component | 44154.8 | 44218.8 | 44416.8 | 16.6 ± 7.5 |
| three-component | 44127.6 | 44223.6 | 44520.5 | 34.2 ± 12.3 |
| four-component | 44081.2 | 44605.1 | 38.2 ± 15.4 | |
| five-component | 44071.9 | 44231.9 | 44726.8 | NC |
| six-component | 44072.3 | 44264.3 | 44858.2 | NC |
| covarion | 44187.1 | 44222.1 | ||
For CV, standard deviation can be easily computed and is thus indicated.
Figure 2Branch lengths for the two partitions in the case of the mitochondrial alignment of mammals (3591 sites, 17 species). The shape parameter of the Γ distribution was estimated to be 0.4. The weights are 0.40 for component I (B) and 0.60 for component II (A).
Figure 3Whiskers plot for the average posterior probabilities of component I for the two-component MBL model on the mitochondrial mammal dataset. A Kruskal-Wallis non-parametric test shows the means of posterior probabilities for genes are significantly different (p < 0.0001)
Contingency table for the mitochondrial alignment
| Cox+Cytb | Other genes | |
| Component 1 | 142/278 | 583/447 |
| Component 2 | 1237/1101 | 1629/1765 |
Observed/expected numbers of positions are indicated.
Figure 4Comparison of branch lengths from the two partitions for the nuclear (A), plastid (B) and mitochondrial (C) alignments. R = 0.63, 0.63 and 0.57 respectively.