| Literature DB >> 19325854 |
Vivek Jayaswal1, Lars S Jermiin, John Robinson.
Abstract
The non-homogeneous model of nucleotide substitution proposed by Barry and Hartigan (Stat Sci, 2: 191-210) is the most general model of DNA evolution assuming an independent and identical process at each site. We present a computational solution for this model, and use it to analyse two data sets, each violating one or more of the assumptions of stationarity, homogeneity, and reversibility. The log likelihood values returned by programs based on the F84 model (J Mol Evol, 29: 170-179), the general time reversible model (J Mol Evol, 20: 86-93), and Barry and Hartigan's model are compared to determine the validity of the assumptions made by the first two models. In addition, we present a method for assessing whether sequences have evolved under reversible conditions and discover that this is not so for the two data sets. Finally, we determine the most likely tree under the three models of DNA evolution and compare these with the one favoured by the tests for symmetry.Entities:
Keywords: Maximum Likelihood; Nucleotide Sequence Evolution; Phylogenetics; Reversibility; Tests for Symmetry
Year: 2007 PMID: 19325854 PMCID: PMC2658871
Source DB: PubMed Journal: Evol Bioinform Online ISSN: 1176-9343 Impact factor: 1.625
Probabilities obtained from matched-pairs tests of symmetry, marginal symmetry and internal symmetry using 1st codon sites from the hominoid data
| Ppan | Ptro | Hsap | Ggor | Ppyg | Hlar | ||
|---|---|---|---|---|---|---|---|
| Bowker | 0.206 | ||||||
| Stuart | 0.620 | ||||||
| Ababneh | 0.425 | ||||||
| Bowker | 0.217 | 0.709 | |||||
| Stuart | 0.312 | 0.867 | |||||
| Ababneh | 0.532 | 0.883 | |||||
| Bowker | 0.032 | 0.219 | 0.302 | ||||
| Stuart | 0.024 | 0.227 | 0.243 | ||||
| Ababneh | 0.769 | 0.994 | 0.387 | ||||
| Bowker | 0.440 | 0.579 | 0.614 | 0.139 | |||
| Stuart | 0.092 | 0.095 | 0.239 | 0.078 | |||
| Ababneh | 1.000 | 1.000 | 1.000 | 0.680 | |||
| Bowker | 0.400 | 0.331 | 0.262 | 0.180 | 0.703 | ||
| Stuart | 0.517 | 0.419 | 0.576 | 0.106 | 0.696 | ||
| Ababneh | 0.268 | 0.404 | 0.127 | 0.688 | 0.499 | ||
| Bowker | 0.592 | 0.584 | 0.303 | 0.233 | 0.635 | 0.735 | |
| Stuart | 0.327 | 0.304 | 0.303 | 0.056 | 0.242 | 0.522 | |
| Ababneh | 0.759 | 0.786 | 0.313 | 0.914 | 0.989 | 0.913 | |
Probabilities obtained from matched-pairs tests of symmetry, marginal symmetry and internal symmetry using 2nd codon sites from the hominoid data
| Ppan | Ptro | Hsap | Ggor | Ppyg | Hlar | ||
|---|---|---|---|---|---|---|---|
| Bowker | 0.102 | ||||||
| Stuart | 0.206 | ||||||
| Ababneh | 1.000 | ||||||
| Bowker | 0.197 | 0.352 | |||||
| Stuart | 0.348 | 0.826 | |||||
| Ababneh | 1.000 | 0.754 | |||||
| Bowker | 0.264 | 0.323 | 0.361 | ||||
| Stuart | 0.437 | 0.706 | 0.334 | ||||
| Ababneh | 1.000 | 0.352 | 0.558 | ||||
| Bowker | 0.359 | 0.446 | 0.728 | 0.297 | |||
| Stuart | 0.154 | 0.243 | 0.401 | 0.088 | |||
| Ababneh | 0.720 | 0.653 | 0.879 | 0.867 | |||
| Bowker | 0.157 | 0.444 | 0.126 | 0.331 | 0.165 | ||
| Stuart | 0.297 | 0.721 | 0.638 | 0.513 | 0.177 | ||
| Ababneh | 0.231 | 0.329 | 0.075 | 0.327 | 0.239 | ||
| Bowker | 0.710 | 0.957 | 0.890 | 0.605 | 0.46 | 0.801 | |
| Stuart | 0.881 | 0.996 | 0.940 | 0.940 | 0.494 | 0.948 | |
| Ababneh | 0.378 | 0.690 | 0.592 | 0.248 | 0.351 | 0.440 | |
Probabilities obtained from matched-pairs tests of symmetry, marginal symmetry and internal symmetry using 3rd codon sites from the hominoid data
| Ppan | Ptro | Hsap | Ggor | Ppyg | Hlar | ||
|---|---|---|---|---|---|---|---|
| Bowker | 0.670 | ||||||
| Stuart | 0.357 | ||||||
| Ababneh | 0.846 | ||||||
| Bowker | 0.517 | 0.504 | |||||
| Stuart | 0.511 | 0.452 | |||||
| Ababneh | 0.589 | 0.443 | |||||
| Bowker | 0.257 | 0.767 | 0.171 | ||||
| Stuart | 0.568 | 0.947 | 0.459 | ||||
| Ababneh | 0.349 | 0.398 | 0.092 | ||||
| Bowker | 0.019 | 0.028 | 0.016 | 0.046 | |||
| Stuart | 0.016 | 0.029 | 0.242 | 0.011 | |||
| Ababneh | 0.180 | 0.160 | 0.010 | 0.662 | |||
| Bowker | 0.236 | 0.277 | 0.743 | 0.244 | 0.756 | ||
| Stuart | 0.083 | 0.135 | 0.623 | 0.093 | 0.535 | ||
| Ababneh | 0.715 | 0.584 | 0.627 | 0.678 | 0.748 | ||
| Bowker | 0.372 | 0.528 | 0.383 | 0.158 | 0.035 | 0.445 | |
| Stuart | 0.151 | 0.261 | 0.354 | 0.386 | 0.006 | 0.142 | |
| Ababneh | 0.996 | 0.986 | 0.567 | 0.100 | 0.811 | 0.948 | |
Figure 1The most likely tree of the hominoids inferred using F84 and GTR models (the bar corresponds to 0.01 substitutions per site).
Log Likelihood values for the three most likely trees returned by the BH program
| Tree | Log Likelihood |
|---|---|
| ((((((Ptro,Ppan),Ggor),Hsap),Ppyg),Hlar),Msyl) | −3540.684 |
| ((((((Ptro,Ppan),Hsap),Ggor),Ppyg),Hlar),Msyl) | −3545.508 |
| (((((Ptro,Ppan),(Hsap,Ggor)),Ppyg),Hlar),Msyl) | −3554.946 |
Shimodaira-Hasegawa (SH) Test and Approximately Unbiased (AU) Test
| Tree | SH Test | AU Test |
|---|---|---|
| ((((((Ptro,Ppan),Ggor),Hsap),Ppyg),Hlar),Msyl) | 0.811 | 0.716 |
| ((((((Ptro,Ppan),Hsap),Ggor),Ppyg),Hlar),Msyl) | 0.428 | 0.334 |
| (((((Ptro,Ppan),(Hsap,Ggor)),Ppyg),Hlar),Msyl) | 0.075 | 0.026 |
Comparison of edge lengths obtained using BH and PHYLIP for the hominoid tree ((((((Ptro,Ppan), Ggor), Hsap), Ppyg), Hlar), Msyl). Refer Figure 1 for an explanation of node numbers.
| Edge | Distance using BH | Distance using DNAML | Confidence Interval (DNAML) |
|---|---|---|---|
| Ppyg, Node-2 | 0.058 | 0.061 | 0.046–0.077 |
| Node-2, Node-4 | 0.028 | 0.024 | 0.014–0.035 |
| Node-2, Node-3 | 0.018 | 0.020 | 0.011–0.030 |
| Node-4, Hlar | 0.037 | 0.039 | 0.027–0.053 |
| Node-4, Msyl | 0.108 | 0.109 | 0.088–0.129 |
| Node-3, Hsap | 0.032 | 0.029 | 0.019–0.040 |
| Node-3, 5-Node | 0.009 | 0.009 | 0.003–0.016 |
| Node-5, Ggor | 0.043 | 0.042 | 0.029–0.055 |
| Node-5, Node-6 | 0.010 | 0.009 | 0.003–0.015 |
| Node-6, Ptro | 0.017 | 0.017 | 0.009–0.025 |
| Node-6, Ppan | 0.016 | 0.015 | 0.007–0.022 |
Macaque-Bonobo divergence matrix for the seven taxa hominoid tree ((((((Ptro, Ppan), Ggor), Hsap), Ppyg), Hlar), Msyl) based on (a) observed values and (b) joint probability distribution values
| A | 306 | 11 | 18 | 15 | |
| C | 10 | 279 | 2 | 47 | |
| G | 20 | 4 | 142 | 2 | |
| T | 6 | 40 | 2 | 302 | |
| A | 303.7 | 12.8 | 21.6 | 11.9 | |
| C | 10.2 | 270.8 | 2.1 | 54.8 | |
| G | 21.3 | 7.5 | 138.1 | 1.1 | |
| T | 6.8 | 42.8 | 2.2 | 298.2 | |
Probability values for Bowker’s Test of Symmetry and Stuart’s Test of Marginal Symmetry for all the edges of the most likely hominoid tree ((((((Ptro, Ppan), Ggor), Hsap), Ppyg), Hlar), Msyl). See Figure 1 for an explanation of node numbers
| Edge | Bowker’s Test | Stuart’s Test |
|---|---|---|
| Ppyg, Node-2 | 0.113 | 0.035 |
| Node-2, Node-4 | 0.435 | 0.697 |
| Node-2, Node-3 | 0.241 | 0.282 |
| Node-3, Hsap | 0.000 | 0.000 |
| Node-3, Node-5 | 0.145 | 0.023 |
| Node-5, Ggor | 0.001 | 0.000 |
| Node-5, Node-6 | 0.088 | 0.012 |
| Node-6, Ptro | 0.085 | 0.013 |
| Node-6, Ppan | 0.097 | 0.013 |
| Node-4, Hlar | 0.454 | 0.140 |
| Node-4, Msyl | 0.135 | 0.080 |
Contingency table for the edge linking node 5 to the Gorilla leaf node. Rows correspond to internal node and columns to leaf node
| A | C | G | T | |
|---|---|---|---|---|
| A | 325.0 | 2.0 | 17.7 | 3.0 |
| C | 2.0 | 332.4 | 0.0 | 18.5 |
| G | 2.0 | 0.0 | 156.3 | 0.0 |
| T | 0.0 | 4.6 | 0.0 | 342.5 |
Probabilities obtained from matched-pairs tests of symmetry, marginal symmetry and internal symmetry using all sites from the bacterial data
| Apyr | Bsub | Drad | Tthe | ||
|---|---|---|---|---|---|
| Bowker | 0.000 | ||||
| Stuart | 0.000 | ||||
| Ababneh | 0.295 | ||||
| Bowker | 0.000 | 0.995 | |||
| Stuart | 0.000 | 0.946 | |||
| Ababneh | 0.754 | 0.958 | |||
| Bowker | 0.509 | 0.000 | 0.000 | ||
| Stuart | 0.731 | 0.000 | 0.000 | ||
| Ababneh | 0.263 | 0.544 | 0.863 | ||
| Bowker | 0.132 | 0.000 | 0.000 | 0.415 | |
| Stuart | 0.325 | 0.000 | 0.000 | 0.267 | |
| Ababneh | 0.095 | 0.417 | 0.297 | 0.546 | |
Figure 2The most likely bacterial tree inferred by BH (tree #1). The GC content of the sequences is included (based on Table 14a).
Figure 3The most likely bacterial tree inferred using GTR (and F84) models (tree #2). The GC content of the sequences is included (based on Table 14a).
Comparison of edge lengths obtained using the BH program and DNAML for tree #1. Refer Figure 2 for tree diagram and an explanation of node numbers
| Edge | Distance using BH | Distance using DNAML | Confidence Interval (DNAML) |
|---|---|---|---|
| Bsub, Node-2 | 0.122 | 0.127 | 0.104–0.150 |
| Node-2, Node-3 | 0.040 | 0.039 | 0.024–0.053 |
| Node-3, Tthe | 0.060 | 0.069 | 0.051–0.087 |
| Node-3, Drad | 0.131 | 0.120 | 0.098–0.143 |
| Node-2, Node-4 | 0.036 | 0.043 | 0.027–0.058 |
| Node-4, Tmar | 0.058 | 0.061 | 0.044–0.078 |
| Node-4, Apyr | 0.124 | 0.127 | 0.104–0.150 |
Divergence matrices for tree #1 for (a) Bacillus-Aquifex pair and (b) Bacillus-Deinococcus pair
| (a) Observed and estimated divergence matrix values for | ||||||||||
| A | 0.195 | 0.019 | 0.034 | 0.004 | 0.191 | 0.018 | 0.039 | 0.004 | ||
| C | 0.005 | 0.201 | 0.02 | 0.012 | 0.006 | 0.194 | 0.024 | 0.014 | ||
| G | 0.012 | 0.030 | 0.273 | 0.004 | 0.014 | 0.038 | 0.262 | 0.005 | ||
| T | 0.002 | 0.037 | 0.027 | 0.125 | 0.003 | 0.037 | 0.03 | 0.121 | ||
(b) Observed and estimated divergence matrix values for | ||||||||||
| A | 0.209 | 0.007 | 0.023 | 0.011 | 0.199 | 0.012 | 0.032 | 0.008 | ||
| C | 0.006 | 0.192 | 0.017 | 0.023 | 0.012 | 0.176 | 0.019 | 0.031 | ||
| G | 0.023 | 0.015 | 0.271 | 0.011 | 0.032 | 0.018 | 0.252 | 0.017 | ||
| T | 0.012 | 0.019 | 0.011 | 0.149 | 0.008 | 0.027 | 0.017 | 0.139 | ||
Goodness of Fit index for all pairs of bacteria
| Sequence Pair | Tree #1 | Tree #2 |
|---|---|---|
| Bsub-Tmar | 3.06 | 17.94 |
| Bsub-Apyr | 7.01 | 25.37 |
| Bsub-Tthe | 1.26 | 0.91 |
| Bsub-Drad | 34.92 | 3.41 |
| Tmar-Apyr | 0.52 | 0.43 |
| Tthe-Drad | 2.10 | 13.65 |
| Tmar-Drad | 3.14 | 4.59 |
| Apyr-Drad | 5.99 | 6.85 |
| Tmar-Tthe | 7.90 | 1.42 |
| Apyr-Tthe | 9.06 | 1.77 |
| (a) Marginal probabilities at leaf nodes for bacterial data set.
| ||||
| Tthe | 0.219 | 0.278 | 0.354 | 0.149 |
| Tmar | 0.207 | 0.279 | 0.359 | 0.155 |
| Apyr | 0.214 | 0.287 | 0.354 | 0.145 |
| Drad | 0.250 | 0.233 | 0.321 | 0.195 |
| Bsub | 0.251 | 0.238 | 0.319 | 0.191 |
(b) Marginal probabilities at internal nodes for tree #1. | ||||
| Node-2 | 0.216 | 0.272 | 0.358 | 0.154 |
| Node-3 | 0.218 | 0.269 | 0.357 | 0.156 |
| Node-4 | 0.210 | 0.282 | 0.36 | 0.148 |
(c) Marginal probabilities atinternal nodes for tree #2. | ||||
| Node-2 | 0.214 | 0.275 | 0.360 | 0.151 |
| Node-3 | 0.227 | 0.257 | 0.342 | 0.174 |
| Node-4 | 0.212 | 0.281 | 0.361 | 0.146 |
Probability values for Bowker’s Test of Symmetry and Stuart’s Test of Homogeneity for all the edges of the bacterial tree
| (a) Tree #1. Refer | ||||
| Bsub, Node-2 | 0.000 | 0.000 | ||
| Node-2, Node-3 | 0.427 | 0.835 | ||
| Node-2, Node-4 | 0.002 | 0.005 | ||
| Node-3, Tthe | 0.567 | 0.390 | ||
| Node-3, Drad | 0.000 | 0.000 | ||
| Node-4, Tmar | 0.646 | 0.359 | ||
| Node-4, Apyr | 0.135 | 0.742 | ||
(b) Tree #2. Refer | ||||
| Tthe, Node-2 | 0.568 | 0.607 | ||
| Node-2, Node-3 | 0.000 | 0.000 | ||
| Node-2, Node-4 | 0.167 | 0.264 | ||
| Node-3, Drad | 0.000 | 0.000 | ||
| Node-3, Bsub | 0.000 | 0.000 | ||
| Node-4, Tmar | 0.315 | 0.092 | ||
| Node-4, Apyr | 0.130 | 0.843 | ||
Contingency table for the edge linking node 3 to the Deinococcus leaf node. Rows correspond to internal node and columns to leaf node.
| a | A | C | G | T |
|---|---|---|---|---|
| 260.2 | 12.8 | 37.9 | 0.0 | |
| 2.1 | 280.7 | 6.2 | 6.0 | |
| 4.7 | 9.7 | 378.0 | 2.6 | |
| 0.5 | 32.9 | 21.2 | 182.3 |
Comparison of edge lengths obtained using the BH program and DNAML for tree #2. Refer Figure 3 for tree diagram and an explanation of node numbers
| Edge | Distance using BH | Distance using DNAML | Confidence Interval (DNAML) |
|---|---|---|---|
| Tthe, Node-2 | 0.064 | 0.068 | 0.050–0.086 |
| Node-2, Node-3 | 0.050 | 0.050 | 0.033–0.066 |
| Node-3, Bsub | 0.106 | 0.105 | 0.083–0.126 |
| Node-3, Drad | 0.110 | 0.108 | 0.087–0.130 |
| Node-2, Node-4 | 0.046 | 0.047 | 0.031–0.063 |
| Node-4, Tmar | 0.059 | 0.063 | 0.046–0.079 |
| Node-4, Apyr | 0.122 | 0.122 | 0.099–0.145 |