| Literature DB >> 24256155 |
Abstract
BACKGROUND: Nucleotide and amino acid substitution tendencies are characteristic of each species, organelle, and protein family. Hence, various empirical amino acid substitution rate matrices have needed to be estimated for phylogenetic analysis: JTT, WAG, and LG for nuclear proteins, mtREV for mitochondrial proteins, cpREV10 and cpREV64 for chloroplast-encoded proteins, and FLU for influenza proteins. On the other hand, in a mechanistic codon substitution model, in which each codon substitution rate is proportional to the product of a codon mutation rate and the ratio of fixation depending on the type of amino acid replacement, mutation rates and the strength of selective constraint on amino acids can be tailored to each protein family with additional 11 parameters. As a result, in the evolutionary analysis of codon sequences it outperforms codon substitution models equivalent to empirical amino acid substitution matrices. Is it superior even for amino acid sequences, among which synonymous substitutions cannot be identified?Entities:
Mesh:
Substances:
Year: 2013 PMID: 24256155 PMCID: PMC4225520 DOI: 10.1186/1471-2148-13-257
Source DB: PubMed Journal: BMC Evol Biol ISSN: 1471-2148 Impact factor: 3.260
Brief description of models: Amino acid substitution models
| mtREV-dG | The empirical amino acid rates of mtREV
[ |
| JTT-F-dG | The empirical amino acid exchangeabilities of JTT
[ |
The suffix "-dGmr" means that the variation of substitution rate is approximated by a discrete gamma distribution [38] with m categories of unequal probabilities; see Additional file 1 for details.
Brief description of models: Mechanistic codon substitution models
| Equal-Constraint- | Equal constraint irrespective of amino acid substitution type is assumed; |
| EI- | |
| JTT-ML91+- | Selective constraints
|
| KHG-ML200- | Selective constraints
|
The suffix "n" means the number of parameters optimized for the substitution rate matrix. The suffix "-F" means that equilibrium codon frequencies are assumed to be equal to codon frequencies in codon sequences; equal codon usage is assumed for amino acid sequences. The suffix "-dGm(r|s|sf)" denotes "-dGmr", "-dGms" or "-dGmsf". The suffixes "-dGmr" and "-dGms" mean the variation of mutation rate or selective constraint across sites, respectively, which is approximated by a discrete gamma distribution [38] with m categories of unequal probabilities; see Additional file 1 for details. The "f" following "-dGms" means that the posterior frequencies of amino acids in each category in the first run are used in the second run as the equilibrium frequencies for each category; see the Methods section.
Comparisons between various amino acid and codon substitution models for the reference phylogenetic tree of the mammalian-mtProt
|
| ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Amino acid substitution models | ||||||||||
| mtREV-dG4r | 1 | -96.5 | 154.9 | 37.2 | | | | | | 0.471 |
| cpREV64-F-dG4r | 20 | -3733.4 | 7466.9 | 7466.9 | | | | | | 0.426 |
| WAG-F-dG4r | 20 | -2667.4 | 5334.7 | 5334.7 | | | | | | 0.443 |
| LG-F-dG4r | 20 | -2617.5 | 5235.1 | 5235.1 | | | | | | 0.438 |
| cpREV10-F-dG4r | 20 | -2316.2 | 4632.4 | 4632.4 | | | | | | 0.445 |
| FLU-F-dG4r | 20 | -2249.4 | 4498.8 | 4498.8 | | | | | | 0.433 |
| JTT-F-dG4r | 20 | -1255.8 | 2511.6 | 2511.6 | | | | | | 0.436 |
| mtREV-F-dG4r | 20 | 0.0 | 0.0 | 0.0 | | | | | | 0.469 |
| Mechanistic codon substitution models | ||||||||||
| Equal-Constraint-10-F-dG4r | 30 | -3356.4 | 6732.7 | 6794.6 | (0.0) | -0.000 | 1.000 | 0.338 | 2.887 | 0.407 |
| EI-11-F-dG4r | 31 | -1663.4 | 3348.8 | 3417.0 | 0.463 | 0.012 | 0.276 | 0.369 | 4.061 | 0.424 |
| WAG-ML91+-11-F-dG4r | 31 | 356.4 | -690.9 | -622.8 | 1.140 | 0.017 | 0.122 | 0.336 | 3.978 | 0.427 |
| LG-ML91+-11-F-dG4r | 31 | 621.5 | -1221.1 | -1152.9 | 0.962 | 0.585 | 0.194 | 0.269 | 4.029 | 0.418 |
| KHG-ML200-11-F-dG4r | 31 | 701.5 | -1380.9 | -1312.8 | 1.321 | 0.944 | 0.223 | 0.196 | 1.939 | 0.415 |
| JTT-ML91+-11-F-dG4r | 31 | 712.6 | -1403.2 | -1335.1 | 1.354 | 0.539 | 0.137 | 0.348 | 2.417 | 0.421 |
| JTT-ML91+-11-F-dG8r | 31 | 1328.0 | -2634.0 | -2565.8 | 1.363 | 0.483 | 0.129 | 0.304 | 2.480 | 0.302 |
| Equal-Constraint-10-F-dG4s | 30 | -3346.1 | 6712.1 | 6774.1 | (0.0) | -0.000 | 1.000 | 0.300 | 2.950 | 0.396 |
| EI-11-F-dG4s | 31 | -1164.7 | 2351.4 | 2419.5 | 0.553 | -0.511 | 0.136 | 0.344 | 3.772 | 0.288 |
| WAG-ML91+-11-F-dG4s | 31 | 509.8 | -997.6 | -929.4 | 1.355 | 0.147 | 0.106 | 0.403 | 3.534 | 0.418 |
| KHG-ML200-11-F-dG4s | 31 | 511.1 | -1000.2 | -932.1 | 1.259 | 0.069 | 0.115 | 0.192 | 2.044 | 0.485 |
| LG-ML91+-11-F-dG4s | 31 | 637.6 | -1253.2 | -1185.1 | 0.994 | -0.108 | 0.097 | 0.268 | 3.897 | 0.436 |
| JTT-ML91+-11-F-dG4s | 31 | 909.2 | -1796.5 | -1728.3 | 1.587 | 0.425 | 0.094 | 0.398 | 2.190 | 0.452 |
| JTT-ML91+-11-F-dG8s | 31 | 1712.7 | -3403.4 | -3335.2 | 1.739 | 0.409 | 0.078 | 0.348 | 2.250 | 0.328 |
| Equal-Constraint-10-F-dG4sf | 87 | -1878.8 | 3891.7 | 4306.7 | (0.0) | -0.000 | 1.000 | 0.283 | 2.967 | 0.390 |
| EI-11-F-dG4sf | 88 | 444.0 | -752.0 | -330.8 | 0.541 | -0.678 | 0.117 | 0.310 | 3.914 | 0.265 |
| JTT-ML91+-11-F-dG4sf | 88 | 1226.6 | -2317.2 | -1896.0 | 1.495 | 0.358 | 0.098 | 0.373 | 2.350 | 0.442 |
| WAG-ML91+-11-F-dG4sf | 88 | 1290.2 | -2444.5 | -2023.3 | 1.339 | 0.220 | 0.116 | 0.375 | 3.544 | 0.390 |
| KHG-ML200-11-F-dG4sf | 88 | 1328.4 | -2520.8 | -2099.6 | 1.406 | 0.986 | 0.208 | 0.181 | 2.062 | 0.574 |
| LG-ML91+-11-F-dG4sf | 88 | 1360.7 | -2585.4 | -2164.2 | 0.992 | 0.122 | 0.123 | 0.278 | 3.769 | 0.416 |
a"-F" means that the equilibrium frequencies are estimated to be equal to those in the alignment; equal codon usage is assumed. "-dGmr" and "-dGms" mean discrete gamma distributions with m categories of unequal probabilities for the rate variation and the variation of selective constraint across sites, respectively. "-dGmsf" means the equilibrium frequencies for respective categories are estimated from their posterior probabilities for sites. The number string in the model name indicates the number of parameters optimized for the substitution rate matrix, and the remaining strings denote a rate matrix or a selective constraint matrix used.
bThe number of adjustable parameters.
cDifference from the reference state; Δℓ = ℓ + 122106.2, ΔAIC = AIC - 244252.3, and ΔBIC = BIC - 244376.2. The reference tree topology is Tree-6 in [34].
d; is the one specified by the model name.
eThe value parenthesized means that the parameter is fixed at the value specified.
fThe average of over all amino acid pairs {a,b}; .
gThe ratio of double to single and of triple to double nucleotide change exchangeability; .
hThe ratio of mean transitional to mean transversional exchangeability; .
iThe shape parameter of a discrete gamma distribution for the variation of mutation rate or selective constraint across sites.
Comparisons between various amino acid and codon substitution models for the reference phylogenetic tree of the cpProt-55
|
| ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Amino acid substitution models | ||||||||||
| cpREV64-dG4r | 1 | 0.0 | 0.0 | 0.0 | | | | | | 0.292 |
| LG-F-dG4r | 20 | -9935.0 | 19908.0 | 20051.6 | | | | | | 0.339 |
| mtREV-F-dG4r | 20 | -7875.1 | 15788.1 | 15931.7 | | | | | | 0.259 |
| WAG-F-dG4r | 20 | -7649.6 | 15337.2 | 15480.7 | | | | | | 0.348 |
| FLU-F-dG4r | 20 | -5732.4 | 11502.7 | 11646.3 | | | | | | 0.269 |
| cpREV10-F-dG4r | 20 | -5649.9 | 11337.8 | 11481.3 | | | | | | 0.349 |
| JTT-F-dG4r | 20 | -4671.9 | 9381.8 | 9525.3 | | | | | | 0.347 |
| cpREV64-F-dG4r | 20 | -803.8 | 1645.6 | 1789.2 | | | | | | 0.345 |
| Mechanistic codon substitution models | ||||||||||
| Equal-Constraint-10-F-dG4r | 30 | 332.4 | -606.8 | -387.7 | (0.0) | -0.000 | 1.000 | 0.109 | 2.556 | 0.287 |
| EI-11-F-dG4r | 31 | 565.0 | -1070.0 | -843.3 | 0.101 | 0.101 | 0.782 | 0.119 | 2.686 | 0.285 |
| KHG-ML200-11-F-dG4r | 31 | 1150.9 | -2241.8 | -2015.1 | 0.386 | 0.139 | 0.491 | 0.102 | 2.249 | 0.287 |
| WAG-ML91+-11-F-dG4r | 31 | 1164.8 | -2269.5 | -2042.9 | 0.334 | 0.065 | 0.475 | 0.161 | 2.648 | 0.286 |
| LG-ML91+-11-F-dG4r | 31 | 1179.4 | -2298.7 | -2072.0 | 0.271 | 0.165 | 0.548 | 0.139 | 2.666 | 0.286 |
| JTT-ML91+-11-F-dG4r | 31 | 1426.3 | -2792.6 | -2565.9 | 0.430 | 0.132 | 0.421 | 0.187 | 2.234 | 0.287 |
| JTT-ML91+-11-F-dG8r | 31 | 1666.2 | -3272.3 | -3045.6 | 0.435 | 0.134 | 0.418 | 0.182 | 2.237 | 0.295 |
| Equal-Constraint-10-F-dG4s | 30 | 346.8 | -635.6 | -416.4 | (0.0) | -0.233 | 0.793 | 0.113 | 2.549 | 0.286 |
| EI-11-F-dG4s | 31 | 962.6 | -1865.2 | -1638.5 | 0.264 | -0.255 | 0.341 | 0.135 | 2.727 | 0.262 |
| KHG-ML200-11-F-dG4s | 31 | 1472.2 | -2884.3 | -2657.7 | 0.434 | -0.672 | 0.199 | 0.101 | 2.326 | 0.284 |
| WAG-ML91+-11-F-dG4s | 31 | 1632.9 | -3205.8 | -2979.1 | 0.607 | -0.344 | 0.189 | 0.167 | 2.633 | 0.258 |
| LG-ML91+-11-F-dG4s | 31 | 1742.9 | -3425.8 | -3199.1 | 0.544 | 0.005 | 0.248 | 0.148 | 2.630 | 0.276 |
| JTT-ML91+-11-F-dG4s | 31 | 1886.9 | -3713.7 | -3487.1 | 0.788 | 0.221 | 0.235 | 0.191 | 2.198 | 0.253 |
| JTT-ML91+-11-F-dG8s | 31 | 2176.2 | -4292.4 | -4065.7 | 0.854 | 0.257 | 0.218 | 0.200 | 2.170 | 0.275 |
| Equal-Constraint-10-F-dG4sf | 87 | 1224.3 | -2276.5 | -1626.7 | (0.0) | -0.174 | 0.840 | 0.115 | 2.537 | 0.276 |
| EI-11-F-dG4sf | 88 | 1920.6 | -3667.2 | -3009.8 | 0.279 | -0.231 | 0.335 | 0.135 | 2.665 | 0.251 |
| KHG-ML200-11-F-dG4sf | 88 | 2105.0 | -4036.1 | -3378.7 | 0.455 | -0.626 | 0.200 | 0.102 | 2.296 | 0.286 |
| WAG-ML91+-11-F-dG4sf | 88 | 2320.8 | -4467.5 | -3810.1 | 0.633 | 0.060 | 0.270 | 0.165 | 2.528 | 0.249 |
| LG-ML91+-11-F-dG4sf | 88 | 2369.0 | -4564.0 | -3906.7 | 0.523 | -0.007 | 0.256 | 0.147 | 2.557 | 0.269 |
| JTT-ML91+-11-F-dG4sf | 88 | 2542.1 | -4910.2 | -4252.8 | 0.787 | 0.308 | 0.255 | 0.188 | 2.168 | 0.249 |
a"-F" means that the equilibrium frequencies are estimated to be equal to those in the alignment; equal codon usage is assumed. "-dGmr" and "-dGms" mean discrete gamma distributions with m categories of unequal probabilities for the rate variation and the variation of selective constraint across sites, respectively. "-dGmsf" means the equilibrium frequencies for respective categories are estimated from their posterior probabilities for sites. The number string in the model name indicates the number of parameters optimized for the substitution rate matrix, and the remaining strings denote a rate matrix or a selective constraint matrix used.
bThe number of adjustable parameters.
cDifference from the reference state; Δℓ = ℓ+ 217554.4, ΔAIC=AIC- 435110.9, and ΔBIC=BIC- 435118.5. The reference tree topology is the one reported in [35].
d; is the one specified by the model name.
eThe value parenthesized means that the parameter is fixed at the value specified.
fThe average of over all amino acid pairs {a,b};.
gThe ratio of double to single and of triple to double nucleotide change exchangeability; .
hThe ratio of mean transitional to mean transversional exchangeability; .
iThe shape parameter of a discrete gamma distribution for the variation of mutation rate or selective constraint across sites.
Comparisons between various amino acid and codon substitution models for the reference phylogenetic tree of the HA_Human-Flu-A-H1N1
| Amino acid substitution models | ||||||||||
| FLU-dG4r | 1 | 0.0 | 0.0 | 0.0 | | | | | | 0.913 |
| mtREV-F-dG4r | 20 | -985.5 | 2009.0 | 2085.2 | | | | | | 0.809 |
| LG-F-dG4r | 20 | -885.1 | 1808.3 | 1884.5 | | | | | | 0.856 |
| WAG-F-dG4r | 20 | -777.1 | 1592.2 | 1668.4 | | | | | | 0.882 |
| cpREV10-F-dG4r | 20 | -695.8 | 1429.5 | 1505.8 | | | | | | 0.858 |
| JTT-F-dG4r | 20 | -386.3 | 810.5 | 886.7 | | | | | | 0.892 |
| cpREV64-F-dG4r | 20 | -167.9 | 373.8 | 450.0 | | | | | | 0.840 |
| FLU-F-dG4r | 20 | 8.1 | 21.7 | 98.0 | | | | | | 0.907 |
| Mechanistic codon substitution models | ||||||||||
| Equal-Constraint-10-F-dG4r | 30 | 203.4 | -348.7 | -232.4 | (0.0) | -1.109 | 0.330 | 0.010 | 4.768 | 0.828 |
| EI-11-F-dG4r | 31 | 332.7 | -605.4 | -485.1 | 0.311 | -0.609 | 0.212 | 0.013 | 4.835 | 0.880 |
| LG-ML91+-11-F-dG4r | 31 | 394.6 | -729.2 | -608.8 | 0.453 | -0.690 | 0.151 | 0.014 | 4.792 | 0.920 |
| WAG-ML91+-11-F-dG4r | 31 | 405.2 | -750.4 | -630.1 | 0.565 | -0.679 | 0.145 | 0.018 | 4.825 | 0.940 |
| KHG-ML200-11-F-dG4r | 31 | 410.0 | -760.0 | -639.7 | 0.676 | -0.214 | 0.202 | 0.009 | 3.287 | 0.923 |
| JTT-ML91+-11-F-dG4r | 31 | 418.3 | -776.6 | -656.2 | 0.636 | -0.425 | 0.162 | 0.027 | 3.725 | 0.923 |
| JTT-ML91+-11-F-dG8r | 31 | 441.2 | -822.3 | -702.0 | 0.641 | -0.446 | 0.157 | 0.026 | 3.745 | 0.923 |
| Equal-Constraint-10-F-dG4s | 30 | 206.3 | -354.7 | -238.4 | (0.0) | -1.434 | 0.238 | 0.010 | 4.754 | 0.823 |
| EI-11-F-dG4s | 31 | 328.5 | -596.9 | -476.6 | 0.332 | -0.495 | 0.225 | 0.015 | 4.741 | 0.887 |
| LG-ML91+-11-F-dG4s | 31 | 397.6 | -735.2 | -614.9 | 0.454 | -0.962 | 0.115 | 0.014 | 4.780 | 0.903 |
| KHG-ML200-11-F-dG4s | 31 | 412.5 | -765.1 | -644.7 | 0.676 | -0.662 | 0.129 | 0.009 | 3.300 | 0.923 |
| WAG-ML91+-11-F-dG4s | 31 | 415.0 | -770.0 | -649.6 | 0.627 | -0.303 | 0.190 | 0.021 | 4.620 | 0.890 |
| JTT-ML91+-11-F-dG4s | 31 | 421.1 | -782.2 | -661.9 | 0.635 | -0.761 | 0.116 | 0.027 | 3.722 | 0.918 |
| JTT-ML91+-11-F-dG8s | 31 | 457.7 | -855.4 | -735.1 | 0.731 | -0.317 | 0.152 | 0.029 | 3.630 | 0.911 |
| Equal-Constraint-10-F-dG4sf | 87 | 297.2 | -422.3 | -77.4 | (0.0) | -1.549 | 0.212 | 0.010 | 4.603 | 0.716 |
| EI-11-F-dG4sf | 88 | 405.8 | -637.7 | -288.7 | 0.313 | -0.526 | 0.229 | 0.014 | 4.366 | 0.856 |
| KHG-ML200-11-F-dG4sf | 88 | 428.1 | -682.2 | -333.2 | 0.565 | -0.674 | 0.155 | 0.010 | 3.397 | 0.920 |
| LG-ML91+-11-F-dG4sf | 88 | 439.7 | -705.5 | -356.5 | 0.369 | -1.050 | 0.128 | 0.016 | 4.575 | 0.885 |
| WAG-ML91+-11-F-dG4sf | 88 | 443.3 | -712.6 | -363.7 | 0.658 | -0.012 | 0.241 | 0.023 | 4.446 | 0.864 |
| JTT-ML91+-11-F-dG4sf | 88 | 447.8 | -721.6 | -372.7 | 0.686 | -0.200 | 0.185 | 0.032 | 3.520 | 0.871 |
b"-F" means that the equilibrium frequencies are estimated to be equal to those in the alignment; equal codon usage is assumed. "-dGmr" and "-dGms" mean discrete gamma distributions withmcategories of unequal probabilities for the rate variation and the variation of selective constraint across sites, respectively. "-dGmsf" means the equilibrium frequencies for respective categories are estimated from their posterior probabilities for sites. The number string in the model name indicates the number of parameters optimized for the substitution rate matrix, and the remaining strings denote a rate matrix or a selective constraint matrix used.
bThe number of adjustable parameters.
cDifference from the reference state; Δℓ = ℓ+20059.7, ΔAIC=AIC-40121.5, and ΔBIC = BIC-40125.5. The reference tree topology is one inferred by FastTree-2 [36].
d; is the one specified by the model name.
eThe value parenthesized means that the parameter is fixed at the value specified.
fThe average of over all amino acid pairs {a,b}; .
gThe ratio of double to single and of triple to double nucleotide change exchangeability; .
hThe ratio of mean transitional to mean transversional exchangeability; .
iThe shape parameter of a discrete gamma distribution for the variation of mutation rate or selective constraint across sites.