| Literature DB >> 24958740 |
Maryam Zaheri1, Linda Dib1, Nicolas Salamin2.
Abstract
Models of codon evolution have attracted particular interest because of their unique capabilities to detect selection forces and their high fit when applied to sequence evolution. We described here a novel approach for modeling codon evolution, which is based on Kronecker product of matrices. The 61 × 61 codon substitution rate matrix is created using Kronecker product of three 4 × 4 nucleotide substitution matrices, the equilibrium frequency of codons, and the selection rate parameter. The entities of the nucleotide substitution matrices and selection rate are considered as parameters of the model, which are optimized by maximum likelihood. Our fully mechanistic model allows the instantaneous substitution matrix between codons to be fully estimated with only 19 parameters instead of 3,721, by using the biological interdependence existing between positions within codons. We illustrate the properties of our models using computer simulations and assessed its relevance by comparing the AICc measures of our model and other models of codon evolution on simulations and a large range of empirical data sets. We show that our model fits most biological data better compared with the current codon models. Furthermore, the parameters in our model can be interpreted in a similar way as the exchangeability rates found in empirical codon models.Entities:
Keywords: Kronecker product; Markov model; codon models; multiple substitutions; phylogenetics; positive selection
Mesh:
Substances:
Year: 2014 PMID: 24958740 PMCID: PMC4137716 DOI: 10.1093/molbev/msu196
Source DB: PubMed Journal: Mol Biol Evol ISSN: 0737-4038 Impact factor: 16.240
Different Variants of the KCM Model.
| Model Description | Number of Parameters | Fixed Parameter |
|---|---|---|
| 7 | ||
| 19 | ||
| 7 | ||
| 19 | ||
| KCM19 | 18 |
Note.—The symbol q refers to the type of substitution matrix at each nucleotide position.
Mean Delta AICc (standard deviation) Over the 50 Replicates for the Different Simulations Performed.
| Simulations | Model | Mean Delta AICc | ||||
|---|---|---|---|---|---|---|
| Factor = 0.1 | Factor = 0.5 | Factor = 1.0 | Factor = 2.0 | Factor = 10.0 | ||
| A | −20.84(6.69) | −20.39(6.38) | −18.71(7.50) | −20.23(6.42) | −17.98(7.84) | |
| −20.36(6.33) | −21.77(5.94) | −20.33(6.94) | −21.55(6.05) | −18.66(7.96) | ||
| −5.77(3.03) | −4.14(4.55) | −4.16(4.81) | −4.35(4.51) | −5.18(3.52) | ||
| −5.82(3.20) | −5.37(3.57) | −5.70(3.79) | −5.75(3.85) | −6.14(3.32) | ||
| B | 3.78(18.35) | 40.60(23.19) | 44.49(21.63) | 55.23(26.96) | 65.80(29.51) | |
| −22.04(4.84) | −20.22(4.73) | −19.81(5.42) | −19.46(5.70) | −19.97(6.11) | ||
| 18.63(17.77) | 55.87(22.88) | 59.94(21.80) | 70.63(27.70) | 79.54(28.93) | ||
| −8.00(1.74) | −5.71(3.07) | −4.61(3.31) | −4.23(3.26) | −5.43(3.04) | ||
| C | 134.83(35.01) | 210.45(38.49) | 243.46(37.59) | 260.84(42.86) | 302.18(45.76) | |
| −14.50(7.17) | −20.06(4.60) | −20.41(5.54) | −17.94(7.72) | −21.78(6.15) | ||
| 146.46(34.20) | 226.87(38.12) | 259.05(37.04) | 273.08(43.08) | 316.93(44.27) | ||
| −7.32(2.98) | −6.04(2.68) | −5.87(3.72) | −6.13(2.84) | −5.64(2.99) | ||
| D | 213.24(52.33) | 320.78(57.01) | 343.11(46.81) | 407.34(55.21) | 449.84(53.37) | |
| −11.48(32.64) | −14.77(27.79) | −18.55(6.76) | −20.07(7.67) | −21.28(5.73) | ||
| 227.22(53.05) | 336.20(57.97) | 357.53(46.39) | 421.73(55.51) | 466.92(52.47) | ||
| −2.30(30.35) | −2.35(26.98) | −6.80(2.48) | −7.04(2.73) | −5.65(3.76) | ||
| ECM | 93.45(17.80) | 164.37(31.97) | 235.62(43.36) | 299.15(42.84) | 398.78(35.68) | |
| 12.19(9.03) | 9.31(9.84) | 13.79(12.77) | 17.14(11.68) | 32.55(13.43) | ||
| 98.15(18.31) | 162.18(28.51) | 229.84(41.43) | 287.54(39.80) | 373.72(34.05) | ||
| 0.40(4.65) | −0.83(5.35) | −0.38(4.26) | 0.34(4.66) | 11.74(6.67) | ||
Note.—The analysis is reported for .The term factor refers to the constant used to multiply the ω parameter in the different simulations.
Mean ω (standard deviation) over the 50 Replicates for the Different Simulations Performed.
| Simulations | Model | Simulated Parameter | ||||
|---|---|---|---|---|---|---|
| Factor = 0.1 | Factor = 0.5 | Factor = 1.0 | Factor = 2.0 | Factor = 10.0 | ||
| A | M0 | 0.110(0.013) | 0.520(0.054) | 0.978(0.104) | 2.030(0.224) | 8.434(1.440) |
| 0.114(0.023) | 0.543(0.136) | 1.137(0.336) | 2.308(0.516) | 14.194(9.335) | ||
| 0.115(0.023) | 0.540(0.133) | 1.125(0.332) | 2.233(0.469) | 10.240(2.918) | ||
| 0.108(0.015) | 0.496(0.061) | 0.978(0.117) | 2.182(0.280) | 13.456(8.282) | ||
| 0.110(0.014) | 0.512(0.054) | 0.980(0.109) | 2.042(0.227) | 8.458(1.453) | ||
| B | M0 | 0.132(0.014) | 0.549(0.066) | 1.027(0.137) | 1.724(0.407) | 4.911(1.710) |
| 0.102(0.030) | 0.425(0.092) | 0.839(0.161) | 1.584(0.376) | 14.007(15.299) | ||
| 0.108(0.034) | 0.426(0.109) | 0.811(0.176) | 1.393(0.510) | 4.228(1.977) | ||
| 0.109(0.013) | 0.490(0.069) | 0.998(0.144) | 1.917(0.432) | 20.627(35.725) | ||
| 0.132(0.015) | 0.551(0.074) | 1.025(0.148) | 1.706(0.425) | 4.899(1.745) | ||
| C | M0 | 0.185(0.017) | 0.607(0.067) | 0.986(0.148) | 1.407(0.200) | 2.524(0.536) |
| 0.079(0.020) | 0.365(0.071) | 0.833(0.286) | 1.432(0.467) | 10.494(5.773) | ||
| 0.108(0.032) | 0.388(0.076) | 0.733(0.266) | 1.016(0.350) | 2.045(0.944) | ||
| 0.096(0.014) | 0.448(0.063) | 0.948(0.201) | 1.765(0.461) | 14.205(8.130) | ||
| 0.185(0.018) | 0.618(0.076) | 1.002(0.152) | 1.420(0.230) | 2.588(0.524) | ||
| D | M0 | 0.241(0.198) | 0.845(1.017) | 1.016(0.150) | 1.301(0.202) | 1.692(0.244) |
| 0.078(0.019) | 0.405(0.103) | 0.744(0.161) | 1.632(0.623) | 9.559(6.274) | ||
| 0.120(0.037) | 0.427(0.103) | 0.591(0.168) | 0.849(0.257) | 1.270(0.629) | ||
| 0.091(0.013) | 0.483(0.090) | 0.960(0.216) | 2.001(0.731) | 11.149(7.188) | ||
| 0.209(0.025) | 0.706(0.092) | 1.036(0.166) | 1.334(0.221) | 1.687(0.254) | ||
| ECM | M0 | 0.026(0.003) | 0.130(0.011) | 0.241(0.026) | 0.443(0.067) | 1.555(0.383) |
| 0.006(0.003) | 0.034(0.008) | 0.066(0.015) | 0.122(0.022) | 0.633(0.163) | ||
| 0.009(0.005) | 0.052(0.014) | 0.102(0.031) | 0.192(0.044) | 0.698(0.240) | ||
| 0.008(0.002) | 0.053(0.006) | 0.099(0.016) | 0.194(0.026) | 1.108(0.296) | ||
| 0.026(0.003) | 0.123(0.011) | 0.226(0.025) | 0.427(0.064) | 1.460(0.359) | ||
Note.—The values given for the and models are the uncorrected ones (see text). The analysis is reported for . The term factor refers to the constant used to multiply the ω parameter in the different simulations.
FDelta AICc plots comparing the performance of the M0 model with KCM models (, , , ) on 100 empirical data sets randomly selected from Selectome database. For each empirical data set, we evaluated the maximum-likelihood value of the M0 model and the KCM variants and compared the delta AICc to penalize the 2 free parameters of the M0 model and the 7, 7, 19, and 19 free parameters of the KCM models, respectively. For each plot, a black horizontal line is drawn for the mean delta AICc value of the empirical data set. The codon frequencies used were the products of the observed nucleotide frequencies at each of the three codon positions (; Yang and Bielawski 2000). Empirical data sets with delta AICc < 4 are shown in red.
Estimated Log Likelihood (ln L), AICc, and ω Parameters (corrected for the model) for Vertebrate β-Globin and the Plants rbcL and pepC Genes.
| Models | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| −ln | AICc | Ω | −ln | AICc | −ln | AICc | |||
| M0 | 3,815.5 | 7,635.1 | 0.23685 | 4,362.7 | 8,729.4 | 0.10116 | 9,783.4 | 19,571 | 0.06597 |
| 3,799.9 | 7,614.1 | 0.20640 | 4,336.70 | 8,687.5 | 0.08671 | 9,734.95 | 19,484 | 0.06093 | |
| 3,710.5 | 7,460.8 | 0.1409 | 4,301.86 | 8,642.3 | 0.0832 | 9,595.99 | 19,231 | 0.0443 | |
| 3,694.78 | 7,403.8 | 0.11417 | 4,297.31 | 8,608.7 | 0.06580 | 9,541.91 | 19,098 | 0.04428 | |
| 3,601.16 | 7,242.2 | 0.0706 | 4,263.39 | 8,565.4 | 0.0652 | 9,367.98 | 18,775 | 0.0291 | |
| MECneutral | 3,840.4 | 7,697.9 | 1.129 | 4,557.3 | 9,130.9 | 1.189 | 10,504.9 | 21,026.9 | 1.139 |
| KCM19 | 3,659.9 | 7,361.3 | 1.000 | 4,328.5 | 8,693.5 | 1.000 | 9,649.0 | 19,335.0 | 1.000 |
Partitions of the Codons into 20 Categories Based on the Substitution Rate Matrix of 2 Genes Obtained Under and M0 Models Using AIS Algorithm.
| AIS Analysis | ||
|---|---|---|
| Gene | KCM | M0 |
Partitions of Codons into Seven Categories Based on Substitution Rate Matrix of Two Genes Obtained Under and M0 Models Using AIS Algorithm.
| Genes | AIS Analysis | |
|---|---|---|
| KCM | M0 | |