| Literature DB >> 16740169 |
Zhang Zhang1, Jun Li, Jun Yu.
Abstract
BACKGROUND: Approximate methods for estimating nonsynonymous and synonymous substitution rates (Ka and Ks) among protein-coding sequences have adopted different mutation (substitution) models. In the past two decades, several methods have been proposed but they have not considered unequal transitional substitutions (between the two purines, A and G, or the two pyrimidines, T and C) that become apparent when sequences data to be compared are vast and significantly diverged.Entities:
Mesh:
Year: 2006 PMID: 16740169 PMCID: PMC1552089 DOI: 10.1186/1471-2148-6-44
Source DB: PubMed Journal: BMC Evol Biol ISSN: 1471-2148 Impact factor: 3.260
Symbols used in estimating Ka and Ks
| Symbol | Definition |
| S | Number of synonymous sites |
| N | Number of nonsynonymous sites |
| Sd | Number of synonymous substitutions |
| Nd | Number of nonsynonymous substitutions |
| Ks | Synonymous substitution rate |
| Ka | Nonsynonymous substitution rate |
| ω | Estimator of selective strength, ω = Ka/Ks |
| Divergence time between two sequences, the expected number of nucleotide substitutions per codon, | |
| αR | Transitional rate between purines |
| αY | Transitional rate between pyrimidines |
| α | Transitional rate |
| β | Transversional rate |
| κR | Ratio of transitional rate between purines to transversional rate, κR = αR/β |
| κY | Ratio of transitional rate between pyrimidines to transversional rate, κY = αY/β |
| κ | Ratio of transitional rate/transversional rate, κ = α/β |
| Frequency of nucleotide N, N ∈ {T, C, A, G} | |
| πj | Frequency of codon |
Figure 1Percentage errors of estimated ω (= Ka/Ks) by YN and MYN when κ. The percentage error was calculated by the formula 100% × [(estimated value) - (expected value)]/(expected value). The canonical genetic code was used for simulated sequences with 2 million codons. Three sets of codon frequencies were used: equal (A to C), human (D to F) calculated from human protein-coding genes, and rice (G to I) calculated from rice protein-coding genes. ω = 0.3 (A, D, G), ω = 1 (B, E, H), and ω = 3 (C, F, I) were considered as representative values for purifying selection, neutral mutation and positive selection, respectively.
Figure 2Percentage errors of estimated Ks by YN and MYN when κ. The percentage error was calculated by the formula 100% × [(estimated value) - (expected value)]/(expected value). Sequences with 2 million codons were simulated with human codon frequencies. Three representative values of ω (0.3 in A, 1 in B, and 3 in C) were used for purifying selection, neutral mutation, and positive selection, respectively.
Average estimates of Ka, Ks, and ω with YN and MYN
| Parameters | Expected Values | YN | MYN | ||||||||
| ω | κR | κY | Ka | Ks | Ka | Ks | ω | Ka | Ks | ω | |
| 0.3 | 0.1 | 3 | 1.5 | 0.021 | 0.069 | 0.021 | 0.065 | 0.353 | 0.021 | 0.067 | 0.340 |
| 5 | 1.5 | 0.021 | 0.069 | 0.021 | 0.062 | 0.361 | 0.021 | 0.066 | 0.332 | ||
| 10 | 1 | 0.020 | 0.068 | 0.022 | 0.058 | 0.395 | 0.021 | 0.066 | 0.334 | ||
| 3.75 | 3.75 | 0.020 | 0.066 | 0.020 | 0.066 | 0.328 | 0.020 | 0.066 | 0.331 | ||
| 1 | 3 | 1.5 | 0.207 | 0.692 | 0.210 | 0.653 | 0.329 | 0.208 | 0.720 | 0.298 | |
| 5 | 1.5 | 0.206 | 0.686 | 0.206 | 0.569 | 0.369 | 0.201 | 0.672 | 0.311 | ||
| 10 | 1 | 0.205 | 0.682 | 0.197 | 0.419 | 0.476 | 0.188 | 0.529 | 0.366 | ||
| 3.75 | 3.75 | 0.199 | 0.662 | 0.198 | 0.662 | 0.305 | 0.198 | 0.676 | 0.301 | ||
| 1 | 0.1 | 3 | 1.5 | 0.033 | 0.033 | 0.034 | 0.032 | 1.216 | 0.034 | 0.033 | 1.163 |
| 5 | 1.5 | 0.033 | 0.033 | 0.034 | 0.030 | 1.294 | 0.034 | 0.032 | 1.187 | ||
| 10 | 1 | 0.033 | 0.033 | 0.034 | 0.030 | 1.293 | 0.033 | 0.033 | 1.102 | ||
| 3.75 | 3.75 | 0.033 | 0.033 | 0.034 | 0.033 | 1.144 | 0.034 | 0.033 | 1.150 | ||
| 1 | 3 | 1.5 | 0.333 | 0.333 | 0.330 | 0.305 | 1.103 | 0.325 | 0.322 | 1.034 | |
| 5 | 1.5 | 0.333 | 0.333 | 0.325 | 0.283 | 1.168 | 0.317 | 0.310 | 1.044 | ||
| 10 | 1 | 0.333 | 0.333 | 0.300 | 0.242 | 1.267 | 0.287 | 0.279 | 1.051 | ||
| 3.75 | 3.75 | 0.333 | 0.333 | 0.326 | 0.318 | 1.043 | 0.327 | 0.318 | 1.047 | ||
| 3 | 0.1 | 3 | 1.5 | 0.040 | 0.013 | 0.041 | 0.013 | 3.637 | 0.040 | 0.014 | 3.511 |
| 5 | 1.5 | 0.041 | 0.014 | 0.041 | 0.013 | 3.738 | 0.040 | 0.014 | 3.453 | ||
| 10 | 1 | 0.041 | 0.014 | 0.041 | 0.014 | 3.077 | 0.039 | 0.016 | 2.783 | ||
| 3.75 | 3.75 | 0.041 | 0.014 | 0.040 | 0.016 | 2.846 | 0.041 | 0.016 | 2.869 | ||
| 1 | 3 | 1.5 | 0.403 | 0.134 | 0.396 | 0.129 | 3.173 | 0.391 | 0.135 | 2.994 | |
| 5 | 1.5 | 0.405 | 0.135 | 0.389 | 0.122 | 3.304 | 0.379 | 0.132 | 2.986 | ||
| 10 | 1 | 0.406 | 0.135 | 0.354 | 0.113 | 3.216 | 0.340 | 0.128 | 2.734 | ||
| 3.75 | 3.75 | 0.413 | 0.138 | 0.400 | 0.136 | 3.015 | 0.402 | 0.136 | 3.026 | ||
Note: The values were averaged over 1,000 pairs of simulated sequences that each had 400 codons.
Figure 3Average ω estimates over 1,000 pairs of sequences when divergence time . Human codon frequencies were used for simulating sequences (400 codons each). The parameters are ω = 0.3, κR = 10, κY = 1 in A; ω = 1, κR = 10, κY = 1 in B; and ω = 3, κR = 1, κY = 10 in C. Note that the scales of y-axis are different in all three panels.
Average estimates of ω with YN and MYN
| Number of codons | ω = 0.3 | ω = 3 | ||
| YN | MYN | YN | MYN | |
| 100 | 0.504 | 0.408 | 2.092 | 2.014 |
| 200 | 0.477 | 0.369 | 2.340 | 2.394 |
| 300 | 0.473 | 0.368 | 2.522 | 2.627 |
| 400 | 0.473 | 0.364 | 2.583 | 2.736 |
| 500 | 0.470 | 0.363 | 2.568 | 2.824 |
| 600 | 0.469 | 0.360 | 2.661 | 2.929 |
| 700 | 0.472 | 0.363 | 2.653 | 2.943 |
| 800 | 0.469 | 0.361 | 2.684 | 3.003 |
| 900 | 0.466 | 0.358 | 2.695 | 3.034 |
| 1000 | 0.473 | 0.361 | 2.671 | 3.006 |
Note: The parameters used were κR = 10, κY = 1 and t = 1 for purifying selection (ω = 0.3), and κR = 1, κY = 10, and t = 0.1 for positive selection (ω = 3). The ω values were averaged over 1,000 pairs of simulated sequences.
Figure 4Cumulative percentage of κ. Dashed lines were used to show the cases when κR-κY = -1 and κR-κY = 1.
Proportions of synonymous sites (S%) and estimates of Ka, Ks and ω
| Method | κR – κY > 1 | κR – κY < -1 | |κR – κY|≤ 1 | |||||||||
| S% | Ka | Ks | ω | S% | Ka | Ks | ω | S% | Ka | Ks | ω | |
| human-mouse orthologs | ||||||||||||
| LPB | - | 0.069 | 0.463 | 0.148 | - | 0.071 | 0.449 | 0.159 | - | 0.105 | 0.500 | 0.209 |
| GY | 27.2% | 0.065 | 0.518 | 0.125 | 27.1% | 0.068 | 0.503 | 0.135 | 26.9% | 0.101 | 0.561 | 0.180 |
| YN | 27.4% | 0.064 | 0.527 | 0.121 | 27.2% | 0.067 | 0.505 | 0.133 | 26.6% | 0.099 | 0.588 | 0.169 |
| MYN | 26.1% | 0.063 | 0.597 | 0.105 | 28.5% | 0.068 | 0.474 | 0.144 | 26.5% | 0.099 | 0.591 | 0.168 |
| human-dog orthologs | ||||||||||||
| LPB | - | 0.055 | 0.309 | 0.176 | - | 0.057 | 0.296 | 0.192 | - | 0.081 | 0.348 | 0.233 |
| GY | 27.5% | 0.052 | 0.332 | 0.157 | 27.5% | 0.055 | 0.318 | 0.172 | 26.5% | 0.078 | 0.381 | 0.205 |
| YN | 27.8% | 0.052 | 0.329 | 0.158 | 27.8% | 0.055 | 0.310 | 0.176 | 26.4% | 0.077 | 0.387 | 0.200 |
| MYN | 26.5% | 0.051 | 0.357 | 0.143 | 29.1% | 0.056 | 0.294 | 0.189 | 26.3% | 0.077 | 0.389 | 0.199 |
| mouse-rat orthologs | ||||||||||||
| LPB | - | 0.030 | 0.176 | 0.173 | - | 0.030 | 0.170 | 0.179 | - | 0.048 | 0.196 | 0.245 |
| GY | 28.0% | 0.030 | 0.189 | 0.157 | 27.7% | 0.029 | 0.183 | 0.160 | 26.9% | 0.047 | 0.214 | 0.220 |
| YN | 28.2% | 0.029 | 0.186 | 0.157 | 27.9% | 0.029 | 0.180 | 0.162 | 26.6% | 0.046 | 0.216 | 0.215 |
| MYN | 26.6% | 0.029 | 0.201 | 0.142 | 29.1% | 0.030 | 0.171 | 0.173 | 26.5% | 0.046 | 0.216 | 0.214 |
Note: The values were calculated from concatenated sequences according to the three scenarios of κR and κY. The methods compared in addition to YN and MYN were LPB, an approximate method developed by Li (1993) and by Pamilo and Bianchi (1993) independently, and GY, a maximum likelihood method proposed by Goldman and Yang (1994). When GY was used, codon frequencies were calculated from nucleotide frequencies of three codon positions.