| Literature DB >> 24655606 |
Dawit Nigatu, Attiya Mahmood, Werner Henkel1.
Abstract
BACKGROUND: A number of evolutionary models have been widely used for sequence alignment, phylogenetic tree reconstruction, and database searches. These models focus on how sets of independent substitutions between amino acids or codons derive one protein sequence from its ancestral sequence during evolution. In this paper, we regard the Empirical Codon Mutation (ECM) Matrix as a communication channel and compute the corresponding channel capacity.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24655606 PMCID: PMC3998026 DOI: 10.1186/1471-2105-15-80
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Probability distribution of codons (Biological and Optimal). The optimum codon distribution to maximize mutual information and the biological distribution of codons in the five vertebrates. Consecutive bins indicate that the codons belong to the same encoded amino acid (one letter symbol). The synonymous codons are arranged alphabetically.
Biological codon relative frequency
| T | TTT | 0.0191 | TCT | 0.0171 | TAT | 0.0132 | TGT | 0.0110 | T |
| | TTC | 0.0196 | TCC | 0.0160 | TAC | 0.0160 | TGC | 0.0119 | C |
| | TTA | 0.0085 | TCA | 0.0133 | TAA | 0.0003 | TGA | 0.0003 | A |
| | TTG | 0.0141 | TCG | 0.0043 | TAG | 0.0001 | TGG | 0.0125 | G |
| C | CTT | 0.0150 | CCT | 0.0176 | CAT | 0.0116 | CGT | 0.0054 | T |
| | CTC | 0.0173 | CCC | 0.0150 | CAC | 0.0144 | CGC | 0.0087 | C |
| | CTA | 0.0080 | CCA | 0.0178 | CAA | 0.0137 | CGA | 0.0062 | A |
| | CTG | 0.0373 | CCG | 0.0059 | CAG | 0.0337 | CGG | 0.0085 | G |
| A | ATT | 0.0175 | ACT | 0.0144 | AAT | 0.0182 | AGT | 0.0136 | T |
| | ATC | 0.0200 | ACC | 0.0160 | AAC | 0.0206 | AGC | 0.0191 | C |
| | ATA | 0.0094 | ACA | 0.0169 | AAA | 0.0282 | AGA | 0.0135 | A |
| | ATG | 0.0219 | ACG | 0.0059 | AAG | 0.0319 | AGG | 0.0118 | G |
| G | GTT | 0.0136 | GCT | 0.0200 | GAT | 0.0252 | GGT | 0.0115 | T |
| | GTC | 0.0138 | GCC | 0.0213 | GAC | 0.0246 | GGC | 0.0176 | C |
| | GTA | 0.0084 | GCA | 0.0179 | GAA | 0.0311 | GGA | 0.0184 | A |
| | GTG | 0.0265 | GCG | 0.0060 | GAG | 0.0389 | GGG | 0.0133 | G |
| T | C | A | G | ||||||
The codon relative frequency of the five vertebrates genomes (human, mouse, chicken, frog, and zebrafish) from the data presented by Schneider A., Cannarozzi G., and Gonnet G. [4].
Calculated codon relative frequency
| T | TTT | 0.0257 | TCT | 0.0113 | TAT | 0.0207 | TGT | 0.0215 | T |
| | TTC | 0.0264 | TCC | 0.0150 | TAC | 0.0260 | TGC | 0.0247 | C |
| | TTA | 0.0097 | TCA | 0.0100 | TAA | * | TGA | * | A |
| | TTG | 0.0119 | TCG | 0.0066 | TAG | * | TGG | 0.0439 | G |
| C | CTT | 0.0118 | CCT | 0.0159 | CAT | 0.0141 | CGT | 0.0073 | T |
| | CTC | 0.0150 | CCC | 0.0162 | CAC | 0.0183 | CGC | 0.0129 | C |
| | CTA | 0.0054 | CCA | 0.0161 | CAA | 0.0144 | CGA | 0.0077 | A |
| | CTG | 0.0277 | CCG | 0.0085 | CAG | 0.0337 | CGG | 0.0065 | G |
| A | ATT | 0.0162 | ACT | 0.0071 | AAT | 0.0160 | AGT | 0.0130 | T |
| | ATC | 0.0205 | ACC | 0.0128 | AAC | 0.0212 | AGC | 0.0163 | C |
| | ATA | 0.0088 | ACA | 0.0093 | AAA | 0.0251 | AGA | 0.0157 | A |
| | ATG | 0.0330 | ACG | 0.0079 | AAG | 0.0261 | AGG | 0.0122 | G |
| G | GTT | 0.0096 | GCT | 0.0132 | GAT | 0.0234 | GGT | 0.0114 | T |
| | GTC | 0.0114 | GCC | 0.0172 | GAC | 0.0228 | GGC | 0.0162 | C |
| | GTA | 0.0060 | GCA | 0.0110 | GAA | 0.0235 | GGA | 0.0183 | A |
| | GTG | 0.0260 | GCG | 0.0048 | GAG | 0.0263 | GGG | 0.0126 | G |
| T | C | A | G | ||||||
The codon relative frequency that maximizes the mutual information between input and output and yielding a capacity close to what is required for preserving the information content of amino acids. An exponential factor of 0.26 is applied to the ECM matrix.
Figure 2Capacity as a function of an exponential factor. The required rate for error-free transmission and the achievable capacity are plotted as a function of the exponent of the ECM matrix.