| Literature DB >> 28963504 |
Frida Belinky1, Igor B Rogozin1, Eugene V Koonin2.
Abstract
Reconstruction of the evolution of start codons in 36 groups of closely related bacterial and archaeal genomes reveals purifying selection affecting AUG codons. The AUG starts are replaced by GUG and especially UUG significantly less frequently than expected under the neutral expectation derived from the frequencies of the respective nucleotide triplet substitutions in non-coding regions and in 4-fold degenerate sites. Thus, AUG is the optimal start codon that is actively maintained by purifying selection. However, purifying selection on start codons is significantly weaker than the selection on the same codons in coding sequences, although the switches between the codons result in conservative amino acid substitutions. The only exception is the AUG to UUG switch that is strongly selected against among start codons. Selection on start codons is most pronounced in evolutionarily conserved, highly expressed genes. Mutation of the start codon to a sub-optimal form (GUG or UUG) tends to be compensated by mutations in the Shine-Dalgarno sequence towards a stronger translation initiation signal. Together, all these findings indicate that in prokaryotes, translation start signals are subject to weak but significant selection for maximization of initiation rate and, consequently, protein production.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28963504 PMCID: PMC5622118 DOI: 10.1038/s41598-017-12619-6
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Selection on start codons switches. The 3 panels compare the frequencies of each of the switches between the 3 start codons (blue), the respective codons in non-start positions of coding sequences (orange), the corresponding nucleotide triplets in non-coding sequences (dark grey), and the corresponding 4-fold degenerate sites that are followed by ‘UG’ (light grey). Error bars indicate standard errors of the frequencies.
Start codon switch counts and frequencies in 36 triples of prokaryotic genomes compared to the switches of the same nucleotide triplets in non-start coding regions and non-coding regions.
| Switch | Start codon switch Count | Start codon switch frequency | Non-coding switch Count | Non-coding switch frequency | 4-fold sites switch count | 4-fold sites switch frequency | Coding non-start switch Count | Coding non-start switch frequency |
|---|---|---|---|---|---|---|---|---|
| AUG > GUG | 363 | 0.0059 | 2,273 | 0.0228 | 1,564 | 0.1107 | 2,623 | 0.004 |
| AUG > UUG | 113 | 0.0018 | 783 | 0.0079 | 601 | 0.0425 | 1,790 | 0.0027 |
| GUG > AUG | 202 | 0.0398 | 2,370 | 0.0279 | 2,807 | 0.0842 | 3,343 | 0.0046 |
| GUG > UUG | 37 | 0.0073 | 553 | 0.0065 | 649 | 0.0195 | 1,106 | 0.0015 |
| UUG > AUG | 43 | 0.0174 | 722 | 0.0061 | 628 | 0.0344 | 948 | 0.0047 |
| UUG > GUG | 35 | 0.0142 | 659 | 0.0056 | 573 | 0.0314 | 771 | 0.0038 |
Start codon switch frequencies in well-sampled prokaryotic phyla.
| AUG > UUGUUG > AUG | AUG > GUGGUG > AUG | GUG > UUGUUG > GUG | Ancestral codon counts | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| # | Freq. | p Fisher | # | Freq. | p Fisher | # | Freq. | p Fisher | AUG | GUG | UUG | |
| α-proteobacteria | 244 | 0.0030.012 | 0.007 | 6234 | 0.0050.043 | 1.2e–16 | 66 | 0.0070.019 | 0.12 | 11440 | 789 | 323 |
| β-proteobacteria | 134 | 0.0110.021 | 1.6e–4 | 6215 | 0.0050.029 | 1.3e-6 | 43 | 0.0080.016 | 0.4 | 11347 | 520 | 191 |
| γ-proteobacteria | 191 | 0.0010.004 | 0.25 | 6337 | 0.0040.030 | 7.2e–19 | 52 | 0.0040.008 | 0.35 | 17733 | 1219 | 258 |
| δ/ε-proteobacteria | 13 | 3e-40.022 | 3e-4 | 2414 | 0.0080.019 | 0.01 | 02 | 00.014 | 0.03 | 3178 | 735 | 139 |
| Bacilii | 3716 | 0.0040.020 | 1.5e-6 | 6129 | 0.0060.04 | 1.3e-12 | 713 | 0.0100.016 | 0.37 | 9861 | 725 | 820 |
| Clostridia | 97 | 0.0040.022 | 0.002 | 4324 | 0.0190.103 | 9.6e-9 | 74 | 0.0300.013 | 0.21 | 2224 | 233 | 315 |
| Actinobacteria | 42 | 0.0020.051 | 0.0065 | 2737 | 0.0150.062 | 2.7e–8 | 11 | 0.0020.026 | 0.12 | 1850 | 598 | 39 |
| Methanococci | 34 | 0.0020.040 | 0.001 | 34 | 0.0020.080 | 9.1e–5 | 01 | 00.010 | 1 | 1219 | 50 | 101 |
Figure 2Selection on start codon switches in slow-evolving and fast-evolving genes. Error bars indicate standard errors of the frequencies.
Start codon switch counts and frequencies in slow vs. fast evolving genes.
| Start codon switches | All genes switch count | All genes switch frequency | Slow evolving genes switch count | Slow evolving genes switch frequency | Fast evolving genes switch count | Fast evolving genes switch frequency |
|---|---|---|---|---|---|---|
| AUG > GUG | 363 | 0.0059 | 126 | 0.0039 | 230 | 0.0086 |
| AUG > UUG | 113 | 0.0018 | 40 | 0.0012 | 69 | 0.0026 |
| GUG > AUG | 202 | 0.0398 | 70 | 0.027 | 124 | 0.0532 |
| GUG > UUG | 37 | 0.0073 | 19 | 0.0073 | 17 | 0.0073 |
| UUG > AUG | 43 | 0.0174 | 19 | 0.015 | 23 | 0.0204 |
| UUG > GUG | 35 | 0.0142 | 14 | 0.011 | 18 | 0.0159 |
Figure 3Comparison of the evolutionary rate (A) and protein abundance (B) in Escherichia coli genes starting with AUG, GUG or UUG codons. The lower bound of dN/dS values is zero. To enable presentation in log scale the lower bound was shifted to a finite value that was arbitrarily set to 0.01.
Figure 4Cumulative substitution frequencies in 29 base pair windows.
Coupled substitutions in start codons and neighboring sequences.
| Paired with a single substitution at position | Start codon substitutions | All other substitution | Frequency | p-value of comparison to all other positions | p-value of comparison to −1 to −3 | p-value of comparison to −7 to −10 |
|---|---|---|---|---|---|---|
| −7 to −10 | 80 | 9741 | 0.0082 | 5.2 × 10−4 | 0.61 | — |
| −1 to −3 | 118 | 15529 | 0.0076 | 6.7 × 10−4 | — | 0.61 |
| All other positions | 405 | 76884 | 0.0053 | — | 6.7 × 10−4 | 5.2 × 10−4 |