| Literature DB >> 35655139 |
Xiaolong Wang1, Quanjiang Dong2, Gang Chen3, Jianye Zhang3, Yongqiang Liu3, Yujia Cai3.
Abstract
Frameshift mutations have been considered of significant importance for the molecular evolution of proteins and their coding genes, while frameshift protein sequences encoded in the alternative reading frames of coding genes have been considered to be meaningless. However, functional frameshifts have been found widely existing. It was puzzling how a frameshift protein kept its structure and functionality while substantial changes occurred in its primary amino-acid sequence. This study shows that the similarities among frameshifts and wild types are higher than random similarities and are determined at different levels. Frameshift substitutions are more conservative than random substitutions in the standard genetic code (SGC). The frameshift substitutions score of SGC ranks in the top 2.0-3.5% of alternative genetic codes, showing that SGC is nearly optimal for frameshift tolerance. In many genes and certain genomes, frameshift-resistant codons and codon pairs appear more frequently than expected, suggesting that frameshift tolerance is achieved through not only the optimality of the genetic code but, more importantly, the further optimization of a specific gene or genome through the usages of codons/codon pairs, which sheds light on the role of frameshift mutations in molecular and genomic evolution.Entities:
Mesh:
Substances:
Year: 2022 PMID: 35655139 PMCID: PMC9164415 DOI: 10.1186/s12864-022-08435-6
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 4.547
The readthrough rules derived from natural suppressor tRNAs for nonsense mutations
| Site | tRNA (AA) | Codon |
|---|---|---|
| Ser (S) | UAG | |
| Gln (Q) | UAG | |
| Tyr (Y) | UAG | |
| Lys (K) | UAA | |
| Trp (W) | UGA |
Fig. 1Different alignments of the three translations of zebrafish vegfaa. A The wild-type and frameshift translations of zebrafish vegfaa; B The ClustalW alignment of the three translations; C FrameAlign of the first and the second translations; D FrameAlign of the first and the third translations; E FrameAlign of the second and the third translations. F The color scheme of GeneDoc, which is used in (B-E) to color the amino acids by their physicochemical properties. CDS: coding sequence; F1: the first translation (wild type); F2: the second translation (+ 1 frameshift); F3: the third translation (+ 2 frameshift); F2R: F2 readthrough; F3R: F3 readthrough
The similarities of proteins and their frameshifts (aligned by ClustalW or MSA)
| Type | Species | Number of CDSs | Average Similarity | Num of Gaps | |||||
|---|---|---|---|---|---|---|---|---|---|
| δ | δ | δ | δ | MAX | MIN | ||||
| Real CDSs (ClustalW) | 71,853 | 0.474 ± 0.039 | 0.454 ± 0.046 | 0.433 ± 0.043 | 0.890 | 0.271 | 53.3 | ||
| 15,781 | 0.473 ± 0.04 | 0.452 ± 0.047 | 0.431 ± 0.042 | 0.657 | 0.309 | 48.9 | |||
| 27,208 | 0.469 ± 0.038 | 0.448 ± 0.046 | 0.43 ± 0.041 | 0.739 | 0.286 | 52.5 | |||
| 7706 | 0.477 ± 0.038 | 0.455 ± 0.044 | 0.439 ± 0.042 | 0.638 | 0.320 | 36.8 | |||
| 14,151 | 0.465 ± 0.036 | 0.443 ± 0.043 | 0.433 ± 0.038 | 0.658 | 0.332 | 51.4 | |||
| 23,936 | 0.455 ± 0.039 | 0.432 ± 0.045 | 0.426 ± 0.039 | 0.702 | 0.250 | 69.4 | |||
| 29,227 | 0.475 ± 0.037 | 0.444 ± 0.042 | 0.441 ± 0.042 | 0.750 | 0.261 | 50.4 | |||
| 35,378 | 0.468 ± 0.038 | 0.439 ± 0.042 | 0.436 ± 0.043 | 0.828 | 0.217 | 47.6 | |||
| 5889 | 0.482 ± 0.043 | 0.451 ± 0.042 | 0.463 ± 0.047 | 0.692 | 0.259 | 39.7 | |||
| 4140 | 0.441 ± 0.039 | 0.415 ± 0.043 | 0.408 ± 0.042 | 0.614 | 0.280 | 45.6 | |||
| Average | 235,269 | 0.468 ± 0.039 | 0.443 ± 0.044 | 0.434 ± 0.042 | 0.890a | 0.217a | 49.6 | ||
| Random CDSs (ClustalW) | Three frames | 100000 × 3 | 0.475 ± 0.019 | 0.428 ± 0.020 | 0.427 ± 0.020 | 0.512 | 0.391 | 80.1 | |
| Three random CDSs | 100000 × 3 | 0.476 ± 0.019 | 0.429 ± 0.020 | 0.428 ± 0.020 | 0.520 | 0.388 | 137.1 | ||
| Random CDSs (MSA) | Three frames | 100000 × 3 | 0.409 ± 0.06 | 0.411 ± 0.059 | 0.448 ± 0.044 | 0.541 | 0.207 | 108.27 | |
| Three random CDSs | 100000 × 3 | 0.411 ± 0.06 | 0.413 ± 0.059 | 0.447 ± 0.043 | 0.540 | 0.201 | 109.47 | ||
The similarities of proteins and their frameshifts (aligned by FrameAlign)
| Type | Species | Number of CDSs | Average Similarity | Number of Gaps | |||||
|---|---|---|---|---|---|---|---|---|---|
| δ | δ | δ | δ | MAX | MIN | ||||
| Real CDSs (FrameAlign) | 71,853 | 0.492 ± 0.043 | 0.472 ± 0.044 | 0.434 ± 0.040 | 0.713 | 0.194 | 2 | ||
| 15,781 | 0.491 ± 0.046 | 0.468 ± 0.046 | 0.431 ± 0.042 | 0.625 | 0.311 | 2 | |||
| 27,208 | 0.484 ± 0.046 | 0.469 ± 0.042 | 0.426 ± 0.040 | 0.739 | 0.286 | 2 | |||
| 7706 | 0.481 ± 0.042 | 0.481 ± 0.041 | 0.439 ± 0.037 | 0.644 | 0.353 | 2 | |||
| 14,151 | 0.471 ± 0.044 | 0.468 ± 0.040 | 0.408 ± 0.040 | 0.614 | 0.314 | 2 | |||
| 23,936 | 0.475 ± 0.046 | 0.457 ± 0.044 | 0.362 ± 0.047 | 0.689 | 0.236 | 2 | |||
| 29,227 | 0.450 ± 0.047 | 0.475 ± 0.045 | 0.421 ± 0.043 | 0.634 | 0.224 | 2 | |||
| 35,378 | 0.442 ± 0.045 | 0.477 ± 0.044 | 0.412 ± 0.041 | 0.882 | 0.244 | 2 | |||
| 5889 | 0.461 ± 0.041 | 0.510 ± 0.042 | 0.423 ± 0.038 | 0.692 | 0.259 | 2 | |||
| 4140 | 0.435 ± 0.046 | 0.426 ± 0.047 | 0.372 ± 0.043 | 0.571 | 0.237 | 2 | |||
| Average | 235,269 | 0.468 ± 0.045 | 0.470 ± 0.043 | 0.413 ± 0.041 | 0.882a | 0.194a | 2 | ||
| Random CDSs (FrameAlign) | Three frames | 100,000 | 0.394 ± 0.028 | 0.394 ± 0.028 | 0.395 ± 0.028 | 0.477 | 0.330 | 2 | |
| Three random CDSs | 100000 × 3 | 0.383 ± 0.028 | 0.383 ± 0.028 | 0.383 ± 0.028 | 0.458 | 0.304 | 0 | ||
aVery large/small similarity values were observed in a few very short or repetitive peptides
The amino acid substitution scores for different kinds of codon substitutions
| Codon Substitution | Random | Frameshift | Interchangeable | |
|---|---|---|---|---|
| Type of Codon Substitution | ||||
| All | 4096 | 256 | 256 | 256 |
| Unchanged (%) | 64 (1.6%) | 4 (1.6%) | 4 (1.6%) | 64 (25%) |
| Changed (%) | 4032 (98.4%) | 252 (98.4%) | 252 (98.4%) | 192 (75%) |
| SS (%) | 230 (5.6%) | 14 (5.5%) | 14 (5.5%) | 192 (75%) |
| NSS-Positive (%) | 859 (20.1%) | 76 (29.7%) | 76 (29.7%) | 40 (15.6%) |
| NSS-Negative (%) | 3007 (73.4%) | 166 (64.8%) | 166 (64.8%) | 24 (9.4%) |
| Average Substitution Score | ||||
| BLOSSUM62 | −1.29 | −0.61 | −0.65 | 3.77 |
| PAM250 | −4.26 | −0.84 | −0.84 | 3.68 |
| GON250 | −10.81 | −1.78 | −1.78 | 35.60 |
SS/NSS Synonymous/nonsynonymous substitution, FF/RF Forward/reverse frameshift substitutions
Fig. 2The distribution and the statistical analysis of the FSSs for the alternative genetic codes. A The frequencies of occurrence of the FSSs in the random codon tables and the compatible codon tables. B The means and standard deviations of the sum FSSs of different types of genetic codes. T-tests indicate that the sum FSS of the NGC is significantly higher than the mean FSS of the random or compatible genetic codes in all six comparisons (P ≈ 0) (Table S3). NGC: natural genetic code; FSSs were calculated using matrices PAM250, BLOSSUM62, and GON250. The probability densities were computed using a normal distribution function, and the diagrams were plotted in the language R
The frameshift substitution scores of the natural and alternative genetic codes
| Genetic codes (Number tested) | Scoring Matrix | FSS of the natural genetic code (NGC) | FSS of the alternative genetic codes | ||||||
|---|---|---|---|---|---|---|---|---|---|
| FSS Score | Rank | Rank% | STDEV | STDEV% | Average | STDEV | STDEV% | ||
| Random (1,000,000 × 100) | PAM250 | − 344 | 132,586.79 | 13.26% | 1011.17 | 0.1011% | − 504.88 | 0.54 | −0.1073% |
| Blossum62 | − 276 | 19,752.52 | 1.98% | 295.17 | 0.0295% | −450.53 | 0.27 | −0.0598% | |
| Gonnet250 | −912 | 29,447.26 | 2.94% | 398.72 | 0.0399% | − 2872.95 | 4.16 | −0.1447% | |
| Compatible (13824) | PAM250 | −344 | 4273 | 30.91% | – | – | −401.25 | – | – |
| Blossum62 | −276 | 481 | 3.48% | – | – | −436.75 | – | – | |
| Gonnet250 | −912 | 481 | 3.48% | – | – | − 2736.13 | – | – | |
The usage of codons and their weighted mean FSSs (Gon250)
| No | Species (Codon Usage) | Weighted mean FSS |
|---|---|---|
| 1 | −9.82 | |
| 2 | −13.47 | |
| 3 | −12.75 | |
| 4 | −20.58 | |
| 5 | −19.43 | |
| 6 | −23.38 | |
| 7 | −22.52 | |
| 8 | −14.08 | |
| 9 | −28.59 | |
| 10 | Equal usage | −22.27 |
The usage of codon pairs and their weighted mean FSSs (Gon250)
| No | species | Number of codon pairs | Weighted mean FSS | ||||
|---|---|---|---|---|---|---|---|
| Over-represented | Under-represented | Absent | Over-represented | Under-represented | All | ||
| 1 | 1573 | 2523 | 50 | −1.52 | −7.80 | −3.06 | |
| 2 | 1505 | 2591 | 190 | −2.83 | −7.13 | − 3.81 | |
| 3 | 1660 | 2436 | 148 | −3.12 | −6.98 | −3.80 | |
| 4 | 1493 | 2603 | 148 | −4.87 | −6.09 | −5.18 | |
| 5 | 1418 | 2678 | 140 | −5.33 | −5.86 | −5.02 | |
| 6 | 1469 | 2627 | 164 | −6.47 | −5.26 | −6.11 | |
| 7 | 1566 | 2530 | 15 | −6.30 | −5.35 | − 6.37 | |
| 8 | 1493 | 2603 | 159 | −4.86 | −6.14 | −4.27 | |
| 9 | 1389 | 2707 | 197 | −6.76 | − 5.11 | − 6.82 | |
| 10 | Equal Usage | 0 | 0 | 0 | N/A | N/A | −5.67 |