| Literature DB >> 24204297 |
Liat Rockah-Shmuel1, Ágnes Tóth-Petróczy, Asaf Sela, Omri Wurtzel, Rotem Sorek, Dan S Tawfik.
Abstract
Short insertions and deletions (InDels) comprise an important part of the natural mutational repertoire. InDels are, however, highly deleterious, primarily because two-thirds result in frame-shifts. Bypass through slippage over homonucleotide repeats by transcriptional and/or translational infidelity is known to occur sporadically. However, the overall frequency of bypass and its relation to sequence composition remain unclear. Intriguingly, the occurrence of InDels and the bypass of frame-shifts are mechanistically related - occurring through slippage over repeats by DNA or RNA polymerases, or by the ribosome, respectively. Here, we show that the frequency of frame-shifting InDels, and the frequency by which they are bypassed to give full-length, functional proteins, are indeed highly correlated. Using a laboratory genetic drift, we have exhaustively mapped all InDels that occurred within a single gene. We thus compared the naive InDel repertoire that results from DNA polymerase slippage to the frame-shifting InDels tolerated following selection to maintain protein function. We found that InDels repeatedly occurred, and were bypassed, within homonucleotide repeats of 3-8 bases. The longer the repeat, the higher was the frequency of InDels formation, and the more frequent was their bypass. Besides an expected 8A repeat, other types of repeats, including short ones, and G and C repeats, were bypassed. Although obtained in vitro, our results indicate a direct link between the genetic occurrence of InDels and their phenotypic rescue, thus suggesting a potential role for frame-shifting InDels as bridging evolutionary intermediates.Entities:
Mesh:
Year: 2013 PMID: 24204297 PMCID: PMC3812077 DOI: 10.1371/journal.pgen.1003882
Source DB: PubMed Journal: PLoS Genet ISSN: 1553-7390 Impact factor: 5.917
Point mutations and InDels frequencies in the naive and selected repertoires (G0 and G17, respectively).
| Region | Mutation type | InDel length | G0 occurrence | G17 occurrence |
| Nucleotides 4–990; open reading frame amino acids 2–330 | Point mutations (S) | 3×10−3 | 2×10−2 | |
| InDels (I) | 2.2×10−4 | 1.1×10−4 | ||
| S/I | 16 | 190 | ||
| Insertions | 6.7×10−5 | 1.7×10−5 | ||
| Deletions | 1.5×10−4 | 0.9×10−4 | ||
| Distribution | >+3 | 0.00% | 0.03% | |
| +3 | 0.15% | 0.18% | ||
| +2 | 1.8% | 0.4% | ||
| +1 | 28.9% | 9.2% | ||
| −1 | 69.1% | 90.2% | ||
| Nucleotides 969–990; C-terminus, amino acids 324–330 | Point mutations (S) | 3×10−3 | 3×10−2 | |
| InDels (I) | 2.0×10−4 | 0.4×10−2 | ||
| S/I | 16 | 9 | ||
| Insertions | 1.2×10−5 | 8.5×10−5 | ||
| Deletions | 1.9×10−4 | 3.5×10−3 | ||
| Distribution | +1 | 6.3% | 1.4% | |
| −1 | 93.7% | 98.6% |
The frequencies of InDels and point mutations were subtracted from the background frequencies found in the respective repertoire, G0 or G17. The latter were determined by sequencing of a 98 nts region that was not subjected to the error-prone PCR mutagenesis, and were found to be 2.1×10−5 for G0 and 1.6×10−5 for G17. The background frequencies for point mutations were found to be 1.7×10−3 for G0 and 8.6×10−4 for G17.
The percent values relate to the fraction of a particular InDel type out of all InDels in the respective repertoire, G0 or G17.
Homonucleotide repeats in M.HaeIII gene and the observed InDels frequencies within these repeats.
| Homonucleotide repeats | G0 | G17 | ||||
| Length | nt | Occurrence | Average frequency (×10−3) | Fraction | Average frequency (×10−3) | Fraction |
| 8 | A | 1 | 50.25 | 1/1 | 18.04 | 1/1 |
| T | - | - | - | - | - | |
| C | - | - | - | - | - | |
| G | - | - | - | - | - | |
| 7 | A | - | - | - | - | - |
| T | - | - | - | - | - | |
| C | - | - | - | - | - | |
| G | - | - | - | - | - | |
| 6 | A | - | - | - | - | - |
| T | 3 | 4.72 | 3/3 | 1.11 | 3/3 | |
| C | - | - | - | - | - | |
| G | 2 | 4.80 | 2/2 | 0.56 | 2/2 | |
| 5 | A | 3 | 2.81 | 3/3 | 0.39 | 1/3 |
| T | 2 | 2.95 | 2/2 | 0.76 | 1/2 | |
| C | 1 | 3.38 | 1/1 | 0.25 | 1/1 | |
| G | 1 | 2.04 | 1/1 | 0.16 | 0/1 | |
| 4 | A | 12 | 1.93 | 12/12 | 0.15 | 4/12 |
| T | 5 | 2.51 | 5/5 | 0.10 | 1/5 | |
| C | - | - | - | - | - | |
| G | 1 | 2.08 | 1/1 | 0.06 | 0/1 | |
| 3 | A | 18 | 0.61 | 17/18 | 0.02 | 0/18 |
| T | 16 | 0.66 | 15/16 | 0.03 | 0/16 | |
| C | 2 | 0.21 | 2/2 | 0.00 | 0/2 | |
| G | 3 | 0.50 | 3/3 | 0.03 | 0/3 | |
The number of repeats of this type as found in the nucleotide sequence of M.HaeIII's gene.
Fraction represents the number of repeats in which InDels were found with ≥10-fold higher frequency than background out of the total number of that repeat type (i.e., the occurrence of that repeat).
The log values of the average frequencies are linearly correlated with repeat length (linear regression R2≥0.97; see also Figure 3).
Figure 3Methylation activities of M.HaeIII variants carrying individual InDels.
The encoding plasmids of variants #1–19 (listed in Table 3) were extracted and the methylation activities were determined by the level of protection from digestion with HaeIII. Shown are the levels of protection for plasmids derived from E. coli cells grown with over-expression (with inducer; A) or with basal expression of the M.HaeIII variants (B).
Figure 1InDels frequencies in the selected (G17) and unselected (G0) repertoires.
Panel A indicates the length of the homonucleotide repeat within which the InDels occurred. Panels B and C represent frequencies plotted per nucleotide position in each library. Panel D illustrates the conservation pattern in the M.HaeIII as derived by Consurf [49] – the highest the score, the higher the conservation. Also indicated are the locations of the conserved motifs shared by all DNA methyltransferases of this class (C5 methyltransferases). The serial #1–19 numbers indicate individually tested InDels (listed in Table 3).
Individually tested InDels.
| InDel location | InDel frequency | InDel type | MTase activity | ||||||||
| # | nt | aa | Structural location | G0 (×10−3) | G17 (×10−3) | Repeat | Sequence | Basal | OE | Conservation | |
| 1 | 57 | 19 | SAM binding domain -between motif I and II | 2.79 | 0.86 | 5A | −A |
| 62% | 78% | 1.22 |
| 2 | +A |
| 13% | 36% | 1.22 | ||||||
| 3 | 204 | 68 | Catalytic domain - motif IV | 5.65 | 0.72 | 6G | −G |
| 0% | 0% | −0.80 |
| 4 | +G |
| 0% | 0% | −0.80 | ||||||
| 5 | 274 | 92 | Catalytic domain - motif V | 4.12 | 0.92 | 6T | −T |
| 2% | 91% | 0.81 |
| 6 | 306 | 102 | Catalytic domain - between motif V and VI | 50.25 | 18.04 | 8A | −A |
| 23% | 100% | 0.51 |
| 7 | +A |
| 88% | 100% | 0.51 | ||||||
| 8 | +AA |
| 94% | 100% | 0.51 | ||||||
| 9 | 422 | 141 | Catalytic domain - beginning of motif VIII | 3.23 | 0.03 | 4T | −T |
| 0% | 0% | 0.08 |
| 10 | 472 | 158 | End of catalytic domain (after motif VIII) | 6.08 | 1.94 | 6T | −T |
| 1% | 95% | −0.36 |
| 11 | +T |
| 0% | 0% | −0.36 | ||||||
| 12 | 596 | 199 | TRD - 25 residues before the recognition residues | 3.76 | 0.17 | 5A | −A |
| 1% | 39% | −0.55 |
| 13 | +A |
| 5% | 92% | −0.55 | ||||||
| 14 | 666 | 222 | TRD - close to the recognition residues | 3.15 | 0.12 | 5T | −T |
| 0% | 0% | −0.01 |
| 15 | 748 | 250 | TRD - 25 residues after the recognition residues | 3.38 | 0.25 | 5C | −C |
| 32% | 100% | −0.68 |
| 16 | +C |
| 2% | 12% | −0.68 | ||||||
| 17 | 804 | 268 | TRD - before motif IX | 2.14 | 0.47 | 4A | −A |
| 100% | 100% | 1.26 |
| 18 | 876 | 292 | End of TRD - between motif IX and X | 2.74 | 1.40 | 5T | −T |
| 1% | 17% | −0.70 |
| 19 | +T |
| 59% | 75% | −0.70 | ||||||
Noted within the sequence column, in bold, is the inserted base, and in bold with strikethrough the deleted one.
The TRD recognition residues interact with the substrate DNA and are located in amino acids 219 to 244 [18], [50].
MTase activity relates to the fraction of plasmid DNA that was completely methylated and hence protected from digestion by HaeIII; the fraction was deduced from the gel image (Figure 3) using GelAnalyzer.
The conservation score derived from Consurf [49] using the multiple sequence alignment of M.HaeIII's orthologs with standard parameters (see Figure 1D).
Figure 2Average InDel frequencies by repeat lengths.
The frequencies in the selected (G17, full circles) and in the unselected (G0, empty circles) repertoires were averaged for all InDels occurring within a given repeat length. The average frequencies, multiplied by103, are presented on a log scale. Repeat length = 1 corresponds to positions whereby both flanking bases differ from the base in this position. Error bars correspond to standard deviations of the InDel frequencies for each repeat length.