| Literature DB >> 18442989 |
Andrew E Firth1, Wayne M Patrick.
Abstract
There are many methods for introducing random mutations into nucleic acid sequences. Previously, we described a suite of programmes for estimating the completeness and diversity of randomized DNA libraries generated by a number of these protocols. Our programmes suggested some empirical guidelines for library design; however, no information was provided regarding library diversity at the protein (rather than DNA) level. We have now updated our web server, enabling analysis of translated libraries constructed by site-saturation mutagenesis and error-prone PCR (epPCR). We introduce GLUE-Including Translation (GLUE-IT), which finds the expected amino acid completeness of libraries in which up to six codons have been independently varied (according to any user-specified randomization scheme). We provide two tools for assisting with experimental design: CodonCalculator, for assessing amino acids corresponding to given randomized codons; and AA-Calculator, for finding degenerate codons that encode user-specified sets of amino acids. We also present PEDEL-AA, which calculates amino acid statistics for libraries generated by epPCR. Input includes the parent sequence, overall mutation rate, library size, indel rates and a nucleotide mutation matrix. Output includes amino acid completeness and diversity statistics, and the number and length distribution of sequences truncated by premature termination codons. The web interfaces are available at http://guinevere.otago.ac.nz/stats.html.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18442989 PMCID: PMC2447733 DOI: 10.1093/nar/gkn226
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Completeness and diversity statistics for two hypothetical site-saturation mutagenesis libraries, in which two codons have been randomized according to different schemes (NNK + NDT or NNB + NAY)
| Library | No. of clones sampled | GLUE | GLUE-IT | ||
|---|---|---|---|---|---|
| No. of distinct DNA variants | DNA completeness | No. of distinct amino acid variants | Amino acid completeness | ||
| NNK + NDT | 100 | 88 | 0.23 | 77 | 0.32 |
| 500 | 280 | 0.73 | 196 | 0.82 | |
| 1000 | 356 | 0.93 | 229 | 0.95 | |
| 1500 | 376 | 0.98 | 237 | 0.99 | |
| NNB + NAY | 100 | 88 | 0.23 | 53 | 0.66 |
| 500 | 280 | 0.73 | 78 | 0.98 | |
| 1000 | 356 | 0.93 | 80 | 1.00 | |
| 1500 | 376 | 0.98 | 80 | 1.00 | |
aThe fraction of all possible DNA sequence variants (384 for each library) that are represented in the sample.
bNot including variants with stop codons.
cThe fraction of all possible amino acid variants (240 for Library 1; 80 for Library 2) that are sampled.
Characteristics of an α-synuclein epPCR library, estimated by PEDEL-AA and by a previously-described Monte Carlo library diversity algorithm (23)
| Property | PEDEL-AA | Ref. ( |
|---|---|---|
| Prematurely truncated variants (proportion of total library) | 16% | 15% |
| Number of full-length clones | 3.2 × 106 | 3.1 × 106 |
| Protein mutation frequency per amino acid | 0.016 | 0.016 |
| Mean number of mutations per protein | 2.1 | 2.1 |
| Unmutated (wild-type) sequences (proportion of total library) | 14% | 14% |
| Number of unique proteins in the library | 1.3 × 106 | 1.3 × 106 |
| Number of different point mutations in the library | 1989 | 1990 |
| Number of unique single-mutation variants in the library | 1618 | 1566 |
aThe epPCR library was constructed by Volles and Lansbury (23), and consisted of 3.77 × 106 clones with an average of 3.2 nucleotide mutations per clone. The template for randomization was 399 bp in length (coding for amino acids 8–140 of the α-synuclein protein). Table 1 of Volles and Lansbury (23) was used for the nucleotide mutation matrix.
Figure 1.The estimated numbers of unique DNA sequence variants (CDNA, dashed line) and protein sequence variants (Cprotein, solid line) in a purF epPCR library (24), plotted as a function of the DNA mutation rate, λ. The epPCR comprised 30 thermal cycles, with eff = 0.41. The library contained 6.4 × 105 clones. A total of 7549 bp of DNA sequence was obtained from randomly chosen library members, and 77 substitutions plus one single-nucleotide deletion were observed. These data were used to construct the nucleotide mutation matrix for PEDEL-AA. The library used for genetic selection experiments contained λ = 15.5 mutations per sequence; the estimated sequence diversity it contains is indicated by the vertical arrow on the right. The maximally diverse library is indicated by the vertical arrow at λ = 8. Note that, after peaking, the number of unique amino acid (but not nucleotide) variants decreases with increasing λ, due to an increasing number of truncated sequences.