| Literature DB >> 34817110 |
Bastiaan P Kuiper1, Rianne C Prins1, Sonja Billerbeck1.
Abstract
The construction of custom libraries is critical for rational protein engineering and directed evolution. Array-synthesized oligo pools of thousands of user-defined sequences (up to ∼350 bases in length) have emerged as a low-cost commercially available source of DNA. These pools cost ≤10 % (depending on error rate and length) of other commercial sources of custom DNA, and this significant cost difference can determine whether an enzyme engineering project can be realized on a given research budget. However, while being cheap, oligo pools do suffer from a low concentration of individual oligos and relatively high error rates. Several powerful techniques that specifically make use of oligo pools have been developed and proven valuable or even essential for next-generation protein and pathway engineering strategies, such as sequence-function mapping, enzyme minimization, or de-novo design. Here we consolidate the knowledge on these techniques and their applications to facilitate the use of oligo pools within the protein engineering community.Entities:
Keywords: array-based oligonucleotides; gene synthesis; mutagenesis; protein engineering
Mesh:
Substances:
Year: 2021 PMID: 34817110 PMCID: PMC9300125 DOI: 10.1002/cbic.202100507
Source DB: PubMed Journal: Chembiochem ISSN: 1439-4227 Impact factor: 3.461
Figure 1Overview of the discussed methods for mutagenic library creation, gene‐ and pathway assembly, and library analysis. A) Oligo pool‐based methods to generate protein libraries, gene fragment libraries, or pathways. 1) Single residue saturation mutagenic libraries, such as those used for deep‐mutational scanning, can be created from oligo pools by Nicking Mutagenesis (NM),[ , ] programmed allelic series (PALS), plasmid recombineering (PR) or CRISPR‐enabled trackable Genome engineering (CREATE). 2) Libraries where short DNA stretches are comprehensively inserted, such as required for protein minimization, have been created by PR and can likely be created by NM, PALS, and CREATE. Single‐residue deletion libraries have been created by PALS. 3) Complex pools of many different proteins fragments can be created by multiple pairwise assembly and DropSynth.[ , ] 4) Larger genes and pathways can be assembled as shown by Wan et al. B) Library sequence analysis: The quality of libraries needs to be analyzed for the frequency of programmed mutations (red dot) and the frequency of off‐target mutations (cross). 1) Short‐read sequencing – as offered via Illumina‐based services – is highly accurate, high in throughput, and widely accessible but the short read‐length leads to a narrow resolution window. Libraries need to be tiled to be accurately quality controlled, as off‐target mutations outside the resolution window are otherwise invisible. 2) Molecular barcoding and computational assembly can overcome the limited read length in NGS as full‐length sequences of mutagenized clones can be obtained from short NGS reads. 3) Single‐molecule real‐time (SMRT) long‐read sequencing, followed by computational error correction or combined with variant‐concatenation[ , ] starts to allow long‐read sequencing for library quality control in protein engineering.
Performance overview of single‐site saturation mutagenesis methods (e. g. used for deep mutational scanning).
|
Method |
Protein (# of mutated codons) |
% Coverage[a] |
% 1 NSM[b] |
% 0 NSM (wild‐type)[c] |
% >1 NSM (off‐target)[d] |
Ref. |
|---|---|---|---|---|---|---|
|
| ||||||
|
Nicking mutagenesis (NM) |
|
100 |
36.4 |
52.7 |
10.8 |
[36] |
|
|
100 |
26.5 |
59.4 |
13.6 |
[36] | |
|
Anti‐influenza human antibody variable heavy gene UCA9 (99)[e] |
97.4 |
14.1 |
71.3 |
15.0 |
[36] | |
|
Phage ΦX174 F capsid protein (421) |
100 |
41.8 |
28.6 |
29.5 |
[50] | |
|
Phage ΦX174 G spike protein (172) |
99.9 |
49.3 |
29.3 |
21.4 |
[50] | |
|
PALS |
|
99.9 |
47 |
24 |
21 |
[37] |
|
Human p53 (393) |
93.4 |
33 |
30 |
35 |
[37] | |
|
| ||||||
|
| ||||||
|
Plasmid recombineering (PR)[f] |
|
99.8 |
28.8 |
60.7 |
10.5 |
[41] |
|
CREATE (genomic mutagenesis) |
|
100 |
56.8[g] |
22.4[g] |
n/a |
[42] |
|
|
100 |
95 |
5 |
n/a |
[42] | |
|
|
22.7 to 61.6 |
n/a |
n/a |
n/a |
[54] | |
n/a: not available; [a] Number of actually observed mutations per 100 designed mutations. Note: differences in sequencing depth used for quality control in the different studies influences the number of observed mutations, thus the apparent coverage. [b] Percent of mutants that carry exactly one desired non‐synonymous mutation (NSM). [c] Percent of mutants that do not carry any NSM, thus being wild‐type (non‐edited) variants. Note that CREATE is the only method that counter‐selects for wild‐type via CRISPR‐Cas9 mediated double‐strand breaks. [d] Percent of mutants that carry more than one NSM, e. g. off‐target mutations. [e] The average of two reported independent runs of NM is given (Table 1 in Ref. [36]). [f] Here, hand‐mixed pools of column‐synthesized oligonucleotides were used. It was shown later that PR also works with array synthesized oligo pools. [g] Calculated based on 80 % of clones being edited (based on colorimetric screen) and 71 % of those 80 % being correctly edited (based on sequencing, 0.8×0.71); and 20 % of clones being wild‐type (based on colorimetric screen) plus 3 % of the 80 % phenotypically edited clones (0.8×0.03) being still unedited at the programmed locus. [h] CREATE was developed and applied for multiplexed pathway mutagenesis. The percent coverage refers to the observed coverage range of five test loci that were analyzed in‐depth (Table 1 in Ref. [54]).
Performance overview of double‐ and multi‐site saturation mutagenesis methods (e. g. for targeted protein engineering).
|
Method |
Library size |
% Coverage |
% 1 NSM |
% 0 NSM (wild type) |
% 2 NSM |
% >2 NSM |
Ref. |
|---|---|---|---|---|---|---|---|
|
Double‐site mutagenesis | |||||||
|
Nicking mutagenesis (NM) |
n/a |
79.2 |
n/a |
60.0 |
n/a |
n/a |
[36] |
|
Plasmid recombineering |
5940 |
98.0 |
26.3 |
32.6 |
24.5 |
16.4 |
[41] |
|
| |||||||
|
Multi‐site mutagenesis | |||||||
|
Optimized multi‐site nicking mutagenesis (NM)[a] |
16 384 |
99.9 |
n/a |
2.58 |
n/a |
n/a |
[49] |
|
32 768 |
99.4 |
n/a |
0.25 |
n/a |
n/a |
[49] | |
n/a: not available; [a] The average of the two reported independent runs of optimized multi‐site NM is given (Table 1 in Ref. [49]).
Performance overview of methods for gene library assembly (e. g. for testing various protein designs).
|
Method |
Gene size (bp) |
Pool size[a] |
% coverage[b] |
% accurate assemblies[c] |
Ref. |
|---|---|---|---|---|---|
|
MPA |
192–252 |
131 to 250 |
72.7 to 96.4 |
11.8 to 31.3 |
[38] |
|
192–252 |
1212 |
84.2 |
11.8 to 31.3 |
[38] | |
|
192–252 |
2271 |
70.6 |
11.8 to 31.3 |
[38] | |
|
DropSynth 2.0 |
675 |
384 |
92.0 |
23.5 |
[43] |
|
675 |
1536 |
80.0 |
22.6 to 27.6 |
[43] |
[a] Number of independent assemblies performed in one pool. [b] Number of assemblies (per 100 designed assemblies) that show at least one error‐free molecule. [c] Number of error‐free assembled molecules per 100 assembled molecules.