| Literature DB >> 29718463 |
Adam J Bogdanove1, Andrew Bohm2, Jeffrey C Miller3, Richard D Morgan4, Barry L Stoddard5.
Abstract
Protein engineering is used to generate novel protein folds and assemblages, to impart new properties and functions onto existing proteins, and to enhance our understanding of principles that govern protein structure. While such approaches can be employed to reprogram protein-protein interactions, modifying protein-DNA interactions is more difficult. This may be related to the structural features of protein-DNA interfaces, which display more charged groups, directional hydrogen bonds, ordered solvent molecules and counterions than comparable protein interfaces. Nevertheless, progress has been made in the redesign of protein-DNA specificity, much of it driven by the development of engineered enzymes for genome modification. Here, we summarize the creation of novel DNA specificities for zinc finger proteins, meganucleases, TAL effectors, recombinases and restriction endonucleases. The ease of re-engineering each system is related both to the modularity of the protein and the extent to which the proteins have evolved to be capable of readily modifying their recognition specificities in response to natural selection. The development of engineered DNA binding proteins that display an ideal combination of activity, specificity, deliverability, and outcomes is not a fully solved problem, however each of the current platforms offers unique advantages, offset by behaviors and properties requiring further study and development.Entities:
Mesh:
Substances:
Year: 2018 PMID: 29718463 PMCID: PMC6007267 DOI: 10.1093/nar/gky289
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Summary of many of the significant attempts to engineer the DNA recognition properties of the protein systems discussed in this review. While not intended to be entirely comprehensive and complete, this table is intended to assist reviewers in following the details of the main text
| Platform and year(s) | Targets and development | Engineering approach | References |
|---|---|---|---|
|
| |||
| 1992–1993 | Novel DNA triplets | Structure-based modeling | ( |
| 1994–1995 | Novel DNA triplets | Phage Display | ( |
| 1999 | Novel sites with GNN triplets | ( | |
| 2000 | Novel 9 basepair targets | Bacterial two-hybrid selections | ( |
| 2001 | Novel triplets with ANN and CNN triplets | Phage Display | ( |
| 2001 | Novel 9 basepair targets | Phage Display; hybrid 3 finger library panning | ( |
| 2001 | Novel 12 to 18 basepair targets | Assembly of two-finger ZFP subunits | ( |
| 2002 - 2003 | Drosophia yellow gene target | Modular assembly | ( |
| 2005 | Human IL2Rg gene target | Zinc finger selections and assembly | ( |
| 2008 | Novel 9 basepair targets | Bacterial two-hybrid selections | ( |
| 2011 | Novel 9 basepair targets | Informatics-driven, Context-dependent ZFP assembly | ( |
|
| |||
| 2002 | Single basepair target variants (I-CreI) | Bacterial gene elimination assay/screen | ( |
| 2002 | Activity-based selection (I-SceI) | Bacterial gene elimination assay/screen | ( |
| 2002 | Hybrid nuclease generation (I-CreI/I-DmoI –> H-DreI | Structure-based computational redesign | ( |
| 2003 | Single basepair target variants (PI-SceI) | Bacterial two-hybrid selections | ( |
| 2006 | Single basepair target variant (I-MsoI | Structure-based computational redesign | ( |
| 2006 | Multiple base pair target variants (I-CreI) | Bacterial ene elmination assay/screen | ( |
| 2006 | Multiple basepair target variants (I-CreI) | Eukaryotic gene recombination assay/screen | ( |
| 2009 | Individual and multiple base pair target variants (I-AniI) | Structure-based computational redesign and bacterial selections | ( |
| 2009–2010 | Monomerization of homodimeric meganuclease (I-CreI) | Structure-based modeling and activity-based selections | ( |
| 2009 | Activity-based selections (I-AniI) | Yeast surface display | ( |
| 2010 | Maize liguless gene target (I-CreI) | Structure-based modeling and activity-based selections | ( |
| 2010 | Multiple basepair target variants (I-MsoI) | Structure-based computational redesign | ( |
| 2014 | Human Brutons Tyrosine Kinase (Btk) get target (I-AniI) | Structure-based computational redesign and Yeast Surface Display | ( |
| 2007–2013 | Various eukaryotic gene targets (I-CreI) | Structure-based modeling; bacterial selections; eukaryotic selections | ( |
| 2014–2015 | Human TCRa and CCR5 gene targets (I-OnuI) | Yeast surface display meganuclease selections and MegaTAL | ( |
| 2014–2015 | Human CFTR gene target (I-OnuI) | In vitro compartmentalization | ( |
| 2017 | Various eukaryotic gene targets (I-OnuI) | Yeast surface display and bacterial selections | ( |
|
| |||
| 2009 | TAL effector code determination and first designer TALs | Tandem repeat assembly | ( |
| 2010–2011 | TAL Nuclease Creation and Initial refinement | Tandem repeat assembly and FokI fusion | ( |
| 2012 | Improved design using additional RVDs; G-specific RVDs | Tandem repeat assembly using new specificity determinants and data | ( |
| 2012–2013 | Increasing mismatch tolerance of C-terminal repeats | Tand repeat assembly and DNA substrate sequence variation | ( |
| 2013–2014 | Altered specificity at base 0 | Modification of cryptic repeat sequences; RVD at position1, and context | ( |
| 2014 | Aberrant repeats that allow frameshift binding | Incorporation of natural repeat variants with small insertions or deletions | ( |
| 2014–2015 | Expanded repertoire of RVDs for fine-tuned targeting | Characterization of specificities and affinities of 400 RVDs | ( |
| 2016–2017 | Modularion of TAL effector binding strength and TALEN efficiency | Varying the backbone (non-RVD) sequecnes of the repeats | ( |
| 2017 | Optimized length for maximum specificity | Varying the number of repeats | ( |
|
| |||
| 1999 | Circumvent need for accessory factors (Tn3 resolvase) | Error prone PCR, galK-based colored colony selection | ( |
| 2009 | Increased efficiency/selectivity (PhiC31 Integrase) | Error prone PCR, lacZ selection & GFP expression | ( |
| 1988 | Enhanced and altered activity (gin) | Chemical mutagenesis and bacterial selection | ( |
| 2000 | Circumvent need for accessory factors (lambda integrase) | GFP based fluorescence | ( |
| 2015 | Targeting CCR5 and AAVS1 safe harbor locus (Bin and Tn21 recombinases) | Site specific sequence randomization and error prone PCR, antibotic selection | ( |
| 2003 | Altered loxP sequence (Cre) | site specific sequence randomization, GFP expression and FACS | ( |
| 2001–2011 | Altered loxP sequence,HIV LTR sequences (Cre) | Substrate linked protein evolution | ( |
| 2013 | HIV LTR (improved activity) (Cre) | Molecular modeling and dynamics | ( |
| 2017 | HIV LTR (improved activity) (Cre) | Observations based on the crystal structure | ( |
| 2008 | Mutants that promote heterotetramers (Cre) | Structure-based selection of interfactial residues to be randomized | ( |
| 2015 | Mutants that promote heterotetramers (Cre) | Protein design via molecular modeling | ( |
| 2013 | Weakened protein–protein interactions to enhance specificity (Cre) | Random mutagenesis and bacterial selection | ( |
| 1988 | Enhanced activity (Flp) | Substrate linked protein evolution | ( |
| 2004 | Mutants that promote heterotetramers (Flp) | Error prone PCR, blue/white selection | ( |
| 2003–2006 | Altered FRT sequence, interleukin 10 target (Flp) | Error prone PCR and randomization of specific sites, LacZ and RFP reporters | ( |
| 2016 | Enhanced activity (R and TD recombinases) | Sequence truncation, random mutagenesis | ( |
| 1995 | Relaxed specificity (lambda integrase) | Analysis of chimeric integrases | ( |
| 2015 | Human genome target (lambda integrase) | Beta-lactamase inhibitor based screen | ( |
| 2001 | Human chromosome 8 target (PhiC31) | Blue/white selection | ( |
| 2003–2011 | Hybrid reslovase/ZFN targets (serine recombinases) | Truncated resolvases with zinc finger fusion (varied linkers) | ( |
| 2011–2014 | Mutants to promote heterodimers (resolvases) | Rational design and directed evolution | ( |
| 2011–2014 | Altered specificity of catalytic domains (serine recombinases) | Random mutagenesis of selected residues and directed evolution | ( |
|
| |||
| 1987–1999 | 1st Attempts to alter specificity (EcoRI, EcoRV, BamHI) | Structure-based modeling | ( |
| 2002–2006 | Additional attempts to alter specificity (BstYI, NotI) | Directed evolution and selection | ( |
| 2003 | Alteration of specificity of bifunctional RM enzyme (Eco57I) | Directed evolution and selection for altered methylation specificity | ( |
| 2009 | Alteration of specificity of type IIG enzyme (MmeI) | Informatics covariation analysis and structure-based modeling | ( |
Figure 1.Structure and mode of action of zinc fingers and zinc finger nucleases. Panel A:Structure of the Zif268-DNA complex showing the three zinc fingers of Zif268 bound in the major groove of the DNA. Fingers are spaced at 3-bp intervals. The DNA is grey; the zinc ions are dark teal spheres. The structure and primary DNA contacting residues of zinc finger #2 (ZF2) are indicated to the right. A sequence alignment of the 3 fingers of Zif268 is shown below. The zinc binding Cys2-His2 motif is indicated with blue bold font; the canonical DNA-contacting residues are indicated by arrows. Panel B: Modular assembly of a three-finger protein from individual fingers. To generate a zinc finger protein (ZFP) with specificity for the sequence GGGGGTGAC, three fingers are identified that each bind a component triplet. These fingers are then linked. Panel C:Sketch of a pair of zinc finger nuclease (ZFN) subunits bound to two halves of a DNA target. Each ZFN contains the cleavage domain of FokI linked to an array of three to six zinc fingers (four are shown here) that have been designed to specifically recognize sequences (blue and red boxes) that flank the cleavage site. A small number of bases separate the ZFN targets. The FokI nuclease domains transiently dimerize across those central bases and cleave each DNA strand to generate a double strand break with 5′ overhangs averaging 4 bases in length.
Figure 2.Structure and reprogramming of a meganuclease. Panel A:Structure and original target site of the I-OnuI meganuclease. The protein is comprised of a single protein chain of 290 residues and is bound to a 22 base pair DNA target site. The N- and C-terminal domains of the endonuclease, which possess the same overall protein fold related by a pseudo two-fold symmetry axis, recognize and interact with the 5′ and 3′ half-sites of the DNA target site, respectively. The interface between the target 5′ half-site and the protein N-terminal domain is indicated by the oval. Panel B:Schematic of immediate contacts between the DNA 5′ half-site and the meganuclease N-terminal domain (corresponding to oval in panel a above). Bases and protein residues in blue boxes correspond to elements shown in panel C. Panel C:Region corresponding to the contacts between two consecutive base pairs in DNA target site (indicated by the blue box in panel B) and the six most near-neighboring protein side chains (also indicated in panel B with blue boxes). In a typical selection experiment, a cluster of at least six such residues are simultaneously randomized and incorporated into a combinatorial protein library for subsequent screening against a DNA substrate containing the desired base pairs at the corresponding nucleotide positions. Panel D:DNA-bound structure of a fully reprogrammed variant of the I-OnuI enzyme, harboring selected point mutations at 50 residues in the protein–DNA interface (corresponding to ∼17% of the total protein sequence; indicated with red spheres spanning the side chains of each altered residue). The engineered protein, which recognizes a DNA sequence that differs from the original target at over half of its base pair positions (12 out of 22; the altered basepairs are indicated by lower case letters) displays an rmsd across all backbone atoms of only 0.6 Å. A structural superposition of the wild-type enzyme and its fully redesigned variant is shown to the right (engineered enzyme is colored blue).
Figure 3.Structure of the TAL effector–DNA association and the basis of specificity. Panels A and B:The structure of PthXo1 binding region (comprised of 22 TAL effector repeats distributed along a single monomeric protein chain) bound to its DNA target site is shown from the side of the DNA duplex and looking down the axis of the DNA. The effector contains 22.5 repeat modules, each colored separately. In the side view, the N-terminal end of the protein is leftmost. The structure also contains two cryptic N-terminal repeats that engage the DNA backbone via a series of basic residues, and that contact a strongly conserved thymine at the 5′ position of the binding site. Panel C illustrates the contacts made by the HD RVD (residues 12 and 13) in repeat number 14. The histidine at position 12 in the repeat forms a hydrogen bond to the backbone carbonyl oxygen of residue 8 in the first a-helix, while the aspartate at position 13 forms a hydrogen bond to the extracyclic amino nitrogen of the cytosine base. Panel D shows repeats 14, 15 and 16 interacting with the DNA, illustrating that consecutive RVDs (HD, NG and NN, respectively in these repeats) contact consecutive bases (in this case cytosine, thymine, and guanine) on the same DNA strand. Figure adapted with permission from Figure 1 in Doyle et al. (2013) Trends in Cell Biology23 (8):390–398.
Figure 4.Site specific recombinase modes of action. Panel A: SSRs are capable of catalyzing excision, insertion, or inversion reactions. Panel B: Tyrosine recombinases such as Cre, Flp and λ integrase proceed through a Holliday junction intermediate. The catalytic domains have pseudo four-fold symmetry and engage palindromic sequences (arrows). Panel C: The topology of the λ integrase catalytic domains is similar to that of simple tyrosine recombinases like Cre. However, λ integrase also contains N-terminal DNA-binding domains (top set with arrows) that are critical for site-specific DNA recognition. Panel D:The catalytic domains (center) of serine integrases break both DNA strands before rotating relative to one another and religation of the DNA. DNA binding domains (top and bottom) of the wild-type enzymes can be replaced with zinc fingers to customize specificity.
Figure 5.Regions of Cre recombinase particularly important for recognition. The DNA is shown in grey, and key regions of the protein involved in DNA recognition as described in the main text are highlighted with labels and side chain atom spheres.
Figure 6.Engineering altered DNA specificity of the MmeI restriction endonuclease via a bioinformatics-driven approach. Panel A:Sequence alignment of target sites and key specificity determining region of C-terminal domain of MmeI and 19 homologues that have known specificities. The positions of base pair 6 in each enzyme's target site, and the residues in the enzyme that display significant covariation against that target position, are highlighted. Panel B:Ribbon diagram of the crystal structure of Mme bound to its DNA target, demonstrating the distribution and separation of the enzyme's endonuclease catalytic site (‘REase’), methyltransferase active site (‘MTase’) and target recognition domain (‘TRD’). Panel C:Close ups illustrating (left) the experimentally observed positions of residues E806 and R808 that were found to display direct contacts to base pair 6 in the wild-type DNA-bound crystal structure of MmeI, and (right) a corresponding model of the same two residues, after introduction of mutations (to K and D, respectively) that were predicted and later found to alter specificity from a G:C to a C:G base pair. The ability to systematically alter the specificity of this enzyme is facilitated by the availability of a large number of sequenced enzyme homologues with corresponding known target sites and by a protein architecture in which endonuclease catalytic activity is less intimately coupled to target recognition and binding.