Andrew Currin1,2, Jane Kwok2, Joanna C Sadler2, Elizabeth L Bell3, Neil Swainston1,2, Maria Ababi3,4, Philip Day3, Nicholas J Turner1,2, Douglas B Kell1,2. 1. Manchester Centre for Synthetic Biology of Fine and Speciality Chemicals (SYNBIOCHEM), Manchester Institute of Biotechnology , The University of Manchester , Manchester M1 7DN , United Kingdom. 2. School of Chemistry , The University of Manchester , Manchester M13 9PL , United Kingdom. 3. Faculty of Biology, Medicine and Health , The University of Manchester , Manchester M13 9PL , United Kingdom. 4. School of Computer Science , The University of Manchester , Manchester M13 9PL , United Kingdom.
Abstract
Directed evolution requires the creation of genetic diversity and subsequent screening or selection for improved variants. For DNA mutagenesis, conventional site-directed methods implicitly utilize the Boolean AND operator (creating all mutations simultaneously), producing a combinatorial explosion in the number of genetic variants as the number of mutations increases. We introduce GeneORator, a novel strategy for creating DNA libraries based on the Boolean logical OR operator. Here, a single library is divided into many subsets, each containing different combinations of the desired mutations. Consequently, the effect of adding more mutations on the number of genetic combinations is additive (Boolean OR logic) and not exponential (AND logic). We demonstrate this strategy with large-scale mutagenesis studies, using monoamine oxidase-N ( Aspergillus niger) as the exemplar target. First, we mutated every residue in the secondary structure-containing regions (276 out of a total 495 amino acids) to screen for improvements in kcat. Second, combinatorial OR-type libraries permitted screening of diverse mutation combinations in the enzyme active site to detect activity toward novel substrates. In both examples, OR-type libraries effectively reduced the number of variants searched up to 1010-fold, dramatically reducing the screening effort required to discover variants with improved and/or novel activity. Importantly, this approach enables the screening of a greater diversity of mutation combinations, accessing a larger area of a protein's sequence space. OR-type libraries can be applied to any biological engineering objective requiring DNA mutagenesis, and the approach has wide ranging applications in, for example, enzyme engineering, antibody engineering, and synthetic biology.
Directed evolution requires the creation of genetic diversity and subsequent screening or selection for improved variants. For DNA mutagenesis, conventional site-directed methods implicitly utilize the Boolean AND operator (creating all mutations simultaneously), producing a combinatorial explosion in the number of genetic variants as the number of mutations increases. We introduce GeneORator, a novel strategy for creating DNA libraries based on the Boolean logical OR operator. Here, a single library is divided into many subsets, each containing different combinations of the desired mutations. Consequently, the effect of adding more mutations on the number of genetic combinations is additive (Boolean OR logic) and not exponential (AND logic). We demonstrate this strategy with large-scale mutagenesis studies, using monoamine oxidase-N ( Aspergillus niger) as the exemplar target. First, we mutated every residue in the secondary structure-containing regions (276 out of a total 495 amino acids) to screen for improvements in kcat. Second, combinatorial OR-type libraries permitted screening of diverse mutation combinations in the enzyme active site to detect activity toward novel substrates. In both examples, OR-type libraries effectively reduced the number of variants searched up to 1010-fold, dramatically reducing the screening effort required to discover variants with improved and/or novel activity. Importantly, this approach enables the screening of a greater diversity of mutation combinations, accessing a larger area of a protein's sequence space. OR-type libraries can be applied to any biological engineering objective requiring DNA mutagenesis, and the approach has wide ranging applications in, for example, enzyme engineering, antibody engineering, and synthetic biology.
Entities:
Keywords:
biocatalysis; directed evolution; mutagenesis; protein engineering; synthetic biology
Natural evolution is based on
the random creation of genetic diversity and subsequent selection
of a desired fitness.[1] Directed evolution
attempts to improve on and speed up this process in the laboratory.
Genetic diversity is generated for a target gene, enabling the discovery,
selection and isolation of variants encoding an improvement in the
desired fitness (e.g., increased activity). This process can then
be repeated iteratively to improve the properties of the target molecule
until an adequate fitness is achieved.[2−11] This general process is widely used for the engineering of biocatalysts,
enabling the development of enzymes for applications in industrial
biotechnology.[10,12−15]Since Sewall Wright’s
original conception,[16] the relationship
between a protein’s sequence and
its function(s) is often referred to as a “fitness landscape”.[17−21] It is conventionally visualized with the position in the sequence
space represented via Cartesian x- and y-coordinates and with fitness as a “height”. Proteins
are known to exhibit rugged landscapes, where a variety of constraints[22] means that sequences of high fitness are surrounded
by areas of lower or even negligible activity or fitness.[23−27] The objective of directed evolution is therefore to navigate this
landscape to discover the best fitnesses possible.[10,28] Unfortunately, the size of sequence space (the total number of possible
sequences) is vast and impossible to test exhaustively, even for short
peptides (full randomization of just 50 amino acids would produce
a library of 2050, ∼1 × 1065). As
with any combinatorial search problem,[29−32] the experimenter therefore needs
to devise a strategy that can search efficiently for improved variants
while at the same time making libraries of a size that can be screened
or selected for in the laboratory.Some approaches have utilized
reduced[33] or “smart” library
strategies,[34−38] which decrease the redundancy of the mixed-base codons
used and hence the level of diversity at individual residues. However,
even with these methods, the library size quickly becomes too large
to test experimentally when looking to mutate multiple amino acids
(typically four or more). This problem arises because simultaneous
mutation of multiple amino acids leads to a combinatorial explosion,
as the size of the search space increases exponentially as the number
of residues increases.[10,32,39] This is equivalent to the AND operator of Boolean logic (e.g., mutate
{residue 1} AND {residue 2} AND {residue 3}, etc.).Here, we
provide an experimental implementation of the Boolean
OR function for library generation for directed evolution (e.g., mutate
{residue 1} OR {residue 2} OR {residue 3}, etc.). The effect on library
size of mutating multiple residues in this way is therefore additive
and not multiplicative/exponential. We demonstrate that this strategy
can be employed to reduce the library size significantly (often by
many orders of magnitude), as well as decreasing its complexity, enabling
the mutation of a larger number of regions in the same library. This
has the highly desirable effect of significantly reducing the overall
size of the library, while still testing all the desired codons and
mutations.We demonstrate the benefit of OR-type libraries through
two approaches
using monoamine oxidase-N (MAO-N) as the exemplar enzyme target. First,
a large-scale mutagenesis approach was adopted, mutating 276 amino
acids of MAO-N (of a total of 495 amino acids); these account for
every residue known to exhibit a secondary structure. Our approach
permitted several (typically up to 12) amino acids to be mutated in
a single library without the combinatorial explosion that would occur
when using AND-type libraries. We identified multiple variants with
increased kcat toward both native and
non-native amine substrates, including novel activity for new substrates.
Second, we created combinatorial OR-type libraries for a Combinatorial
Active-Site Saturation Test (CASTing[40,41]). Using this
approach, 10 active-site residues were mutated simultaneously, such
that many different combinations of two-residue mutations were tested
in one library. These combinatorial mutations reduced the library
size ∼4.4 × 1010-fold compared to simultaneous
randomization of all residues (AND mutations). This enabled the screening
of a library with more diverse mutations compared to conventional
methods, and the rapid discovery of a new variant exhibiting activity
toward two novel substrates.
Results and Discussion
Asymmetric
PCR for the Generation of OR-Type
Libraries
Numerous studies have utilized asymmetric PCR for
the purposes of site-directed mutagenesis.[39−43] In this two-step process, the first step
consists of an asymmetric PCR that generates a single-stranded DNA
(ssDNA) product, created by using an unequal concentration of DNA
oligonucleotide primers. The lower concentration (limiting) primer
encoding the mutations (termed “mutagenic primer”, MP)
becomes depleted during the early cycles of the PCR, after which the
corresponding high concentration (excess) primer continues to amplify
the amplicon. This generates a ssDNA product encoding all the mutations
encoded by the mutagenic primers. Following purification, this product
is then used as a “megaprimer” in a second PCR to amplify
the full-length gene encoding the library (Supporting Information S1).An important advantage of using asymmetric
PCR is that, given that the mutagenic primers are depleted, it ensures
that all mutations encoded by the primers are present in the final
library. We exploit this by using multiple different mutagenic primers
in a single reaction to create mutations at different positions in
the DNA sequence. If these primers anneal to the same position in
the DNA, the final library will conform to Boolean OR logic (i.e.,
each DNA strand encoding a mutation from {MP1} OR {MP2}). All such
primers binding the same position on the DNA template are herein referred
to as a “set”. For example, for a simple “set”
containing three mutagenic primers, the library is therefore composed
of DNA strands with mutations from either MP1 OR
MP2 OR MP3 (Supporting Information S1).
Consequently, upon transformation into cells, each clone from this
library will encode a protein variant with these OR-type mutations
which can then be screened (Figure ). Another benefit is that OR-type libraries simplify
the generation of multiple variant libraries by condensing the number
of samples required for synthesis, for example, combining three different
libraries into one tube (following the example in Figure ). Extending this approach
to include multiple sets creates more complex combinatorial OR-type
mutations, discussed further below (section ).
Figure 1
An example of OR-type mutations. When randomizing
multiple amino
acids (here residues “SIK”, 20 possible amino acids
for each position), conventional approaches mutated each residue simultaneously.
This “AND-type” mutagenesis approach creates large numbers
of variants, as the impact of each position is multiplicative (20
× 20 × 20 = 8000). In contrast, OR-type libraries can randomize
any one of these same amino acids, but not all together. In this example
the impact of each position is additive (20 + 20 + 20 = 60), thus
significantly reducing the size and complexity of the variant library.
From another perspective, this OR-type approach is simplifying the
generation of multiple libraries by synthesizing three different randomized
libraries in one tube.
An example of OR-type mutations. When randomizing
multiple amino
acids (here residues “SIK”, 20 possible amino acids
for each position), conventional approaches mutated each residue simultaneously.
This “AND-type” mutagenesis approach creates large numbers
of variants, as the impact of each position is multiplicative (20
× 20 × 20 = 8000). In contrast, OR-type libraries can randomize
any one of these same amino acids, but not all together. In this example
the impact of each position is additive (20 + 20 + 20 = 60), thus
significantly reducing the size and complexity of the variant library.
From another perspective, this OR-type approach is simplifying the
generation of multiple libraries by synthesizing three different randomized
libraries in one tube.
OR-Type Libraries for Large-Scale Mutagenesis
of Monoamine Oxidase-N
MAO-N is an important industrial biocatalyst
that oxidizes a variety of primary, secondary, and tertiary amines.[42−49] Wild type MAO-N (uniprot: P46882) exhibits strong activity toward
primary amine substrates (see section ) that are believed to be similar to the
native substrates (rates referred to as “wild type speed”).
In contrast, the wild type enzyme exhibits very low activity (kcat = 0.17 min–1) toward the
primary amine α-methylbenzylamine (α-MBA, chebi:CHEBI:670);
however, previous directed evolution studies have generated a variant
(I246M/N336S/M348 K/T384N/D385S termed D5[42,44,46]) with a kcat of 154 min–1 for α-MBA. Hence we devised
a strategy to seek variants with a “wild type speed” kcat toward the non-native substrate α-MBA.Our large-scale mutagenesis strategy is guided by the understanding
that amino acids throughout the protein structure, often distal to
the active site, have a significant effect on the efficiency of catalysis
(kcat/Km or kcat).[10] Hence, creating
mutations throughout the protein structure will enable us to detect
those variants with significantly increased kcat for a panel of native and non-native amine substrates.Given that protein secondary structure often follows a regular
binary pattern of polar (P) and nonpolar (NP) residues (e.g., amphiphilic
helices can follow a P–NP–P–P–NP–NP–P
pattern[50,51]) one strategy to ensure that the majority
of the searched sequence space encodes proteins with similar secondary
structure is to follow this semiconservative binary pattern,[52] such that the tertiary structure is more-or-less
conserved in order to maximize the likelihood of preserving function.
Hence, we devised a novel codon mutagenesis approach to increase the
proportion of functional protein variants by binary patterning (Supporting Information S2). For example, when
Leu is the starting amino acid, we mutated it using the NTN codon
(N = A, T, G, or C) to encode Phe, Leu, Ile, Met, or Val. Similarly,
small side-chain amino acids were mutated to others with small side-chains
(Ala, Gly, Val) and polar residues with H-bonding potential were mutated
to other similar residues (Ser, Tyr, Cys, Thr). Secondary structure
predictions support the hypothesis that our variants maintain the
α-helical and β-sheet content of the native protein, significantly
more when compared to full amino acid randomization (Figure ). Consequently, our strategy
is calculated to search a more “functional sequence space”.
Figure 2
Secondary
structure predictions of a binary pattern library. Using
α-helix [188]–[198] as an example, every variant encoded
by our mutagenesis strategy was calculated for its probability of
forming an α-helix (using NetSurfP). For each amino acid the
calculated probability is shown (mean and standard deviation), comparing
our approach to full randomization using the NNK codon. Our variants
are predicted to exhibit similar levels of secondary structure compared
to the parent sequence, significantly higher than using full randomization,
thus supporting our strategy for maintaining secondary structure in
our variants.
Secondary
structure predictions of a binary pattern library. Using
α-helix [188]–[198] as an example, every variant encoded
by our mutagenesis strategy was calculated for its probability of
forming an α-helix (using NetSurfP). For each amino acid the
calculated probability is shown (mean and standard deviation), comparing
our approach to full randomization using the NNK codon. Our variants
are predicted to exhibit similar levels of secondary structure compared
to the parent sequence, significantly higher than using full randomization,
thus supporting our strategy for maintaining secondary structure in
our variants.In this study, every
amino acid in MAO-N D5 exhibiting secondary
structure was mutated according to our mutagenesis design (see below),
totalling 276 amino acids. Mutagenic primers were limited to mutating
three amino acids or less, using our design of ambiguous codons (Supporting Information S2). These primers were
used as part of a “set” to mutate single strand α-helices
or β-sheets in one library. In one example (Figure , Supporting Information S3), four mutagenic primers were created in one
set to mutate 11 consecutive codons. Simultaneous mutation of all
11 codons together (the same mutations but using AND-type mutations)
would create 5.9 × 1011 genetic combinations, whereas
a corresponding OR-type library encodes 5136 combinations, a 1.1 ×
108-fold reduction.
Figure 3
An example of how OR-type libraries were
used to mutate the secondary
structure of MAO-N in this study. (A) The selected α-helix (residues
[188]–[198], 11 amino acids) was divided into four, each mutating
2 or 3 amino acids with a mutagenic primer (MP). The number of variants
per residue is highlighted, and these follow our mutagenesis strategy.
OR-type libraries produced by this approach encoded 5136 genetic combinations,
a 1.1 × 108-fold reduction compared to simultaneous
AND-type mutations. (B) The design of DNA oligonucleotides for mutagenesis
is shown for this α-helix, aligned to the target amino acid
sequence and corresponding DNA sequence.
An example of how OR-type libraries were
used to mutate the secondary
structure of MAO-N in this study. (A) The selected α-helix (residues
[188]–[198], 11 amino acids) was divided into four, each mutating
2 or 3 amino acids with a mutagenic primer (MP). The number of variants
per residue is highlighted, and these follow our mutagenesis strategy.
OR-type libraries produced by this approach encoded 5136 genetic combinations,
a 1.1 × 108-fold reduction compared to simultaneous
AND-type mutations. (B) The design of DNA oligonucleotides for mutagenesis
is shown for this α-helix, aligned to the target amino acid
sequence and corresponding DNA sequence.
MAO-N Improved Variants to Non-native Amine
Substrates
Using the previously described colony-based screening
method to analyze oxidase activity by detection of hydrogen peroxide,[42,53] we screened every OR-type library using α-MBA, attempting
to improve the kcat toward this non-native
substrate. For each library, the top (fastest) colonies were selected
and the DNA sequenced. Sequences that showed a clear selection for
a new variant (e.g., a mutation selected multiple times) were characterized.We identified four variants with an elevated kcat compared to that of the D5 variant (Supporting Information S4). One variant, A289V (kcat = 242 min–1), exhibited a 1.6-fold
increase to that of its parent D5. We compiled all the screening data
to understand the mutability of each mutated amino acid, providing
an insight into the in vitro selection of every amino
acid mutated in the study (Figure ). We discovered strong selection for 120 residues,
where the amino acid encoded in the parent sequence was invariant.
Conversely, many amino acids were tolerant of several different mutations
while still maintaining good catalytic activity. In total, of those
assessed, 53 residues could encode one other residue, 44 could accommodate
two mutations, and 50 could accommodate three or more mutations. High-frequency
selection for a new mutation was discovered for nine amino acids and
each of these mutations was characterized (above). We also found that
a strong selection for native (parent D5) residues was more frequently
observed for amino acids closer to the protein core and to the FAD
cofactor.
Figure 4
Enzymatic improvements for selected MAO-N variants. (A) The most
significant improved activity to the primary target non-native substrate
α-methylbenzylamine was demonstrated by the D5 variant A289V,
exhibiting a 1.6-fold increase to that of D5. (B) Every amino acid
mutated in this study is shown, with its color denoting whether it
(i) showed strong selection for the wild-type amino acid (red); (ii)
exhibited robustness, where at least one alternative mutation could
be accommodated while still maintaining activity (green); and (iii)
exhibited strong selection for a new mutation that increased kcat (blue). (C) Amino acid selection (as in
panel B) showing the secondary structure elements. Images generated
using PyMol using MAO-N D5 structure (2vvm). (D) Improved activity
to three native amine substrates was shown by the D5 variant F128L,
with a kcat between 1.6 to 2.25-fold higher
than the WT, and 2.2 and 3-fold higher than the parent D5 variant.
Enzymatic improvements for selected MAO-N variants. (A) The most
significant improved activity to the primary target non-native substrate
α-methylbenzylamine was demonstrated by the D5 variant A289V,
exhibiting a 1.6-fold increase to that of D5. (B) Every amino acid
mutated in this study is shown, with its color denoting whether it
(i) showed strong selection for the wild-type amino acid (red); (ii)
exhibited robustness, where at least one alternative mutation could
be accommodated while still maintaining activity (green); and (iii)
exhibited strong selection for a new mutation that increased kcat (blue). (C) Amino acid selection (as in
panel B) showing the secondary structure elements. Images generated
using PyMol using MAO-N D5 structure (2vvm). (D) Improved activity
to three native amine substrates was shown by the D5 variant F128L,
with a kcat between 1.6 to 2.25-fold higher
than the WT, and 2.2 and 3-fold higher than the parent D5 variant.
MAO-N
Activity to Native Primary Amine Substrates
In addition to
characterizing MAO-N variants toward α-MBA,
we also tested our variants against the native WT substrates, where
several variants also exhibited increased activity (Figure , Supporting Information S5). Interestingly, the best α-MBA variant
(A289V) was not the fastest toward these substrates, but F128L was
faster for all three “native” substrates. F128L activity
to N-amylamine (AA, chebi:CHEBI:74848, 655 min–1) is the highest kcat published
for MAO-N for any substrate to date, 1.7-fold higher than the WT and
3-fold faster than its parent D5 variant.
MAO-N
Activity to Novel Substrates
No published MAO-N variants
to date (including WT and D5) exhibit
detectable activity toward the primary amine cyclohexylamine (CHA,
chebi:CHEBI:15773). However, we detected activity (kcat = 17 min–1, Supporting Information S5) for one of our variants (A266V).
To improve this activity, we created double mutants combining A266V
with other mutations found in this study (both neutral and positive
for activity). Interestingly, combining A266V with other mutations
known to improve activity for other substrates (e.g., F128L) did not
improve activity for CHA (kcat = 15 min–1). However, combining A266V with C50T did improve
activity over 2-fold (kcat = 38 min–1). Interestingly, the C50T mutation alone does not
improve activity (kcat toward α-MBA,
AA, BTA, and BZA is not increased), thus demonstrating an unpredictable
epistatic interaction between A266V and C50T. Given that neither C50
nor A266 are positioned in the active site (29 and 16 Å from
the FAD amine where catalysis occurs, respectively, Supporting Information S6), such data show that residues distal
to the active site also contribute specificity for substrates, and
that mutagenesis of these residues can yield variants with activity
toward novel substrates.[54]
Active Site Mutagenesis Using Combinatorial
OR-Type Mutations
We envisaged that the benefit of OR-type
mutations becomes more significant when this method is applied to
screening multiple combinatorial mutations, given that its additive
nature prevents the combinatorial explosion of mutation combinations
associated with conventional AND-type libraries. It is worth noting
that combinatorial OR-type mutations can also be described as AND–OR
mutations. To demonstrate this, we created OR-type combinatorial mutations
for 10 amino acids in and around the active site of MAO-N for CASTing.
The residues were divided into two sets (each containing five amino
acids) and the megaprimers for each set were pooled together in the
second PCR step to create combinatorial OR-type mutations (Figure , Supporting Information S7). Each amino acid was mutated using
the NNK codon (32 possible combinations encoding all 20 amino acids).
Consequently, in this CASTing library every amino acid substitution
for all five amino acids in set 1 was mutated with every amino acid
substitution in set 2. Mutation of all 10 amino acids together (AND-type
library) would create ∼1 × 1015 codon combinations
(= 3210), whereas our library encodes 25600 combinations,
a 4.4 × 1010-fold reduction in DNA library size. Alternatively,
to recreate each of these mutation combinations without OR-type libraries
would require the synthesis of 25 separate libraries. This demonstrates
the benefit of combinatorial OR-type mutations for the screening of
many mutation combinations, significantly reducing the experimental
effort of creating all the different mutations separately. Effectively
this strategy permits the screening of a more diverse number of mutation
combinations quickly in the search for improved and novel enzyme function.
Figure 5
Combinatorial
OR-type libraries for CASTing. (A) OR-type mutations
at two separate positions in a sequence generates combinatorial OR-type
mutations, where all different combinations of mutations from set
1 and set 2 are created, such that optimal paired mutations can be
discovered. Simultaneous mutagenesis using the NNK codon using conventional
AND-type mutations produces over 1015 genetic combinations,
while the corresponding combinatorial OR-type library encoded 25600
combinations, a 4.4 × 1010-fold reduction in library
size. (B) Sets 1 (residues [209]–[213]) and 2 ([241]–[245])
were selected as they sit on either side of the MAO-N active site
channel. Both sets contained five mutagenic primers, each randomizing
one amino acid. The library created every mutation combination between
sets 1 and 2, that is, {[MP1] AND [MP1]} OR {[MP1] AND [MP2]}, etc.
(C) The “hit” combination, exhibiting novel activity
to non-native substrates, encoded mutations at the [1] (A209S) and
[5] (L245C) positions.
Combinatorial
OR-type libraries for CASTing. (A) OR-type mutations
at two separate positions in a sequence generates combinatorial OR-type
mutations, where all different combinations of mutations from set
1 and set 2 are created, such that optimal paired mutations can be
discovered. Simultaneous mutagenesis using the NNK codon using conventional
AND-type mutations produces over 1015 genetic combinations,
while the corresponding combinatorial OR-type library encoded 25600
combinations, a 4.4 × 1010-fold reduction in library
size. (B) Sets 1 (residues [209]–[213]) and 2 ([241]–[245])
were selected as they sit on either side of the MAO-N active site
channel. Both sets contained five mutagenic primers, each randomizing
one amino acid. The library created every mutation combination between
sets 1 and 2, that is, {[MP1] AND [MP1]} OR {[MP1] AND [MP2]}, etc.
(C) The “hit” combination, exhibiting novel activity
to non-native substrates, encoded mutations at the [1] (A209S) and
[5] (L245C) positions.Screening of the CASTing library identified a new variant
(D5 A209S/L245C)
with novel activity to two non-native substrates (1-(3-bromophenyl)ethan-1-amine
and 1-(3-methoxyphenyl)ethan-1-amine; Supporting Information S8).[55] These mutations
were encoded in positions [1] and [5] in sets 1 and 2 (respectively, Figure ), a combination
that could not realistically have been predicted by structural or
sequence analysis, thus demonstrating the benefit of our approach.
Conclusions
In this study we demonstrate a methodology to
create a novel type
of variant library, whereby multiple discrete DNA regions can be mutated
in an OR-type fashion. The result is that each region contributes
an additive effect to the total library size (Boolean OR logic), in
contrast to conventional site-directed mutagenesis methods (utilizing
AND logic) where multiple mutations create a combinatorial explosion.
Boolean logic rules have recently been exploited in different biological
applications, most notably in synthetic biology to provide control
over cellular systems and pathways. Siuti et al.[56] have implemented logic gate functions in E. coli using recombinases, while others utilize small molecules.[57,58] However, to our knowledge GeneORator is the first application of
Boolean OR logic for the construction of variant protein libraries
for directed evolution.Here, we exploit OR-type libraries to
implement a novel mutagenesis
scheme based on the binary patterning feature of protein secondary
structure. We devised an ambiguous codon design strategy and used
this to mutate every amino acid in the secondary structure-containing
regions of MAO-N. This design sought to conserve the pattern of polar
and nonpolar residues present in the MAO-N sequence, an approach predicted
to improve the proportion of variants with the secondary structure
required to create the tertiary fold required for catalysis. Taken
together, our mutagenesis methodology and library design enabled large-scale
mutagenesis studies to improve the search of “functional sequence
space”, in a way that is not economic (nor feasible) using
existing approaches. Regardless of the codon mutagenesis strategy,
we have demonstrated that our experimental approach was efficient
at generating the designed OR-type mutations for screening. A similar
strategy could be therefore be employed for different enzyme targets
for which multiple mutations are to be created and screened.In this study we discovered several residues distal to the active
site that conferred an increase in kcat in a manner that was not predictable from the knowledge of an amino
acid sequence, tertiary structure, or catalytic mechanism. Given that
these mutations are not predicted to alter the protein’s basic
secondary structure, it is expected that these mutations improve activity
through the alteration of protein dynamics during catalysis, rather
than via major ground-state structural changes (see also refs (59 and 60)). Recently, Curado-Carballada
et al.[61] described molecular dynamics simulations
of MAO-N wild type and D5 variant, describing the presence of previously
unknown conformations during catalysis. Interestingly, the F128L variant
identified in this study is located close to a β-hairpin loop,
predicted to be involved in the recognition of the different substrates.Given the knowledge of which variants had been screened in each
library we obtained sequence-activity data for every library that
was screened. Combining these data with that of the mutations that
increase kcat provides important information
on the selection pressure exerted on every residue in the secondary
structure during our screening. Interestingly, combining multiple
mutations known to improve activity together did not yield an additive
improvement; thus, no double mutants exhibited an increased kcat for α-MBA above the single mutants.
Accordingly, the highest activity variant for native substrates (F128L)
had a neutral effect on activity for CHA (variant D5 A266V/F128L)
while the neutral mutation C50T had an improved effect on CHA activity
(variant D5 A266V/C50T). These data serve to illustrate the highly
epistatic nature of this protein’s fitness landscape.There is widespread interest in exploiting in silico learning algorithms for biological applications. Machine learning
provides the opportunity to learn complex sequence-activity relationships
and to predict variants with improved fitness.[17] Principled search algorithms such as “protein sequence
activity relationships” (ProSAR) have been used to help engineer
enzymes by creating partial least-squares (PLS) regression models,
and recent updates may accommodate epistatic interactions between
two residues.[28,62,63] We envisage that improved technology in DNA library synthesis and
“deep mutational scanning”[64,65] will empower learning algorithms to predict proteins with improved
fitness for a variety of directed evolution applications. Given the
complexity of protein sequence–activity relationships, especially
the importance of epistasis, learning algorithms require the ability
to design specific yet complex DNA libraries for screening. GeneORator
is capable of creating these libraries in a way that does not suffer
from the combinatorial explosion associated with conventional libraries,
and is a powerful tool in the rapid discovery of new biocatalysts
with improved and novel activity.
Materials and Methods
Design
of Oligonucleotide Primers for OR-Type Libraries
The MAO-N
D5 gene (uniprot:P46882) was designed using GeneGenie[66] and synthesized using the SpeedyGenes gene synthesis
method, as previously described.[67,68] In the design
of OR-type libraries, first the number of target regions and the number
of codons to be mutated were identified (typically up to four target
regions, each containing up to three codon mutations). Flanking sequences
to these target regions were selected, such that the annealing temperature
(Tm) was predicted to be 60 °C at
both the 5′ and 3′ termini. The relevant ambiguous codons
were designed by CodonGenie[38] then inserted
into the oligonucleotide sequence, depending on the amino acids present
in the parent D5 sequence. One mutagenic primer was designed for each
target region, such that a set of primers encoded the same 5′
and 3′ flanking sequences but each different target region
mutations. Corresponding end PCR (nonmutagenic) primers were also
designed with a predicted annealing temperature (Tm) of 60 °C for the 5′ and 3′ termini
of the gene.
Synthesis of OR-Type Libraries
DNA
oligonucleotides
were synthesized by Integrated DNA Technologies. For asymmetric PCR,
the reaction contained 25 nM mutagenic (limiting, forward read) primer
and 500 nM end (excess, reverse read) primer, with 0.5 ng μL–1 template (MAO-N D5), 0.2 mM dNTP mix, Q5 reaction
buffer, and 0.02 U μL–1 Q5 hot-start high-fidelity
polymerase (New England Biolabs) in 50 μL total volume. The
PCR consisted of denaturation at 98 °C for 30 s, then 25 cycles
of 98 °C for 20 s, 60 °C for 20 s, and 72 °C for 40
s. PCR products containing ssDNA were purified using a PCR purification
kit (Qiagen).For symmetric PCR to assemble the full gene, the
ssDNA PCR product from asymmetric PCR was used as the megaprimer (reverse
read) together with the relevant end primer (forward read). The reaction
contained 16.5 μL of megaprimer, 500 nM end primer, and other
reagents as above. The PCR consisted of denaturation at 98 °C
for 30 s, then 25 cycles of 98 °C for 30 s, 60 °C for 20
s, and 72 °C for 40 s. For combinatorial OR-type libraries, megaprimers
were created for each set of mutations and pooled together in the
in the PCR above (also see ref (69)). PCR products were visualized and purified by gel electrophoresis
and gel extraction (Qiagen kit) (Supporting Information S9). Purified libraries were ligated into a linearized expression
vector (pET16b, Novagen) using the In-Fusion cloning kit (Clontech),
following the manufacturers’ protocol. Quality control of the
synthesized libraries was performed using Sanger sequencing (Eurofins)
and next-generation DNA sequencing (Supporting Information S10–S11).
Screening for MAO-N Activity
Ligation reactions were
transformed into E. coli competent cells (T7 express,
New England Biolabs) and spread onto an LB agar with 100 μg
mL–1 ampicillin covered with a Hybond-N membrane
(Amersham Biosciences). Following incubation overnight at 30 °C,
the membrane containing single colonies was transferred to an LB agar
plate (100 μg mL–1 ampicillin and 1 mM IPTG)
and incubated for 2 h at 30 °C. Oxidase activity was then assayed
following the protocol outlined previously.[39,53] Briefly, the membrane containing colonies was transferred to a membrane
soaked in 0.1 mg mL–1 HRP (Sigma) and 100 mM potassium
phosphate pH 7.7 for 30 min (the prescreen). Colonies were then transferred
to a membrane soaked in 0.1 mg mL–1 HRP, DAB (Sigma),
2.5 mM α-methylbenzylamine (Sigma), and 100 mM potassium phosphate
pH 7.7. Oxidase activity was observed by the formation of a brown
DAB precipitate.Colonies that exhibited the fastest color change
were picked and inoculated into LB (100 μg mL–1 ampicillin) and grown overnight (37 °C, 180 rpm), and the plasmids
were extracted using a plasmid miniprep kit (Qiagen). The sequencing
of variants was performed as above.
Expression and Purification
of MAO-N
Selected variants
were overexpressed by BL21 (DE3) E. coli strain in
700 mL of LB medium with 100 μg mL–1 ampicillin.
A 0.5 mM sample of IPTG was introduced to the culture when OD600 reached 0.6, and the culture was incubated at 25 °C,
180 rpm. Cells were harvested after 16–20 h and purified using
5 mL of Histrap FF crude column (GE Healthcare) with an AKTA Explorer
100 protein purification system as described.[48]
Liquid Phase Kinetic Assay
Different amine stock solutions
including α-methylbenzylamine, N-amylamine,
butylamine, benzylamine, or cyclohexylamine (all from Sigma) were
prepared in 0.1 M potassium phosphate pH 7.7. The final concentration
range of the substrate was between 0.5 mM and 100 mM. A colorimetric
assay solution was made up by dissolving Pyrogallol red (0.3 mM final
concentration) in 0.1 M potassium phosphate, pH 7.7. The assay was
conducted by combining 35 μL of substrate solution, 50 μL
of Pyrogallol red solution and 5 μL of horseradish peroxidase
(1 mg mL–1) in a flat bottom 96-well plate and started
by adding 110 μL of purified MAO-N. Assay progress was monitored
at 550 nm at 25 °C in a Molecular Devices Spectramex M2 plate
reader. The data were analyzed using Prism7 (GraphPad), which was
also used to calculate the kinetic parameters kcat, kM, and vmax.
Data Availability
The data sets generated during the
current study are available from the corresponding author upon reasonable
request.
Authors: Christopher G Knight; Mark Platt; William Rowe; David C Wedge; Farid Khan; Philip J R Day; Andy McShea; Joshua Knowles; Douglas B Kell Journal: Nucleic Acids Res Date: 2008-11-23 Impact factor: 16.971