Joseph M Rogers1, Toby Passioura1, Hiroaki Suga2,3. 1. Department of Chemistry, Graduate School of Science, The University of Tokyo, Tokyo 113-0033, Japan. 2. Department of Chemistry, Graduate School of Science, The University of Tokyo, Tokyo 113-0033, Japan; hsuga@chem.s.u-tokyo.ac.jp. 3. Core Research for Evolutionary Science and Technology, Japan Science and Technology Agency, Saitama 332-0012, Japan.
Abstract
High-resolution structure-activity analysis of polypeptides requires amino acid structures that are not present in the universal genetic code. Examination of peptide and protein interactions with this resolution has been limited by the need to individually synthesize and test peptides containing nonproteinogenic amino acids. We describe a method to scan entire peptide sequences with multiple nonproteinogenic amino acids and, in parallel, determine the thermodynamics of binding to a partner protein. By coupling genetic code reprogramming to deep mutational scanning, any number of amino acids can be exhaustively substituted into peptides, and single experiments can return all free energy changes of binding. We validate this approach by scanning two model protein-binding peptides with 21 diverse nonproteinogenic amino acids. Dense structure-activity maps were produced at the resolution of single aliphatic atom insertions and deletions. This permits rapid interrogation of interaction interfaces, as well as optimization of affinity, fine-tuning of physical properties, and systematic assessment of nonproteinogenic amino acids in binding and folding.
High-resolution structure-activity analysis of polypeptides requires amino acid structures that are not present in the universal genetic code. Examination of peptide and protein interactions with this resolution has been limited by the need to individually synthesize and test peptides containing nonproteinogenic amino acids. We describe a method to scan entire peptide sequences with multiple nonproteinogenic amino acids and, in parallel, determine the thermodynamics of binding to a partner protein. By coupling genetic code reprogramming to deep mutational scanning, any number of amino acids can be exhaustively substituted into peptides, and single experiments can return all free energy changes of binding. We validate this approach by scanning two model protein-binding peptides with 21 diverse nonproteinogenic amino acids. Dense structure-activity maps were produced at the resolution of single aliphatic atom insertions and deletions. This permits rapid interrogation of interaction interfaces, as well as optimization of affinity, fine-tuning of physical properties, and systematic assessment of nonproteinogenic amino acids in binding and folding.
The chemical structure of a polypeptide determines its activity—including any folding or binding. Specific changes to chemical structure (mutants) can be analyzed in large numbers (1). Deep mutational scanning methods, in particular, allow for the analysis of many thousands of mutants (2): Mutant proteins or peptides are each coupled to their encoding DNA, libraries of pooled mutants are sorted for activity, mutants are counted via deep sequencing, and each mutant is scored. The throughput of these experiments is sufficient for exhaustive saturation mutagenesis; that is, testing all proteinogenic amino acids at all positions in a sequence and returning all effects on folding (3–5), binding (2, 5–8), or function (9). Extensive structure–activity maps are produced, but these methods are currently limited to the chemistry accessible within the universal genetic code—the 20 proteinogenic amino acids.However, extending mutagenesis to include nonproteinogenic amino acids offers many advantages. The larger range of chemical structures allows for a finer dissection, or optimization, of molecular interactions, down to single aliphatic carbon insertions and deletions or functional group substitutions (10, 11). Certain nonproteinogenic amino acids can also improve the otherwise poor in vivo stability of short peptides and/or reduce the excessive polarity that prevents peptides crossing cell membranes (12, 13). Indeed, nonproteinogenic amino acids are abundant in peptide natural products (14) and peptides modified for in vivo use (10, 11). Nonproteinogenic amino acid mutagenesis can also address longstanding questions about the fitness of the universal genetic code and its collection of amino acids relative to plausible prebiotic alternatives (15), as well as guide development of synthetic polymers with the ability to fold (i.e., foldamers) (16, 17).Exploration of nonproteinogenic amino acid mutants has been previously limited by the need to chemically synthesize and analyze peptides individually (10, 11). Recent advances in peptide synthesis have extended the numbers of nonproteinogenic mutants that can be constructed (18). Moreover, it is possible to combine parallel peptide synthesis with measures of function (19). However, these approaches cannot construct peptide libraries with the sequence length and numbers that deep mutational scanning can, which, at its core, uses high-fidelity nucleic acid-directed synthesis of polypeptides by the ribosome.Ribosomal synthesis (i.e., translation) can be manipulated to include nonproteinogenic amino acids (20). In vitro genetic code reprogramming is particularly versatile, allowing for the incorporation of amino acids with diverse chemical structures (21). Flexizymes, flexible tRNA-acylation ribozymes, can load almost any (ester-activated) amino acid onto any tRNA, and these loaded tRNAs can be added to reconstituted in vitro translation systems to replace proteinogenic amino acids in the genetic code (Fig. 1 and ). These “reprogrammed” genetic codes allow for the one-pot synthesis of trillions of unique, nonproteinogenic amino acid-containing peptides derived from a pool of mRNA sequences. Members of these peptide libraries can be coupled to their encoding mRNA/cDNA, allowing for the isolation of functional, highly nonproteinogenic peptides; most notably, de novo macrocyclic peptides (22) from the random nonstandard peptide integrated discovery (RaPID) system (23–26). Here, we use flexizyme-based genetic code reprogramming to extend the reach of deep mutational scanning to examine any number of nonproteinogenic amino acids in peptide binding and folding.
Fig. 1.
Nonproteinogenic deep mutational scanning. (A) Essentially any nonproteinogenic amino acid [activated by cyanomethyl ester (CME) or dinitrobenzyl ester (DBE)] can be loaded onto tRNA by flexizymes and delivered to the ribosome for use during in vitro translation. Translation of a site-saturation mutagenesis mRNA library using genetic code reprogramming allows for a nonproteinogenic amino acid to be incorporated at all positions in a given peptide sequence, and mRNA display can link each mutant peptide to its encoding mRNA. This can be repeated for n nonproteinogenic amino acids (n = 5 shown). Initiating reverse transcription with a barcoded DNA primer allows for the resulting peptide–cDNA products to be pooled and nonproteinogenic mutants deconvoluted after DNA sequencing. (B) The large diverse library from A, containing proteinogenic and nonproteinogenic mutants, can be sorted for binding to a partner protein. Next-generation sequencing of cDNA before (input) and after (output) binding is then performed. For each mutant, i, the fraction of DNA reads, F, can be used to calculate an enrichment score, e. (C) Enrichment scores relative to wild-type, E, provide a map of beneficial and deleterious changes to the peptide chemical structure.
Nonproteinogenic deep mutational scanning. (A) Essentially any nonproteinogenic amino acid [activated by cyanomethyl ester (CME) or dinitrobenzyl ester (DBE)] can be loaded onto tRNA by flexizymes and delivered to the ribosome for use during in vitro translation. Translation of a site-saturation mutagenesis mRNA library using genetic code reprogramming allows for a nonproteinogenic amino acid to be incorporated at all positions in a given peptide sequence, and mRNA display can link each mutant peptide to its encoding mRNA. This can be repeated for n nonproteinogenic amino acids (n = 5 shown). Initiating reverse transcription with a barcoded DNA primer allows for the resulting peptide–cDNA products to be pooled and nonproteinogenic mutants deconvoluted after DNA sequencing. (B) The large diverse library from A, containing proteinogenic and nonproteinogenic mutants, can be sorted for binding to a partner protein. Next-generation sequencing of cDNA before (input) and after (output) binding is then performed. For each mutant, i, the fraction of DNA reads, F, can be used to calculate an enrichment score, e. (C) Enrichment scores relative to wild-type, E, provide a map of beneficial and deleterious changes to the peptide chemical structure.
Results
Selection of Nonproteinogenic Amino Acids for Mutagenesis.
Hundreds of nonproteinogenic amino acids are accepted by flexizymes and the ribosome (23); a sample set was chosen for nonproteinogenic mutagenesis, focusing on simple, largely nonpolar structures absent from the universal genetic code. This set of 21 amino acids included both alternative side chains and backbone modifications such as N-methyl substitution, disubstitution, and d- stereochemistry (Fig. 2). We chose initially to test these in a natural protein–protein interaction that has already been studied using traditional mutagenesis: the interaction between the apoptosis regulatory BH3 domain of PUMA and the folded protein MCL1 (27, 28). BH3 domains are relatively short and unstructured in isolation but will fold to a single α-helix upon binding (Fig. 2). This simple structure has made PUMA and its homologs model systems for protein folding upon binding (29, 30) and for exploring the potential of nonproteinogenic amino acids in druglike peptides (16, 31).
Fig. 2.
Nonproteinogenic deep mutational scanning of linear PUMA and cyclic CP2. (A) Test set of 21 nonproteinogenic amino acids used in this study. (B) PUMA (blue), an intrinsically disordered protein, folds to an α-helix upon binding with MCL1 (white) [Protein Data Bank (PDB) ID code 2ROC]. (C) Deep mutational scanning scores (log2E) report on free energy changes of binding upon mutation, ∆∆G. Proteinogenic (black) and nonproteinogenic mutants (white) overlay. Errors in log2E represent SDs of repeats of library binding and DNA recovery. (D) Leave-one-out cross-validation of empirical fit of ∆∆G and log2E showing agreement between calculated ∆∆G of the left-out mutant (∆∆GScanning) and expected experimental value (∆∆GBiophysics). Greater than 90% of PUMA mutant ∆∆G values fall within the white region. (E) ∆∆G for all mutants of PUMA; stabilizing (red) and destabilizing (blue). (Left) Average ∆∆G projected onto PUMA/MCL1 structure. (F) De novo macrocyclic peptide CP2 (Left). Average ∆∆G projected onto CP2/KDM4A structure (Right) (PDB ID code 5LY1). (G) ∆∆G for all mutants of CP2 binding to KDM4A.
Nonproteinogenic deep mutational scanning of linear PUMA and cyclic CP2. (A) Test set of 21 nonproteinogenic amino acids used in this study. (B) PUMA (blue), an intrinsically disordered protein, folds to an α-helix upon binding with MCL1 (white) [Protein Data Bank (PDB) ID code 2ROC]. (C) Deep mutational scanning scores (log2E) report on free energy changes of binding upon mutation, ∆∆G. Proteinogenic (black) and nonproteinogenic mutants (white) overlay. Errors in log2E represent SDs of repeats of library binding and DNA recovery. (D) Leave-one-out cross-validation of empirical fit of ∆∆G and log2E showing agreement between calculated ∆∆G of the left-out mutant (∆∆GScanning) and expected experimental value (∆∆GBiophysics). Greater than 90% of PUMA mutant ∆∆G values fall within the white region. (E) ∆∆G for all mutants of PUMA; stabilizing (red) and destabilizing (blue). (Left) Average ∆∆G projected onto PUMA/MCL1 structure. (F) De novo macrocyclic peptideCP2 (Left). Average ∆∆G projected onto CP2/KDM4A structure (Right) (PDB ID code 5LY1). (G) ∆∆G for all mutants of CP2 binding to KDM4A.
Nonproteinogenic Deep Mutational Scanning.
The flexizyme protocol allows for facile reprogramming of genetic codes (Fig. 1). We assembled in vitro translation systems in which methionine was replaced with one of the 21 nonproteinogenic amino acids (Fig. 2), with high-fidelity incorporation at AUG codons (). A site-saturation mutagenesis mRNA library for PUMA was added to each of these systems and translated into PUMA peptides. Similar to previous studies (29, 30), a pseudo wild-type sequence was used containing the mutation M144A (henceforth referred to as the wild-type); this mutation prevents oligomerization of PUMA peptides at high (micromolar) concentrations. Each peptide was covalently attached to its encoding mRNA via a puromycin linker, and the pool of peptide–mRNA fusions was reverse transcribed into noncovalent cDNA complexes using barcoded DNA primers (Fig. 1) (32). Barcoding, which encoded the reprogrammed genetic codes themselves, permitted an outsized 41 amino acid alphabet.The resulting diverse PUMA peptide library was incubated with immobilized MCL1 and washed, and the bound fraction was recovered (Fig. 1). Populations before and after binding were enumerated by deep sequencing, and enrichment scores for binding (E) were calculated for every mutant (2) (Fig. 1). To validate this approach, previously reported KD/∆∆G values (30) were compared with E scores, which revealed that E is a smooth function of MCL1 binding affinity (Fig. 2). To validate that E is a function of binding for nonproteinogenic mutants, we chemically synthesized additional PUMA mutants and measured binding to MCL1 (). For synthetic convenience, the PUMA peptides in this collection were 27 amino acids in length, shorter than the 35-aa peptides used in previous studies (29, 30), and ∆∆G values were calculated relative to the binding of an equivalent 27-aa wild-type. The link between ∆∆G and E is maintained for this collection and, importantly, the proteinogenic and nonproteinogenic mutant data overlay (Fig. 2).E scores alone are sufficient for analysis of deep mutational scanning data (2–9). However, to aid interpretation of the mutant data and any structure–activity relationships that follow, we chose to calibrate log2E against a handful of experimentally validated mutants and calculate true thermodynamic parameters, ∆∆G/KD, for each mutant. Rather than make assumptions about the function linking log2E and KD (1, 5), we chose to use an empirical fit of E and ∆∆G to calibrate the deep mutational scanning data. The predictive power of our approach was assessed using leave-one-out cross-validation (Fig. 2).An exhaustive map of PUMA mutant ∆∆G was constructed, including all mutations to all 41 proteinogenic and nonproteinogenic amino acids (Fig. 2). As expected for α-helical folding, mutations to proline or backbone N-methyl amino acids were highly destabilizing (29) (Fig. 2 and ). As expected for a BH3 domain (27, 28), mutations to the highly conserved D146 were not tolerated (Fig. 2 and ). A145G, which brings PUMA closer to the BH3 consensus of LXXXGD (28), was highly stabilizing. Interestingly, a nonproteinogenic amino acid substitution was the most favorable at many positions in the PUMA sequence ().
Nonproteinogenic Deep Mutational Scanning of the de Novo Macrocyclic Peptide CP2.
In principal, this scanning approach can be applied to sequences that already contain nonproteinogenic amino acids, such as the small de novo macrocyclic peptides discovered using the RaPID system (23–26). We reasoned that saturation mutagenesis could help understand the molecular details behind the potent activities of these macrocycles and might suggest modifications to amino acids known to improve protease resistance or membrane permeability (13). We chose to investigate the peptide CP2 (Fig. 2), a potent and isoform-selective inhibitor of the KDM4A histone demethylase, which contains a nonproteinogenic d-tyrosine and is cyclized via a nonreducible thioether bond. CP2 binds KDM4A with a 30 nM KD, forms a small β-sheet–like secondary structure when bound (Fig. 2 and ), and exhibits inhibitory activity with ∼40 nM IC50 (26).Nonproteinogenic deep mutational scanning was performed for CP2. The raw E scores for binding KDM4A correlated well with previously reported inhibitory IC50 values (). A small collection of CP2 mutants was synthesized, and measured ∆∆G values for binding () were used to calibrate E and calculate ∆∆G for all proteinogenic and nonproteinogenic CP2 mutants (Fig. 2 and ). The map of ∆∆G shows that a critical RSG motif was intolerant to mutation (Fig. 2 and ) and confirms many of the observations from previous structure–activity studies (33). The RSG motif forms the turn of the β-sheet and is deeply buried in KDM4A (Fig. 2 and ).
Structure–Activity Relationships at Single-Atom Resolution.
The expanded range of amino acid structures and the systematic nature of this mutagenesis allowed for the extraction of detailed structure–activity relationships at every position in each peptide sequence. As an example, progressive deletion of aliphatic carbon atoms from the side chain L141 of PUMA increasingly destabilizes the interaction with MCL1 (Fig. 3), whereas some atom insertion was stabilizing (e.g., the mutant L141Cpa; Fig. 3) (31). At position A144, where there is a nonnatural feature in the sequence of PUMA (M144A, see above), mutants with longer aliphatic side chains were increasingly stabilizing (Fig. 3). Thus, deep mutational scanning was able to identify the vacant hydrophobic pocket left by the M144A mutation. In the cyclic peptideCP2, R10 could be truncated to aliphatic side chains such as Nva without affecting affinity for KDM4A (Fig. 3), suggesting that it is the hydrophobic chain of the arginine that interacts favorably with the partner protein, not the charged head group. Strikingly, at G8, addition of an (l-) side chain (Fig. 3) is not tolerated, but d-alanine is accepted (Fig. 2 and ). This led us to inspect the phi and psi angles of bound CP2, which identified G8 as being in a region disallowed for l- but permissible for d- amino acids ().
Fig. 3.
Structure–activity relationships at single-atom resolution. (A) ∆∆G for each proteinogenic and nonproteinogenic mutation at position L141 and A144 in PUMA (Left), and G8 and R10 in CP2 (Right). (B, Left) Graph of amino acid side chains in which edges represent the insertion or deletion of a single aliphatic carbon atom. Other graphs, Left to Right, show nodes colored according to ∆∆G [stabilizing (red) and destabilizing (blue)] for mutations at L141 and A144 of PUMA, and G8 and R10 of CP2. Arrows highlight single carbon atom changes to the wild-type side chain.
Structure–activity relationships at single-atom resolution. (A) ∆∆G for each proteinogenic and nonproteinogenic mutation at position L141 and A144 in PUMA (Left), and G8 and R10 in CP2 (Right). (B, Left) Graph of amino acid side chains in which edges represent the insertion or deletion of a single aliphatic carbon atom. Other graphs, Left to Right, show nodes colored according to ∆∆G [stabilizing (red) and destabilizing (blue)] for mutations at L141 and A144 of PUMA, and G8 and R10 of CP2. Arrows highlight single carbon atom changes to the wild-type side chain.
Deep Mutational Scanning-Guided Redesign for Improved Affinity.
The information from these nonproteinogenic mutant scans can be used to engineer peptides for increased affinity, assuming that the effects of multiple mutants are, to some extent, additive. For example, five individual affinity-enhancing mutants of PUMA were combined into one redesigned peptide, “rPUMA” (Fig. 4). rPUMA bound MCL1 with improved affinity (KD < 0.16 vs. 4 nM for an equivalent 27-aa wild-type peptide), stronger than any of the individual mutants, and with markedly slower dissociation compared with the unmodified peptide (Fig. 4).
Fig. 4.
Nonproteinogenic deep mutational scanning-guided peptide optimization. (A, Left) Five PUMA mutants identified as stabilizing the interaction with MCL1 (A139tBu, Q140Y, A144Cha, A145G, and Y152Bzt) were combined to make rPUMA (segment shown, mutated amino acids in black). (A, Right) By surface plasmon resonance (SPR), rPUMA (blue) bound tighter to MCL1, with markedly slower dissociation than the equivalent PUMA wild-type (black). This result is notable, because even one poorly chosen mutation can significantly weaken the interaction; for example, I137hSM (gray). (B, Left) Four mutants of CP2 that reduce polarity or steric bulk but individually do not affect binding to KDM4A (N4MeA, T5A, W9Bzt, and R10Nva), were combined to make rCP2 (structure shown, mutants in black). (B, Right) SPR amplitude analysis shows that rCP2 (blue) binds to KDM4A with similar affinity as CP2 (black). Compare with a single poorly chosen mutant, R6G (gray).
Nonproteinogenic deep mutational scanning-guided peptide optimization. (A, Left) Five PUMA mutants identified as stabilizing the interaction with MCL1 (A139tBu, Q140Y, A144Cha, A145G, and Y152Bzt) were combined to make rPUMA (segment shown, mutated amino acids in black). (A, Right) By surface plasmon resonance (SPR), rPUMA (blue) bound tighter to MCL1, with markedly slower dissociation than the equivalent PUMA wild-type (black). This result is notable, because even one poorly chosen mutation can significantly weaken the interaction; for example, I137hSM (gray). (B, Left) Four mutants of CP2 that reduce polarity or steric bulk but individually do not affect binding to KDM4A (N4MeA, T5A, W9Bzt, and R10Nva), were combined to make rCP2 (structure shown, mutants in black). (B, Right) SPR amplitude analysis shows that rCP2 (blue) binds to KDM4A with similar affinity as CP2 (black). Compare with a single poorly chosen mutant, R6G (gray).
Deep Mutational Scanning-Guided Tuning of Peptide Physical Properties.
Mutations to nonproteinogenic amino acids can engender peptides with beneficial physical properties that can increase in vivo potency (13), provided they do not disrupt the main function of the molecule. For CP2, this has been attempted before using structure-based design (26) (), whereby a number of nonproteinogenic mutants were chosen to increase protease resistance and/or cellular activity without impacting the bound state, but modest reductions in affinity for KDM4A were observed (26). The deep mutational scanning data corroborated the structure-based design but also highlighted additional beneficial mutations (): mutants that reduced steric bulk, added N-methylation, removed charges or hydrogen bonding groups, and did not affect KDM4A binding. These mutations were combined to produce “rCP2” (Fig. 4) and remained energetically neutral in combination: rCP2 bound KDM4A with equivalent affinity to CP2 (Fig. 4) (KD = 7.0 vs. 6.6 nM for the wild-type), in contrast to the modified CP2 peptides generated through structure-based design.
Analysis of Nonproteinogenic Amino Acids in Binding and Folding.
The systematic nature of deep mutational scanning has allowed for global analysis of proteinogenic amino acid substitutions (34) (i.e., comparing the amino acids themselves). Here, we could extend this analysis to include nonproteinogenic amino acids. As an example, scanning the α-helical PUMA with mutations to the nonproteinogenic MeB showed a pattern of ∆∆G almost identical to the scan with proteinogenic Pro (Fig. 5). As MeB is a noncyclic homolog of Pro, this emphasizes that the N-substitution, rather than cyclic structure, is the root cause of Pro destabilization of α-helices. Interestingly, certain nonproteinogenic amino acids closely mimic the behavior of a proteinogenic amino. At all positions in the PUMA and CP2 peptides, mutations to 3Th are energetically equivalent to mutations to the structurally similar Phe; likewise for tBu and Leu (Fig. 5 and ).
Fig. 5.
Certain nonproteinogenic amino acids closely mimic those in the universal genetic code. Graph shows ∆∆G mutant scans for proteinogenic (black) and nonproteinogenic amino acids (rainbow). MeB shows a similar ∆∆G profile to Pro in the folding and binding of α-helical PUMA (Top), whereas the ∆∆G profile of 3Th mimics Phe (Middle) and tBu mimics Leu (Bottom), in both PUMA and β-sheet CP2. Errors in ∆∆G propagated from SDs in log2E.
Certain nonproteinogenic amino acids closely mimic those in the universal genetic code. Graph shows ∆∆G mutant scans for proteinogenic (black) and nonproteinogenic amino acids (rainbow). MeB shows a similar ∆∆G profile to Pro in the folding and binding of α-helical PUMA (Top), whereas the ∆∆G profile of 3Th mimics Phe (Middle) and tBu mimics Leu (Bottom), in both PUMA and β-sheet CP2. Errors in ∆∆G propagated from SDs in log2E.
Discussion
By coupling the synthetic abilities of genetic code reprogramming to the massively parallel analysis of deep mutational scanning, we describe a method to exhaustively trial multiple, diverse nonproteinogenic amino acids in the thermodynamics of protein–protein interactions. This approach permits the study of large numbers of nonproteinogenic amino acid mutants in a single experiment, on a scale that would not be possible using classical cycles of synthesis and biophysical analysis.Understanding the binding of peptides and improving their affinity is a recurring problem in biomedical science. Here, we show how deep mutational scanning can provide exhaustive peptide structure–activity relationships, down to single-atom resolution. One use of this data is to assess the extent that peptide affinity can be improved and whether there is any benefit to using nonproteinogenic chemistry. In the example of PUMA binding MCL1, many proteinogenic mutations stabilized the interaction with MCL1, suggesting that PUMA had not evolved solely for optimal affinity. Nonetheless, a nonproteinogenic amino acid substitution was the most favorable at many positions in the PUMA sequence, highlighting the use of chemistry beyond the universal genetic code to improve interaction interfaces. We used this information to redesign a PUMA peptide for greater affinity. Importantly, the bound structure was not required for this optimization, and some of the nonproteinogenic mutations (e.g., A139tBu) would have been difficult to predict, even with such structural information.In contrast to PUMA, few mutations to the macrocyclic CP2 stabilized the interaction with its binding partner. This suggests that the trillion-member cyclic peptide library of the RaPID system, while insufficient to cover sequence space entirely, is large enough to discover molecules highly optimized for binding. However, even when affinity is difficult to improve upon, the deep mutational scanning data are still a valuable resource. Many substitutions were energetically neutral, and many of these were to amino acids with physical properties that increase protease resistance and membrane permeability (13). This approach applied to the CP2cyclic peptide was significantly quicker and suggested more suitable modifications than previous structure-based design (26).A longstanding basic biological question asks whether the collection of proteinogenic amino acids is optimal for making folded, functional proteins. Here, in the context of the two main protein-folding secondary structures (the α-helix of PUMA and the β-sheet CP2), our deep mutational scanning identified and validated nonproteinogenic bioisosteric replacements: nonproteinogenic amino acids that closely mimicked the behavior of a proteinogenic amino acid. This finding suggests that alternative genetic codes are possible that are as “fit” as the universal genetic code (15) with regard to producing folding and binding polypeptides.Here, we examined 1,360 mutants of PUMA (714 nonproteinogenic) and 468 mutants of CP2 (240 nonproteinogenic). This is has not taken advantage of the maximum mRNA display library size (>1013) or the full throughput of next-generation DNA sequencing. Therefore, nonproteinogenic deep mutational scanning can be expanded to study multiple mutants or longer protein sequences. This method can be applied to other natural, de novo discovered or designed peptides or proteins that function through binding or through binding and folding.The ability to rapidly generate high-resolution structure–activity maps will be particularly useful in the development of drug-candidate peptides, as these information-rich datasets can be used to guide modifications for greater binding and potency. In essence, this approach is a highly parallelized version of hit-to-lead exploration in small molecule drug development. Lastly, this method can be extended to include any number of nonproteinogenic amino acids tolerated by the ribosome, a set that contains hundreds of structures and continues to expand (23, 35, 36). Deep mutational scanning can be used to comprehensively assess these nonproteinogenic amino acids for use in binding, folding, and protein engineering.
Materials and Methods
Nonproteinogenic amino acids for mutagenesis (ester activated) were loaded onto tRNAEnGluCAU using flexizymes (21) (). Nonproteinogenic amino acids for translation initiation (N-acetyl-l-phenylalanine for PUMA and N-chloroacetyl-d-tyrosine for CP2) were loaded onto tRNAfMetCAU, using the flexizyme eFx (22). Loaded tRNA were separately added to methionine-deficient in vitro translation systems, along with site-saturation mRNA libraries (with C-terminal HA tag) for PUMA or CP2, and translated into peptides according to an mRNA display protocol (25). Reverse transcription was carried out using barcoded DNA primers unique for each reprogrammed genetic code, and the products from each genetic code were pooled. Nonproteinogenic amino acids were incorporated with high fidelity (), but to account for any differences in translation efficiencies, cDNA-linked peptide mutants (with C-terminal HA tag) were purified using anti-HA magnetic beads to remove any incompletely translated by-products.Twenty microliters of an approximately 200 nM library of anti-HA–purified cDNA-linked peptide mutants (PUMA or CP2) was incubated with 200 nM protein binding partner biotin-streptavidin immobilized onto magnetic beads (MCL1 or KDM4A). A 200 nM concentration of partner protein was low enough to produce a broad distribution of log2E, while high enough to avoid excessive PCR cycles during DNA recovery (). Binding was allowed to reach equilibrium (3 h at 25 °C), and the beads were twice washed with buffer (20 µL each), allowing binding to reach equilibrium after each buffer addition (three 3-h incubations in total). Samples of the anti-HA–purified cDNA libraries before and after binding were prepared for Illumina sequencing. DNA reads were analyzed using a modified version of the Enrich pipeline (2), calculating enrichment scores (E) for each proteinogenic and nonproteinogenic mutant. Repeat experiments of the incubation, washing, and PCR amplification, (starting from the same library preparation, n = 3 for PUMA, n = 4 for CP2) were averaged to give the reported log2E values, and the SDs were used to estimate the error in log2E. Depending on the function ∆∆G = f(log2E), errors were propagated appropriately to give reported errors in ∆∆G.A collection of proteinogenic and nonproteinogenic mutants of PUMA and CP2, covering the range of E scores, was synthesized using solid-phase peptide synthesis, and binding to MCL1 and KDM4A, respectively, was tested using surface plasmon resonance. The resulting KD/∆∆G values, plus any published values (30), were used to find empirical relationships between E and ∆∆G, and these functions were used to calculate KD/∆∆G for all PUMA and CP2 mutants.
Authors: Joseph M Rogers; Vladimiras Oleinikovas; Sarah L Shammas; Chi T Wong; David De Sancho; Christopher M Baker; Jane Clarke Journal: Proc Natl Acad Sci U S A Date: 2014-10-13 Impact factor: 11.205
Authors: Rameshwar U Kadam; Jarek Juraszek; Boerries Brandenburg; Christophe Buyck; Wim B G Schepens; Bart Kesteleyn; Bart Stoops; Rob J Vreeken; Jan Vermond; Wouter Goutier; Chan Tang; Ronald Vogels; Robert H E Friesen; Jaap Goudsmit; Maria J P van Dongen; Ian A Wilson Journal: Science Date: 2017-09-28 Impact factor: 47.728
Authors: Akane Kawamura; Martin Münzel; Tatsuya Kojima; Clarence Yapp; Bhaskar Bhushan; Yuki Goto; Anthony Tumber; Takayuki Katoh; Oliver N F King; Toby Passioura; Louise J Walport; Stephanie B Hatch; Sarah Madden; Susanne Müller; Paul E Brennan; Rasheduzzaman Chowdhury; Richard J Hopkinson; Hiroaki Suga; Christopher J Schofield Journal: Nat Commun Date: 2017-04-06 Impact factor: 14.919
Authors: Victor I Lyamichev; Lauren E Goodrich; Eric H Sullivan; Ryan M Bannen; Joerg Benz; Thomas J Albert; Jigar J Patel Journal: Sci Rep Date: 2017-09-21 Impact factor: 4.379
Authors: Steven R Fleming; Paul M Himes; Swapnil V Ghodge; Yuki Goto; Hiroaki Suga; Albert A Bowers Journal: J Am Chem Soc Date: 2020-03-06 Impact factor: 15.419
Authors: Philip R Lindstedt; Francesco A Aprile; Pietro Sormanni; Robertinah Rakoto; Christopher M Dobson; Gonçalo J L Bernardes; Michele Vendruscolo Journal: Cell Chem Biol Date: 2020-11-19 Impact factor: 8.116
Authors: Sijie Wang; Kyle E Denton; Kathryn F Hobbs; Tyler Weaver; James M B McFarlane; Katelyn E Connelly; Michael C Gignac; Natalia Milosevich; Fraser Hof; Irina Paci; Catherine A Musselman; Emily C Dykhuizen; Casey J Krusemark Journal: ACS Chem Biol Date: 2019-12-12 Impact factor: 5.100
Authors: Mette Ishøy Rosenbaum; Louise S Clemmensen; David S Bredt; Bernhard Bettler; Kristian Strømgaard Journal: Nat Rev Drug Discov Date: 2020-11-11 Impact factor: 84.694
Authors: Amaurys A Ibarra; Gail J Bartlett; Zsöfia Hegedüs; Som Dutt; Fruzsina Hobor; Katherine A Horner; Kristina Hetherington; Kirstin Spence; Adam Nelson; Thomas A Edwards; Derek N Woolfson; Richard B Sessions; Andrew J Wilson Journal: ACS Chem Biol Date: 2019-09-27 Impact factor: 5.100