Literature DB >> 25627867

Prediction of distal residue participation in enzyme catalysis.

Heather R Brodkin1, Nicholas A DeLateur, Srinivas Somarowthu, Caitlyn L Mills, Walter R Novak, Penny J Beuning, Dagmar Ringe, Mary Jo Ondrechen.   

Abstract

A scoring method for the prediction of catalytically important residues in enzyme structures is presented and used to examine the participation of distal residues in enzyme catalysis. Scores are based on the Partial Order Optimum Likelihood (POOL) machine learning method, using computed electrostatic properties, surface geometric features, and information obtained from the phylogenetic tree as input features. Predictions of distal residue participation in catalysis are compared with experimental kinetics data from the literature on variants of the featured enzymes; some additional kinetics measurements are reported for variants of Pseudomonas putida nitrile hydratase (ppNH) and for Escherichia coli alkaline phosphatase (AP). The multilayer active sites of P. putida nitrile hydratase and of human phosphoglucose isomerase are predicted by the POOL log ZP scores, as is the single-layer active site of P. putida ketosteroid isomerase. The log ZP score cutoff utilized here results in over-prediction of distal residue involvement in E. coli alkaline phosphatase. While fewer experimental data points are available for P. putida mandelate racemase and for human carbonic anhydrase II, the POOL log ZP scores properly predict the previously reported participation of distal residues. 2015 The Authors Protein Science published by Wiley Periodicals, Inc. on behalf of The Protein Society.

Entities:  

Keywords:  alkaline phosphatase; catalytic efficiency; ketosteroid isomerase; nitrile hydratase; partial order optimum likelihood; phosphoglucose isomerase; remote residues

Mesh:

Substances:

Year:  2015        PMID: 25627867      PMCID: PMC4420525          DOI: 10.1002/pro.2648

Source DB:  PubMed          Journal:  Protein Sci        ISSN: 0961-8368            Impact factor:   6.725


Introduction

Many chemical reactions that require high temperature and/or extreme pH in the laboratory are catalyzed by enzymes at physiological temperature and neutral pH. One of the central challenges in biochemistry is to understand how enzymes achieve this catalytic power under mild conditions. Structural and biochemical studies have now characterized the active sites of hundreds of enzymes; nearly all of these studies have focused on the amino acid residues in direct contact with the reacting substrate molecule(s). A typical enzyme might consist of hundreds of residues, but only a handful of them have been identified in the literature as catalytically important. In the manually curated portion of the Catalytic Site Atlas (CSA),1 a database of enzyme active site composition compiled from the experimental literature, an average of only 1.2% of the residues in the biological units of these 170 enzymes is listed as catalytically important. Virtually all of these reported catalytic residues are, in the active conformation of the enzyme, in direct contact with the reacting substrate molecule(s). A recent review surveys examples from the literature of proteins where distal residues have been shown to participate directly in the biochemical function.2 This work presents a model, based on computed chemical properties, that successfully predicts both first-shell and more distant residues that are important for function. Model predictions are compared with available experimental data and, for a few cases, new experimental results are presented. The ability to predict, with a simple calculation, the involvement of distal residues in catalytic and other processes is a very important capability for protein engineering and for understanding how nature builds catalytic sites. Here new computational and experimental results, together with experimental data from the literature, are presented to show that residues that are not in direct contact with the reacting substrate molecule often play significant roles in catalysis, but more importantly, that their participation in catalysis is predictable with a simple calculation. Previously we have shown3,4 that all of the residues in a protein structure may be assigned scores based on computed chemical properties and that these scores may be used to rank-order all of the residues according to their probability of functional importance. In the present work, these scores are transformed so that they are more comparable across proteins of different sizes. We now show that the pattern of these scores predicts whether an enzyme has a highly localized, compact active site, or a spatially extended, multilayer active site with extensive involvement of distal residues in catalysis and ligand binding. For purposes of the present study, residues interacting directly with the substrate molecule are referred to as first-shell residues; other residues interacting directly with one or more first-shell residues are called second-shell residues; and finally, other residues interacting with one or more second-shell residues are called third-shell residues. All residues defined as first-, second-, or third-shell thus may be thought of as members of respective spheres surrounding the reacting substrate molecule. In the present work, enzymes with available experimental data on the participation of distal residues in catalysis are the focus. THEoretical Microscopic Anomalous TItration Curve Shapes (THEMATICS) is a computational method for the prediction of catalytic and binding residues in proteins solely from the three-dimensional structure of the query protein.5–7 THEMATICS takes advantage of the unique chemical and electrostatic properties of catalytic and binding residues. In particular, the ionizable residues in active sites tend to have anomalously shaped theoretical titration curves. We have established metrics6 to quantify the degree of deviation from normal titration behavior and have shown how these metrics may be used to predict, with high sensitivity and specificity,7 the active site residues in a protein three-dimensional structure, just from a simple set of calculations. Partial Order Optimum Likelihood (POOL)3,4,8 is a machine learning method for the prediction of functionally important residues in protein structures. POOL, a multidimensional, monotonicity-constrained maximum likelihood technique, utilizes input for which the output variable (in this case the probability of functional importance of a residue) is a monotonic function of each of the input features. POOL starts with the hypothesis that the larger the THEMATICS metrics for a given residue, the higher the probability that that residue is important for the biochemical function. Electrostatics features from THEMATICS are combined with multidimensional isotonic regression to form maximum likelihood estimates of probabilities that specific residues are active in biochemical function. All residues are then rank-ordered according to their probability of functional importance. Through the introduction of environment variables in POOL, THEMATICS metrics are used to characterize the chemical environments of all residues, not just the ionizable ones. Furthermore, the input to POOL is flexible, in that any property upon which the probability of the functional importance of a residue depends monotonically can be a POOL input feature. These include features based solely on the structure of the query protein, such as THEMATICS metrics6,7,9 and surface geometric properties,10 but may also include other features if they are available, such as phylogenetic tree-based scores.11 It has been demonstrated3 that POOL with purely structure-based features performs very nearly as well as the best methods that utilize both sequence alignments and structure. POOL with input features that include THEMATICS, geometric data, and phylogenetic scores has been shown4 to be an even higher-performing functional residue predictor; with these three input features, THEMATICS is the largest contributor to the POOL predictions. In this work, THEMATICS, the structure-only version of ConCavity,10 and INTREPID,11 are used as input to POOL to predict specific residues, including those outside the first shell, that are involved in catalysis; POOL scores are used to predict the degree of spatial extension of enzyme active sites and experimental evidence to support these predictions is presented. Numerous examples of distal residue involvement in enzyme specificity have been reported in the directed evolution literature.12 For instance the activity of human carbonic anhydrase II on the ester substrate 2-naphthyl acetate was increased 40-fold with a triple mutant obtained through three rounds of mutagenesis, selection and recombination. Notably, all three residues were located outside of the first shell.13 In the directed evolution of Escherichia coli d-sialic acid aldolase to l−3-deoxy-manno−2-octulosonic acid aldolase, changes in eight amino acids, all of which are located outside the first shell, were necessary to produce a mutant enzyme with a 1000-fold increase in specificity for the unnatural sugar substrate, l-d−3-deoxy-manno−2-octulosonic acid.14 Directed evolution techniques were also used to alter the specificity of aspartate aminotransferase.15 An enzyme variant containing 17 amino acid substitutions resulted in a 106-fold increase in catalytic rate for the non-native valine substrate.15 Interestingly, only one of the mutated residues interacted directly with the substrate; all others were remote residues. Random mutagenesis experiments on the flavoenzyme vanillyl-alcohol oxidase generated four single point mutants, each with enhanced reactivity for creosol and other ortho-substituted 4-methylphenols.16 In each mutant enzyme, the mutation was located outside of the first shell. Finally, a mutant metallo-β-lactamase was discovered through directed evolution that resulted in an enzyme with increased hydrolytic efficiency toward cephalexin.17 This mutant contained four amino acid substitutions, two in the second coordination sphere of the metal ion and two far removed from the annotated active site. There are also a few examples where site-directed mutagenesis studies have shown that distal residues contribute to reactivity. In the metalloenzymes alkaline phosphatase18–21 and mandelate racemase,22 rational mutations to residues in the second shell surrounding the metal ions have been reported previously, sometimes with dramatic results. Studies showing participation in catalysis by multiple THEMATICS-predicted second- and third-shell residues have been reported recently by us for Pseudomonas putida nitrile hydratase23 and human phosphoglucose isomerase,24 and also for the Y-family DNA polymerase from E. coli DinB.25 Further analysis of these and additional enzymes are reported here, using the new POOL scores, new experimental data, and compilations of experimental data from the literature.

Results

Optimizing POOL scores to predict distal residues

The top 8% of POOL-ranked residues typically have been taken to be the predicted set of functionally important residues because this gives the best performance against the CSA-100,11 the benchmark set of annotated catalytic residues, for purposes of predicting the known catalytic residues. The CSA-100 is a 100-enzyme subset of the original, manually curated portion of the Catalytic Site Atlas;1 the 100 enzymes were chosen to maximize sequence diversity with the goal of no detectable homology between any two members of the subset. However, virtually all of the CSA-100 annotated residues are first-shell residues and thus the cutoff that gives optimum performance on the benchmark set is not necessarily the best cutoff for the prediction of all of the residues that participate in catalysis, including the distal residues. For purposes of prediction of both first-shell and more distant residues, we introduce the POOL scores, which contain useful information and enable better predictions of functionally important residues, both inside and outside the first shell. Note that POOL scores are used to rank-order the residues within a given protein structure, according to their probability of participation in catalysis. However, the raw POOL scores are not comparable between two different proteins, as they depend on protein size. It is desirable to transform the POOL scores to a scale that is comparable across proteins. Therefore the POOL scores within each protein are linearly transformed into Z scores. Generally the few highest-ranked residues have Z scores that are many orders of magnitude larger than those of most of the other residues. Therefore, the mean and standard deviation σP of all of the POOL scores within a protein are calculated a first time and the residues with POOL scores greater than two standard deviations over the mean are deemed outliers. The trimmed set of POOL scores, excluding the outlier residues, is then used for the second calculation of the mean and standard deviation. The POOL scores for all residues are then expressed as Z scores, as: Here k is a residue index, t represents the mean POOL score calculated on the trimmed set and σPt represents the standard deviation for the trimmed set. Because the range of POOL scores is over many orders of magnitude in a given protein, the Z scores are re-written as a logarithmic function log ZP as: Using the new POOL scores, we analyze enzymes across different enzyme classes.

Nitrile hydratase

Nitrile hydratase (E.C. 4.2.1.84, NHase) from P. putida utilizes low-spin, noncorrinoid Co(III) to catalyze the hydrolysis of nitrile substrates to their corresponding amides. The reaction scheme is shown in Figure 1(A). This type of low-spin Co(III) NHase is prevalent as a biocatalyst for the industrial production of commodity chemicals such as acrylamide on the kiloton scale, and also nicotinamide and 5-cyanovaleramide.26 Catalysis by NHase affords a number of advantages over conventional catalysts, including low energy consumption, less waste, high efficiency and product purity. P. putida NHase (ppNH) is a heterodimeric metalloenzyme. The two subunits, α and β, do not show mutual homology but both α and β do show a high degree of homology to other known NHases.
Figure 1

For Pseudomonas putida Nitrile hydratase: A) reaction scheme; B) stereo view of the extended active site. The Co ion is rendered as a pink ball; font color indicates location—red, first sphere—blue, second sphere—green, third sphere; C) POOL log ZP score as a function of POOL rank for the top 25 POOL ranked residues.

For Pseudomonas putida Nitrile hydratase: A) reaction scheme; B) stereo view of the extended active site. The Co ion is rendered as a pink ball; font color indicates location—red, first sphere—blue, second sphere—green, third sphere; C) POOL log ZP score as a function of POOL rank for the top 25 POOL ranked residues. Six first-shell residues in ppNH were reported previously27–30 to be important for catalysis: the cobalt-coordinating α-C112, α-C115, and α-C117, the catalytic β-R52 and β-R149, and the ligand-binding β-Y68. The two arginine residues are thought to hydrogen bond to the cysteine residues which coordinate the metal ion. These arginine residues therefore appear to stabilize the claw setting in the active site. We previously reported23 first-, second-, and third-shell residues predicted by THEMATICS7 to be important for catalysis in ppNH (PDB ID: 3QXE) through site-directed mutagenesis (SDM) experiments. A total of eight residues were tested, five of which were predicted by THEMATICS and three were not predicted; all eight residues are well conserved. Mutations, made as conservatively as possible, at four of the five THEMATICS-predicted residues showed significant loss of catalytic activity; one predicted residue, plus the three residues not predicted, did not show significant difference from wild type upon mutation. In the present work, we report additional experiments on distal residues, using the POOL scores as a guide. The effects of the mutations are summarized in Table1. A stereo view of the extended active site of ppNH is shown in Figure 1(B). The cobalt ion is rendered as a pink ball. The first-sphere residues are labeled in red font, the second-sphere in blue, and the third sphere in green.
Table 1

Residue POOL Rank, POOL log ZP Score, and Loss of Catalytic Efficiency for ppNH Variants

VariantShellPOOL rankPOOL log ZP score(kcat/KM)wild type/(kcat/KM)mutantReference
α-C115AFirst12.0>1.0 × 10527
β-Y68FFirst21.83.8 × 10228
α-C112AFirst31.3>1.0 × 10527
β-R149YFirst41.0See text30; See text
β-Y69FThird50.996This work
α-K131QSecond60.9411This work
β-Y63FSecond70.9411This work
β-E56QSecond80.892.3 × 10223
β-H147NSecond90.851.1 × 10223
α-C117AFirst100.79>1.0 × 10527
α-D164NSecond110.562023
β-R52KFirst120.521.3 × 10329
β-H71LThird150.372523
α-Y118FSecond170.0478This work
α-E168QSecond22−0.0293.723
β-Y215FThird32−0.0543.323
α-R170NSecond35−0.0621.623
α-Y171FThird40−0.0672.123
α-R195QThird51−0.0723This work

Note that the residue ranked fourth, β-R149, has been reported to be important for catalysis, but numerical kinetics data have not been reported.

Residue POOL Rank, POOL log ZP Score, and Loss of Catalytic Efficiency for ppNH Variants Note that the residue ranked fourth, β-R149, has been reported to be important for catalysis, but numerical kinetics data have not been reported. Figure 1(C) shows the POOL log ZP scores as a function of POOL rank for ppNH. The top 19 residues, corresponding to the top 5% of all residues in the dimer, have positive log ZP scores. For ppNH, 10 out of the 15 top POOL-ranked residues were previously tested by site-directed mutagenesis, either reported in the prior literature or by us, and all 10 of them show a significant effect on catalysis. These 10 include the 6 first-shell residues reported previously (the cobalt-coordinating α-C112, α-C115, and α-C117, the catalytic β-R52 and β-R149, and the ligand-binding β-Y68).27–30 Note that in Fe-type nitrile hydratase, the residue equivalent to β-R149 of the Co-type nitrile hydratase analyzed here, β-R141, has been reported30 to show no activity upon mutation to Y or E, and significant decrease in activity upon mutation to K, although no specific kinetics values were provided. The remaining four residues corresponding to significant loss of activity upon mutation include three second-shell residues (α-D164, β-E56, and β-H147) and one third-shell residue (β-H71) reported by us to show a loss in catalytic efficiency of at least 20-fold.23 α-D164 interacts with α-C112 through a water molecule; mutation to N retains the relative size, shape and hydrogen bonding capability while removing the ability to form a negative charge. β-E56 is within hydrogen bonding distance to α-C115, β-R149 and β-H147, while the second-shell residue β-H147 is located behind both catalytic residues β-R52 and β-R149. The third-shell residue β-H71 interacts through water molecules with the metal coordinating residue, α-C115. Mutation of β-H71 to L removes hydrogen bond capabilities. In contrast, there are four variants reported by us which exhibit an insignificant change in catalytic efficiency: second-shell residues α-E168 (involved in a salt bridge to β-R52) and α-R170 (H-bonded to α-C117); and third-shell residues β-Y215 (H-bonded to the side chain of β-R149 and to distal residue β-D172) and α-Y171 (H-bonded to distal residue β-D210). These four residues are all ranked lower by POOL (ranking 22nd, 32nd, 35th, and 40th, respectively), with negative log ZP scores. Some additional residues with high POOL ranks, β-Y69, α-K131, and β-Y63, ranked fifth, sixth, and seventh, respectively, have now been studied by SDM. α-K131 is a second-shell residue located behind the first-shell residue β-Y68 with respect to the substrate. β-Y69 is a third-shell residue 11.9 Å from the Co ion and located behind α-K131 with respect to the substrate. β-Y63 is a second-shell residue behind the Co-coordinating α-C115. Results of these mutations are summarized, together with prior results, in Table1 and illustrated in Figure 2. Figure 2 plots log10 of the ratio of the catalytic efficiency for wild-type ppNH to that of the variant, as a function of the POOL log ZP score of the residue at the substituted position in the variant. The second-shell variants α-K131Q and β-Y63F both show an 11-fold reduction in catalytic efficiency. The third-shell variant β-Y69F shows a small but significant loss in catalytic efficiency.
Figure 2

Log10 of the ratio of the catalytic efficiency of WT over variant as a function of the POOL log ZP score of the substituted position in the variant for ppNH.

Log10 of the ratio of the catalytic efficiency of WT over variant as a function of the POOL log ZP score of the substituted position in the variant for ppNH. Notice in Figure 2 that the effect on catalytic efficiency generally declines with declining POOL score and falls off to a kinetically small or insignificant effect for negative values of the POOL log ZP score. (A threefold or less decrease in catalytic efficiency is not considered significant.) It is important to note in Table1 that there are second- and third-shell residues with positive POOL log ZP scores that do show a significant effect on catalytic efficiency. Thus the POOL log ZP scores give some guidance about the number of residues that participate significantly in catalysis; the residues with the higher, positive scores tend to show the most participation in catalysis whereas residues with negative scores tend not to show significant effects.

Phosphoglucose isomerase

Phosphoglucose isomerase (PGI; E.C. 5.3.1.9) is an example of a moonlighting protein31 that has multiple functions. Inside the cell, PGI catalyzes the reversible isomerization of glucose-6-phosphate to fructose-6-phosphate [Fig. 3(A)], an essential step in glycolysis. Extracellular PGI has many roles including neuroleukin,32,33 autocrine motility factor,34 and maturation factor.35 Human phosphoglucose isomerase (hPGI) is medically important because certain mutations in the gene that encodes for hPGI have been identified to cause nonspherocytic hemolytic anemia.36 In addition, hPGI is used as a biomarker because of its role as a tumor-secreted cytokine and angiogenic factor in certain types of cancer.37 The structure of hPGI is a dimer of two identical subunits.38
Figure 3

For human Phosphoglucose isomerase: A) reaction scheme; B) stereo view of the extended active site—font color indicates location—red, first sphere—blue, second sphere—green, third sphere; C) POOL log ZP score as a function of POOL rank for the 50 top-ranked residues.

For human Phosphoglucose isomerase: A) reaction scheme; B) stereo view of the extended active site—font color indicates location—red, first sphere—blue, second sphere—green, third sphere; C) POOL log ZP score as a function of POOL rank for the 50 top-ranked residues. Figure 3(B) shows a stereo view of the extended active site of human PGI (hPGI) and Figure 3(C) shows the POOL log ZP scores as a function of POOL rank for hPGI. Like ppNH, the POOL log ZP scores for hPGI exhibit a sharp fall-off after the first few residues, but then an elongated tail, indicating a prediction of an active site with many participating residues, including residues outside the first layer. The top 47 residues, corresponding to the top 8%, have positive log ZP scores. Experiments confirmed the prediction of a spatially extended active site, as shown in Table2. The catalytic residues for PGI were originally established by site-directed mutagenesis experiments39,40 on the Bacillus stearothermophilus enzyme. Catalytically active residues for mammalian PGI were then identified by sequence alignment and by X-ray structure determination.38,41–43 Although the human and Bacillus stearothermophilus enzymes share only 21% sequence identity, their active site residues are well conserved. Kinetics data in Table2 are for human PGI, except where noted. The seven cases in parentheses (R273, E358, E217, K519, Q512, K211, and K440) are for the homologous residues in the B. stearothermophilus enzyme (R207, E290, E150, K425, Q418, K144, and K356, respectively) and all except for K440 are presumed to be important also in human PGI. Figure 4 shows a plot of log10 of the ratio of the catalytic efficiency for wild-type PGI to that of the variant, as a function of the POOL log ZP score of the residue at the substituted position in the variant.
Table 2

Residue POOL Rankings, POOL log ZP Score, and Loss of Catalytic Efficiency for PGI Variants

VariantShellPOOL rankPOOL log ZP score(kcat/KM) wild type/(kcat/KM)mutantReference
H389LFirst12.0>10,00024
H396LThird31.32424
(R273A)First60.79(170,000)39
Y341FThird110.661.524
(E358Q)First130.59(2,100)40
E495QSecond140.5825024
(E217Q)First200.49(130,000)40
H100LThird210.4863024
Y274FSecond230.342.224
N386ASecond250.222.824
K362ASecond260.27>10,00024
Q388ASecond290.181924
(K519A)First390.06(390)39
(Q512A)First440.02(120)40
(K211A)First450.01(19)39
D511NSecond57−0.0323024
N154QSecond135−0.083.824
S185ASecond140−0.083.524
(K440A)Third162−0.08(1.0)39

Parentheses indicate kinetics data reported for the homologous residue in the B. stearothermophilus enzyme (with hPGI numbering). All other kinetics data are for human PGI.

Figure 4

Log 10 of the ratio of the catalytic efficiency of WT over variant as a function of the POOL log ZP score of the substituted position in the variant for human phosphoglucose isomerase.

Residue POOL Rankings, POOL log ZP Score, and Loss of Catalytic Efficiency for PGI Variants Parentheses indicate kinetics data reported for the homologous residue in the B. stearothermophilus enzyme (with hPGI numbering). All other kinetics data are for human PGI. Log 10 of the ratio of the catalytic efficiency of WT over variant as a function of the POOL log ZP score of the substituted position in the variant for human phosphoglucose isomerase. For hPGI, important first-shell residues, H389, R273, E358, E217, K519, Q512, and K211, all rank among the top 45 residues and all have positive POOL log ZP scores. Second-shell residues, E495, K362, and Q388, and third shell residues, H396 and H100, that have been shown to be important for catalysis,24 are also among the highest ranking residues. The second-shell residue D511, that has been shown24 to play a significant role in catalysis, is ranked 57th and falls just below the range of residues with positive scores. There are three highly ranked second- and third-shell residues, Y341, Y274, and N386 that proved to be false positives, at least with respect to single-site mutation. Residues that are highly conserved but ranked well below those with positive scores, N154, S185, and K440, are true negatives and the effects of mutation on catalysis are not significant.

Ketosteroid isomerase

Δ5−3-Keto steroid isomerase (KSI; EC 5.3.3.1) is a well-studied44–51 enzyme with high catalytic activity, with a rate approaching the diffusion-controlled limit. KSI catalyzes the isomerization of a Δ5−3-ketosteroid to a Δ4−3-ketosteroid [Fig. 5(A)], involving C-H bond cleavage and formation through a dienolate intermediate. KSI has a metabolic role in the degradation of steroids in bacterial species that can live on steroids as their sole carbon source. High-resolution crystal structures of KSI from P. putida (ppKSI, PDB ID: 1OPY52) and Commamonas testosteroni (PDB ID: 1QJG53) reveal that, although these two enzymes share only 34% sequence identity, their active site residues are conserved and structurally superimposable. Figure 5(B) shows a stereo view of the active site of ppKSI.
Figure 5

For Pseudomonas putida Ketosteroid isomerase: A) reaction scheme; B) stereo view of the extended active site—font color indicates location—red, first sphere—blue, second sphere—green, third sphere; C) POOL log ZP score as a function of POOL rank for the top 20 POOL-ranked residues.

For Pseudomonas putida Ketosteroid isomerase: A) reaction scheme; B) stereo view of the extended active site—font color indicates location—red, first sphere—blue, second sphere—green, third sphere; C) POOL log ZP score as a function of POOL rank for the top 20 POOL-ranked residues. While ppKSI and hPGI both catalyze isomerization reactions and are both members of EC class 5.3, these two isomerases possess very different types of active sites.24 In the plot of POOL log ZP score as a function of POOL rank for the residues of ppKSI, shown in Figure 5(C), the residues of ppKSI exhibit a sharp fall-off after the first few residues and no extended tail, corresponding to a prediction of a compact active site. The 13 highest scoring residues, comprising 10% of this small protein, have positive log ZP scores. The top three residues in the POOL ranking, Y32, Y57, and Y16, form an important hydrogen bonding network in the active site of KSI;50 Y16 stabilizes the dienolate intermediate and serves as a proton donor. The Y16F variant shows a 550-fold reduction in kcat/KM relative to wild type;54 the Y16S variant shows an 18-fold reduction.55 While the Y32F and Y57F variants do not show major differences from wild type, with 1.3- and 2.9-fold reductions54 in kcat/KM, respectively, kcat/KM for the Y57S variant is 66-fold less55 than wild type (see Table3). Table3 and Figure 6 show that mostly first-shell residues participate in catalysis; residues outside the first shell have only small effects, as predicted.
Table 3

Residue POOL Rankings, POOL log ZP Score, and Change in Catalytic Efficiency for P. putida KSI Variants

VariantShellPOOL rankPOOL log ZP score(kcat/KM)wild type/(kcat/KM)mutantReference
Y32FSecond11.91.354
Y32S3.655
Y57FFirst21.42.954
Y57S6655
Y16FFirst31.455054
Y16S1855
D40NFirst41.1334,00056
W120FFirst50.201357
F56LFirst60.133.657
D103NFirst90.0742751
E39QSecond120.0152.424
F86LFirst14−0.0069.257
M105AThird16−0.0267.624
M84ASecond23−0.050.9824
M31ASecond34−0.052.124
C97SSecond39−0.051.358
C81SSecond48−0.050.9858
C69SSecond58−0.050.9258
Figure 6

Log 10 of the ratio of the catalytic efficiency of WT over variant as a function of the POOL log ZP score of the substituted position in the variant for P. putida ketosteroid isomerase.

Residue POOL Rankings, POOL log ZP Score, and Change in Catalytic Efficiency for P. putida KSI Variants Log 10 of the ratio of the catalytic efficiency of WT over variant as a function of the POOL log ZP score of the substituted position in the variant for P. putida ketosteroid isomerase.

Alkaline phosphatase

Alkaline phosphatase (AP, EC 3.1.3.1) has been widely studied using both directed evolution and rational protein design approaches.21 AP is a dimeric metalloenzyme containing two zinc ions and one magnesium ion per subunit and it catalyzes the reversible hydrolysis of phosphomonoesters to yield inorganic phosphate plus an alcohol; the reaction mechanism is shown in Figure 7(A). In the crystal structure of E. coli AP (PDB ID: 1ALK59), there is an inorganic phosphate ion bound to the two zinc ions (Zn1 and Zn2) and to the guanidinium group of R166.19 The active site region of the protein includes D101, S102, A103, the three metal ions and their coordinating residues, and R166;59 R166 has been shown to be important for transition state stabilization and for discrimination between phosphates and sulfates.60,61 There is a hydrogen bond network that includes hydrogen bonds between D101 and R166 and between R166 and a water molecule.62 K328 interacts with the phosphate group in the active center through a water mediated hydrogen bond. A stereo image of the active site is shown in Figure 7(B).
Figure 7

For E. coli alkaline phosphatase: A) reaction scheme; B) stereo view of the active site. The Zn ions are rendered as grey balls; Font color indicates location—red, first sphere—blue, second sphere—green, third sphere; C) POOL log ZP score as a function of POOL rank for the top 50 POOL-ranked residues.

For E. coli alkaline phosphatase: A) reaction scheme; B) stereo view of the active site. The Zn ions are rendered as grey balls; Font color indicates location—red, first sphere—blue, second sphere—green, third sphere; C) POOL log ZP score as a function of POOL rank for the top 50 POOL-ranked residues. Two interesting mutations, designed specifically to increase the catalytic activity of AP, are associated with the first-shell residue D153 and the second-shell residue D330. D153 is involved in an ion pair interaction with K328 and in a water-mediated interaction with the Mg2+ ion.21 D153 also serves to position the catalytic residue, R166. Using rational protein design methods, four different single-point mutations were made at the D153 position in an attempt to understand more clearly its role and to increase the catalytic efficiency of the protein.63–65 Three of the four mutations, D153G, D153E, and D153A, resulted in an increased catalytic rate. In an effort to increase the turnover rate to a level comparable to that of mammalian enzymes, while allowing the protein to retain its high thermostability, Muller et al.21 assumed that while mutations to residues in the active site most often had a negative effect on catalysis, residues outside the catalytic site may also influence catalysis. Using directed evolution approaches, they showed that variants with double or triple mutations of D153, K328, and D330 showed increased activity over wild type while still maintaining thermostability.21 Figure 7(C) shows the POOL log ZP scores as a function of POOL rank for E. coli AP. The top 36 residues, or 8% of all residues, have positive log ZP scores; these top 36 include the two catalytic residues S102 and R166, D101 in the active site, and the eight Zn- and Mg-coordinating residues D51, T155, E322, D327, H331, D369, H370, and H412. Note that the metal-coordinating residues H370 and E322 are ranked third and fifth, respectively, by POOL (kinetics data not available). Also included is the second-shell residue H372, which hydrogen bonds to the Zn-coordinating residue D327. Mutation of H372 to alanine shows no change in Zn binding affinity, but kcat of the H372A variant is nine-fold less than that of wild type, with a 3.7-fold increase in catalytic efficiency (kcat/KM).18 In this work, two additional second-shell residues in the top 36, H86 and M53, the third-shell residue E57 at rank 17, and the lower-ranked second-shell residues Q435, E150, and S105, were studied by site-directed mutagenesis. Additional variants were also made for H372 and D330, two second-shell residues previously shown to influence activity. Table4 shows the POOL rank, POOL log ZP score, and change in catalytic efficiency for some E. coli AP variants. In this case, the log ZP = 0 cutoff for POOL scores results in over-prediction of the essential residues for catalysis. While all of the residues known to have significant, single-handed effects on activity are included in the top 20 predicted by POOL, predicted residues H86, E57, and M53 proved to be false positives with respect to single-point mutations.
Table 4

Residue POOL Rankings, POOL log ZP Score, and Change in Catalytic Efficiency for E. coli Alkaline Phosphatase Variants

VariantShellPool rankPOOL log ZP score(kcat/KM) wild type/(kcat/KM)mutantReference
D51NFirst12.81266
D369NFirst22.69519
D327NFirst42.510067
D327AFirst4>1,000,00067
H412NFirst62.013068
H331EFirst71.297265
D101SFirst81.00.263
H372ASecond91.00.2718
H372DSecond93.3This work
H372LSecond92.5This work
D153GFirst100.720.263
D153NFirst101.164
R166AFirst110.6031369
K328AFirst130.473.870
H86LSecond140.453.2This work
E57QThird170.171.8This work
T155MFirst180.1667819
S102AFirst190.16150,00071
M53ASecond210.131.5This work
M53TSecond212.4This work
D330NSecond220.060.221
D330NSecond222.1This work
Q435ESecond44−0.031.7This work
T100VSecond51−0.030.320
V99ASecond100−0.060.220
E150QSecond106−0.063.7This work
S105ASecond136−0.062.1This work

Residue numbering follows the PDB file 1ALK.

Residue POOL Rankings, POOL log ZP Score, and Change in Catalytic Efficiency for E. coli Alkaline Phosphatase Variants Residue numbering follows the PDB file 1ALK.

Other examples of the importance of distal residues

Mandelate racemase

Mandelate racemase (EC 5.1.2.2) catalyzes the interconversion of the (R) and (S) enantiomers of mandelic acid (Scheme 8) via abstraction of a proton from the α-carbon atom.72–74 It is a divalent cation-dependent protein and in the P. putida crystal structure (PDB ID: 2MNR), the Mn2+ ion is coordinated by D195, E221, E222, and E247.75 The reaction proceeds via a two-base mechanism; H297 acts as the (R)-specific acid/base catalyst73 while K166 acts as the (S)-specific acid/base catalyst.74 The second-shell residue D270 forms a hydrogen bond with the catalytic H297. The single mutation D270N results in a 10,000-fold decrease in enzyme activity compared with wild type for both (R)- and (S)-mandelate substrates.22 The N270 side chain in the structure of the variant is superimposable on the D270 side chain in the structure of the wild type; the remainder of the two structures are identical except that the side chain of the catalytic H297 is “rotated and displaced toward the binding site” in the mutant structure.22 It was argued that D270 is necessary to impart the correct pKa to the catalytic H297.22 The two catalytic bases H297 and K166 and the important second-shell residues D270 are all highly ranked by POOL, with log ZP scores of 0.71, 0.65, and 0.65, respectively.
Scheme 1

Reaction catalyzed by mandelate racemase.

Reaction catalyzed by mandelate racemase. Reaction catalyzed by carbonic anhydrase.

Carbonic anhydrase isoform II

Carbonic anhydrase isoform II (CAII, EC 4.2.1.1) is a zinc-dependent metalloenzyme that catalyzes the reversible hydration of carbon dioxide (Scheme 9).76 The hydrolysis of carbon dioxide to the bicarbonate ion and a proton occurs via a two-step mechanism. In the first step, a zinc-bound hydroxide ion acts as the nucleophile which attacks CO2 to form a zinc-bound bicarbonate intermediate. This intermediate is then displaced by a water molecule creating a zinc-H2O complex and free bicarbonate ion. In the rate determining step, the zinc-bound hydroxide is regenerated through the transfer of a proton to the solvent facilitated by the active site histidine residue, H64, which acts as a proton shuttle.77,78 In the crystal structure of CAII, (PDB ID: 1CA279) the zinc ion is tetrahedrally coordinated with the side chains of H94, H96, and H119 and a hydroxide ion. T199 accepts a hydrogen bond from the zinc-bound hydroxide ion and donates a hydrogen bond to the side chain of E106. T199 also helps to orient the hydroxide ion properly for nucleophilic attack on the substrate, CO2.80 The second-shell residue H107 when mutated to Y causes CAII deficiency syndrome in vivo.81 For another second-shell residue E106, loss-of-charge variants E106Q and E106A show decreased catalytic rate constants by factors of 850 and 110, respectively; the variant E106D that maintains the negative charge shows no change in catalytic rate from that of wild type.82
Scheme 2

Reaction catalyzed by carbonic anhydrase.

For CAII, only THEMATICS and ConCavity data were used in the POOL analysis because of residue numbering disparities in the INTREPID data. The top 19 residues have positive log ZP scores. The Zn-coordinating histidine residues H94, H119, and H96 are the top three ranking residues in POOL. The important second-shell residue E106 ranks fourth with a log ZP score of 1.9. The proton shuttle H64 ranks sixth with a log ZP score of 0.91. H107 in the second shell, mutation of which is associated with disease, ranks seventh with a log ZP score of 0.58. The hydroxide activator T199 ranks 10th with a log ZP score of 0.15.

Discussion

Distal residues can be important contributors to enzyme catalysis but their participation is hard to predict computationally by prior methods. Distal residues are clearly important in computationally driven enzyme design, but better understanding of how to use them is needed.83 While there are presently few enzymes with extensive data on variants with substitutions in distal positions, the examples presented here suggest that the log ZP POOL scores can predict the distal residues and provide useful guidance about the extent of distal residue participation. The log ZP scores provide a more accurate prediction of distal residue participation than the previously utilized 8 to 10% cutoff from the top of the POOL rank order. For most instances, the present log ZP score cutoff (log ZP > 0) generates fewer false positives than the 8 to 10% cutoff. More importantly, the log ZP method allows that some proteins have much more extensive distal residue participation than others. The multilayer active sites of P. putida nitrile hydratase and of human phosphoglucose isomerase are predicted by the log ZP scores, as is the single-layer active site of P. putida ketosteroid isomerase. The log ZP score cutoff utilized here results in over-prediction of distal residue involvement in E. coli alkaline phosphatase. This may be because of extensive collective interactions in AP, such that mutation at a single position may have no significant effect on catalysis because a sufficient number of additional important residues remain. We note that some of the positions predicted by POOL, such as D153 and D330, for which single-point variants show little difference from wild type, have been shown to be significant in two- and three-point variants.21 While fewer experimental data points are available for P. putida mandelate racemase and for human carbonic anhydrase II, the POOL log ZP scores properly predict the previously reported participation of distal residues. For ketosteroid isomerase, the two residues contributing the most to catalysis, D40 and Y16, are both ranked in the top four by POOL. The 66-fold loss in catalytic efficiency observed55 for the Y57S variant suggests that the aromatic ring of this residue, ranked second by POOL, is important for catalysis. From very recent X-ray solution scattering and NMR studies of the Y57S variant it was concluded that, while the overall size of the variant structure is reduced relative to wild type, the active site cavity is enlarged,84 suggesting that the bulky phenyl group is necessary to maintain the correct spatial arrangement within the active site. The number 1 POOL ranking for Y32 is an apparent false positive. It is possible that the very intense electric field85 inside the active site is not significantly diminished by the loss of one residue in the second shell. Another possible explanation is that, upon mutation of Tyr to Phe, a water molecule takes the place of the phenolic hydroxyl group and assumes its role, perhaps then allowing catalytic efficiency to be maintained. The log ZP scores provide richer information about the importance of amino acid residues in the function of the enzyme than do other computational predictors, including the simple POOL ranks used previously. The cutoff utilized in the present work corresponds to the best performance for the admittedly small available set of proteins with experimental mutagenesis data on distal residues. More examples with extensive experimental data on distal residue mutagenesis are needed to establish the optimum cutoff for the log ZP scores. Distal residues influence the polarity, proton transfer properties, electrostatic ambience, spatial arrangement, and flexibility of active site residues. While the present method predicts most of the distal residues that influence catalysis, it is unable to predict the effects of some substitutions that alter the orientation or relative positions of the catalytic residues by virtue of their steric bulk, for instance in the M105A variant of ketosteroid isomerase.24 Since distal residues can play key supporting roles in catalysis, and since the electrostatic properties of the residues in and around the active site are critically important, these factors need to be considered in rational protein design.

Materials and Methods

Computational methods

The protein structures used as the input data for all calculations were downloaded from the Protein Data Bank (PDB, http://www.rcsb.org/pdb/).86 Coordinates for all the proteins in the test set were analyzed by Theoretical Microscopic Titration Curves (THEMATICS)5,87 using the method of Wei.7 POOL calculations3 were performed as described previously.4 Unless otherwise noted, input features include THEMATICS metrics, INTREPID evolutionary scores88 and ConCavity10 surface geometry features. INTREPID scores were obtained using the Berkeley phylogenomics web server.89 The structure-only version of ConCavity was used. The Ligand Protein Contacts—Contacts of Structural Units (LPC-CSU) server was used to identify the ligand- and/or metal-binding residues and the residue-residue contacts,90,91 in order to assign residues to shells. Both the known catalytic and ligand/metal binding residues were considered first-shell for this study. Experimental mutagenesis data reported previously were obtained from individual literature references, with the help of the BRENDA enzyme database92 and from the Protein Mutant Database (http://www.genome.ad.jp/dbget-bin/www_bfind?pmd) within the DBGET database retrieval system (http://www.genome.jp/dbget/). Expression, purification, and kinetics analysis were performed exactly as described previously.23

Site-directed mutagenesis

An expression plasmid pEK2969 generously provided by E.R. Kantrowitz (Boston College), encoding the E. coli alkaline phosphatase gene, was used for protein expression and mutagenesis. Mutations were constructed using a Quikchange site-directed mutagenesis kit (Agilent Technologies, Santa Clara, CA). The mutated genes were sequenced to verify the construct (Massachusetts General Hospital DNA core, Cambridge, MA). After confirmation of the intended mutation, the plasmid was transformed into SM547 (a gift from E.R. Kantrowitz)69 competent cells and selected on LB agar containing 100 µg/mL ampicillin.

Protein expression and purification

Cells were grown at 37°C in 1 L of 2× YT media containing ampicillin (100 µg/mL) for 12 h. The cells were harvested, washed, and osmotically shocked as previously described93 and then protein was precipitated, suspended, dialyzed, and purified on a HiTrap FastFlow Q column (GE Healthcare) as previously described.69 Protein fractions greater than 95% pure as judged by SDS-PAGE and Coomassie blue staining were pooled and stored at −20°C. The concentration of all proteins was determined by Bradford assay (Bio-Rad, Hercules, CA) using bovine serum albumin as a standard.

Kinetics

Alkaline phosphatase activity was measured spectrophotometrically utilizing p-nitrophenyl phosphate as the substrate. Formation of p-nitrophenolate was monitored at 410 nm at room temperature in Tris buffer (1.0M Tris-HCl pH 8.0). Initial velocities were calculated utilizing a molar absorptivity coefficient of 1.42 x 104M−1 cm−1.69 Nonlinear regression was performed to calculate KM and kcat values using GraphPad Prism 5 version 5.02.
  90 in total

1.  The Catalytic Site Atlas: a resource of catalytic sites and residues identified in enzymes using structural data.

Authors:  Craig T Porter; Gail J Bartlett; Janet M Thornton
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

2.  Function of arginine-166 in the active site of Escherichia coli alkaline phosphatase.

Authors:  A Chaidaroglou; D J Brezinski; S A Middleton; E R Kantrowitz
Journal:  Biochemistry       Date:  1988-11-01       Impact factor: 3.162

3.  On the localization of alkaline phosphatase and cyclic phosphodiesterase in Escherichia coli.

Authors:  R W Brockman; L A Heppel
Journal:  Biochemistry       Date:  1968-07       Impact factor: 3.162

4.  The neurotrophic factor neuroleukin is 90% homologous with phosphohexose isomerase.

Authors:  M Chaput; V Claes; D Portetelle; I Cludts; A Cravador; A Burny; H Gras; A Tartar
Journal:  Nature       Date:  1988-03-31       Impact factor: 49.962

5.  Site-directed mutagenesis for cysteine residues of cobalt-containing nitrile hydratase.

Authors:  Yoshiteru Hashimoto; Satoshi Sasaki; Sachio Herai; Ken-Ichi Oinuma; Sakayu Shimizu; Michihiko Kobayashi
Journal:  J Inorg Biochem       Date:  2002-07-25       Impact factor: 4.155

6.  The crystal structure of mouse phosphoglucose isomerase at 1.6A resolution and its complex with glucose 6-phosphate reveals the catalytic mechanism of sugar ring opening.

Authors:  J T Graham Solomons; Ella M Zimmerly; Suzanne Burns; N Krishnamurthy; Michael K Swan; Sandra Krings; Hilary Muirhead; John Chirgwin; Christopher Davies
Journal:  J Mol Biol       Date:  2004-09-17       Impact factor: 5.469

7.  Refined structure of human carbonic anhydrase II at 2.0 A resolution.

Authors:  A E Eriksson; T A Jones; A Liljas
Journal:  Proteins       Date:  1988

8.  Mutational and structural analysis of cobalt-containing nitrile hydratase on substrate and metal binding.

Authors:  Akimasa Miyanaga; Shinya Fushinobu; Kiyoshi Ito; Hirofumi Shoun; Takayoshi Wakagi
Journal:  Eur J Biochem       Date:  2004-01

9.  Glutamic acid residues as metal ligands in the active site of Escherichia coli alkaline phosphatase.

Authors:  Cheryl L Wojciechowski; Evan R Kantrowitz
Journal:  Biochim Biophys Acta       Date:  2003-06-26

10.  Laboratory-evolved vanillyl-alcohol oxidase produces natural vanillin.

Authors:  Robert H H van den Heuvel; Willy A M van den Berg; Stefano Rovida; Willem J H van Berkel
Journal:  J Biol Chem       Date:  2004-05-28       Impact factor: 5.157

View more
  11 in total

1.  Contribution of buried distal amino acid residues in horse liver alcohol dehydrogenase to structure and catalysis.

Authors:  Karthik K Shanmuganatham; Rachel S Wallace; Ann Ting-I Lee; Bryce V Plapp
Journal:  Protein Sci       Date:  2018-01-25       Impact factor: 6.725

2.  Tri-arginine exosite patch of caspase-6 recruits substrates for hydrolysis.

Authors:  Derek J MacPherson; Caitlyn L Mills; Mary Jo Ondrechen; Jeanne A Hardy
Journal:  J Biol Chem       Date:  2018-11-12       Impact factor: 5.157

3.  Effects of Distal Mutations on Prolyl-Adenylate Formation of Escherichia coli Prolyl-tRNA Synthetase.

Authors:  Jonathan Zajac; Heidi Anderson; Lauren Adams; Dechen Wangmo; Shanzay Suhail; Aimee Almen; Lauren Berns; Breanna Coerber; Logan Dawson; Andrea Hunger; Julia Jehn; Joseph Johnson; Naomi Plack; Steven Strasser; Murphi Williams; Sudeep Bhattacharyya; Sanchita Hati
Journal:  Protein J       Date:  2020-10       Impact factor: 2.371

4.  Control of the Position of Oxygen Delivery in Soybean Lipoxygenase-1 by Amino Acid Side Chains within a Gas Migration Channel.

Authors:  Lara Collazo; Judith P Klinman
Journal:  J Biol Chem       Date:  2016-02-10       Impact factor: 5.157

5.  Electrostatic fingerprints of catalytically active amino acids in enzymes.

Authors:  Suhasini M Iyengar; Kelly K Barnsley; Rholee Xu; Aleksandr Prystupa; Mary Jo Ondrechen
Journal:  Protein Sci       Date:  2022-05       Impact factor: 6.725

6.  CSmetaPred: a consensus method for prediction of catalytic residues.

Authors:  Preeti Choudhary; Shailesh Kumar; Anand Kumar Bachhawat; Shashi Bhushan Pandit
Journal:  BMC Bioinformatics       Date:  2017-12-22       Impact factor: 3.169

7.  Probing remote residues important for catalysis in Escherichia coli ornithine transcarbamoylase.

Authors:  Lisa Ngu; Jenifer N Winters; Kien Nguyen; Kevin E Ramos; Nicholas A DeLateur; Lee Makowski; Paul C Whitford; Mary Jo Ondrechen; Penny J Beuning
Journal:  PLoS One       Date:  2020-02-06       Impact factor: 3.240

8.  Cutoff lensing: predicting catalytic sites in enzymes.

Authors:  Simon Aubailly; Francesco Piazza
Journal:  Sci Rep       Date:  2015-10-08       Impact factor: 4.379

9.  Characterizing the relation of functional and Early Folding Residues in protein structures using the example of aminoacyl-tRNA synthetases.

Authors:  Sebastian Bittrich; Michael Schroeder; Dirk Labudde
Journal:  PLoS One       Date:  2018-10-30       Impact factor: 3.240

10.  Analysis of electrostatic coupling throughout the laboratory evolution of a designed retroaldolase.

Authors:  Timothy A Coulther; Moritz Pott; Cathleen Zeymer; Donald Hilvert; Mary Jo Ondrechen
Journal:  Protein Sci       Date:  2021-05-24       Impact factor: 6.725

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.