Literature DB >> 35070165

Critical assessment of structure-based approaches to improve protein resistance in aqueous ionic liquids by enzyme-wide saturation mutagenesis.

Till El Harrar^1,2, Mehdi D Davari³, Karl-Erich Jaeger^4,5, Ulrich Schwaneberg^1,6, Holger Gohlke^2,7.

Abstract

Ionic liquids (IL) and aqueous ionic liquids (aIL) are attractive (co-)solvents for green industrial processes involving biocatalysts, but often reduce enzyme activity. Experimental and computational methods are applied to predict favorable substitution sites and, most often, subsequent site-directed surface charge modifications are introduced to enhance enzyme resistance towards aIL. However, almost no studies evaluate the prediction precision with random mutagenesis or the application of simple data-driven filtering processes. Here, we systematically and rigorously evaluated the performance of 22 previously described structure-based approaches to increase enzyme resistance to aIL based on an experimental complete site-saturation mutagenesis library of Bacillus subtilis Lipase A (BsLipA) screened against four aIL. We show that, surprisingly, most of the approaches yield low gain-in-precision (GiP) values, particularly for predicting relevant positions: 14 approaches perform worse than random mutagenesis. Encouragingly, exploiting experimental information on the thermostability of BsLipA or structural weak spots of BsLipA predicted by rigidity theory yields GiP = 3.03 and 2.39 for relevant variants and GiP = 1.61 and 1.41 for relevant positions. Combining five simple-to-compute physicochemical and evolutionary properties substantially increases the precision of predicting relevant variants and positions, yielding GiP = 3.35 and 1.29. Finally, combining these properties with predictions of structural weak spots identified by rigidity theory additionally improves GiP for relevant variants up to 4-fold to ∼10 and sustains or increases GiP for relevant positions, resulting in a prediction precision of ∼90% compared to ∼9% in random mutagenesis. This combination should be applicable to other enzyme systems for guiding protein engineering approaches towards improved aIL resistance.

Entities: Chemical

Keywords: Bacillus subtilis lipase A; Ionic liquids; Protein engineering; Protein stability; Site-saturation mutagenesis

Year: 2021 PMID： 35070165 PMCID： PMC8752993 DOI： 10.1016/j.csbj.2021.12.018

Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN： 2001-0370 Impact factor: 7.271

Introduction

With the world population continuing to increase [1], studies forecast a shortage of natural resources, such as fresh water [2]and fossil fuels [3], [4]. Green industrial processes, such as the enzymatic production of biofuel and other valuable products from abundantly available plant material, attempt to solve these problems [5], [6], [7], [8], [9], [10], [11]. However, in particular, current biofuel production uses environmentally unfriendly acid catalysis and requires large amounts of freshwater for the reaction workup [12], [13], [14]. Consequently, environmentally friendly alternatives to produce biofuel are needed. Ionic liquids (IL) are attractive solvents for this, as some IL dissolve cellulosic plant material without the need for heat activation or pretreatment using solvents such as strong acids or carbon disulfide [12], [15]. For instance, IL-pretreated holocellulose retains a high digestibility for enzymes after recrystallization in water [21]. However, pure IL often result in enzyme activities impractical for industrial processes [16], [17], [18], [19], [20], and aqueous ionic liquids (aIL), e.g., the remnants of IL in recrystallized holocellulose, show a reduced yet still marked impact on enzymatic activity [22], [23]. Hence, for using aIL in green industrial processes, it is of utmost importance to understand how aIL affect enzyme stability and activity and to use this knowledge to improve enzyme resistance against these solvents. To improve enzyme resistance to aIL, studies frequently relied on straightforward and well-established approaches, such as directed evolution [24], [25], to generate aIL-resistant enzyme variants [26], [27], [28]. The low experimental efforts, however, come with the drawback that mutations are randomly generated (albeit this can be directed to a certain degree using, e.g., modified polymerases), leading predominantly to minor changes in the protein [29] and often incomplete coverage of the sequence and position space [30]. More recently, approaches to increase aIL resistance transposed towards data-driven protein engineering approaches, which rely on prior knowledge to improve specific enzyme properties by introducing changes at distinguished sequence positions and can cover the whole sequence space. Here, variant libraries are designed by predicting advantageous positions based on, e.g., structure [31], [32], [33], [34], [35] or consensus information [36], [37], [38], [39] or by predicting substitutions (exchanges of an amino acid to a different amino acid due to a mutation in the corresponding DNA sequence) at distinct positions with a specific goal in mind, e.g., in disulfide bond engineering [40] or surface charge modification approaches [17], [18], [19], [20], [41], [42], [43], [44]. Surface charge modification, in particular, is a widely proposed approach to increase aIL resistance following the rationale that introducing charged, ion-repelling substitutions at the protein surface can prevent aIL interactions with enzymes and their subsequent effects [41], [17], [18], [19], [20], [44], [45], [46]. Over the years, this approach became noticeably more specific, as it evolved from a global chemical modification of all lysine residues of a protein [18], [19], [20] over fractional substitutions of lysine residues [41] to an NMR-based site-specific approach targeting distinguished positions around perturbed protein residues [17]. However, the lack of available, systematic large-scale data prevented evaluating the performance of such approaches against random mutagenesis or simple structure-based guidelines. For the model enzyme Bacillus subtilis Lipase A (BsLipA), a complete site-saturation mutagenesis library (termed “BsLipA SSM library” hereafter) is available that covers all 3620 potential single substitutions with natural amino acids (181 substitution sites with 20 possible substitutions at each site) [16]). The BsLipA SSM library was screened towards thermostability [33], resistance to four detergents [33], [47]), resistance to three organic solvents [30], and aIL resistance to four imidazolium-based aIL (0.9 M 1-butyl-3-methylimidazolium bromide ([BMIM/Br]), 1.2 M 1-butyl-3-methylimidazolium chloride ([BMIM/Cl]), 0.6 M 1-butyl-3-methylimidazolium iodide ([BMIM/I]) and 0.7 M 1-butyl-3-methylimidazolium trifluoromethanesulfonate ([BMIM/TfO])) [16]. The concentrations of the individual aIL were chosen to result in residual activities of 30–40% with respect to the activity in buffer to allow for relative comparisons between the aIL [16]. BsLipA is particularly interesting for that, as it is a small lipase and does not show interfacial activation, but has often been used in similar experimental and computational studies [16], [17], [33], [48], [44], [45], and high-resolution X-ray crystal structures (PDB ID: 1I6W [50] and 1ISP [51]) are available. An initial analysis of the BsLipA SSM library showed that more than half of all amino acid positions contribute to IL resistance of BsLipA. It further revealed substitution patterns at which presumably high fractions of aIL-resistant variants occur, e.g., for substitutions at specific secondary structure elements [16] or substitutions to chemically different amino acids [16]. Subsequent studies based on the BsLipA SSM library proposed surface charge-engineering and increasing the substrate cleft polarity to improve aIL resistance [16], [45], [49]. However, in these cases, the results were not related to a priori probabilities, such that the performance of these guidelines for suggesting aIL-tolerant variants may be overrated (see also below). A previous large-scale analysis of the BsLipA SSM library with respect to thermostability and detergent resistance revealed significant improvements in prediction accuracy compared to random mutagenesis for a data-driven structural stability-based approach [33]. Additionally, data mining of the BsLipA SSM library [30], [33] and another large-scale library [52] showed that applying simple physicochemical properties to predict substitutions, such as the solvent-accessibility (SA) or the change in unfolding free energy (ΔΔGunf), increases the prediction accuracy for thermostability or detergent resistance [30], [33], [52]. Hence, the BsLipA SSM library offers a unique opportunity to evaluate the performance of commonly applied approaches to increase aIL resistance towards their prediction accuracy of beneficial substitutions and substitution sites. Furthermore, the BsLipA SSM library can be used to systematically evaluate new guidelines aiming at a time- and cost-efficient knowledge-driven protein engineering towards aIL resistance. In this work, we show for the BsLipA SSM library that the prediction accuracy of commonly used approaches and guidelines to improve aIL resistance of enzymes is surprisingly low. We apply rigorous binary classifiers and report the results relative to performing unbiased random mutagenesis for evaluation. This way, we account for a priori probabilities. Furthermore, we introduce a rational approach that outperforms currently applied approaches, can be computed within a few hours, and only requires a protein structure as input.

Results

In total, 9% of all variants show significantly increased aIL resistance, and 57% of all positions harbor such variants

In total, The BsLipA SSM library contains 3620 variants at 181 positions that were tested for residual activity (RAaIL; Eq. S1) in 0.9 M [BMIM/Br], 1.2 M [BMIM/Cl], 0.6 M [BMIM/I], and 0.7 M [BMIM/TfO] and subsequently assessed concerning the variance of the data and significance of changes (see Section 3.1 in Supplementary Information) [16]. The aIL resistance of a variant was considered significantly improved when RAvariant,aIL ≥ RAwildtype,aIL + 3 σaIL, with RAvariant,aIL and RAwildtype,aIL being the RAaIL of the variant or wildtype in aIL and buffer, respectively, and σaIL being the standard deviation of the assay in the respective aIL [16]. 3 σaIL was chosen because it corresponds to a p-value below 0.01, assuming a Gaussian distribution of the RAaIL. Throughout this study, variants with significantly improved aIL resistance and positions harboring such substitutions will be termed “relevant variants” or “relevant positions”. A graphical representation of the BsLipA variant distribution of variants and positions is shown in Fig. 1.

Fig. 1

Distributions of relevant variants and positions for the four aIL in the BsLipA SSM library. Data is analyzed by focussing on relevant variants (A-B) and relevant positions (C-D). (A) The average number of relevant variants per position is mapped onto the BsLipA structure with blue (red) color depicting a low (high) amount of variants per position. The catalytic site residues S77, D133, and H156 are depicted as sticks and colored in green. (B) Average number of relevant variants per position. The majority of the positions yields less than one aIL resistant variant, and few positions yield multiple (>4) aIL resistant variants. (C) Number of positions that are relevant in n = 0 to 4 aIL. Almost half of all BsLipA positions (89 positions) yield relevant variants in three or more aIL, and only ∼20% (39 positions) yield variants that are not improved in any aIL. (D) Data of (C) mapped onto the BsLipA structure with colors depicting the number of aIL (white: 0; light blue: 1; blue: 2; magenta:3; red:4). The catalytic site residues S77, D133, and H156 are depicted as sticks and colored in green. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.) Averaged over all four aIL, only 9% of all substitutions (310 variants) yielded relevant variants ([BMIM/Br]: 8% or 263 variants; [BMIM/Cl]: 13%/462; [BMIM/I]: 6%/206; [BMIM/TfO]: 9%/292). This proportion (9%) represents the chance of finding relevant variants using unbiased random mutagenesis, e.g., by error-prone PCR (epPCR) with equal probabilities for all variants; experimental biases, such as the preference of Taq polymerase [53] in epPCR for AT → GC transitions, are thus not considered [54]. This value will subsequently be used in our analyses to evaluate the performance of approaches to predict relevant variants. The percentage of relevant variants is comparable to that obtained for detergent resistance (∼12%) [33], [47]. The slightly lower percentage for aIL may be due to using 3 σaIL as a limit to define significance, whereas 2 σD was used in the case of detergents [33], [47]. The conservative limits are used to counterbalance experimental uncertainties in the RAaIL that originate from enzyme activities measured in the supernatant [16], which may be influenced by differences in thermodynamic or kinetic protein stability [34], [55] or protein expression [56]. The RAaIL distributions for the four aIL are shown in Fig. S1. In contrast, more than half of all substitution sites (57% or 103 positions) harbored such relevant variants ([BMIM/Br]: 50% or 91 positions; [BMIM/Cl]: 69%/124; [BMIM/I]: 52%/95; [BMIM/TfO]: 57%/104). Interestingly, almost half of all BsLipA positions (89 positions) yield relevant variants in three or more aIL, and only ∼20% (39 positions) yield variants that are not improved in any aIL (Fig. 1C/D). Thus, more than twice the number of positions in BsLipA yield relevant variants compared to detergent resistance (∼27%) [33]. This proportion (57%) represents the chance of finding a position that harbors relevant variants using unbiased random mutagenesis and will subsequently be used in our analyses to evaluate the performance of approaches to predict relevant positions. Here, the majority of the positions yields less than one aIL resistant variant, and few positions yield multiple (>4) aIL resistant variants (Fig. 1A/B). Hence, in BsLipA, each of the 103 relevant positions yields on average three relevant variants out of 20 possible substitutions. When using unbiased random mutagenesis, the experimental effort to identify 10 unique relevant variants or positions on average sums up to screening ∼117 and ∼18 variants, respectively.

Definition of measures for evaluating the predictive power of approaches

We defined two measures to evaluate the performance of a given approach for improving aIL resistance on the BsLipA SSM library based on binary classification: the gain-in-precision (GiP, Eq. S3, [57]) on a variant-wise level (GiPvar) and the gain-in-precision on a positional level (GiPpos). The GiPvar and GiPpos describe the relative likelihoods to correctly predict relevant variants or relevant positions compared to unbiased random mutagenesis. Note that GiP is not affected by data prevalence and data imbalance, in contrast to other measures of binary classification, such as accuracy [58], which is important in view of the underrepresentation of relevant variants. Note, too, that we focus on precision and not recall [57] because, for our application, it is more important to have a high fraction of correctly classified instances among those classified relevant than to have high coverage of the relevant class: Substantially improved enzyme variants often incorporate only a few (1–3) substitutions [59], [60], [61], [62], and additional substitutions do not easily lead to further improvements, particularly when they are interacting [25]. This is because the majority of substitutions destabilize an enzyme, limiting the way how substitutions are combinable [46], [59], [63], [64], [65], [66]. Furthermore, despite state-of-the-art high-throughput selection [67], [68], [69], [70] and screening [71], [72], [73] techniques, protein engineering approaches are still limited to a small number of positions if the whole sequence space shall be investigated, as the library size increases exponentially (combining all possible substitutions at, e.g., six positions already leads to 206 = 6.4·107 variants) [74]. Hence, identifying a few relevant variants and positions is necessary and sufficient for most protein engineering approaches. Because our analysis is focused on general applicability towards several aIL instead of individual solvents, the GiP values are averaged over the four aIL of the BsLipA SSM library. Yet, to provide an estimate of the data variance, the ranges of the numbers of relevant variants and positions, and the relations to the total considered variants and positions, are presented across the four aIL (Table 1, Table S1). Finally, we performed a Boschloo’s ‘exact’-test to determine if the observed populations of relevant variants and positions of a given approach were significantly different (p ≤ 0.05) from those of random mutagenesis [75]. Here, we assessed the p-value of the test statistics regarding the populations of relevant variants or positions versus not-relevant variants or positions compared to random mutagenesis, which describes the probability of finding a sample statistic as extreme as the test statistics. Unless all p-values are <0.05, the lowest and highest p-values observed over the four aIL are shown in Table 1 and Table S1.

Table 1

Predictive performance of selected approaches and physicochemical and evolutionary properties to predict relevant variants and positions.[a]

[a] Substitutions to specific residues are indicated by “→” plus one-letter code; in all other cases, substitutions to all residues are performed. The results for the predicted relevant variants and positions for all evaluated approaches, properties, and the combinations of both are shown along the sequence of BsLipA (see the top for a secondary structure representation): Red bars indicate relevant positions for which relevant variants were correctly predicted. Blue bars indicate relevant positions for which no relevant variant was correctly predicted. The height of red bars represents the fraction of relevant variants among all predicted variants for the given position, thus, describing the precision of predicting relevant variants. The height of blue bars represents the fraction of (falsely) predicted relevant variants of all possible variants at this position, thus, giving an estimate of the experimental work unnecessarily spent when investigating all predicted variants. In all, high red bars and low blue bars indicate a favorable approach, and vice versa. For random mutagenesis (Rd), the graph along the BsLipA sequence represents the experimentally determined mutagenesis efficiency (i.e., the relevance) of each sequence position. Thus, blue bars represent positions not relevant in all aIL, whereas red bars represent positions relevant in at least one aIL. The height of red bars displays the average fraction of relevant variants at the respective relevant position.

[b] Numbering of evaluated approaches and properties. A = Approach, P = Properties, C = Combination of approaches and properties.

[c] Number of relevant variants vs. all considered variants.

[d] Number of relevant positions vs. all considered positions.

[e] Random mutagenesis.

[f] Averaged percentage of relevant variants compared to the whole BsLipA SSM library.

[g] Averaged percentage of relevant positions compared to the whole BsLipA SSM library.

[h] Not determined.

[i] See Section 3.4 in the Supplementary Information for an explanation of the abbreviations.

Predictive performance of selected approaches and physicochemical and evolutionary properties to predict relevant variants and positions.[a] [a] Substitutions to specific residues are indicated by “→” plus one-letter code; in all other cases, substitutions to all residues are performed. The results for the predicted relevant variants and positions for all evaluated approaches, properties, and the combinations of both are shown along the sequence of BsLipA (see the top for a secondary structure representation): Red bars indicate relevant positions for which relevant variants were correctly predicted. Blue bars indicate relevant positions for which no relevant variant was correctly predicted. The height of red bars represents the fraction of relevant variants among all predicted variants for the given position, thus, describing the precision of predicting relevant variants. The height of blue bars represents the fraction of (falsely) predicted relevant variants of all possible variants at this position, thus, giving an estimate of the experimental work unnecessarily spent when investigating all predicted variants. In all, high red bars and low blue bars indicate a favorable approach, and vice versa. For random mutagenesis (Rd), the graph along the BsLipA sequence represents the experimentally determined mutagenesis efficiency (i.e., the relevance) of each sequence position. Thus, blue bars represent positions not relevant in all aIL, whereas red bars represent positions relevant in at least one aIL. The height of red bars displays the average fraction of relevant variants at the respective relevant position. [b] Numbering of evaluated approaches and properties. A = Approach, P = Properties, C = Combination of approaches and properties. [c] Number of relevant variants vs. all considered variants. [d] Number of relevant positions vs. all considered positions. [e] Random mutagenesis. [f] Averaged percentage of relevant variants compared to the whole BsLipA SSM library. [g] Averaged percentage of relevant positions compared to the whole BsLipA SSM library. [h] Not determined. [i] See Section 3.4 in the Supplementary Information for an explanation of the abbreviations.

Assessment of commonly applied approaches to improve aIL resistance

We extracted 22 approaches to improve enzyme resistance towards aIL from the literature and evaluated their performance to improve aIL resistance using the above-defined measures (Table S1). These approaches can be classified into six groups (Fig. 2):

Fig. 2

Overview of evaluated structure-based approaches described in the literature for improving aIL resistance. The classification of the approaches (I-VI) is described in the text. Most approaches rely on analyzing direct protein-aIL interactions (A1-A7), whereas only a few investigate subsequent effects of the aIL interactions on the protein (A8-A11).

approaches that determine relevant positions from experimental structural data for the system (A1-A5); approaches that determine relevant positions from extensive computations (A6-A11); approaches that determine relevant positions from experimental biochemical data on other “environmental” effects, such as temperature or solvents with detergents (A12-A13); one approach where relevant positions are determined as structural weak spots by rigidity theory without considering specific aIL effects (A14); approaches that modify surface charges (A15-A20); two approaches that did not consider a priori information (A21-A22). Overview of evaluated structure-based approaches described in the literature for improving aIL resistance. The classification of the approaches (I-VI) is described in the text. Most approaches rely on analyzing direct protein-aIL interactions (A1-A7), whereas only a few investigate subsequent effects of the aIL interactions on the protein (A8-A11). We will summarize the results for the approaches of each group here (Table 1). For detailed information on each approach, see the Supporting Information and Table S1. Group I: Approaches A1 and A2 used binding sites identified from X-ray crystal structures of BsLipA in the presence of aIL, which were subsequently refined by molecular dynamics (MD) simulations. Approaches A3-A5 used two-dimensional 15N/1H HSQC NMR experiments to identify positions that experienced perturbations in their local chemical environment upon incubation in [BMIM/Cl]. In both cases, a similar number of relevant sites (23 to 24) were predicted. These sites overlap to a low degree between the approaches (26% of A1-A2 sites are found in A3-A5 and 25% vice versa), but to a high degree with the BsLipA SSM library reference data (74% and 80%). For specific changes to charged amino acids, in either subgroup, high GiPvar values of ∼2.2 to 2.5 are associated with low GiPpos values (∼0.3 to 0.5), indicating that such charge changes are effective at the predicted relevant sites with ∼⅓ to ½ of the precision of random mutagenesis only. In turn, moderate GiPvar (≤1.8) and GiPpos (≤1.2) values are obtained if substitutions to all amino acids are evaluated. I.e., GiPvar = 1.79 and GiPpos = 1.2 in approach A2 indicate that ∼65 and ∼15 variants have to be screened to obtain ten relevant variants and positions, respectively, compared to ∼118 and ∼18 when using random mutagenesis. Group II: In approaches A6 and A7, we identified aIL binding sites of BsLipA from extensive MD simulations using distance-based interaction criteria and evaluated the 20 most occupied positions for each solvent [43]. The identified sites showed a low to moderate overlap with binding sites of A1-A2 (41% and 26% for A6 and A7, respectively) and A3-A5 (18% and 21% for A6 and A7, respectively), but a high overlap with the BsLipA SSM library reference data (73% and 85%). However, similar to A1 and A3-A4, specific changes to positively (negatively) charged amino acids for cation (anion) binding sites yielded moderate GiPvar of ∼1.3 but substantially lower GiPpos values (∼0.3), again indicating a low precision for such predicted relevant sites compared to random mutagenesis in the context of charge changes. In approaches A8-A11, we assessed whether predictions based on aIL-induced local structural stability changes, identified using either MD simulations (A8–A9) or the rigidity theory-based Constraint Network Analysis (CNA) (A10-A11), lead to increased GiP values. Introducing charged amino acids at solvent-exposed positions (A8 and A10) yielded high GiPvar ≈ 2 but again low GiPpos ≈ 0.75 values. In turn, considering substitutions to all amino acids at positions irrespective of the solvent exposure (A9 and A11) yields moderately increased GiPvar (∼1.35) and GiPpos (∼1.15) values. Notably, the results are comparable to predictions from A1-A5, indicating that computational approaches predict relevant variants and positions with similar precision as experiment-based ones without the need for cost- and time-intensive experiments. Group III: Approaches A12 and A13 probe to what extent knowledge of relevant positions gained from optimizing BsLipA against temperature or detergent influence can be transferred to increasing aIL resistance. In the latter case (A13), only moderate GiPvar (1.56) and GiPpos (1.20) are obtained. In the former case (A12), however, the highest GiPvar = 3.03 and GiPpos = 1.61 of all tested approaches are obtained. Group IV: Approach A14 assesses whether structural weak spots of the BsLipA structure identified with the rigidity theory-based Constraint Network Analysis method are relevant positions. Contrary to A11, weak spots were identified based on structural ensembles of BsLipA generated in water only and determined from phase transitions upon thermal unfolding. With this approach, the highest GiPpos = 1.41 among all evaluated computational approaches is obtained, and the fifth-highest GiPvar = 2.39 among all evaluated experimental and computational approaches. Note that for groups III and IV, the number of predicted relevant positions is low (6 to 11), which facilitates identifying beneficial substitution combinations later. However, the number of variants is still high because we evaluate substitutions to all amino acids. Hence, further rules are needed to limit the substitution possibilities (see below). Group V: Approaches A15–A20 comprise surface charge modifications irrespective of identifying aIL interaction sites or changes in structural stability due to aIL beforehand. The underlying principle is to repel like-charged solvent molecules by introducing either positively (K/R) or negatively (D/E) charged residues on the protein surface. Introducing charged residues (D, E, R, or K) at all surface residue positions (A15), introducing only E there following Ref. [17] (A16), or substitutions to all other residues but D, E, R, or K (A20) led to GiPvar = 1.10 to 1.76, but GiPpos < 1.0 with at the same time ≥ 120 residues to consider. Focussing on lysine residues on the surface only and substituting them to E (A17) yielded GiPvar = 2.26, but GiPpos = 0.33, again indicating that such charge changes are effective at predicted relevant sites with ∼⅓ of the precision of random mutagenesis only. Finally, performing positive-to-negative substitutions for surface residues (A18) or the opposite, negative-to-positive substitutions (A19), yielded almost identical results for both GiPvar ≈ 2.75 and GiPpos ≈ 0.56, indicating that the direction of single charge changes does not matter but that such changes are effective at predicted relevant sites with ½ of the precision of random mutagenesis only. Group VI: Approaches A21 and A22 were suggested based on previous observations for the BsLipA SSM library [16] and involved the somewhat unexpected substitutions to chemically different amino acids at all sites, or substitutions in helices and loops. However, in both cases, GiPvar and GiPpos are close to 1 or below, indicating that these approaches lead to precisions as found in random mutagenesis. Not considering this prior information led to overrating the approaches previously [16]. To conclude, of the presented 22 approaches, only two stand out with substantially improved GiPvar and GiPpos values. These are A12, which exploits experimental information on the thermostability of BsLipA, resulting in GiPvar of 3.03 and GiPpos of 1.61, as well as A14, which exploits structural weak spots of BsLipA predicted by rigidity theory, resulting in GiPvar of 2.39 and GiPpos of 1.41. Further, they only require performing substitutions at 6 to 10 positions. On the other hand, approaches employing the concept of surface charge modification, which focus on repelling aIL ions via the introduction of charged residues at the surface (A1, A3-A4, A15-A19), yield high GiPvar ≥ 1.6 (except for A3) only at the expense of low GiPpos ≤ 0.6.

Evaluating physicochemical and evolutionary properties for predicting improved aIL resistance

Motivated by recent findings that simple descriptors can explain protein stability change upon substitutions [76], we scrutinized if five physicochemical and evolutionary properties of protein residues can predict relevant variants and substitution sites to improve aIL resistance. These properties are solvent accessibility (P1-P3), relative volume (P4), hydropathy (P5), unfolding free energy (P6), and residue conservation (P7-P8) (summarized in Table 1; see the Supplementary Information and Table S1 for detailed information on each approach). We also combined these properties (P9) and evaluated their robustness towards deviations from the optimal range by relaxing and tightening the ranges by 25% and 50% (P10-P12). Here, we considered substitutions to all other amino acids at predicted relevant sites according to the properties P1-P12. P1-P3: Solvent accessibility. Recent studies investigating the thermostability of Streptococcus sp. protein G [52] and BsLipA [33] reported increased prediction accuracy when substituting at more solvent-exposed positions compared to buried positions. For increasing aIL resistance, selecting residues with a low to moderate solvent accessibility (SA, Eq. S5) had a beneficial effect for GiPvar = 1.46 and GiPpos = 1.13 (10% < SA ≤ 40%, P1). Substituting at solvent-accessible positions (5% ≤ SA), in general, was more favorable (GiPvar = 1.20 and GiPpos = 1.07, P2) than at buried (5% > SA) sites (GiPvar = 0.58 and GiPpos = 0.85, P3). P4: Relative volume. The relative volume (rV, Eq. S7) reflects that different positions can differentially accommodate volume changes, e.g., substitutions of small, buried amino acids to larger ones are usually disfavorable [52]. Consequently, the precision in predicting improved BsLipA thermostability increased when small-to-large substitutions were excluded [34]. Here, similar to the exclusion of small-to-large mutations, excluding substitutions that markedly increase the occupied volume (rV > 1.3) led to GiPvar = 1.09 but GiPpos = 0.88. P5: Hydropathy. The change in hydropathy (ΔHy, Eq. S9) of BsLipA variants is related to the concept of surface charge modification as both aim at modifying polarity, which is widely used to increase aIL resistance [18], [19], [20]. Here, the highest GiP were found for a moderate reduction in hydropathy (ΔHy ≤ −4) (GiPvar = 1.21 and GiPpos = 0.61). P6: Unfolding free energy. The unfolding free energy (ΔΔGunf, Eq. S6) is an important factor when considering substitutions, as beneficial effects towards aIL resistance must compensate potentially destabilizing effects (higher ΔΔGunf) due to substitutions. This concept was previously used to evaluate the cooperativity of BsLipA variants to increase aIL resistance in [BMIM/Cl], where the exclusion of strongly destabilizing variants (ΔΔGunf ≥ 7.52 kcal mol−1) led to a higher chance of determining cooperative variants [63]. Here, excluding substitutions that moderately destabilized the enzyme (ΔΔGunf > 4 kcal mol−1) led to the highest GiP (GiPvar = 1.13 and GiPpos = 0.93). P7-P8: Residue conservation. Residue conservation (CS) is often analyzed prior to rational mutagenesis approaches to determine residues important for the structure or function of enzymes, such as in the catalytic or ligand binding sites [60], [77], [78]. Reducing the degree of residue conservation below which substitutions are allowed led to an almost linear increase of both GiPvar and GiPpos, resulting in GiPvar = 1.22 and GiPpos = 1.20 at CS = 0 (see P8). However, as relevant sites and substitutions can coincide with semiconserved positions [33], we used CS ≤ 4 as the limit, which yields GiPvar = 1.13 and GiPpos = 1.08 (P7). P9-P12: Combined properties. We then evaluated the performance when combining the properties P1, P4, P5, P6, and P7. This yielded GiPvar = 3.35 and GiPpos = 1.29 (P9), which are substantially increased GiP values compared to the individual properties. Notably, the result is robust to deviations from the optimal property ranges and still yielded GiPvar ≈ 3 and GiPpos ≈ 1 when tightening or relaxing the optimal ranges by 25%, respectively (P10 and P11). Finally, relaxing the optimal ranges by 50% (P12) yielded GiPvar = 1.85 and GiPpos = 0.83, a performance comparable to that of some experimental approaches (A1, A2, and A4). Tightening the optimal ranges by 50% led to zero predicted relevant variants and positions. Note that relaxing (tightening) refers to modifying the optimal ranges as to include more (less) variants. I.e., for ΔHy, relaxing (tightening) by 25% means modifying the range from [−∞, −4] to [−∞, −3] (or [−∞, −5]), while the same changes modify the SA-ranges from [0.1, 0.4] to [0.075, 0.5] (or [0.125, 0.3]). To conclude, the combination of five physicochemical and evolutionary properties (P9), which can be computed within a few hours from a static protein structure or sequence information, yielded the, so far, highest GiPvar value and the third-highest GiPpos value. At 13 positions predicted to be relevant, substitutions would need to be performed, up to about twice as many as predicted by A12 and A14. The five properties had been optimized individually against the BsLipA SSM library, which may explain the excellent performance of P9. Still, if the properties were modified by −25% to +50% (P10-P12), GiPvar ≥ 1.85 result, although the GiPpos decreased to ∼1 or below.

Computational approaches can be further enhanced by combination with physicochemical and evolutionary properties

C1-C9: “Combinations”. Finally, we probed if the predictive power of the most promising computational structure- and mechanism-based approaches (A9, A11, A14) can be further improved by combining them with the physicochemical and evolutionary properties (P9), which also notably reduces the number of predicted relevant variants and positions, resulting in C1-C3 (Table1; Table S1). Furthermore, we assessed the predictive power when the applied properties deviate by −25% or +25% from the optimal values, resulting in C4-C6 and C7-C9, respectively. In most cases, increases in GiPvar result, while GiPpos is sustained (∼1) or increased (∼1.7). These results indicate that the properties can be used as filters to improve the predictive power for relevant variants and positions. Particularly, the results for C3, C6, and C9 indicate that, first, predicting relevant positions by identifying structural weak spots with CNA (A14) and, subsequently, filtering the variants and positions using the physicochemical and evolutionary properties (P9, P10, and P11) is a powerful and efficient approach to predict smarter variant libraries at very few positions for improving aIL resistance in protein engineering approaches.

Discussion

In this study, we systematically and rigorously evaluated the performance of 22 previously described structure-based approaches to increase aIL resistance. We based our assessment on an experimental BsLipA SSM library, which is, to our knowledge, outstanding with respect to the number and completeness of variants and the variants’ screening against four aIL. We show that, surprisingly, most of the approaches yield low GiP values, in particular with respect to predicting relevant positions. Here, 14 approaches perform worse than random mutagenesis (GiPpos < 1). Encouragingly, however, exploiting experimental information on the thermostability of BsLipA (A12) or structural weak spots of BsLipA predicted by rigidity theory (A14) yields GiPvar values of 3.03 and 2.39 as well as GiPpos ≈ 1.5. Furthermore, we demonstrated that the combination of five simple-to-compute physicochemical and evolutionary properties (P9-P12) substantially increases the precision of predicting relevant variants and positions of BsLipA for increasing aIL resistance. Finally, we showed that combining these properties with predictions from structural stability analyses of MD trajectories (C1/C4) or structural weak spots identified by CNA (C2, C3, C5, C6, and C9) additionally improves GiPvar up to 4-fold to ∼10 (C3, C6, C9) and sustains or increases GiPpos ≈ 0.96–1.77. Furthermore, at most ten relevant positions are predicted, similar to the number obtained using different random mutagenesis approaches [30], [79], [80], [81]. This enables the investigation of substitution combinations for additive or cooperative effects. Our results are based on the BsLipA SSM library that covers all 181 positions and contains all 3620 variants, each with a single amino acid substitution as confirmed by DNA sequencing [16]. This dataset represents a unique opportunity to evaluate the predictions of approaches to improve aIL resistance, because, in contrast to other biotechnologically relevant enzyme properties such as thermostability and resistance towards detergents and organic solvents, for which databases such as ProTherm [82], [83], ProtaBank [84], and FireProtDB [85] exist, such large-scale data is not available for aIL resistance. Additionally, it is unique in terms of its comprehensiveness and unbiasedness. In comparison, the ProTherm database [82], [83] contains on average ∼12 single, ∼12 double, and ∼1 multiple substitutions for each of the ∼1000 proteins stored [86] and is strongly biased towards substitutions to alanine [87]. Thus, outliers in this data may potentially corrupt its evaluation to extract generally applicable rules to improve enzyme properties. Finally, the uniformity of screening conditions applied for the BsLipA SSM library avoids ambiguous results originating from different experimental methods, which was observed for thermostability data of the same variant [88]. Note, though, that enzyme activity determined for the BsLipA SSM library may be influenced by differences in thermodynamic or kinetic protein stability [34], [55] and protein expression [56]. Although in a recent study, these shortcomings were circumvented by reporting comprehensive, domain-wide thermostability data for purified variants of protein G (Gβ1, 56 residues) [52], no such data at large scale is available for aIL resistance. To evaluate our results, we used rigorous binary classifiers that are not affected by data prevalence and data imbalances and report the results relative to performing random mutagenesis, which accounts for a priori probabilities [57], [58]. Subsequently, we determined if the changes of the observed relevant and non-relevant populations were significant using Boschloo’s exact test [75]. Five approaches (A1, A2, A12, A14, A15) significantly improved GiPvar compared to random mutagenesis, but only approaches A12 and A14 markedly improved GiPpos, although not significantly (Fig. 3). The latter is likely due to the small sample sizes evaluated for predicted relevant positions (sometimes a field of the contingency table even contains a zero) [89], although A12 and A14 consistently improve GiPpos for all four aIL screened (A12: 1.66, 1.46, 1.59, 1.74; A14: 1.39, 1.31, 1.52, 1.39). This finding indicates that the BsLipA SSM library may still be too small to allow for a rigorous statistical assessment of approaches that aim at predicting small residue proportions as relevant positions. These limitations will likely become more pronounced when smaller datasets, such as those extracted from the ProTherm database [82], [83] or the Gβ1 dataset [52], are considered.

Fig. 3

Only five approaches (A1, A2, A12, A14, A15) yield a significantly improved prediction precision for relevant variants compared to random mutagenesis; only two approaches (A12, A14) yield a markedly improved prediction precision for relevant positions compared to random mutagenesis. Approaches are colored according to their classification. See Fig. 2 for the color code. GiPvar and GiPpos are shown as mean ± standard error of the mean over the four BsLipA SSM libraries. Significant differences compared to random mutagenesis (Rd) are indicated with an asterisk if p < 0.05 for each of the four BsLipA SSM libraries. The prediction precision of approaches that determine relevant positions from experimental structural data (group I) and extensive computations (group II), or perform general surface charge modifications (group V), was unexpected considering that in no (group II) or at most 50% of the assessed approaches (groups I and V) GiPvar values were >2, and in no (group V) or at most 40% of the cases GiPpos values were >1. The low performance needs to be related to the extensive experimental (group I) or computational (group II) work required to predict relevant positions, or the wide use of the approaches (group V) [41], [19], [20]. Hence, our assessment demonstrates that approaches should be evaluated on large, unbiased, and complete datasets that allow a thorough analysis of a priori information; by contrast, many of the approaches in the three groups have been exemplified based on small numbers of variants or positions only (e.g., 24 positions in the case of A1 [90], 23 positions in the case of A4 [17], or 20–28 positions in the case of A6-A11 [43]). Notably, predictions of relevant positions in terms of interaction sites or perturbed residues for the BsLipA SSM library based on experimental (A2, A5) or computational (A9, A11) work perform almost equally, and only moderately better than performing substitutions at all solvent-exposed positions (P2). Furthermore, by most of the approaches of the three groups, many (≥20) relevant positions are predicted, which then lead to high numbers of substitutions to be evaluated. The cases where GiPvar > 2 but GiPpos ≈ 0.3 to 0.5 (A1, A4, A17-A19) indicate that charge modifications may be effective but that their effect is strongly position-dependent. This corroborates previous findings that aIL interact specifically with a few surface residues of BsLipA, but also hints at that identifying such interaction sites without evaluating the interaction effect on the protein stability is insufficient [43]. Indeed, when changes in local structural stability originating from such interactions were additionally considered, higher GiPpos, albeit still < 1, are obtained (A8, A10). Finally, almost identical prediction precisions for R/K substitutions at cationic binding sites (A6) versus D/E substitutions at anionic binding sites (A7), or K/R → D/E (A18) versus D/E → K/R (A19) substitutions at solvent-exposed sites, indicate that effects on protein stability due to aIL cations or anions can be equally well counteracted. These results furthermore suggest that cooperative countermeasures may be possible when the respective charge modifications are introduced together [42]. Previously, knowledge gained on a system while improving one property was subsequently used to improve another property [33]. This applies particularly to improving thermostability, which has been described to foster protein evolvability [91], [92] and be related to improvements of resistance to organic solvents [66], [93], [94], [95], [96] and detergents [33], [97]. In that respect, knowledge gained for improving resistance to detergents also leads to moderate GiP compared to random mutagenesis for predicting relevant positions and variants for resistance to aIL (A13). More remarkably, the largest GiP across all 22 approaches are found if prior knowledge on relevant positions for thermostability is transferred to improving aIL resistance (A12), corroborating the relationship between proteins that are stable against temperature and other influence [33], [66], [94], [95], [96], [97]. Rather than generating and screening an entire SSM library to perform approach A12, knowledge gained during enzyme engineering towards improved thermostability should also be valuable [98], [99], [100], [101], [102] if the thermostability screening is more efficient than that for aIL resistance. Finally, many more prediction algorithms have been devised for improving thermostability than aIL resistance, which may also be exploited in this context [103], [104], [105], [106], [107], [108]. One such example is CNA, which has been applied in retro- [33], [55], [109], [110], [111] and prospective [34], [112], [113] studies to improve protein thermostability previously and has been used before to predict structural weak spots of BsLipA [33], [34], [55]. Applying these weak spot predictions (A14) yields the highest GiPpos among all evaluated computational approaches and the fifth-highest GiPvar among all evaluated experimental and computational approaches, without the need to tailor the method system-specifically and with only moderate computational costs [33]. We contrasted the performance of the established approaches with that of five physicochemical and evolutionary properties. Such descriptors have been widely analyzed before for improving thermostability [33], [52], [114], [115], but less for aIL resistance [16], [49], [63]. Many of the approaches derived from literature share features with these properties. E.g., substituting to chemically different amino acids [16] (A21) is highly similar to introducing moderate changes in ΔHy as this often corresponds to different amino acid types, e.g., aliphatic-to-polar and polar-to-charged [116]. However, our hydropathy-based criterion allows us to exclude substitutions that increase hydropathy, that way limiting changes to increases in polarity, which were suggested to be beneficial for aIL resistance, particularly when introduced at the enzyme surface [16], [49]. Surprisingly, the most noticeable improvements originate from properties that disregard specific knowledge on aIL but originate from general data- or structure-based computations, such as solvent-accessibility, residue conservation, and unfolding free energy. The success of these approaches is likely due to “excluding unbeneficial variants” rather than “predicting beneficial variants”, corroborating previous observations for excluding or including specific variants [33], [34], [52], [63], [66]. As all properties that filter on the variant-wise level (P4-P6) led to increased GiPvar at the expense of decreased GiPpos when applied alone, it is advisable to combine variant-wise descriptors (P4-P6) with at least one position-wise descriptor (P1-P3, P7-P8) to circumvent this drawback. Accordingly, combining such properties (P9) not only reduced the numbers of predicted relevant variants and positions to a level realizable by current high-throughput methods [67], [68], [69], [70], [71], [72], [73] but also substantially increases the precision of predicting relevant variants and positions of BsLipA for increasing aIL resistance. To probe for the bias introduced in P9 by optimizing the individual properties against the BsLipA SSM libraries, we assessed the performance of the combination when the properties deviated from the optimal values by –25% (P10), +25% (P11), and +50% (P12). Although the performance of GiPpos dropped to ∼1, GiPvar remains ≥ ∼2, which is still higher than that of most other approaches and indicating that the computed ranges are robust against deviations from their optimal values. Finally, the improved predictive performances of C1-C9 indicate that structure- and mechanism-based computational predictions can still be markedly improved by applying filters based on physicochemical and evolutionary properties. As another favorable result, few predicted relevant variants and positions were obtained, which allows focussing subsequent experimental efforts. This is important because protein engineering approaches are limited to a few positions if the whole sequence space shall be investigated by substitutions, as the library size increases exponentially [74]. Identified variants can subsequently be employed in additive mutagenesis approaches, such as Computer-Assisted Recombination (CompassR), for creating further improved recombinant variants [46], [63]. For instance, substitutions with ΔΔGunf < 7.52 kcal mol−1 were found to be more effectively combinable, indicating that the exclusion of destabilizing substitutions with ΔΔGunf > 4.0 kcal mol−1 likely leads to combinable substitutions with synergistic and further improved enzyme resistance to aIL [46], [63]. In contrast to other methods limiting the investigated range of potential substitutions, our approach evaluates substitutions over the whole residue range, filtering on properties that are independent of fixed residue characteristics but instead employing relative property differences. For instance, in surface charge modification approaches including only substitutions to charged residues on the enzyme surface, only ∼22% of the beneficial substitutions in the BsLipA SSM library are considered, and many relevant variants are discarded [16]. Furthermore, our approach allows exploiting site-specific measures potentially yielding many relevant variants, such as the introduction of hydrophobic or polar residues, which has rarely been thoroughly investigated [49] compared to surface charge modifications [45], [18], [19], [20]. However, previous findings that many of the highest increases in aIL resistance were achieved by introducing hydrophobic or polar residues [49] indicate substantial potential for variants with improved enzyme resistance to aIL based on these substitutions. Thus, our results indicate that a time- and cost-efficient workflow to improve aIL resistance (C3) is given by, first, predicting relevant positions as structural weak spots with CNA (A14) and, subsequently, reducing the number of predicted relevant variants there according to the physicochemical and evolutionary properties (P9). Notably, this combination is robust against variations of the properties by ±25% (C6, C9). In summary, we show for a complete SSM library of BsLipA that the majority of 22 commonly used approaches to increase aIL resistance perform surprisingly poorly compared to random mutagenesis. These findings stress the need to consider a priori information and evaluate approaches for improving aIL resistance on large and diverse enough datasets in the future. Notably, however, exploiting experimental information on the thermostability of BsLipA or structural weak spots of BsLipA predicted by rigidity theory stand out favorably with GiPvar of 3.03 and 2.39 as well as GiPpos ≈ 1.5. The combination of five physicochemical and evolutionary properties provides an even more compute-efficient approach with still fair GiPvar. Finally, combining structural weak spot prediction by rigidity theory (CNA) with the physicochemical and evolutionary properties yields particularly good GiPvar = 7.18–9.76 and GiPpos = 1.77. Hence, compared to an unbiased random mutagenesis study, the experimental effort to identify 10 relevant variants will be reduced from screening ∼117 randomly selected variants to only ∼12 rationally selected variants using approach C6. Although these results were obtained for the case of BsLipA, CNA was not system-specifically adapted, and the robustness of the physicochemical and evolutionary properties as to pronounced deviations from their cutoff values was demonstrated. These findings suggest that this combination should be applicable to other enzyme systems for guiding protein engineering approaches towards improved aIL resistance for the use in green industrial approaches.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

88 in total

1. In vitro selection for catalytic activity with ribosome display.

Authors: Patrick Amstutz; Joelle N Pelletier; Armin Guggisberg; Lutz Jermutus; Sandro Cesaro-Tadic; Christian Zahnd; Andreas Plückthun
Journal: J Am Chem Soc Date: 2002-08-14 Impact factor: 15.419

2. A general strategy for the evolution of bond-forming enzymes using yeast display.

Authors: Irwin Chen; Brent M Dorr; David R Liu
Journal: Proc Natl Acad Sci U S A Date: 2011-06-22 Impact factor: 11.205

3. Exploring the protein stability landscape: Bacillus subtilis lipase A as a model for detergent tolerance.

Authors: Alexander Fulton; Victorine Josiane Frauenkron-Machedjou; Pia Skoczinski; Susanne Wilhelm; Leilei Zhu; Ulrich Schwaneberg; Karl-Erich Jaeger
Journal: Chembiochem Date: 2015-03-13 Impact factor: 3.164

4. Seawater as Alternative to Freshwater in Pretreatment of Date Palm Residues for Bioethanol Production in Coastal and/or Arid Areas.

Authors: Chuanji Fang; Mette Hedegaard Thomsen; Grzegorz P Brudecki; Iwona Cybulska; Christian Grundahl Frankaer; Juan-Rodrigo Bastidas-Oyanedel; Jens Ejbye Schmidt
Journal: ChemSusChem Date: 2015-10-21 Impact factor: 8.928

9. AUTO-MUTE 2.0: A Portable Framework with Enhanced Capabilities for Predicting Protein Functional Consequences upon Mutation.

Authors: Majid Masso; Iosif I Vaisman
Journal: Adv Bioinformatics Date: 2014-08-17

10. Application of Rigidity Theory to the Thermostabilization of Lipase A from Bacillus subtilis.

Authors: Prakash Chandra Rathi; Alexander Fulton; Karl-Erich Jaeger; Holger Gohlke
Journal: PLoS Comput Biol Date: 2016-03-22 Impact factor: 4.475