Literature DB >> 33552446

Can constraint network analysis guide the identification phase of KnowVolution? A case study on improved thermostability of an endo-β-glucanase.

Francisca Contreras1, Christina Nutschel2, Laura Beust1, Mehdi D Davari1, Holger Gohlke2,3, Ulrich Schwaneberg1,4.   

Abstract

Cellulases are industrially important enzymes, e.g., in the production of bioethanol, in pulp and paper industry, feedstock, and textile. Thermostability is often a prerequisite for high process stability and improving thermostability without affecting specific activities at lower temperatures is challenging and often time-consuming. Protein engineering strategies that combine experimental and computational are emerging in order to reduce experimental screening efforts and speed up enzyme engineering campaigns. Constraint Network Analysis (CNA) is a promising computational method that identifies beneficial positions in enzymes to improve thermostability. In this study, we compare CNA and directed evolution in the identification of beneficial positions in order to evaluate the potential of CNA in protein engineering campaigns (e.g., in the identification phase of KnowVolution). We engineered the industrially relevant endoglucanase EGLII from Penicillium verruculosum towards increased thermostability. From the CNA approach, six variants were obtained with an up to 2-fold improvement in thermostability. The overall experimental burden was reduced to 40% utilizing the CNA method in comparison to directed evolution. On a variant level, the success rate was similar for both strategies, with 0.27% and 0.18% improved variants in the epPCR and CNA-guided library, respectively. In essence, CNA is an effective method for identification of positions that improve thermostability.
© 2021 The Authors.

Entities:  

Keywords:  AU, absorbance units; CMC, carboxymethyl cellulose; CNA, Constraint Network Analysis; Cellulase; Constraint network analysis; EGLII, endoglucanase II; GH5 endoglucanase; HTS, high-throughput screening; KnowVolution; MD, molecular dynamics; MTP, 96-well microtiter plates; PCR, polymerase chain reaction; Protein engineering; SSM, site-saturation mutagenesis; Thermostability

Year:  2020        PMID: 33552446      PMCID: PMC7822948          DOI: 10.1016/j.csbj.2020.12.034

Source DB:  PubMed          Journal:  Comput Struct Biotechnol J        ISSN: 2001-0370            Impact factor:   7.271


Introduction

Thermal stability is often a prerequisite for process stability of enzymes in industrial chemical production [1]. Cellulases are employed in processes as biomass depolymerization and animal feed, in which high temperature is needed. In biomass depolymerization, the enzyme dosage can be decreased if a biocatalyst that can withstand high temperatures is employed, and, therefore, a diminution of costs can be achieved [2]. Cellulases used for animal feed pellets are required to withstand high temperatures for a short period of time, e.g., a few minutes; in that period, the cellulases should not be inactivated because their catalytic activity is needed in a posterior stage [3]. The current problem is that high activity and high thermal resistance are often contradictory properties, and thermostable enzymes that can be found in nature, e.g., hot-springs and hydrothermal vents, have evolved to be active at high temperatures (>90 °C). For this reason, their activity at low temperatures (<40 °C) is, in most cases, nonexistent or very low [4], [5]. In order to keep enzymes as a competitive option for catalysis, they should possess high thermal stability and maintain a high activity in a broad spectrum of temperatures. In the last decades, protein engineering has emerged as an important tool to improve the thermostability of enzymes, and directed evolution has become a widely utilized technique in protein engineering [6]. KnowVolution is a strategy that combines directed evolution and computational analyses; this combination allows us to decrease the experimental efforts and maximize improvements [7]. The KnowVolution campaign encompasses four phases: Phase I, identification of beneficial positions is done by random mutagenesis; Phase II, full diversity is generated on the identified beneficial positions, and the best substitution is determined; Phase III, the interaction between substitutions is determined by computationally assisted structural analyses; and Phase IV, recombination of selected positions. To further decrease screening efforts, Phase IV can be guided by the CompassR rule [8] that discards recombinations that are potentially unstable. At the end of a KnowVolution campaign, a molecular understanding of each substitutions' role in improvement is generated. A common strategy employed to decrease the screening burden is to combine screening systems like two-step screening systems [9]. The first step consists of a high-throughput screening (HTS) where viable clones are selected, which can be carried, e.g., in agar plates. Later, in a second step, an enriched library can be further screened to determine improved clones expressing the desired property; this enrichment can be performed, e.g., using a liquid assay in a multi-well plate [10]. Another option to decrease the screening burden is to complement the identification phase of KnowVolution (Phase I) with computational techniques, which can reduce the time and effort spent in the laboratory. Several in silico methods are available to improve the thermostability of enzymes, including B-FITTER [11], PoPMuSiC [12], FoldX [13], [14], [15], FireProt [16], FRESCO [17], [18], PROSS [19], iSAR [20], and Constraint Network Analysis (CNA) [21], [22], [23], [24]. CNA functions as front- and back-end to the graph theory-based software Floppy Inclusions and Rigid Substructure Topography (FIRST) [25]. Applying CNA to biomolecules aims at identifying their composition of rigid clusters and flexible regions, which can aid in the understanding of the biomolecular structure, stability, and function [23], [24]. In CNA, biomolecules are modeled as constraint networks, in which covalent and non-covalent interactions compose the edges, and atoms represent the nodes, as described in detail by Hesphenheide et al. [26]. A fast combinatorial algorithm, the pebble game, counts the bond rotational degrees of freedom and floppy modes (internal, independent degrees of freedom) in the constraint network [27]. In order to monitor the hierarchy of rigidity and flexibility of biomolecules, CNA performs thermal unfolding simulations by consecutively removing non-covalent constraints from a network in increasing order of their strength [28], [29], [30]. CNA has been applied before retro- [21], [30], [31], [32], [33] and prospectively [34], [35] in the context of improving protein thermostability, but never compared directly to the performance of random mutagenesis. Endoglucanases are enzymes commonly used in an ample variety of industries as food and feed, biofuels, detergents, as well as pulp and paper [36]. Nowadays, cellulases from Penicillium strains appear as promising biocatalysts for cellulose degradation due to their high activity. Endoglucanase II (PvCel5A) is an endo-β-1,4-glucanase from the fungi Penicillium verruculosum, which is highly active and the major endoglucanase of this organism [37]. It pertains to the glycosyl hydrolase family 5 (GH5). Cellulases from GH5 possess a (β/α)8 structure consisting of 8 β-sheets in the core of the protein and eight α-helices in the exterior. In this work, we evaluate the potential of CNA to advance Phase I of KnowVolution by comparing the identification of beneficial positions through random mutagenesis (directed evolution) and CNA-based variant predictions (rational design) on the example of Cel5A to improve its thermostability. The performance of both strategies to identify beneficial positions was evaluated by comparing experimental effort and success in improved thermostability. Consequently, the CNA method is evaluated as an entry point in KnowVolution campaigns towards improved thermostability.

Materials and methods

Plasmids and strains

The strains Escherichia coli DH5α (Agilent Technologies, Santa Clara, CA, USA) and Pichia pastoris BSYBG11 (Bisy e.U., Hofstaetten/Raab, Austria) were used as a cloning and expression host, respectively. EGLII mutant libraries were cloned into the shuttle vector pBSYA1S1Z (Bisy e.U., Hofstaetten/Raab, Austria) and generated in P. pastoris BSYBG11. Endo-β-glucanase gene eglII from Penicillium verruculosum (EGLII; UniProtKB/Swiss-Prot: A0A1U7Q1U3) was purchased as codon-optimized synthetic gene fragment from ThermoFisher (Germany) and cloned into pBSYA1S1Z as described previously [38].

Library generation

Random mutagenesis was generated in the endoglucanase eglII gene by error-prone PCR (ep-PCR), as described by Contreras et al. [38]. Briefly, test libraries (consisting of 180 clones) were generated by ep-PCR with varying concentrations of MnCl2 ranging from 0.1 to 0.4 mM. Test libraries were cloned by MEGAWHOP [39] and screened for improved thermostability as described in section “Screening of thermostable EGLII cellulase variants.” The library generated from an ep-PCR supplemented with 0.3 mM MnCl2 was selected for screening. Site-saturation mutagenesis libraries at positions 76, 77, 92, 93, 114, 129, 130, 134, 189, 190, 222, 240, 244, 255, 256, 273, 299, 308, and 312 were generated by site-saturation mutagenesis (SSM) method [40]. SSM libraries were produced as described previously [38], and used NNK primers are detailed in Table S1 in SI. The resulting PCR product was digested using DpnI (37 °C, 18 h), purified by using the NucleoSpin® Gel and PCR Clean-up kit (Macherey-Nagel), and transformed into P. pastoris BSYBG11 for expression.

Cell culture and expression

EGLII was expressed in P. pastoris BSYBG11 strain (Bisy e.U., Austria) cultured in 96-well microtiter plates (MTPs, Greiner, Frickenhausen, Germany). For expression, Yeast Extract–Peptone–Dextrose (YPD) medium (1% (w/v) yeast extract, 2% (w/v) peptone and 2% (w/v) D-glucose) supplemented with 100 µg mL−1 Zeocin was transferred to a MTP. A volume of 5 µL pre-culture (160 μL, 900 rpm, 30 °C, 48 h, and 70% humidity) was used to inoculate the main culture (160 μL, 900 rpm, 25 °C, 96 h, and 70% humidity) in an MTP supplemented with appropriate antibiotics. The supernatant containing EGLII was separated from the cells by centrifugation (Eppendorf 5810R; 4 °C, 3220 xg, 15 min), and the cell-free supernatant was transferred to a new MTP for further analysis. For flask expression, a pre-culture of P. pastoris BSYBG11 was cultured in YPD-Zeocin medium (3 mL, 200 rpm, 30 °C, 24 h) and used to inoculate the main culture to an initial OD600mn of 0.25 for EGLII expression (50 mL, 200 rpm, 25 °C, 72 h). Cells were centrifuged (4 °C, 10,000×g, 20 min; Sorvall, Thermo Fischer Scientific, Darmstadt, Germany), and EGLII containing supernatant was used for further analysis.

Purification by ion-exchange chromatography

Purification of the endoglucanase EGLII was performed by anion exchange chromatography as described previously [38]. After flask expression, 50 mL of EGLII containing supernatant was concentrated by centrifugal ultrafiltration (10 kDa MWCO PES; VivaSpin turbo 15, Sartorius) to 2 mL, and the buffer was exchanged to Bis-Tris buffer (pH 6.5, 20 mM; buffer A). The endoglucanase EGLII was purified by FPLC (ÄKTAprime plus chromatography system, GE Healthcare, Solingen, Germany). The concentrated supernatant was loaded into an anion exchange chromatography column (GE Healthcare HiTrap Capto Q ImpRes, 5 mL), and equilibrated with buffer A. EGLII was eluted in a step-wise program; first, impurities were eluted with buffer B 26% (Bis-tris buffer, pH 6.2, 20 mM, NaCl 1 M) and later, EGLII was eluted with buffer B 33%. Fractions were analyzed by SDS-Page (Fig. S4 in SI) [37]. Endoglucanase EGLII protein concentration was measured by A280nm (NanoDrop™ 1000 spectrophotometer by Thermo Scientific™, Bremen, Germany). Amino acid composition was used to determine the theoretical extinction coefficients with ProtParam on the ExPASy server [41].

Screening of thermostable EGLII cellulase variants

Hydrolytic activity assays

A two-step screening system was employed, as described by Contreras et al [38]. Briefly, an agar-based pre-screening step was performed with Azo-carboxymethyl cellulose (Azo-CMC; Megazyme, Bray, Ireland) supplemented YPD agar plates, colonies presenting clear halos were selected as active for hydrolytic activity. In the second step, the hydrolytic activity was screened by using solubilized Azo-CMC as substrate. After cultivation in MTPs, EGLII-containing supernatant was diluted with sodium acetate buffer (0.1 M, pH 4.5) and incubated without the substrate at 78 °C for 60 min. The diluted supernatant (40 µL) was transferred into an MTP for activity measurement. The enzyme reaction was initiated by the addition of 40 μL of Azo-CMC in sodium acetate buffer (2.0%, 0.1 M, pH 4.5). The reaction mixture was incubated at 50 °C with shaking (ELMI Ltd., SkyLine DTS-4 Digital Thermo Shaker, 900 rpm) for exactly 10 min. The reaction was stopped by precipitating high-molecular-weight dyed azo-CMC fragments with an ethanol-based precipitating solution (80% (v/v) technical grade ethanol, 40 g L−1 sodium acetate, 4 g L−1 ZnCl2, pH 5.0). The precipitated reaction mix was centrifuged at 1000 xg for 10 min. Afterward, 100 μL of the clear supernatant was transferred into an MTP, and the absorbance was measured at 590 nm (Tecan sunrise, Crailsheim, Germany).

Thermostability of EGLII

For the quantification of the endoglucanase EGLII thermostability, the hydrolytic activity was measured in two conditions: without incubation of the supernatant containing EGLII (Activityt0), and after incubation without the substrate at 78 °C for 60 min (Activityt60). The residual activity of the EGLII WT and variants was determined as the ratio between the Activityt60 and Activityt0. A variant improvement was determined as the ratio between the residual activity of the wildtype and variant. As described previously [38], EGLII variants that maintained > 80% of EGLII wild type initial activity and presented increased thermostability were selected.

Specific activity determination

The hydrolytic activity of the purified endoglucanase EGLII was determined by the dinitrosalicylic acid assay (DNS), which quantifies the amount of reducing sugars released in the reaction [42] as described in [38]. Briefly, 20 µL of EGLII solution was mixed with 80 µL carboxymethyl cellulose (CMC) solved in sodium acetate buffer (50 mM, pH 4.5) to a final concentration of CMC 1% (w/v) and incubated in a PCR cycler (96-PCR plate). The reaction mix was stopped with 200 µL of the DNS solution after exactly 10 min and incubated for 15 min at 95 °C to allow color revelation, followed by incubation for 10 min at 10 °C. The resulting color change was measured at 540 nm in a microtiter plate reader (Tecan Sunrise, Germany). The released sugar was calculated with cellobiose as standard. One unit of activity was defined as the amount of enzyme releasing 1 µmol of cellobiose equivalents from substrate per minute.

Generation of structural ensembles

As done previously [43], structural ensembles of wildtype EGLII (PDB ID 5L9C) were generated by all-atom MD simulations of in total 5 μs simulation time. For details on starting structure preparation, parametrization, equilibration, and production runs, see SI Method M1. All minimization, equilibration, and production simulations were performed with the pmemd.cuda module [44] of Amber19 [45]. During production simulations, we set the time step for the integration of Newton's equation of motion to 4 fs following the hydrogen mass repartitioning strategy [46]. Coordinates were stored into a trajectory file every 200 ps. This resulted in 5000 configurations for each production run that were considered for subsequent analyses.

Constraint network analysis

As done previously [43], the thermal unfolding simulations of wildtype EGLII was performed with the Constraint Network Analysis (CNA) software package (version 3.0) [21], [22], [23], [24]. For details on thermal unfolding simulations, see SI Method M2. To improve the robustness and investigate the statistical uncertainty, we carried out CNA on ensembles of network topologies (ENTMD) generated from structural ensembles (see section Generation of structural ensembles) [47]. During a thermal unfolding simulation, the stability map rc indicates for all residue pairs the Ecut value at which a rigid contact rc between the two residues i and j (represented by their Cα atoms) is lost; rc exists as long as i and j belong to the same rigid cluster c of the set of rigid clusters [22]. Thus, rc contains information about the rigid cluster decomposition cumulated over all network states σ during the thermal unfolding simulation. The sum over all entries in rc yields the chemical potential energy due to non-covalent bonding, based on the coarse-grained, residue-wise network representation of the underlying protein structure [31]. In the present study, we applied the neighbor stability map rc to investigate short-range rigid contacts. For this, as done previously [31], [33], [43], rc was filtered such that only rigid contacts between two residues that are at most 5 Å apart from each other were considered.

Evolutionary conservation analysis

The degree of conservation of each amino acidic position was determined with the ConSurf server [48]. The amino acid sequence of the endoglucanase EGLII from P. verruculosum (PDB ID 5L9C) was used as template for the alignment. The conservation score was obtained after the alignment of 150 sequences with a similarity between 90 and 35%. Results are represented on a scale from 1 (not conserved) to 9 (highly conserved) (Fig. S1 in SI).

Results and discussion

A main limitation that prevents a broader use of directed evolution in industrial applications is the time requirement of the campaigns. Through combined strategies of directed evolution and (semi-rational) design, like KnowVolution, experimental screening efforts can be minimized. The identification phase (Phase I, Fig. 1) of a KnowVolution campaign can be changed by the CNA method, which could further reduce the time-requirement and screening burden in a protein engineering campaign towards improved thermostability. The potential of CNA in the identification of beneficial position for protein engineering campaigns is evaluated by the analysis of two types of libraries generated for improving EGLII thermostability (see workflow in Fig. 1). First, we describe the directed evolution library generated by random mutagenesis; second, we explain the CNA approach for the prediction of positions (“structural weak spots”) towards increased thermostability, and, finally, the generation and screening of a semi-rationally designed library guided by CNA is described. We propose the CNA approach as the first step in a protein engineering campaign. The computational screening will envisage the generation of a reduced library for thermostability improvement.
Fig. 1

Protein engineering strategies for thermal stabilization of EGLII. Center: KnowVolution strategy with its four Phases (I, II, III, and IV); Left: A directed evolution campaign was performed in the complete endoglucanase EGLII gene by ep–PCR. Right: A semi-rational library design was performed, starting with a computational screening by the CNA approach, followed by an evolutionary conservation analysis, and finally, an SSM library of the 18 predicted “structural weak spots”.

Protein engineering strategies for thermal stabilization of EGLII. Center: KnowVolution strategy with its four Phases (I, II, III, and IV); Left: A directed evolution campaign was performed in the complete endoglucanase EGLII gene by ep–PCR. Right: A semi-rational library design was performed, starting with a computational screening by the CNA approach, followed by an evolutionary conservation analysis, and finally, an SSM library of the 18 predicted “structural weak spots”.

Random mutagenesis of EGLII towards increased thermostability

Randomly mutagenized libraries are commonly the first step in protein engineering campaigns for the identification of beneficial positions. Although it is a great advantage that no information is needed about the protein structure, the knowledge generated from random mutagenesis is not extensive. Frequently, improved variants can present several substitutions; therefore, it is difficult to acknowledge the role of each position in the improvement of the desired property (Fig. 1). In this work, a randomly mutagenized EGLII library was generated by ep-PCR. The ep-PCR library was optimized as described in “Library generation” section, and EGLII variants were screened towards increased thermostability as described previously [38]. The generated library was pre-screened in an agar plate-based assay, and later the library was enriched by transferring the active clones to a liquid culture and screened for improved thermostability utilizing an optimized Azo-CMC assay. In the agar plate-based assay, a library of ~8 000 clones was pre-screened and presented a 0.23 active/inactive ratio. An enriched library was produced with 1 890 active clones and was screened for improved thermostability. From the enriched library, 22 clones presented up to a 3.1-fold increase in thermostability compared to EGLII wild type, and after sequencing, 15 variants and 20 different substitutions were identified (Fig. 2). The identified variants carried single, double, and triple substitutions. In total, 18 different positions were identified, but as they constitute double or triple variants, it is difficult to distinguish with certainty between the substitutions that are neutral, improving, or reducing EGLII thermostability. It is also uncertain if the found substitutions represent the best amino acids to improve the thermostability. These 18 positions represent 5.7% of the total EGLII amino acids (18 positions out of 314 amino acids), and it cannot be determined if all these positions influence the thermostability of the enzyme.
Fig. 2

EGLII variants obtained from ep-PCR library. (A) Fifteen variants of EGLII with a significant thermostability improvement compared to the EGLII wild type. The improvement is defined as the ratio between the residual activity of the EGLII variants and the EGLII wild type, in AU. Given is the mean over experiments performed in biological replicates (n = 3). Error bars denote the standard error of the mean. (B) Representation of substituted positions (yellow sticks) in EGLII wildtype obtained from the ep–PCR library. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

EGLII variants obtained from ep-PCR library. (A) Fifteen variants of EGLII with a significant thermostability improvement compared to the EGLII wild type. The improvement is defined as the ratio between the residual activity of the EGLII variants and the EGLII wild type, in AU. Given is the mean over experiments performed in biological replicates (n = 3). Error bars denote the standard error of the mean. (B) Representation of substituted positions (yellow sticks) in EGLII wildtype obtained from the ep–PCR library. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.) At the variant level, the yield of clones with increased thermostability from the ep-PCR library was 0.27% (22 out of ~8 000 clones). The yield increased to 1.16% (22 out of 1 890 clones) when an enriched library of only active clones was selected for screening. The latter is in agreement with previous directed evolution campaigns towards improved thermostability, in which the improved clones represented <1% of the total and enriched library [49], [50], [51], [52].

CNA approach towards increased thermostability

Prediction of structural weak spots

For identifying structural weak spots on EGLII, thermal unfolding simulations were carried out by CNA (section Constraint Network Analysis) on structural ensembles of wildtype EGLII generated by MD simulations (section Generation of structural ensembles), as done previously [21], [31], [34], [43]. By visual inspection of the unfolding trajectory, four major phase transitions, T1 – T4, were identified (Fig. 3A). During unfolding, first, helix αA, second, helices αD and αG-I, then αC and αF, and, finally, αE segregated from the largest rigid cluster at 326, 338, 342, and 344 K. The hierarchy of rigid and flexible regions of EGLII showed that most helices segregated from the largest rigid cluster at T2, followed by T3. As most helices that are located at the C-terminus segregate from the largest rigid cluster at T2 and T3, this region is particularly promising for increasing thermostability, considering that substitutions there can improve the interaction strength with the largest rigid cluster and, hence, delay the disintegration of that cluster with increasing temperature.
Fig. 3

Prediction of the thermal unfolding pathway, local rigidity, and weak spots of wildtype EGLII. (A) Thermal unfolding pathway of wildtype EGLII (PDB ID: 5L9C [57]) showing four major phase transitions, T1-T4. The largest rigid cluster at each phase transition is represented as uniformly colored blue body. Helices that segregate from the largest rigid cluster at a phase transition are labeled. (B) Stability map rc for wildtype EGLII including Ecut values at which a rigid contact between two residues (i, j) is lost during the thermal unfolding simulation (upper triangle); the neighbor stability map rc for wildtype EGLII considers only the rigid contacts between two residues that are at most 5 Å apart from each other, with values for all other residue pairs colored gray (lower triangle). A red (blue) color indicates that contacts between residue pairs are more (less) rigid. α-helices and β-strands are depicted at the top. (C) Localization of predicted weak spots of wildtype EGLII (yellow spheres). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Prediction of the thermal unfolding pathway, local rigidity, and weak spots of wildtype EGLII. (A) Thermal unfolding pathway of wildtype EGLII (PDB ID: 5L9C [57]) showing four major phase transitions, T1-T4. The largest rigid cluster at each phase transition is represented as uniformly colored blue body. Helices that segregate from the largest rigid cluster at a phase transition are labeled. (B) Stability map rc for wildtype EGLII including Ecut values at which a rigid contact between two residues (i, j) is lost during the thermal unfolding simulation (upper triangle); the neighbor stability map rc for wildtype EGLII considers only the rigid contacts between two residues that are at most 5 Å apart from each other, with values for all other residue pairs colored gray (lower triangle). A red (blue) color indicates that contacts between residue pairs are more (less) rigid. α-helices and β-strands are depicted at the top. (C) Localization of predicted weak spots of wildtype EGLII (yellow spheres). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.) Next, weak spots were identified as the fringe residues of such helices. In doing so, we followed the hypothesis that the more structurally stable the fringes of the helices are, the more structurally stable those regions will become. Therefore, if the fringe residues are targeted by substitutions, the likelihood to stabilize the protein should be high. Additionally, rc and rc were calculated to monitor the local rigidity of EGLII wild type at a per-residue level (Fig. 3B). Both indices supported the above-mentioned observations. In total, 25 weak spots were predicted, i.e., 8% of all 314 EGLII residues (Fig. 3C, Table 1).
Table 1

Phase transitions during the thermal unfolding simulation of wildtype EGLII, predicted weak spots, and their evolutionary conservation scores.

Phase transitionWeak spotSecondary structure element[a]Conservation score[b]
T3Asp76Loop6
T3Thr77αC1
T3Gly92αC1
T3Lys93Turn3
T2Ile112Loop8
T2Ser114Loop6
T2Phe129Turn7
T2Lys130Turn2
T2Leu134Turn2
T3Gly181Beta Bridge9
T3Ala182Bend9
T2Asn189Loop5
T2Thr1903/10 helix1
T2Cys221Loop9
T2Val222Bend7
T2Ser240αG1
T2Leu244Loop4
T2Asn255Bend8
T2Ser256αH1
T2Ser273Turn3
T2Asp274Turn8
T2Asn299Turn3
T2Gly300Loop8
T2Ser308αI4
T2Thr312Turn1

Localization of the respective weak spot.

Values ≥ 8 are marked in bold and led to the exclusion of the weak spot from experimental analysis.

Phase transitions during the thermal unfolding simulation of wildtype EGLII, predicted weak spots, and their evolutionary conservation scores. Localization of the respective weak spot. Values ≥ 8 are marked in bold and led to the exclusion of the weak spot from experimental analysis. CNA has been applied before retro- [21], [30], [31], [32], [33] and prospectively [34], [35] in the context of improving protein thermostability on pairs [21], [30] and series [31], [32], [35] of proteins from psychro-, meso-, thermo-, and hyperthermophilic organisms and to proteins of different folds. Furthermore, the CNA approach was benchmarked against a complete site saturation mutagenesis library of Bacillus subtilis lipase A [53] for systematically scrutinizing the impact of substitution sites on thermostability and detergent tolerance [33]. Additionally, CNA was used to understand the impact of dimer interface stability on thermostability and the role of active site flexibility for turnover numbers in aldolases [35]. The breadth of these applications is rooted in the identification of structural weak spots [21] based on rigorous analysis of structural rigidity [25] applied to structural ensembles to improve robustness [47]. Hence, the use of CNA is not limited to a specific class or fold of proteins such that the strategy applied herein to identify structural weak spots can be transferred to non-TIM barrel proteins. This also includes structural water molecules, metal ions, and cofactors, which can be considered in the analysis. Finally, CNA has been used to understand signal propagation [54] and allosteric effects [55], [56] within biomolecules, primarily within membrane proteins, expanding the application scope of CNA. After the CNA identification, the evolutionary conservation of the selected weak spots was analyzed to further reduce experimental efforts [48]. A high conservation score is an indicator of the functional or structural importance of a position in the protein. Thus, weak spots with a conservation score ≥8 (Table 1, Fig. S1 in SI) were excluded from the experimental analysis. This resulted in the selection of 18 weak spots (D76, T77, G92, K93, S114, F129, K130, L134, N189, T190, V222, S240, L244, S256, S273, N299, S308, T312), which were targeted for site-saturation mutagenesis (SSM), allowing to focus substitution efforts on only ~6% of the protein residues. A similar percentage was found for weak spot predictions on BsLipA [33]. The selected weak spots are mainly located in the outer α-helices and loops (including turns, bends, and beta bridges) of the (β/α)8 barrel structure (Table 1, Fig. S1 in SI).

Effects of substitutions at weak spots on EGLII thermostability

The 18 selected weak spots were saturated by SSM to systematically probe the effects of the weak spots on the thermostability of EGLII. For each selected weak spot, primers were designed (Table S1 in SI), a site-saturation reaction was performed and individually transformed in P. pastoris. Each SSM library consisted of 180 clones, and 3 240 clones were screened in total (18 positions with 180 clones each), covering ~98% [58] of possible substitutions in each position. This results in a reduction of experimental efforts to ~40% compared with the pre-screening of ~8 000 clones of the ep-PCR library. In our opinion, experimental efforts derived from library production (PCR reaction, cloning, transformation, and library optimization) are comparable in ep-PCR and SSM approaches. On the basis of the SSM libraries, the selected weak spots showed different influences on the activity of EGLII, with an inactivation percentage ranging from 1 to 87% in positions 308 and 76, respectively (Table S2 in SI). At five weak spots (T77, V222, L244, S308, and T312), at least one substitution improved EGLII thermostability. The variant T312R showed the highest improvement (1.99-fold) compared to EGLII wildtype (Table 2). Of note, when T312 was substituted by S in the random mutagenesis approach (Fig. 2), a ~2.4-fold improvement was found, confirming that position 312 is a weak spot. Likewise, for S308P, between ~1.7 and ~1.4-fold improvement was found in the random mutagenesis approach (Fig. 2), whereas the improvement is 1.16-fold when screening the SSM libraries (Table 2). The improvement difference may result from different expression levels of the variants due to silent mutations present in the genes. These five positions represent 1.6% of the total EGLII amino acids (5 positions out of 314 amino acids). As for substitutions at predicted weak spots, six substitutions (T312R, T77V, T77E, L244R, V222P, and S308P) were found to yield increased thermostability.
Table 2

Variants identified in SSM libraries at predicted weak spots with increased thermostability.

Phase transitionSecondary structure elementWeak spotSubstitutionImprovement[a]
T2Turn312T312R1.99 ± 0.30
T3αC77T77V1.25 ± 0.04
T3αC77T77E1.24 ± 0.07
T2Loop244L244R1.23 ± 0.09
T2Bend222V222P1.16 ± 0.09
T2αI308S308P1.16 ± 0.16

[a] Improvement is defined as the ratio between the residual activity of the EGLII variants and the EGLII wild type in AU. Given is the mean ± SEM over n = 3 experiments performed in biological replicates.

Variants identified in SSM libraries at predicted weak spots with increased thermostability. [a] Improvement is defined as the ratio between the residual activity of the EGLII variants and the EGLII wild type in AU. Given is the mean ± SEM over n = 3 experiments performed in biological replicates. Saturation of 18 selected weak spots enables an accurate assessment of each positions' influence over EGLII thermostability. For example, in random mutagenesis, position T77 was found as a triple variant (Fig. 2). Therefore, it is unclear if thermostability improvement is driven from position T77, N299, or S308; or if they possess an additive effect. Substitutions must be single tested either by SDM or SSM to determine their influence on EGLII thermostability. Considering all identified variants by random mutagenesis (section “Random mutagenesis of EGLII towards increased thermostability”), 18 SSM libraries (or 18 SDM) should be produced additionally for each ep-PCR identified position. As a result, screening efforts would be incremented. High thermostability and high activity at lower temperatures are properties not found together in enzymes in nature [59], [60]. Protein engineering has advanced as an important toolbox for improving more than one feature within a protein. Through a screening system that select enzymes with increased thermostability and retained activity at lower temperatures, variants that meet both characteristics can be identified. The specific activity of the identified variants with improved thermostability (T312R, T77V, T77E, L244R, V222P, and S308P) was determined at 75 °C, EGLII wild type optimum temperature [38], and 30 °C. Endoglucanase EGLII wild type presents a specific activity of 249 ± 38 U mg−1 at 75 °C and 58 ± 4 U mg−1 at 30 °C (Table S3 in SI). It is noteworthy that all six variants obtained from the CNA strategy retained >90% of EGLII wild type specific activity at 75 °C, and 100% at 30 °C. The CNA strategy enables the identification of variants with improved thermostability and retained activity at lower temperatures. As to the structural basis of increased thermostability in EGLII variants, in both T312R and L244R a salt bridge with residues on neighboring secondary structure elements can form, which was absent in EGLII wildtype. Salt bridges are considered to enhance thermostability in the majority of cases [61] (Fig. S2 in SI). Concerning the other substitutions, favorable enthalpic contributions appear less likely to be determinants of thermostabilization. Position 77 is exposed to the solvent, such that direct interactions to neighboring residues are not possible (Fig. S2 in SI). In thermophilic proteins, glutamate residues in the protein surface can stabilize exposed structures by increasing the polar surface area, which could explain the stabilization of substitution T77E [62], [63], [64]. Substitution T77V may yield a stabilization of the N-terminal part of the helix because of the more favorable helix propensity of valine compared to threonine [65]. Positions 222 and 308 comprise substitutions to proline, which have a unique role in determining local conformation [66], [67] that may lead to an entropy-driven thermostabilization [68]. At the variant level, the screened 3 240 clones yielded a success rate of 0.18% (6 out of 3 240) of variants with increased thermostability. Depending on the influence of each selected weak spot in the activity of EGLII, the screening effort could be further reduced by pre-screening in an agar plate >180 clones in each position and doing an enrichment with just the necessary clones needed to fulfill a >98% of coverage [58]. If, analogously to ep-PCR, an enriched SSM library of only 1 660 active clones is produced (Table S2 in SI), the success rate can rise to 0.36% (6 out of 1 660) and further reduce the screening in MTP by ~40%. Within a KnowVolution campaign, the major screening load comes from Phases I and II and comprises libraries of thousands of clones. Therefore, the CNA approach could be used beneficially in Phase I (Identification) of a KnowVolution campaign for improved thermostability [38], and, consequently, reduce the screening effort further. Due to the lack of a complete site saturation library for EGLII, as available, e.g., for BsLipA [33] and the domain protein G (Gβ1) [69], the true number of weak spots in EGLII at which at least one substitution leads to increased thermostability remains unknown. Hence, the precision in random classification cannot be calculated rigorously and, thus, neither the gain in precision over random classification (gip) due to our CNA and conservation score analyses. Nevertheless, a lower estimate of gip is possible when one assumes that the 18 positions identified in the random mutagenesis approach constitute the true weak spots of the protein. Then, gip = (# of confirmed predicted weak spots / # of predicted weak spots) / (# of true weak spots / # of amino acids) = (5 / 18) / (18 / 314) = 4.8 [33], demonstrating a ~5-fold higher likelihood to identify weak spots by CNA and conservation score analyses over random identification. Note that, by the design of the CNA approach, the identification of structural weak spots aims at improving thermodynamic thermostability [3], [4], [31], whereas kinetic parameters leading to irreversible denaturation elude the CNA analysis. In this context, CNA has not yet been applied to scrutinize the effect of insertions or deletions on thermostability. Likely, the most direct application would be to identify structurally less stable loop regions that may give rise to local unfolding events, from which irreversible inactivation may occur, and suggest those for deletion. Alternatively, the impact of insertions or deletions suggested from sequence analyses on structural stability could be analyzed with CNA because such effects may not be restricted locally but lead to changes across the protein, presumably through packing changes [70]. Finally, several advantages arise from the SSM libraries generated from the CNA analysis compared with the random mutagenesis library. First, positions that improve EGLII thermostability can be identified with certainty. Second, it can be established which amino acid represents the best substitution for a position. Third, the influence of each position can be quantitatively determined for each improved variant.

Conclusion

Constraint Network Analysis (CNA) is a promising method for the identification of beneficial positions in Phase I of a KnowVolution campaign for thermostability improvement of the endo-β-glucanase Cel5A and can likely be applied to other enzymes. Screening efforts can be reduced to ~40% compared to a randomly mutagenized library based on an estimated ~5-fold higher likelihood to identify weak spots by CNA and conservation score analyses over random identification. The focused work performed in CNA-predicted weak spots yields a success rate (0.18%) in identifying variants with increased thermostability similar to random mutagenesis (0.27%). These results reduce time-requirements in directed evolution campaigns. The CNA-based identification of beneficial positions becomes particularly interesting if high-throughput screening systems are not available.

Author contributions

M.D.D. and H.G. conceived the study. F.C. conceived, planned, and performed the experiments, analyzed the results, and wrote the manuscript. C.N. conceived, planned the computer experiments, performed the computational analyses, analyzed the results, and wrote the manuscript. L.B. planned and performed the experiments, analyzed the results, and revised the manuscript. U.S., M.D.D., and H.G. discussed the results and revised the manuscript.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
  56 in total

1.  Influence of proline residues on protein conformation.

Authors:  M W MacArthur; J M Thornton
Journal:  J Mol Biol       Date:  1991-03-20       Impact factor: 5.469

2.  Conserved quantitative stability/flexibility relationships (QSFR) in an orthologous RNase H pair.

Authors:  Dennis R Livesay; Donald J Jacobs
Journal:  Proteins       Date:  2006-01-01

3.  Protein thermal stability: hydrogen bonds or internal packing?

Authors:  G Vogt; P Argos
Journal:  Fold Des       Date:  1997

Review 4.  Directed evolution 2.0: improving and deciphering enzyme properties.

Authors:  Feng Cheng; Leilei Zhu; Ulrich Schwaneberg
Journal:  Chem Commun (Camb)       Date:  2015-06-18       Impact factor: 6.222

5.  Thermostability improvement of Aspergillus awamori glucoamylase via directed evolution of its gene located on episomal expression vector in Pichia pastoris cells.

Authors:  Alexander Schmidt; Alexey Shvetsov; Elena Soboleva; Yury Kil; Vladimir Sergeev; Marina Surzhik
Journal:  Protein Eng Des Sel       Date:  2019-12-31       Impact factor: 1.650

6.  Assessing directed evolution methods for the generation of biosynthetic enzymes with potential in drug biosynthesis.

Authors:  David P Nannemann; William R Birmingham; Robert A Scism; Brian O Bachmann
Journal:  Future Med Chem       Date:  2011-05       Impact factor: 3.808

7.  Cellulases of Penicillium verruculosum.

Authors:  Valeria V Morozova; Alexander V Gusakov; Ruslan M Andrianov; Artyom G Pravilnikov; Dmitry O Osipov; Arkady P Sinitsyn
Journal:  Biotechnol J       Date:  2010-08       Impact factor: 4.677

8.  Stability and function: two constraints in the evolution of barstar and other proteins.

Authors:  G Schreiber; A M Buckle; A R Fersht
Journal:  Structure       Date:  1994-10-15       Impact factor: 5.006

9.  PoPMuSiC 2.1: a web server for the estimation of protein stability changes upon mutation and sequence optimality.

Authors:  Yves Dehouck; Jean Marc Kwasigroch; Dimitri Gilis; Marianne Rooman
Journal:  BMC Bioinformatics       Date:  2011-05-13       Impact factor: 3.307

10.  Trading off stability against activity in extremophilic aldolases.

Authors:  Markus Dick; Oliver H Weiergräber; Thomas Classen; Carolin Bisterfeld; Julia Bramski; Holger Gohlke; Jörg Pietruszka
Journal:  Sci Rep       Date:  2016-01-19       Impact factor: 4.379

View more
  2 in total

Review 1.  Fungal cellulases: protein engineering and post-translational modifications.

Authors:  Ruiqin Zhang; Chenghao Cao; Jiahua Bi; Yanjun Li
Journal:  Appl Microbiol Biotechnol       Date:  2021-12-10       Impact factor: 4.813

2.  Critical assessment of structure-based approaches to improve protein resistance in aqueous ionic liquids by enzyme-wide saturation mutagenesis.

Authors:  Till El Harrar; Mehdi D Davari; Karl-Erich Jaeger; Ulrich Schwaneberg; Holger Gohlke
Journal:  Comput Struct Biotechnol J       Date:  2021-12-16       Impact factor: 7.271

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.