Literature DB >> 32258884

Evaluating Protein Engineering Thermostability Prediction Tools Using an Independently Generated Dataset.

Peishan Huang1, Simon K S Chu1, Henrique N Frizzo2, Morgan P Connolly3, Ryan W Caster2, Justin B Siegel2,4,5.   

Abstract

Engineering proteins to enhance thermal stability is a widely utilized approach for creating industrially relevant biocatalysts. The development of new experimental datasets and computational tools to guide these engineering efforts remains an active area of research. Thus, to complement the previously reported measures of T 50 and kinetic constants, we are reporting an expansion of our previously published dataset of mutants for β-glucosidase to include both measures of T M and ΔΔG. For a set of 51 mutants, we found that T 50 and T M are moderately correlated, with a Pearson correlation coefficient and Spearman's rank coefficient of 0.58 and 0.47, respectively, indicating that the two methods capture different physical features. The performance of predicted stability using nine computational tools was also evaluated on the dataset of 51 mutants, none of which are found to be strong predictors of the observed changes in T 50, T M, or ΔΔG. Furthermore, the ability of the nine algorithms to predict the production of isolatable soluble protein was examined, which revealed that Rosetta ΔΔG, FoldX, DeepDDG, PoPMuSiC, and SDM were capable of predicting if a mutant could be produced and isolated as a soluble protein. These results further highlight the need for new algorithms for predicting modest, yet important, changes in thermal stability as well as a new utility for current algorithms for prescreening designs for the production of mutants that maintain fold and soluble production properties.
Copyright © 2020 American Chemical Society.

Entities:  

Year:  2020        PMID: 32258884      PMCID: PMC7114132          DOI: 10.1021/acsomega.9b04105

Source DB:  PubMed          Journal:  ACS Omega        ISSN: 2470-1343


Introduction

A common goal of enzyme engineering is the enhancement of thermal stability.[1] For industrial applications, improving a protein’s robustness to thermal challenges or half-life at elevated temperatures can often be the deciding factor for the commercialization of a biocatalyst.[2−5] Currently, the most common approach for improving thermal stability is through directed evolution methodologies,[6,7] which can be time-consuming, costly, and limited in the ability to search sequence space. Computational design tools to predictably identify single and combinatorial mutations that enhance thermal stability are rapidly developing and growing in popularity.[8−14] However, accurate predictions using computational tools to guide protein stability design remain an active area of research and is not always successful. The use of large datasets on the mutational effect on protein stability, such as ProTherm[15] now maintained by ProtaBank,[16] is often used to train computational methods for predicting thermal stability. The datasets utilized generally consist of the equilibrium constant of unfolding (Ku) or the melting temperature of an enzyme (TM).[17] In our previous study, we determined the thermal stability of 79 β-glucosidase B (BglB) variants by finding T50, a type of kinetic stability that is determined by the temperature at which a mutant’s residual activity is reduced by 50% after a heat challenge over a defined time.[4,17,18] When analyzing this set of mutants using two established computational programs (Rosetta ΔΔG and FoldX PSSM) for predicting thermal stability, we found that there was no significant correlation between the predictions and the observed T50.[19] One hypothesis explaining the poor predictive performance of the algorithms with the BglB dataset is that the algorithms are evaluated on TM, a direct measure of structural thermal stability. However, the algorithms were being used to predict T50, which is an indirect measure of the protein’s thermal stability.[17] Alternatively, the poor performance could have come from the narrow T50 range (extreme variants are +6.06 and −5.02 °C from the wild type (WT)) as the algorithms are generally benchmarked on larger changes in thermal stability and ±3 °C may be within the error of the currently developed algorithms. In this study, we evaluated both hypotheses. To assess if there was a significant difference in TM and T50, we developed a dataset of 51 BglB mutants (Figure ) in which both thermal stability measurements, T50 and TM, were measured. Interestingly, for the set of 51 measurements, there was only a modest correlation between T50 and TM, with a Pearson coefficient correlation (PCC) and Spearman’s rank correlation (SRC) of 0.58 and 0.47, respectively. This highlights the difference in the physical properties being measured using these two techniques, TM being the thermal stability of the protein’s structural elements and T50 reporting on the thermal stability to irreversible denaturation. However, similar to the previous study,[19] the relationship between the predicted stability with the experimental TM only results in a weak correlation not only with the previous algorithms evaluated (Rosetta ΔΔG and FoldX PSSM) but also with five other commonly used methods: ELASPIC, DeepDDG, PoPMuSiC, SDM, and AUTO-MUTE. This result suggests that while the two measurements are reporting on different physical properties, this is not the key factor that led to the low predictive accuracy of established algorithms on this dataset.
Figure 1

Structure of BglB (PDB ID: 2JIE) from the bacterium Paenibacillus polymyxa. PyMOL rendering[20] of BglB showing the 28 sequence-positions (teal spheres) of the 51 mutants chosen out of the original 92 previously expressed proteins for the TM analysis.[19] The reaction scheme of the hydrolysis of 4-nitrophenyl β-d-glucopyranoside by BglB used in the T50 study.[19]

Structure of BglB (PDB ID: 2JIE) from the bacterium Paenibacillus polymyxa. PyMOL rendering[20] of BglB showing the 28 sequence-positions (teal spheres) of the 51 mutants chosen out of the original 92 previously expressed proteins for the TM analysis.[19] The reaction scheme of the hydrolysis of 4-nitrophenyl β-d-glucopyranoside by BglB used in the T50 study.[19] To evaluate the second hypothesis, that the changes in thermal stability of the BglB dataset are too small for current algorithms, we investigated the ability of the algorithms to predict if a mutation reduced thermal stability to the point that the protein could no longer be produced and isolated in a soluble form. Analysis of the computational algorithms to predict destabilization to the point where no soluble protein could be isolated showed a significant enrichment based on the calculated energetics of the mutants for several algorithms, the most significant of which is for Rosetta ΔΔG. This supports the hypothesis that the lack of performance on the BglB dataset is due to the narrow range in changes observed for thermal stability. These slight molecular changes, especially interactions that are less than 1 kcal/mol, are challenging to accurately model. This highlights the need for new algorithms for predicting modest, yet important, changes in thermal stability as well as a new utility for current algorithms for prescreening designs for the production of mutants likely to maintain protein structure and be produced as a soluble protein.

Methods

Mutant Selection, Protein Expression, and Purification

Out of 79 mutants of BglB that were previously characterized with T50 data,[19] 51 variants with plasmid readily available were transformed into chemically competent Escherichia coli BLR (DE3) cells. The variants were produced and purified, as previously described.[14] Expression was carried out by growing a 5 mL overnight culture in a 50 mL falcon tube with a breathable seal in Terrific broth (TB) medium with kanamycin while shaking at 250 rpm at 37 °C. After the initial overnight culture, cells were spun down and resuspended in fresh TB with kanamycin with 1 mM isopropyl β-d-1-thiogalactopyranoside in a 50 mL falcon tube with a breathable seal and incubated while shaking at 250 rpm at 18 °C for 24 h. Then, the cells were spun down, lysed, and purified using immobilized metal ion affinity chromatography, as previously described.[19] The purity of the protein samples was analyzed using 12–14% SDS-PAGE (Figure SI 1-1), and the yield was assessed based on the A280 for proteins that appeared >75% pure in the SDS-PAGE analysis. Protein samples were considered expressed if they were detectable in the SDS-PAGE analysis and greater than 0.10 mg/mL using A280, as previously described.[19]

Melting Temperature Assay

The melting temperature (TM) of BglB was determined using the Protein Thermal Shift (PTS) kit (Applied Biosystems, from Thermo Fisher Scientific). Standard protocols provided by the manufacturer were used. Protein concentrations ranged from 0.1–0.5 mg/mL, and fluorescence reading was monitored with a QuantaStudio 3 system from 20 to 90 °C. The temperature melting curve was first smoothed with a 20 step sliding window average (Script SI 2). TM was determined from the average of three to four replicates at which the derivative was largest, and all melting curves can be found in Figure SI 3.

ΔG Calculations from TM

Calculations were conducted, as previously described.[21] First, we assumed that the protein follows the two-state folding mechanism, a binary conversion of native state to full denaturation. Second, to derive ΔG°unfolding, the fluorescence intensity was first translated into fractions of folded (Pf) and unfolded (Pu) proteins of the linear portion of the graph at different temperatures starting from the minimum fluorescence (Fmin) to the maximum fluorescence (Fmax) shown in eq . By taking a two-state folding–unfolding model, the equilibrium constant of unfolding (Ku) at different temperatures is then given by We plotted ln Ku against 1/T using the van’t Hoff method shown in eq (Script SI 2), the is defined by the slope, is the y-intercept, T is temperature, and R is the ideal gas constant. The Gibbs free energy of protein-unfolding can then be determined using eq , where ΔG°unfolding is the unfolding energy at a TRT of 298 K. All calculations can be found in Script SI 2.

Molecular Modeling

Seven popular, readily accessible, and recently developed molecular modeling methods, many of them force-field and machine-learning-based, were evaluated for their ability to recapitulate the experimental data: Rosetta ΔΔG,[22] FoldX,[23] ELASPIC,[24] DeepDDG,[25] PoPMuSiC,[26] SDM,[27] and AUTO-MUTE (DDG).[28] The crystal structure of BglB (PDB ID: 2JIE) was used across seven different algorithms. First, using a previously described method,[19] the 2JIE structure was used as input to the Rosetta ΔΔG application and run, as previously described (Script SI 5). Briefly, 50 poses of the WT and the mutant were generated for which 15 energy terms were reported from the score function used.[22] The three lowest system energy scores out of 50 from WT and the mutant were averaged to give the final Rosetta ΔΔG score. Second, for the FoldX position-specific scoring metric (PSSM) protocol, the 2JIE structure was first minimized for any potential inaccurate rotamer assignment using the RepairPDB application.[23] The repaired PDB structure was mutated with single-point mutants and then modeled using FoldX PSSM. The model was scored based on 17 terms within the FoldX force-field.[23] Third, the ELASPIC protocol first constructed a homology model of the WT using the crystal structure, sequence, molecular, and energetics information. Using the standard procedure described, the FoldX algorithm was used to construct the mutant model. The final mutational change is predicted using Stochastic Gradient Boosted Decision Trees based on the energetic, chemical, and structural features from FoldX.[24,29] Fourth, using a curated dataset derived from the Protherm database,[15] DeepDDG used their previously described shared residue pair neural network structure to make a prediction of stability.[25] The DeepDDG output indicated that >0 kcal/mol could be considered stable, whereas <0 kcal/mol could be considered unstable. Fifth, PoPMuSiC estimated the stability of the WT structure and mutants using 13 statistically potential terms, and an additional two terms that account for the volume differences of the residues between WT and the mutant.[26] Sixth, the SDM method evaluated mutational changes using a statistical potential energy function based on environment-specific substitution tables. These tables consisted of data such as structural information, solvent accessibility of the sidechain, and hydrogen bonding.[27] Lastly, similar to SDM, the seventh method, AUTO-MUTE, which predicts for ΔTM and ΔΔG, utilized a statistical potential to calculate the environmental changes of the residue compared to the WT.[28] The protocol was performed using tree regression at 23 °C and pH 7.5. Apart from predicting ΔΔG, two additional methods were used to evaluate the algorithms’ ability to predict ΔTM changes. As mentioned above, the AUTO-MUTE prediction using the Stability Changes (ΔTM) protocol was performed with tree regression. Also, HotMuSiC was used to evaluate the mutational effect with the temperature-dependent potential and other statistical potential terms such as solvent accessibility, structural, and sequence-based information.[30] Pearson correlation coefficient (PCC) and Spearman’s rank correlation (SRC) analyses were performed between their respective ΔΔG (ΔΔG = ΔGmutants – ΔGWT) or the change in total system energy (ΔTSE) of the nine computational methods. Additionally, the available individual features within the Rosetta ΔΔG and FoldX PSSM force field were further evaluated against the TM dataset for correlation. Finally, ΔTSE was evaluated against mutants that could be isolated as a soluble protein and those that lost structural integrity and either precipitated or were degraded and therefore could no longer be isolated as a soluble protein (nonisolated). A Student’s t-test was used to obtain p-values for the nine computational methods. The two categories between isolated and nonisolated protein were treated as an independent sample using an unequal variance.

Results

Evaluating the Relationship between TM and T50

To the best of our knowledge, there has not been a large dataset (>50 data points) directly comparing the TM and T50 relationship for a single set of protein mutants uniformly produced and characterized. It is important to distinguish both TM and T50 methods since the measurements are quantifying and reporting different structural and functional properties. TM is the temperature at which half the enzyme is found in the unfolded state over the folded state.[17,31] This is often evaluated through denaturation assays, where the thermodynamic measurements (ΔGunfolding) can be obtained.[31] This method is generally a lower throughput method as purified protein is required to obtain an accurate measurement for the structural properties for the mutant being evaluated. T50 measures the temperature of half-inactivation that leads to irreversible unfolding,[11,32] and it is determined by the reduction of half of the enzymatic activity due heat challenges.[17] This is a very common assay for protein engineering due to its compatibility with high-throughput assays and the ability to use cell lysates to evaluate function. To complement our previously measured dataset of T50, 51 of the 92 expressed proteins with available plasmids[19] were selected and evaluated for TM using the Protein Thermal Shift assay to compare T50 and TM. The WT BglB TM was determined to be 45.97 ± 1.03 °C, while the previously determined T50 was 39.9 ± 0.1 °C.[19] When evaluating the entire dataset, the TM ranged between 37.1 and 54.3 °C, slightly larger than what was observed for T50, which was between 34.9 and 46.0 °C (Figure SI 1-2). The variant that had the highest TM in this dataset was E167A, with a ΔTM of 54.3 °C (+8.33 °C), which was also observed to have a similar increase in T50 compared with the WT (+6.06 °C).[19] The variant that had the lowest TM in this dataset was found to be E225A, with a ΔTM of −8.9 °C, which had a corresponding T50 of −3.1 °C. The relationship between T50 and TM is plotted in Figure A. The PCC and SRC of 0.58 and 0.47, respectively, indicate that the two methods are moderately positively correlated. Correlation between methods increased in cases where mutations resulted in extremely stable and unstable products, for example, E167A and E225A, respectively. This is an expected result for small changes (<3 °C) in thermal stability; the differences in measurement methods would be expected to play a more significant role than for larger changes (>5 °C). The evaluation of ΔTM and ΔT50 with experimentally derived ΔΔG is also plotted in Figure B,C, respectively. The PCC and SRC show that the TM method and experimentally derived ΔΔG are strongly correlated (PCC and SRC of −0.76), compared to those between ΔΔG and T50 (PCC of −0.35 and SRC of −0.24).
Figure 2

Comparison of two different experimental thermal stability datasets and experimentally derived ΔΔG. (A) Relationship for each mutant between T50 and TM. The PCC of 0.58 illustrates that the two methods are modestly positively correlated with mutations that are in the extreme ends of the temperature range (±5 °C). (B) Evaluation of ΔTM with experimentally derived ΔΔG shows the two qualities are highly correlated (PCC = −0.76), unlike (C) where the relationship between ΔT50 and experimentally derived ΔΔG has a PCC of −0.35.

Comparison of two different experimental thermal stability datasets and experimentally derived ΔΔG. (A) Relationship for each mutant between T50 and TM. The PCC of 0.58 illustrates that the two methods are modestly positively correlated with mutations that are in the extreme ends of the temperature range (±5 °C). (B) Evaluation of ΔTM with experimentally derived ΔΔG shows the two qualities are highly correlated (PCC = −0.76), unlike (C) where the relationship between ΔT50 and experimentally derived ΔΔG has a PCC of −0.35.

Evaluating Computational Stability Tools Using the BglB TM Dataset

The computational evaluation of protein stability of the current experimental TM dataset was analyzed in the same manner as our previous study on T50.[19] An energetically evaluated model for each mutant was generated using established computational methods and subsequently plotted as a function of TM to evaluate the calculated energies related to the observed TM. The PCC and SRC for the most commonly assessed term, the ΔTSE, was found to be highest for FoldX PSSM (PCC of −0.34 and SRC of −0.35) with ΔTM (Figure ). Similarly, the FoldX PSSM correlations with experimentally derived ΔT50 data were found to be −0.21 and −0.16 for PCC and SRC, respectively. The overall relationship between the ΔTSE and the ΔTM thermal stability dataset slightly improved for FoldX, DeepDDG, PoPMuSiC, and AUTO-MUTE (Figure SI 1-3), while Rosetta ΔΔG and ELASPIC remained relatively unchanged with no significant correlation. Interestingly, SDM was the only method where the correlation with ΔT50 is stronger than that of ΔTM (Figure SI 1-3).
Figure 3

Evaluation of the algorithms ΔTSE versus the experimentally derived ΔΔG and the TM and T50 datasets. The Pearson correlation coefficient and Spearman’s rank correlation for each performance against three types of experimental data were determined. Five representative comparisons are illustrated above, with four additional algorithms, SDM, AUTO-MUTE (DDG), AUTO-MUTE (ΔTM), and HoTMuSiC provided in Figure SI 1-3. No algorithm resulted in a significant correlation between the calculated energies and the observed TM, T50, or ΔΔG for this dataset.

Evaluation of the algorithms ΔTSE versus the experimentally derived ΔΔG and the TM and T50 datasets. The Pearson correlation coefficient and Spearman’s rank correlation for each performance against three types of experimental data were determined. Five representative comparisons are illustrated above, with four additional algorithms, SDM, AUTO-MUTE (DDG), AUTO-MUTE (ΔTM), and HoTMuSiC provided in Figure SI 1-3. No algorithm resulted in a significant correlation between the calculated energies and the observed TM, T50, or ΔΔG for this dataset. An analysis of individual energetic term from Rosetta ΔΔG and FoldX PSSM did not uncover any specific feature in either method’s energetic evaluation that was strongly correlated with the TM dataset, as was previously observed for the T50 dataset[19] (Figure SI 4). The strongest PCC for TM against any of the available energetic terms was 0.39 for the Δbackbone clash term from FoldX PSSM and −0.31 for the Omega energy term from Rosetta ΔΔG. To be consistent with the previous performance assessment, we also evaluated the algorithms on experimentally derived ΔΔG in this dataset (Figure ). The PCC and SRC of 0.39 and 0.36, respectively, between experimental ΔΔG and ΔTSE for FoldX PSSM outperformed six other algorithms that were compared. The correlation between experimental ΔΔG with ΔTSE was not unexpected as TM showed a correlation with ΔΔG with a PCC and SRC of −0.76 (Figure B). Analysis of AUTO-MUTE and HotMuSiC to predict for ΔTM revealed no significant correlation with the experimental ΔTM (Figure SI 1-3). Based on this analysis, it is apparent that the general performance of all given methods at best only weakly correlates with the experimentally determined effects of the mutations. This data fails to support the hypothesis that the lack of a previously observed correlation of these established computational tools with observed changes in thermal stability in the BglB dataset is due to the difference in the physical property being measured.

Prediction of Mutant Soluble Expression

The current dataset consists primarily of modest changes in thermal stability of <5 °C, calculated to be ±4 kcal/mol of the WT, and therefore may be challenging for current computational methods to predict. However, this change has only been analyzed in a fraction of the 129 mutants tested in the overall BglB dataset. Of the 129 mutants, only 92 were found to be produced and isolated in a soluble form. All purification procedures are conducted at ∼20 °C. Since the WT has a T50 of 39.9 °C, any reduction in T50 of >18 °C would result in a loss of structural stability from which insoluble aggregates or proteolytic degradation would readily occur during production and purification. In this case, the proteins would no longer be able to be isolated in a soluble form similar to the WT protein. Therefore, it seemed pertinent to evaluate if any of the nine algorithms could differentiate variants in this dataset that could be isolated as a soluble protein versus those that were not able to be separated as a soluble protein. For this evaluation, all of the previously reported 129 mutants were assessed using the nine algorithms following the same methods used for T50 and TM. A mutant was generally considered soluble if it was observed on an SDS-PAGE analysis and had an A280 >0.1 mg/mL. The WT protein produced using the methods described generally resulted in an average A280 of 1.5 mg/mL, which would provide a >10-fold change in yield for mutants having an A280 less than 0.1 mg/mL. While factors other than thermal stability can affect production and isolation of soluble protein, in this case, it is assumed that the primary factor that decreases soluble protein yield is from denaturation of the mutant protein either during expression or purification. The results of this analysis are presented in Figure .
Figure 4

Computational prediction for the effect on mutant soluble protein production using nine different algorithms. From left to right: Rosetta ΔΔG, FoldX PSSM, ELASPIC, DeepDDG, PoPMuSiC, SDM, AUTO-MUTE (DDG), AUTO-MUTE (ΔTM), and HoTMuSiC of soluble (green) and nonisolated protein (blue). In this case, mutants that resulted in a significant (>10-fold) decrease in yield of purified soluble protein are considered nonisolatable. Significance in population differences was determined using a Student’s t-test. The units (ΔTSE and ΔTM) of all algorithms are individually normalized between 1 to −1. For visualization purposes, outliers were omitted after normalization. Each graph without normalization and with outliers can be found in Figure SI 1-4 and all raw values in Figure SI 4.

Computational prediction for the effect on mutant soluble protein production using nine different algorithms. From left to right: Rosetta ΔΔG, FoldX PSSM, ELASPIC, DeepDDG, PoPMuSiC, SDM, AUTO-MUTE (DDG), AUTO-MUTE (ΔTM), and HoTMuSiC of soluble (green) and nonisolated protein (blue). In this case, mutants that resulted in a significant (>10-fold) decrease in yield of purified soluble protein are considered nonisolatable. Significance in population differences was determined using a Student’s t-test. The units (ΔTSE and ΔTM) of all algorithms are individually normalized between 1 to −1. For visualization purposes, outliers were omitted after normalization. Each graph without normalization and with outliers can be found in Figure SI 1-4 and all raw values in Figure SI 4. Of the nine algorithms evaluated, Rosetta ΔΔG, FoldX, DeepDDG, PoPMuSiC, and SDM can capture the enrichment of mutants isolated as a soluble protein. The differences were evaluated for statistical significance using the Student’s t-test, and the highest among the top five methods was shown for Rosetta ΔΔG with a p-value of 1.0 × 10–5. In contrast, enrichment was lower for ELASPIC, AUTO-MUTE (DDG), AUTO-MUTE (ΔTM), and HotMuSiC with p-values of 0.06, 0.38, 0.65, and 0.07, respectively. A few outliers were observed in all methods, except for ELASPIC (Figure SI 1-4). For example, the mutant G15N for both Rosetta ΔΔG and FoldX PSSM was identified as severely energetically unfavorable, which is consistent with the observation that this variant was not able to be isolated as a soluble protein.

Discussion

Both TM and T50 are methods commonly used to quantify different physical aspects of protein thermal stability; however, to date, there has been relatively little experimental data collected to empirically evaluate the relationship of these two measurements. Using a dataset of 51 protein mutants, we observed that there is a moderate positive correlation (PCC of 0.58 and SRC of 0.47) between these two properties. The theory comparing two methods has been extensively described in the work of Hei and Clark.[33] Briefly, T50 can only be used to assess the temperature at which half of the protein is irreversibly unfolded. Meanwhile, TM provides information on the folded state of the protein regardless of whether or not the unfolding events are irreversible. Therefore, it is not surprising that there is only a moderate correlation between the relationship of T50 and TM. Mutants with extreme stability changes, such as E164A (>6 °C), usually exhibit a similar magnitude of change in TM and T50 results. However, the majority of the mutants show a change of ∼3 °C or less in this TM and T50 dataset being analyzed, a range in which the relationship between TM and T50 appears to be weaker. Therefore, analysis with larger datasets with more extreme stability changes may reveal an even stronger correlation between these two properties. The relationship between ΔTM and the experimentally derived ΔΔG of this dataset (PCC and SRC of −0.76) is not expected to reach a perfect correlation since it is dependent on the temperature at which ΔΔG was evaluated, as described by Pucci et al.[34] For example, the ΔΔG evaluated at TM of the WT will yield a correlation closer to −1 and ΔΔG(25°C) will lead to a lower correlation (−0.68).[34] Consistent with our previous analysis, we found a lack of performance using established computational tools when predicting TM and T50 from the WT for this dataset. According to Jia et al., stability prediction using the experimentally derived free energy change of unfolding ΔΔG (kcal/mol) outperforms the prediction using ΔTM (°C).[35] However, in this case, we saw no significant change in the predictive performance for all seven computational tools compared to the experimentally derived free energy change. In addition, we found TM and ΔΔG to be strongly correlated with this dataset, which may suggest that the improved performance is only relevant for more diverse datasets composed of different proteins as opposed to mutants of a single protein. While none of the computational methods demonstrated a strong predictive power for the mutants in this study, Rosetta ΔΔG, FoldX PSSM, ELASPIC, and PoPMuSiC all have previously been shown to have high correlations with experimental data (PCC between 0.69 to 0.83).[22,26,29,36] This dataset with an experimental ΔΔG range of ±∼4 kcal/mol is within the majority of the mutants observed in the algorithms that were typically evaluated on (+8 to −5 kcal/mol).[17,25,26,29] One potential reason for the lack of performance could be that the structure used in this dataset has a ligand bound structure, and often, the structures used in the development of the methods were apoprotein structures. However, using the PDBFlex database,[37] a clustering of five available PDBs of BglB from the bacterium P. polymyxa showed an average RMSD of 0.234 and a maximum RMSD of 0.274, thus making BglB a rigid structure. As there are no significant structural changes between the apo-form and holo-form of the protein, it is unlikely that the exact structure used for this study resulted in the low level of performance by the algorithms. Another possibility is that the protein evaluated here (BglB) is an outlier when compared to the proteins used to develop the algorithms in terms of its structure–function relationship. However, a related study to our analysis has been conducted for human superoxide dismutase 1 in which a low correlation is observed between experimental and predicted stability.[38] This further validates that current algorithms have limited utility for proteins outside of those they were benchmarked on. This limitation hindered by an over-representation of protein families such as lysozyme, tryptophan synthase, and ribonuclease in curated datasets is often utilized in benchmarking.[39] Thus, this highlights the importance of generating high-quality and diverse datasets of more proteins for evaluating and training new computational tools. This study underlines the need for new computational tools that can more accurately predict modest changes, rather than major changes, in thermal stability. This becomes particularly important because single-point mutants often increase thermal stability by a few degrees at a time, while major changes are more often produced from the synergistic effect of combining multiple mutations.[11,40−42] Furthermore, as larger datasets of protein mutants with explicitly measured biophysical properties are generated, opportunities to explore combinations of molecular modeling and machine learning methods will become practical. These algorithms and datasets will enable the development of robust predictors of thermal stability.
  38 in total

1.  ProTherm, version 2.0: thermodynamic database for proteins and mutants.

Authors:  M M Gromiha; J An; H Kono; M Oobatake; H Uedaira; P Prabakaran; A Sarai
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

2.  Solvent and mutation effects on the nucleation of amyloid beta-protein folding.

Authors:  Luis Cruz; Brigita Urbanc; Jose M Borreguero; Noel D Lazo; David B Teplow; H Eugene Stanley
Journal:  Proc Natl Acad Sci U S A       Date:  2005-12-09       Impact factor: 11.205

3.  Protein stability promotes evolvability.

Authors:  Jesse D Bloom; Sy T Labthavikul; Christopher R Otey; Frances H Arnold
Journal:  Proc Natl Acad Sci U S A       Date:  2006-03-31       Impact factor: 11.205

4.  Estimation of melting curves from enzymatic activity-temperature profiles.

Authors:  D J Hei; D S Clark
Journal:  Biotechnol Bioeng       Date:  1993-11-20       Impact factor: 4.530

5.  DeepDDG: Predicting the Stability Change of Protein Point Mutations Using Neural Networks.

Authors:  Huali Cao; Jingxue Wang; Liping He; Yifei Qi; John Z Zhang
Journal:  J Chem Inf Model       Date:  2019-02-25       Impact factor: 4.956

Review 6.  Principles of Protein Stability and Their Application in Computational Design.

Authors:  Adi Goldenzweig; Sarel J Fleishman
Journal:  Annu Rev Biochem       Date:  2018-01-26       Impact factor: 23.643

7.  Role of conformational sampling in computing mutation-induced changes in protein structure and stability.

Authors:  Elizabeth H Kellogg; Andrew Leaver-Fay; David Baker
Journal:  Proteins       Date:  2010-12-03

8.  PoPMuSiC 2.1: a web server for the estimation of protein stability changes upon mutation and sequence optimality.

Authors:  Yves Dehouck; Jean Marc Kwasigroch; Dimitri Gilis; Marianne Rooman
Journal:  BMC Bioinformatics       Date:  2011-05-13       Impact factor: 3.307

9.  Role of simple descriptors and applicability domain in predicting change in protein thermostability.

Authors:  Kenneth N McGuinness; Weilan Pan; Robert P Sheridan; Grant Murphy; Alejandro Crespo
Journal:  PLoS One       Date:  2018-09-07       Impact factor: 3.240

10.  Combining structural modeling with ensemble machine learning to accurately predict protein fold stability and binding affinity effects upon mutation.

Authors:  Niklas Berliner; Joan Teyra; Recep Colak; Sebastian Garcia Lopez; Philip M Kim
Journal:  PLoS One       Date:  2014-09-22       Impact factor: 3.240

View more
  8 in total

1.  Turning Failures into Applications: The Problem of Protein ΔΔG Prediction.

Authors:  Rita Casadio; Castrense Savojardo; Piero Fariselli; Emidio Capriotti; Pier Luigi Martelli
Journal:  Methods Mol Biol       Date:  2022

Review 2.  Protein Function Analysis through Machine Learning.

Authors:  Chris Avery; John Patterson; Tyler Grear; Theodore Frater; Donald J Jacobs
Journal:  Biomolecules       Date:  2022-09-06

3.  Site-wise Diversification of Combinatorial Libraries Using Insights from Structure-guided Stability Calculations.

Authors:  Benedikt Dolgikh; Daniel Woldring
Journal:  Methods Mol Biol       Date:  2022

4.  Mutational Analysis of Ocriplasmin to Reduce Proteolytic and Autolytic Activity in Pichia pastoris.

Authors:  Roghayyeh Baghban; Safar Farajnia; Younes Ghasemi; Reyhaneh Hoseinpoor; Azam Safary; Mojtaba Mortazavi; Nosratollah Zarghami
Journal:  Biol Proced Online       Date:  2020-12-13       Impact factor: 3.244

5.  Large-scale application of free energy perturbation calculations for antibody design.

Authors:  Fangqiang Zhu; Feliza A Bourguet; William F D Bennett; Edmond Y Lau; Kathryn T Arrildt; Brent W Segelke; Adam T Zemla; Thomas A Desautels; Daniel M Faissol
Journal:  Sci Rep       Date:  2022-07-21       Impact factor: 4.996

6.  Oxidative Implications of Substituting a Conserved Cysteine Residue in Sugar Beet Phytoglobin BvPgb 1.2.

Authors:  Simon Christensen; Leonard Groth; Nélida Leiva-Eriksson; Maria Nyblom; Leif Bülow
Journal:  Antioxidants (Basel)       Date:  2022-08-20

7.  Machine Learning Enables Selection of Epistatic Enzyme Mutants for Stability Against Unfolding and Detrimental Aggregation.

Authors:  Guangyue Li; Youcai Qin; Nicolas T Fontaine; Matthieu Ng Fuk Chong; Miguel A Maria-Solano; Ferran Feixas; Xavier F Cadet; Rudy Pandjaitan; Marc Garcia-Borràs; Frederic Cadet; Manfred T Reetz
Journal:  Chembiochem       Date:  2020-11-17       Impact factor: 3.164

Review 8.  Reviewing Challenges of Predicting Protein Melting Temperature Change Upon Mutation Through the Full Analysis of a Highly Detailed Dataset with High-Resolution Structures.

Authors:  Benjamin B V Louis; Luciano A Abriata
Journal:  Mol Biotechnol       Date:  2021-06-08       Impact factor: 2.695

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.