Literature DB >> 28449074

FireProt: web server for automated design of thermostable proteins.

Milos Musil^1,2,3, Jan Stourac^1,3, Jaroslav Bendl^1,2,3, Jan Brezovsky^1,3, Zbynek Prokop^1,3, Jaroslav Zendulka^2,4, Tomas Martinek^1,2,4, David Bednar^1,3, Jiri Damborsky^1,3.

Abstract

There is a continuous interest in increasing proteins stability to enhance their usability in numerous biomedical and biotechnological applications. A number of in silico tools for the prediction of the effect of mutations on protein stability have been developed recently. However, only single-point mutations with a small effect on protein stability are typically predicted with the existing tools and have to be followed by laborious protein expression, purification, and characterization. Here, we present FireProt, a web server for the automated design of multiple-point thermostable mutant proteins that combines structural and evolutionary information in its calculation core. FireProt utilizes sixteen tools and three protein engineering strategies for making reliable protein designs. The server is complemented with interactive, easy-to-use interface that allows users to directly analyze and optionally modify designed thermostable mutants. FireProt is freely available at http://loschmidt.chemi.muni.cz/fireprot.

Entities: CellLine Chemical Disease Gene Species

Mesh：

Substances：

Year: 2017 PMID： 28449074 PMCID： PMC5570187 DOI： 10.1093/nar/gkx285

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

Proteins are widely used in numerous biomedical and biotechnological applications. However, naturally occurring proteins cannot usually withstand the harsh industrial environment, since they are mostly evolved to function at mild conditions (1). Protein engineering has revolutionized the utilization of naturally available proteins for different industrial applications by improving various protein features such as stability, activity or enantioselectivity to surpass their natural limitations. Protein stability is generally strongly correlated with its expression yield (2), half-life (3), serum survival time (4) and performance in the presence of denaturing agents (5). Thus, stability is one of the key determinants of proteins applicability in biotechnological processes. In the ideal case, the saturation mutagenesis would be applied to evaluate every possible mutation on every position of the engineered protein (6). However, such a search space would be enormous and the experimental evaluation can delay the design of truly thermostable protein for months or even years. Therefore, there are demands for effective and precise predictive computation of protein stability. To satisfy this goal a number of in silico tools have been developed recently. Some of these tools such as EASE-MM (7), I-Mutant (8) or mCSM (9) are based on machine learning techniques. Others are using so-called energetic functions. These programs can be further categorized into two groups. The first group utilizes a physical effective energy function for simulating the fundamental forces between atoms and is represented by the programs like Rosetta (10) and Eris (11). The second group is based on statistical potentials for which the energies are derived from frequencies of residues or atom contacts reported in the datasets of experimentally characterized protein mutants, e.g. PopMuSiC (12) and FoldX (13). However, due to the potentially antagonistic effect of mutations, only single-point mutations are usually predicted in silico and have to be followed by laborious and costly protein expression, purification and characterization. Single-point mutations typically enhance the melting temperature of target proteins by units of degree (3,14). A much higher degree of stabilization can be achieved by constructing multiple-point mutants (15). We have recently developed the FireProt (16), combining energy- and evolution-based approaches for reliable design of stable multiple-point mutants. The protocol includes several preceding filters that accelerate the calculation by omitting potentially deleterious mutations. FireProt is currently available only in a stand-alone format and requires extensive experience in bioinformatics to carry out all necessary steps of the work flow. Currently, we are aware of only one server for design of stable multiple-point mutants - PROSS (17), utilizing Rosetta modeling and phylogenetic sequence information in its computation core. Here, we present a web version of FireProt for the automated design of thermostable proteins. FireProt integrates sixteen computational tools and utilizes both sequence and structural information. FireProt web server provides users with thermostable proteins, constructed by three distinct strategies: (i) evolution-based approach, utilizing back-to-consensus analysis; (ii) energy-based approach, evaluating change in free energy upon mutation and (iii) combination of both evolution-based and energy-based approaches. In our view, it is very important to have this integrated approach, since phylogenetic analysis enables identification of the mutations stabilized by entropy, which cannot be predicted by force field calculations (Beerens et al., under review). The server allows users to include preferred mutations into the thermostable protein, to generate corresponding structures and sequences for gene syntheses. Compared to the previously published FireProt protocol (16), minimum effort and no bioinformatics knowledge is required from users to calculate and analyze the results. Furthermore, all input parameters and computational protocols were optimized to minimize otherwise highly time demanding procedure. The server was complemented with a graphical interface allowing users to directly analyze the protein of interest and design multiple-point mutants.

MATERIALS AND METHODS

The basic workflow of FireProt strategy is outlined in Figure 1. In order to design a highly reliable thermostable multiple-point mutant, a protein defined by the user is annotated using several prediction tools and databases (Phase 1). With this knowledge in hand, energy- and evolution-based approach is applied to assemble a list of potentially stabilizing single-point mutations (Phase 2). Finally, three multiple-point mutants are generated in an additive manner, while removing potentially antagonistic effects of mutations (Phase 3).

Figure 1.

Workflow of FireProt strategy.

Phase 1: Annotation of the protein

Initially, the user is requested to specify the protein structure, either by providing its PDB ID or by uploading a user-defined PDB file. The biological assembly of the target protein is then automatically generated by the MakeMultimer tool (http://watcut.uwaterloo.ca/tools/makemultimer/). Sequence homologs are obtained by performing a BLAST search (18) against the UniRef90 database (19), using the target protein sequence as an input query. Identified homologs are then aligned with the query protein using USEARCH (20), while sequences whose identity with the query is below or above the user defined thresholds (default: 30 and 90%) are excluded from the list of homologs. The remaining sequences are clustered using UCLUST (20), with a 90% identity threshold to remove close homologs. The cluster representatives are sorted based on the BLAST query coverage and by default, the first 200 of them are used to create a multiple sequence alignment with Clustal Omega tool (21). The multiple sequence alignment is used to: (i) estimate the conservation coefficient of each residue position in the protein based on the Jensen–Shannon entropy (22); (ii) identify correlated positions employing a consensual decision of the OMES (23), MI (24), aMIc (25), DCA (26), SCA (27), ELSC (28), McBASC (29) and (iii) analyze amino acid frequencies at individual positions within the protein.

Phase 2: Prediction of single-point mutations

In accordance with the original FireProt protocol, potentially stabilizing single-point mutations are identified via two separate branches: one relying on the estimation of the change of free energy upon mutation and second utilizing back-to-consensus approach. The first, energy-based approach is employing FoldX and Rosetta tools that performed best on our testing dataset. Preceding filters accelerate the calculation by omitting potentially deleterious mutations. Prior to the identification of the single-point mutations itself, the target protein structure is amended and minimized. FoldX protocol is utilized to fill in the missing atoms in the residues and patched structure is consequently minimized with Rosetta minimization module. Conserved and correlated positions are immediately excluded from further analysis. It was observed that functional and structural constraints in proteins generally lead to the conservation of amino acid residues (30–33). Similarly, correlated residues ordinarily help to maintain protein function, folding or stability (34–36). Mutations conducted on these positions are therefore considered unsafe by current FireProt strategy, even though there is certainly a space for more sophisticated treatment of correlated positions, which will be further developed in future versions of FireProt server. The remaining positions are subjected to saturation mutagenesis by using FoldX tool. Mutations with predicted ddG over given threshold (default: –1 kcal/mol) are steered away and rest is forwarded to Rosetta calculations. Finally, the mutations predicted by Rosetta as strongly stabilizing (default cut-off: –1 kcal/mol) are tagged as potential candidates for the design of the multiple-point mutants. A high time demands of Rosetta analysis were one of the most excruciating issues with the original FireProt protocol. Even with the application of filters over 100 mutations was usually left for precise, but slow, Rosetta calculations. For this reason, we have evaluated several force fields and Rosetta protocols with the newly assembled dataset containing 1573 mutations from ProTherm database (37) and HotMuSiC dataset (38). Based on the results of the evaluations, the best trade-off between the time requirements and precision was selected. With Rosetta protocol 3, we have achieved more than tenfold increase in calculation speed while preserving high prediction accuracy. Details on dataset construction and protocols evaluation can be found in the Supplement 1 (Supplementary Tables S1–S5). The second approach is based on the information obtained from multiple sequence alignment. The most common amino acid in each position of protein sequence often provides a non-negligible effect on protein stability (39–42). Therefore, FireProt implements majority and frequency ratio approach to identify mutations at positions where the wild-type amino acid differs from the most prevalent one. By default, the single out mutations are located in the positions where the consensus residue is present in at least 50% of all analyzed sequences (majority method) or where consensus residue frequency is 40% and is at least five times more frequent than the wild-type amino acid (frequency ratio method). These thresholds were chosen in accordance to the previously published HotSpot Wizard method (43). Selected mutations are evaluated by FoldX and the stabilizing ones are listed as candidate mutations for the engineering of multiple-point mutant.

Phase 3: Design of thermostable protein

In total, three protein designs are provided by FireProt strategy. The first design includes only the mutations from energy-based approach, the second contains the mutations suggested by the evolution-based approach and the third is the combination of both. Naturally, because of potentially antagonistic effects between individual mutations, we cannot combine individual mutations blindly. To avoid possible clashes, FireProt strategy is trying to minimize antagonistic effects by utilizing Rosetta. In the first step, all pairs of single-point mutations within the range of 10 Å are evaluated separately for energy- and evolution-based approach. Once change in free energy is obtained for all residue pairs, FireProt starts to introduce them into the multiple-point mutant in the order based on their predicted stability, excluding the mutations that are colliding with already included mutations. Algorithm stops once there are no mutations left or the stabilizing effect of analyzed pair drops below defined threshold. Upon the completion of previous step, procedure is repeated this time considering only the pairs between the mutations chosen for the construction of energy- and evolution-based mutants. Finally, structures of all three mutants are modeled using the Rosetta protocol 16.

DESCRIPTION OF THE WEB SERVER

Input

The only required input to the web server is a tertiary structure of the protein of interest, provided either as a PDB ID or a user-defined PDB file. The user can then choose a predefined biological unit generated by the MakeMultimer tool or manually select chains for which the calculation should be performed. The calculations can be configured in either basic or advanced mode. In the basic mode, user is allowed to change the setting of BLAST search and alignment construction. The advanced mode expands the list of modifiable parameters by the ones connected with: (i) the identification of consensus residues by majority and frequency ratio approach, (ii) the thresholds used by FoldX and Rosetta prediction tools and (iii) the decision threshold employed in the consensual analysis of correlated positions. Advanced mode allows expert users to fine-tune the parameters of calculation according to studied systems. However, the presented default values are optimized to provide reliable results for most of the systems and we therefore do not advice their change in the general scenarios.

Output

Upon submission, a unique identifier is assigned to each job to track the calculation and the ‘Results browser’ informs the user about the status of the individual steps in the FireProt workflow (Figure 2B). Once the job is finished, users can either directly download the results in the .zip archive or navigate themselves into the ‘Results page’ for further analysis. The ‘Results page’ is intuitively organized into several panels as described below.

Figure 2.

FireProt's graphical user interface showing the results obtained for the haloalkane dehalogenase DhaA (PDB ID: 4e46). (A) The ‘Mutant overview’ panel provides a list of mutations introduced into protein structure. (B) The ‘Report’ panel shows the status of calculation in the individual steps of the computational pipeline. (C) The ‘Protocol design’ panel provides general information about FireProt designs. (D) The JSmol ΄Viewer΄ allows interactive visualization of the protein. (E) The ‘Mutant designer’ panel enables manual adjustment of a new combined mutant.

Protein visualization

The wild-type and the mutant structure is interactively visualized in the web browser (Figure 2D) utilizing the Jsmol applet (http://wiki.jmol.org/index.php/JSmol). Users can switch between different protein visualization styles and also highlight selected amino acids in the protein structure. Residues that were included into energy-based mutant are colored in orange, evolution-based mutations are in blue and all other residues are in gray. User selected residues that were not part of any mutant are underlined in red.

Mutant overview

The ‘Mutant overview’ panel is organized into four tabs (Figure 2A). The first three tabs provide information about mutations included into combined, energy-based and evolution-based mutant. The checkbox, allowing users to visualize the chosen residues in Jsmol applet, can be found in each row together with all data relevant for a given computational approach. The last tab contains the list of all residues in the wild-type structure. While ‘wild-type’ tab is active, the wild-type structure is visualized in Jsmol applet instead of the mutated one and the user is allowed to introduce user-defined mutations into multiple-point mutant via the ‘plus’ icon in the last column.

General information

The ‘FireProt protocol design’ panel provides users with general information about the target protein and the designs constructed by FireProt strategy, such as a number of mutations and estimated change in free energy (Figure 2C).

Mutant designer

The ‘Mutant designer’ panel allows the user to design own multiple-point mutant by managing mutations divided into energy- and evolution-based subset. If all mutations in the subset have their predicted energy values assigned, a total change in Gibbs free energy is immediately estimated assuming simple additivity. Users can also generate an amino acid sequence from the designed multiple-point mutant that combines mutations included into energy- and evolution-based subsets. All prepared designs can be downloaded in one .zip archive (Figure 2E).

EXPERIMENTAL VALIDATION

The original FireProt strategy was experimentally verified with three proteins (haloalkane dehalogenase DhaA, PDB ID 4E46; γ-hexachlorocyclohexane dehydrochlorinase LinA, PDB ID 3A76; and fibroblast growth factor 2, PDB ID 4OEE) and provided respective stabilization of proteins ΔTm = 25, 21 and 15°C (Table 1). The original protocol was modified to enable fully automated calculation at the reasonable time, while maintaining high prediction accuracy (Supplementary Table S6). Prediction of eight multiple-point mutants using this modified protocol was validated using the data of FRESCO (44) and identified mutations were compared with another online protein stabilization tool PROSS (17). FireProt and PROSS showed similar predictive power, correctly identifying 29 and 20 potentially stabilizing positions, respectively (Supplementary Table S7).

Table 1.

Experimental validation of FireProt strategy

Protein	Energy-based mutations	Evolution-based mutations	ΔT_m [°C]
PDB ID
4E46	8	3	+25
3A76	4	3	+21
4OEE	4	2	+15

CONCLUSIONS AND OUTLOOK

FireProt is a web server that provides users with a one-stop-shop solution for the design of thermostable multiple-point mutant proteins. In comparison with the standalone FireProt strategy (16), all default parameters and computational protocols were optimized to increase the calculation speed, while maintaining the prediction accuracy. The designs produced by the FireProt workflow were experimentally verified and thus users can obtain highly reliable thermostable proteins with minimal experimental effort. The server is complemented by an easy-to-use graphical interface that allows users to interactively analyze individual mutations selected as a part of energy- or evolution-based approach together with the ability to design their own multiple-point mutants on top of our robust strategy. The automation of the whole procedure makes the process of the design of thermostable proteins accessible to users without any prior expertise in bioinformatics since it eliminates the need to select, install and evaluate tools, optimize their parameters, and interpret intermediate results. However, the energy-based approach of the FireProt strategy depends on the quality of provided protein structure and therefore the prediction accuracy might be compromised in the case of low-resolution structures or homology models. In the future, we plan to implement new strategies such as a design based on the analysis of correlated positions that would contribute to the construction of the final combined mutant, elimination of highly flexible regions and introduction of disulfide bridges. Also, we plan to equip FireProt with several new filters, e.g. exclusion of the amino acids located in the close neighborhoods of the active sites or the ones participating in oligomerization. Click here for additional data file.

42 in total

1. Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations.

Authors: Raphael Guerois; Jens Erik Nielsen; Luis Serrano
Journal: J Mol Biol Date: 2002-07-05 Impact factor: 5.469

Review 2. Stability of biocatalysts.

Authors: Karen M Polizzi; Andreas S Bommarius; James M Broering; Javier F Chaparro-Riggers
Journal: Curr Opin Chem Biol Date: 2007-02-20 Impact factor: 8.822

Review 3. Structure- and sequence-analysis inspired engineering of proteins for enhanced thermostability.

Authors: Hein J Wijma; Robert J Floor; Dick B Janssen
Journal: Curr Opin Struct Biol Date: 2013-05-15 Impact factor: 6.809

4. Computational library design for increasing haloalkane dehalogenase stability.

Authors: Robert J Floor; Hein J Wijma; Dana I Colpa; Aline Ramos-Silva; Peter A Jekel; Wiktor Szymański; Ben L Feringa; Siewert J Marrink; Dick B Janssen
Journal: Chembiochem Date: 2014-06-27 Impact factor: 3.164

5. Thermostable variants of cocaine esterase for long-time protection against cocaine toxicity.

Authors: Daquan Gao; Diwahar L Narasimhan; Joanne Macdonald; Remy Brim; Mei-Chuan Ko; Donald W Landry; James H Woods; Roger K Sunahara; Chang-Guo Zhan
Journal: Mol Pharmacol Date: 2008-11-05 Impact factor: 4.436

Review 6. Evolutionary conservation of the active site of soluble inorganic pyrophosphatase.

Authors: B S Cooperman; A A Baykov; R Lahti
Journal: Trends Biochem Sci Date: 1992-07 Impact factor: 13.807

7. Role of conformational sampling in computing mutation-induced changes in protein structure and stability.

Authors: Elizabeth H Kellogg; Andrew Leaver-Fay; David Baker
Journal: Proteins Date: 2010-12-03

8. Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega.

Authors: Fabian Sievers; Andreas Wilm; David Dineen; Toby J Gibson; Kevin Karplus; Weizhong Li; Rodrigo Lopez; Hamish McWilliam; Michael Remmert; Johannes Söding; Julie D Thompson; Desmond G Higgins
Journal: Mol Syst Biol Date: 2011-10-11 Impact factor: 11.429

9. Multiple sequence alignments as tools for protein structure and function prediction.

Authors: Alfonso Valencia
Journal: Comp Funct Genomics Date: 2003

10. Automated Structure- and Sequence-Based Design of Proteins for High Bacterial Expression and Stability.

Authors: Adi Goldenzweig; Moshe Goldsmith; Shannon E Hill; Or Gertman; Paola Laurino; Yacov Ashani; Orly Dym; Tamar Unger; Shira Albeck; Jaime Prilusky; Raquel L Lieberman; Amir Aharoni; Israel Silman; Joel L Sussman; Dan S Tawfik; Sarel J Fleishman
Journal: Mol Cell Date: 2016-07-14 Impact factor: 17.970

22 in total

1. FireProtASR: A Web Server for Fully Automated Ancestral Sequence Reconstruction.

Authors: Milos Musil; Rayyan Tariq Khan; Andy Beier; Jan Stourac; Hannes Konegger; Jiri Damborsky; David Bednar
Journal: Brief Bioinform Date: 2021-07-20 Impact factor: 11.622

2. Rational-Design Engineering to Improve Enzyme Thermostability.

Authors: Vinutsada Pongsupasa; Piyanuch Anuwan; Somchart Maenpuen; Thanyaporn Wongnate
Journal: Methods Mol Biol Date: 2022

Review 3. Thermostability engineering of industrial enzymes through structure modification.

Authors: Nima Ghahremani Nezhad; Raja Noor Zaliha Raja Abd Rahman; Yahaya M Normi; Siti Nurbaya Oslan; Fairolniza Mohd Shariff; Thean Chor Leow
Journal: Appl Microbiol Biotechnol Date: 2022-07-09 Impact factor: 5.560

4. Stabilizing proteins, simplified: A Rosetta-based webtool for predicting favorable mutations.

Authors: David F Thieker; Jack B Maguire; Stephan T Kudlacek; Andrew Leaver-Fay; Sergey Lyskov; Brian Kuhlman
Journal: Protein Sci Date: 2022-10 Impact factor: 6.993

5. Structural Analysis and Construction of a Thermostable Antifungal Chitinase.

Authors: Dan Kozome; Keiko Uechi; Toki Taira; Harumi Fukada; Tomomi Kubota; Kazuhiko Ishikawa
Journal: Appl Environ Microbiol Date: 2022-06-02 Impact factor: 5.005

6. A PROSS-designed extensively mutated estrogen receptor α variant displays enhanced thermal stability while retaining native allosteric regulation and structure.

Authors: Mark Kriegel; Hanna J Wiederanders; Sewar Alkhashrom; Jutta Eichler; Yves A Muller
Journal: Sci Rep Date: 2021-05-18 Impact factor: 4.379

7. Role of simple descriptors and applicability domain in predicting change in protein thermostability.

Authors: Kenneth N McGuinness; Weilan Pan; Robert P Sheridan; Grant Murphy; Alejandro Crespo
Journal: PLoS One Date: 2018-09-07 Impact factor: 3.240

8. Decoding the intricate network of molecular interactions of a hyperstable engineered biocatalyst.

Authors: Klara Markova; Klaudia Chmelova; Sérgio M Marques; Philippe Carpentier; David Bednar; Jiri Damborsky; Martin Marek
Journal: Chem Sci Date: 2020-09-11 Impact factor: 9.825

9. Seq12, Seq12m, and Seq13m, peptide analogues of the spike glycoprotein shows antiviral properties against SARS-CoV-2: An in silico study through molecular docking, molecular dynamics simulation, and MM-PB/GBSA calculations.

Authors: Kunal Dutta; Ammar D Elmezayen; Anas Al-Obaidi; Wei Zhu; Olga V Morozova; Sergey Shityakov; Ibrahim Khalifa
Journal: J Mol Struct Date: 2021-07-16 Impact factor: 3.196

Review 10. FoldX as Protein Engineering Tool: Better Than Random Based Approaches?

Authors: Oliver Buß; Jens Rudat; Katrin Ochsenreither
Journal: Comput Struct Biotechnol J Date: 2018-02-03 Impact factor: 7.271