Literature DB >> 29718313

UNRES server for physics-based coarse-grained simulations and prediction of protein structure, dynamics and thermodynamics.

Cezary Czaplewski¹, Agnieszka Karczynska¹, Adam K Sieradzan¹, Adam Liwo¹.

Abstract

A server implementation of the UNRES package (http://www.unres.pl) for coarse-grained simulations of protein structures with the physics-based UNRES model, coined a name UNRES server, is presented. In contrast to most of the protein coarse-grained models, owing to its physics-based origin, the UNRES force field can be used in simulations, including those aimed at protein-structure prediction, without ancillary information from structural databases; however, the implementation includes the possibility of using restraints. Local energy minimization, canonical molecular dynamics simulations, replica exchange and multiplexed replica exchange molecular dynamics simulations can be run with the current UNRES server; the latter are suitable for protein-structure prediction. The user-supplied input includes protein sequence and, optionally, restraints from secondary-structure prediction or small x-ray scattering data, and simulation type and parameters which are selected or typed in. Oligomeric proteins, as well as those containing D-amino-acid residues and disulfide links can be treated. The output is displayed graphically (minimized structures, trajectories, final models, analysis of trajectory/ensembles); however, all output files can be downloaded by the user. The UNRES server can be freely accessed at http://unres-server.chem.ug.edu.pl.

Entities: Chemical Disease Gene

Mesh：

Substances：
Proteins

Year: 2018 PMID： 29718313 PMCID： PMC6031057 DOI： 10.1093/nar/gky328

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

The biological functions of proteins depend on their structure and dynamics, therefore, research on these subjects is central in molecular biology and medicinal chemistry, including drug design. However, experimental techniques (x-ray crystallography, nuclear magnetic resonance (NMR) spectroscopy and, since very recently, cryoelectron microscopy), provide structures of only a small fraction of protein sequences discovered yearly (1) and only fragmentary information is available by experimental techniques regarding protein dynamics. Computer modeling, aided by experimental and database information is, therefore, routinely used for the prediction of unknown protein structures and simulation of protein dynamics. For protein-structure prediction comparative modeling has proved to be the most successful method nowadays (1); however, it fails when a protein represents a completely new fold. Protein dynamics is usually simulated by using all-atom molecular dynamics (MD). However, despite the continuous development of computer technology, including the construction of machines dedicated to run MD simulations (2), the discrepancy between the simulation and biological time scale restricts the applicability of this technique to solve concrete biological problems. Coarse-grained models of proteins, in which an amino-acid residue is represented by a few extended atoms and, consequently, the simulation time- and size-scale is extended by orders of magnitude are, therefore, a subject of intense development (3). Generally, these models can be divided into those that use the knowledge- (derived from database statistics) and physics-based (derived by translating the all-atom energy function to that corresponding to a reduced model) potentials. The coarse-grained UNRES model developed in our laboratory (4,5) is a highly-reduced physics-based model of proteins, in which only two interaction sites per residue (united side chains and united peptide groups) are kept. The α-carbon (Cα) atoms are present in the model to assist in the definition of chain geometry. The effective energy function has been defined as a potential of mean force of polypeptide chains in water, which has been subsequently expanded into Kubo cluster cumulant functions, identified with the respective energy terms (6). MD (7,8) and its replica-exchange (REMD) (9) and multiplexed replica exchange (MREMD) (10) extensions have been implemented in UNRES (11). In contrast to most of the protein coarse-grained models, owing to its physics-based origin, the UNRES force field can be used in simulations, including those aimed at protein-structure prediction, without ancillary information from structural databases (12,13); however, the implementation includes the possibility of using restraints (14). UNRES has been used with success in protein-structure prediction (12–14), studying protein-folding kinetics and free-energy landscapes as well as to solve biological problems (4,5,15).

MATERIALS AND METHODS

Data processing

Oligomeric proteins (16), proteins that include D-amino-acid residues (17) and those with disulfide links (18) can be treated with UNRES. For general information on UNRES the reader is referred to the pertinent book chapter (4) and review article (5), while the theory behind the model is described in detail in our recent work (6). Although the UNRES package is available for download (at http://www.unres.pl) since several years ago, a user needs to install it on its system and run in batch mode. This feature leaves out a large number of potential users who prefer to submit jobs using a web-based interface. Moreover, a parallel compute server is required for most of the functions of the package to run. Therefore, we have recently created a server based on the UNRES package, coined a name ‘UNRES server’, which enables a user to submit jobs using a web-based interface. The present article is devoted to the description of this server. A scheme of the UNRES server, outlining its function and data pipelines are shown in Figure 1. As shown, three types of calculations are available: (i) single local energy minimization (the MIN button), (ii) single-trajectory canonical MD (the MD button) and (iii) multiplexed replica-exchange molecular dynamics (the MREMD button). The mode of calculations is selected by the user. For the MREMD type of calculation, a production simulation is followed by running weighted histogram analysis method (19) to enable us to compute the probabilities of conformations, thermodynamic quantities and ensemble averages at any temperature (20), and cluster analysis of the ensemble at the desired temperature (usually 10–20 K below that of the heat-capacity peak) to construct the final models. The number of clusters is fixed to five and the clustering is carried out by using Ward’s minimum-variance method (21). The final models are selected as the conformations closest to the average structures from the respective clusters (20). This procedure copies our physics-based protein-structure prediction procedure that has been tested in the CASP exercises (13). The final models are converted into all-atom representation by using the PULCHRA (22) and SCWRL (23) methods and subjected to a short energy minimization with the AMBER force field with the Generalized-Born method to treat hydration (24).

Figure 1.

UNRES server operation flow.

UNRES server operation flow. Each calculation can be run in the basic or advanced mode; in the basic mode default parameters which were proved to work in most situations are applied, while in the advanced mode the user can select the parameters of the calculations. In particular, one of the two variants of UNRES force field can be selected, a ‘canonical’ one obtained by extensive search of parameter space (25) and subsequently supplemented with additional torsional potentials accounting for the coupling between backbone-local and sidechain-local conformational states (26), which improves the quality of local structure, hereafter referred to as FF2, and the UNRES force field recently calibrated with seven proteins of different structural classes, by using the maximum-likelihood approach developed in our laboratory, hereafter referred to as OPT-WTFSA-2 (27). Both variants of UNRES have been validated in our earlier work (12,13,25–27).

Input data

Amino-acid sequence is required for all types of simulations; it can be input by the user as one-letter code or read from a PDB file (only the first model is taken), which can be uploaded by the user or downloaded from the PDB database given the PDB code. The reference/initial structure, if the user chooses so, is also read from a PDB file. In the current implementation, no missing residues in a structure read from a PDB file are allowed, an error message is displayed if this occurs. For oligomeric proteins, different chains are automatically recognized while reading a PDB file; it is also possible to request calculations for a selected chain only. For example, by supplying the string 5G3Q the user specifies that a calculation should be run for all chains present in the PDB file, while 5G3Q:B means that chain B has been selected. User-supplied sequence of an oligomeric protein must contain the ’XX’ chain separator; for example the string AAGGAAXXAAGGAA means that the sequence is a dimer consisting of two AAGGAA chains. D-residues are recognized in PDB files; in the user-supplied sequence they are marked by lowercase letters. For example, the sequence AAaAA means that the chain contains D-alanine in the third position. The positions of the disulfide links are read from the SSBOND records of the PDB file. For the energy-minimization type, the sequence is always read from a PDB file, along with the starting structure. Energy minimization is carried out with the quasi-Newton Secant Unconstrained Minimization Solver (SUMSL) algorithm (28). In the advanced mode, the user can select the number of minimization steps and the number of maximum function evaluations, and decide whether to use an initial Monte Carlo search of the local geometry of side-chain centroids to remove overlaps. For the MD type of calculations, a single canonical MD trajectory can be run. The basic mode includes selection of the starting structure (extended chain, randomly generated structure, or the structure read from a PDB file), number of MD steps, temperature and the seed to initialize the random-number generator. Initial energy minimization is turned on if the starting structure is read from a PDB file or randomly generated. Calculations are run with the Langevin thermostat, by using an adaptive multiple-time-step (A-MTS) quasi-symplectic algorithm developed in our laboratory (29), which is similar to the RESPA algorithm (30). In the advanced mode, the thermostat (Langevin or Berendsen) can be selected and secondary-structure restraints can be input [in the PSIPRED (31) format]. The ‘molecular time unit’ (mtu) used in UNRES MD amounts to 48.9 fs (7). However, it should be noted that, because of averaging over the secondary degrees of freedom, the time scale of UNRES MD is extended by 1000–10 000 times compared to the all-atom time scale (8). Simulation parameters can be changed in the advanced mode. The MREMD type of calculations implies running multiple trajectories in a parallel run. In the basic input mode, an REMD calculation (no multiplexing) is carried out with eight replicas run at temperatures from 270 K to 345 K. In the advanced mode, the user can choose the number of replica temperatures and the values of the temperatures, as well as multiplexing of each replica and replica-exchange frequency. In addition to this, the distance distribution from small x-ray scattering (SAXS) measurements can be input to run SAXS-restrained simulations (32). MD parameters can be selected in the advanced mode, as for MD-type runs. At last, the user can also select the temperature at which to cluster the final conformational ensemble.

Output

For each simulation type, the output is displayed graphically and can be accessed up to 2 weeks from the time of job completion provided that the user has saved the web address of the job. Moreover, the UNRES input file(s) and all output files can also be downloaded from the server; see http://www.unres.pl/docs for the description of input/output files. For the respective types of calculations, the output is the following: Minimization: UNRES representation of minimized structure, minimized structure superposed on the starting (reference) structure, Cα-root-mean square deviation (Cα-rmsd) from the starting structure, percentage of native contacts. Canonical MD: Temperature distribution, plots of potential energy, radius of gyration, Cα-rmsd, fraction of native contacts, fluctuations versus time, movie of the trajectory. MREMD: Plots of heat capacity and ensemble-average Cα-rmsd versus temperature, plots of walk in temperature space, Cα-rmsd from the reference structure (if present) versus energy, Cα-rmsd versus time, representative conformations of the five families obtained after clustering (cf. section Data processing). These structures, in all-atom form, obtained after the conversion of the UNRES structures, can be downloaded by clicking on the respective button. For each model, its probability (fraction in the ensemble), average cluster Cα-rmsd, as well the Cα-rmsd, TMscore and GDT_TS of the respective model are displayed.

RESULTS AND DISCUSSION

Examples included in server tutorial

The tutorial includes examples of running server jobs for all calculation types in the basic and in the advanced mode. Each calculation corresponding to a basic-mode example can be run within minutes, while those corresponding to advanced mode require up to a couple of hours. By pressing the ‘Load example data’ button all input data are loaded and parameters are set. The results of pre-run computations are displayed by selecting the ‘Tutorial’ item from the top navigation bar. The following basic-mode examples include (i) minimization of the energy of the experimental structure of the N-terminal portion of the B-domain of staphylococcal protein A (PDB code: 1BDD); (ii) canonical MD simulation of the IGG binding domain of streptococcal protein G (PDB code: 1IGD), starting from the experimental structure; a fluctuation plot in residue index is displayed and compared with that of the B-factor; (iii) REMD simulation of the tryptophan cage mini-protein (PDB code: 1L2Y), starting from the extended chain; this is a full-blown structure-prediction run. Advanced-mode examples include: (i) minimization of the P8MTCP1 disulfide-bonded α-helical hairpin miniprotein (PDB code: 1EI0); (ii) canonical MD run of the tryptophan-cage mini-protein (PDB code: 1L2Y) starting from the extended structure; (iii) MREMD run of CASP12 target T0882 also discussed in section ‘Selected test cases’; (iv) REMD run of the Bacteriocin CbnXY miniprotein (PDB code: 5UJG) starting from extended chain with simulated SAXS data in the form of the Gaussian-smoothed distance distribution calculated from the experimental structure showing that very good agreement between the distance distribution calculated from the models and the input distribution is achieved; (v) the central portion of Factor H, modules 10–15 starting from the NMR structure (PDB code: 2KMS structure) with including the experimental distance distribution from SAXS [the SASDA25 entry of the SASBDB database (33)]. Examples (iii) and (v) are discussed in more detail in section ‘Selected test cases’.

Selected test cases

CASP12 target T0882

The calculation has been run with secondary-structure restraints from PSIPRED (34) obtained at the CASP12 time when the experimental 5G3Q structure was not included in the PDB database. The FF2 variant of the UNRES force field (25,26) was used. Figure 2 displays the overlap of UNRES server model 1 and the experimental 5G3Q:B structure. The Cα-RMSD of that model from the experimental structure is 3.6 Å and the GDT_TS is 58.8%.

Figure 2.

Overlap of the experimental 5G3Q:B structure (gray) of CASP12 target T0882 with UNRES server Model 1, rainbow-colored from the N-terminus (blue) to the C-terminus (red).

Example of including experimental SAXS restraints in simulations

This example pertains to the central portion of Factor H, which has been solved by NMR (35) (PDB code: 2KMS). This is a relatively small two-domain β-sheet protein. The distance distribution calculated from the experimental structure does not fully overlap with that from SAXS; in particular, the plot calculated from the 2KMS structure decays to 0 quicker than that from SAXS. By running REMD with UNRES and with the SAXS distance-distribution and secondary-structure restraints, much better agreement, in particular in the long-distance part is obtained (Figure 3). The reason for this better agreement is that, in the UNRES-calculated structure, the domains are at a larger angle than those in the experimental structure (Figure 3).

Figure 3.

Comparison of the distance distribution from SAXS measurements of the central portion of Factor H, modules 10–15 with those calculated from the experimental 2KMS structure and from the five UNRES server models. The 2KMS experimental structure (top, gray) and UNRES model 1 [bottom, rainbow-colored from the the N- (blue) to the C-terminus (red)] are superposed on the graph.

SERVER ARCHITECTURE

The UNRES server is equipped with the web interface written using Django Python Web framework. At the time of submission the server checks the correctness of data provided by the user and reports possible errors. For accepted jobs, the server prepares UNRES input files and submits jobs to the queue on our local cluster. A total of 42 nodes of the cluster are assigned to run server jobs. Upon pressing the ‘Refresh’ button or every 30 s, the server checks if the job is finished and reports the percentage of accomplishment. At last, the results are post-processed (which includes making a movie of an MD trajectory, constructing the plots, etc.) and displayed graphically. Third-party software employed in the server are (i) pymol (36), (ii) convvpdb.pl from the MMTSB Tool Set (37), (iii) AMBER tools (24), (iv) PULCHRA (22), (v) SCWRL (23), NGL Viewer (38), which is employed as an interactive molecular viewer on the web page.

DATA AVAILABILITY

The UNRES server is available at http://unres-server.chem.ug.edu.pl. This website is free and open to all users and there is no login requirement. However, there is optional registration; registered users can access old jobs from their homepage in the server website without having to copy the link corresponding to a particular job. The source code of the server is available from group GIT repository at mmka.chem.univ.gda.pl/repo/django_unres.

29 in total

1. Protein secondary structure prediction based on position-specific scoring matrices.

Authors: D T Jones
Journal: J Mol Biol Date: 1999-09-17 Impact factor: 5.469

2. The PSIPRED protein structure prediction server.

Authors: L J McGuffin; K Bryson; D T Jones
Journal: Bioinformatics Date: 2000-04 Impact factor: 6.937

3. Multiplexed-replica exchange molecular dynamics method for protein folding simulation.

Authors: Young Min Rhee; Vijay S Pande
Journal: Biophys J Date: 2003-02 Impact factor: 4.033

4. SCWRL and MolIDE: computer programs for side-chain conformation prediction and homology modeling.

Authors: Qiang Wang; Adrian A Canutescu; Roland L Dunbrack
Journal: Nat Protoc Date: 2008 Impact factor: 13.491

5. Prediction of protein structure with the coarse-grained UNRES force field assisted by small X-ray scattering data and knowledge-based information.

Authors: Agnieszka S Karczyńska; Magdalena A Mozolewska; Paweł Krupa; Artur Giełdoń; Adam Liwo; Cezary Czaplewski
Journal: Proteins Date: 2017-11-29

6. Application of Multiplexed Replica Exchange Molecular Dynamics to the UNRES Force Field: Tests with alpha and alpha+beta Proteins.

Authors: Cezary Czaplewski; Sebastian Kalinowski; Adam Liwo; Harold A Scheraga
Journal: J Chem Theory Comput Date: 2009-03-10 Impact factor: 6.006

7. Dynamic Formation and Breaking of Disulfide Bonds in Molecular Dynamics Simulations with the UNRES Force Field.

Authors: M Chinchio; C Czaplewski; A Liwo; S Ołdziej; H A Scheraga
Journal: J Chem Theory Comput Date: 2007-07 Impact factor: 6.006

8. Exploring the parameter space of the coarse-grained UNRES force field by random search: selecting a transferable medium-resolution force field.

Authors: Yi He; Yi Xiao; Adam Liwo; Harold A Scheraga
Journal: J Comput Chem Date: 2009-10 Impact factor: 3.376

9. A unified coarse-grained model of biological macromolecules based on mean-field multipole-multipole interactions.

Authors: Adam Liwo; Maciej Baranowski; Cezary Czaplewski; Ewa Gołaś; Yi He; Dawid Jagieła; Paweł Krupa; Maciej Maciejczyk; Mariusz Makowski; Magdalena A Mozolewska; Andrei Niadzvedtski; Stanisław Ołdziej; Harold A Scheraga; Adam K Sieradzan; Rafał Slusarz; Tomasz Wirecki; Yanping Yin; Bartłomiej Zaborowski
Journal: J Mol Model Date: 2014-07-15 Impact factor: 1.810

10. The central portion of factor H (modules 10-15) is compact and contains a structurally deviant CCP module.

Authors: Christoph Q Schmidt; Andrew P Herbert; Haydyn D T Mertens; Mara Guariento; Dinesh C Soares; Dusan Uhrin; Arthur J Rowe; Dmitri I Svergun; Paul N Barlow
Journal: J Mol Biol Date: 2009-10-14 Impact factor: 5.469

14 in total

1. Probing Protein Aggregation Using the Coarse-Grained UNRES Force Field.

Authors: Ana V Rojas; Gia G Maisuradze; Harold A Scheraga; Adam Liwo
Journal: Methods Mol Biol Date: 2022

2. Modeling the Structure, Dynamics, and Transformations of Proteins with the UNRES Force Field.

Authors: Adam K Sieradzan; Cezary Czaplewski; Paweł Krupa; Magdalena A Mozolewska; Agnieszka S Karczyńska; Agnieszka G Lipska; Emilia A Lubecka; Ewa Gołaś; Tomasz Wirecki; Mariusz Makowski; Stanisław Ołdziej; Adam Liwo
Journal: Methods Mol Biol Date: 2022

Review 3. Modeling of Protein Structural Flexibility and Large-Scale Dynamics: Coarse-Grained Simulations and Elastic Network Models.

Authors: Sebastian Kmiecik; Maksim Kouza; Aleksandra E Badaczewska-Dawid; Andrzej Kloczkowski; Andrzej Kolinski
Journal: Int J Mol Sci Date: 2018-11-06 Impact factor: 5.923

4. MERMAID: dedicated web server to prepare and run coarse-grained membrane protein dynamics.

Authors: Mangesh Damre; Alessandro Marchetto; Alejandro Giorgetti
Journal: Nucleic Acids Res Date: 2019-07-02 Impact factor: 16.971

5. C-terminal eYFP fusion impairs Escherichia coli MinE function.

Authors: Navaneethan Palanisamy; Mehmet Ali Öztürk; Emir Bora Akmeriç; Barbara Di Ventura
Journal: Open Biol Date: 2020-05-27 Impact factor: 6.411

6. Model of Early Stage Intermediate in Respect to Its Final Structure.

Authors: Piotr Fabian; Katarzyna Stapor; Irena Roterman
Journal: Biomolecules Date: 2019-12-12

7. Structural Characterization of Covalently Stabilized Human Cystatin C Oligomers.

Authors: Magdalena Chrabąszczewska; Adam K Sieradzan; Sylwia Rodziewicz-Motowidło; Anders Grubb; Christopher M Dobson; Janet R Kumita; Maciej Kozak
Journal: Int J Mol Sci Date: 2020-08-15 Impact factor: 5.923

Review 8. Modeling of Disordered Protein Structures Using Monte Carlo Simulations and Knowledge-Based Statistical Force Fields.

Authors: Maciej Pawel Ciemny; Aleksandra Elzbieta Badaczewska-Dawid; Monika Pikuzinska; Andrzej Kolinski; Sebastian Kmiecik
Journal: Int J Mol Sci Date: 2019-01-31 Impact factor: 5.923

Review 9. Computational reconstruction of atomistic protein structures from coarse-grained models.

Authors: Aleksandra E Badaczewska-Dawid; Andrzej Kolinski; Sebastian Kmiecik
Journal: Comput Struct Biotechnol J Date: 2019-12-26 Impact factor: 7.271

10. Role of Glycosaminoglycans in Procathepsin B Maturation: Molecular Mechanism Elucidated by a Computational Study.

Authors: Krzysztof K Bojarski; Agnieszka S Karczyńska; Sergey A Samsonov
Journal: J Chem Inf Model Date: 2020-04-01 Impact factor: 4.956