Literature DB >> 22649060

GalaxyWEB server for protein structure prediction and refinement.

Junsu Ko1, Hahnbeom Park, Lim Heo, Chaok Seok.   

Abstract

Three-dimensional protein structures provide invaluable information for understanding and regulating biological functions of proteins. The GalaxyWEB server predicts protein structure from sequence by template-based modeling and refines loop or terminus regions by ab initio modeling. This web server is based on the method tested in CASP9 (9th Critical Assessment of techniques for protein Structure Prediction) as 'Seok-server', which was assessed to be among top performing template-based modeling servers. The method generates reliable core structures from multiple templates and re-builds unreliable loops or termini by using an optimization-based refinement method. In addition to structure prediction, a user can also submit a refinement only job by providing a starting model structure and locations of loops or termini to refine. The web server can be freely accessed at http://galaxy.seoklab.org/.

Entities:  

Mesh:

Year:  2012        PMID: 22649060      PMCID: PMC3394311          DOI: 10.1093/nar/gks493

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Three-dimensional protein structures provide essential information for atomic-level understanding of molecular functions designed by the nature and also for human design of new ligands regulating the protein functions. Computational methods for protein structure prediction have become complementary to experimental methods when close homologs of known experimental structures are available. With the ever-increasing sizes of both sequence and structure databases, the role of the structure prediction methods based on known structures of homologs (called template-based modeling, homology modeling or comparative modeling) is also increasing (1,2). Traditionally, large emphasis has been placed on homolog detection and sequence alignment as essential elements of template-based modeling. More recently, obtaining model structures beyond the best available templates or improving models starting from the best available model structures have been discussed to be necessary for further advancement in the field (3–5). However, such improvement has proven to be very difficult, e.g. as demonstrated in the refinement category of recent CASP experiments. In the most recent CASP (CASP9), only three groups including us could achieve improvement in backbone structure quality, and the best improvement was only 0.37% (our own result) (5). In this article, we introduce a new web server that provides two functions: protein structure prediction from sequence and refinement from user-provided model. The method is based on the ‘Seok-server’ tested in CASP9 and evaluated to be among top six servers (6). A lighter version of the original method with comparable performance is employed to provide more efficient service. In detail, lighter sampling is carried out both in the model-building and the refinement steps to reduce computation time. The template-based modeling method extensively uses multiple template information to construct reliable core regions and then refines up to three loops or termini detected to be unreliable. Two existing methods, HHsearch (7) and PROMALS3D (8), are used for template selection and sequence alignment, respectively. They are applied in such a way that reliable core structures are built by selecting templates of similar core structures and aligning core sequences. The remaining less conserved, unreliable regions are treated in the subsequent refinement stage. Better prediction of less conserved regions by an ab initio refinement method like the one introduced here would be invaluable for further functional or design studies because they often contribute to the specific functions of related proteins (9–11).

GALAXYWEB METHOD

A flowchart of the GalaxyWEB structure prediction (GalaxyTBM) and refinement (GalaxyREFINE) procedure is shown in Figure 1. First, candidates for templates are selected by rescoring HHsearch (7) results placing more weights on the secondary structure score for more difficult targets. The re-ranking score is a weighted sum of the Z-score of the HHsearch sequence score, Zseq, and that of the HHsearch secondary structure score, Zss, where the weight w depends on the target difficulty estimated by the probability for the HHsearch top ranker, P, as
Figure 1.

Flowchart of the GalaxyWEB protein structure prediction pipeline which consists of protein structure prediction by GalaxyTBM and refinement by GalaxyREFINE.

Flowchart of the GalaxyWEB protein structure prediction pipeline which consists of protein structure prediction by GalaxyTBM and refinement by GalaxyREFINE. Among the re-ranked top 20 homologs, multiple templates are selected by removing structural outliers based on mutual TM scores (12) for the aligned core regions. Average number of selected templates is 4.55 for the 68 single-domain CASP9 targets used as a test set. Multiple sequence alignment using PROMALS3D (8) is then performed for core regions deleting unaligned termini. Terminus sequence alignments are attached afterwards. Initial model structures are then built from the templates and the sequence alignment by a CSA (conformational space annealing) global optimization (13) of the restraints derived from templates by an in-house method (L. Heo, H. Park and C. Seok, unpublished data). The restraints are sum of approximately single-well potentials, similar to that developed by Thompson et al. (14). The range of restraint application between Cα pairs (up to 15 Å) is wider than Thompson et al. and similar to that in MODELLER (15). (In CASP9, more complex MODELLER restraints requiring more extensive sampling were used.) Unreliable local regions (ULRs) are then detected (16) from the initial model and a maximum of three ULRs are reconstructed ‘simultaneously’ by a CSA optimization of hybrid energy that consists of physics-based terms and knowledge-based terms (16,17). (In CASP9, ‘all’ ULRs were re-modeled individually, requiring more computation time than running a single optimization job.) During CSA optimization, the triaxial loop closure algorithm (18) is extensively used to generate geometrically proper backbone structures for loops (19). More details on the method and the effects of the strategy taken at each stage on the overall performance will be presented in a separate article (submitted). The modifications from the original Seok-server was made to provide the web service more efficiently, as the original method requires 2–3 times more computation power.

Performance of the method

Since the current web server employs a method lighter than the original Seok-server method tested in CASP9 both in the initial model building and refinement stages, the performance of the method was tested again on the 68 single-domain targets of CASP9. The backbone structure quality measured by average GDT-TS (20,21) is 68.5 by Seok-server and 67.6 by GalaxyWEB. The decreased performance of GalaxyWEB compared to the original Seok-server comes from the lighter optimization during model building and refinement. However, the result is still comparable to those of the top six server methods in CASP9. Initial model structures are improved in 65% of the cases in which refinement was performed when the local structure quality is measured by RMSD. The performance of the refinement method is more fully discussed in another article (17).

GALAXYWEB SERVER

Hardware and software

The GalaxyWEB server runs on a cluster of four Linux servers of 2.33 GHz Intel Xeon processors that consist of eight cores. The web application uses Python and the MySQL database. The structure prediction and refinement pipeline is implemented using Python by combining the two programs developed by other groups, HHsearch (7) and PROMALS3D (8), and our own program package for molecular modeling named GALAXY (16,17,19), which is written in Fortran 90. The JMol (http://www.jmol.org) is used for visualization of predicted structures.

Input and output

For structure prediction, a protein sequence must be provided in the FASTA format. For refinement only run, a user is required to provide a model structure to refine in the PDB format and to specify the residue number range for each region to refine. Expected run time for a structure prediction job is 7 h for a 500-residue protein and that for a refinement job is 2 h for a 26-residue loop or terminus. Five best models can be viewed and downloaded on the website, as shown in Figure 2. Full sets of models generated by the server can also be downloaded as a tar file.
Figure 2.

GalaxyWEB output page (A). Five top-ranking models are shown in static images (B). They can also be viewed using the Jmol structure viewer. The residue ranges of the refined ULRs are summarized in the table (C) and also indicated in the secondary structure figure (D) in which secondary structure of the first model is compared with the prediction obtained from sequence using PSIPRED.

GalaxyWEB output page (A). Five top-ranking models are shown in static images (B). They can also be viewed using the Jmol structure viewer. The residue ranges of the refined ULRs are summarized in the table (C) and also indicated in the secondary structure figure (D) in which secondary structure of the first model is compared with the prediction obtained from sequence using PSIPRED.

CONCLUSIONS

GalaxyWEB is a web server for protein structure prediction and refinement. A distinct feature of the server from other protein structure servers is that unreliable regions for which template information is not available or inconsistent are detected and refined by an ab initio method. Model structures obtained by other methods may also be refined by specifying the regions to refine. The ab initio loop and terminus modeling method is one of few refinement methods that can actually improve on the starting models, as demonstrated in CASP9.

FUNDING

National Research Foundation of Korea funded by the Ministry of Education, Science and Technology [2011-0012456]; Center for Marine Natural Products and Drug Discovery (CMDD), one of the MarineBio21 programs funded by the Ministry of Land, Transport and Maritime Affairs of Korea. Funding for open access charge: Seoul National University. Conflict of interest statement. None declared.
  20 in total

1.  A kinematic view of loop closure.

Authors:  Evangelos A Coutsias; Chaok Seok; Matthew P Jacobson; Ken A Dill
Journal:  J Comput Chem       Date:  2004-03       Impact factor: 3.376

2.  Assessment of template based protein structure predictions in CASP9.

Authors:  Valerio Mariani; Florian Kiefer; Tobias Schmidt; Juergen Haas; Torsten Schwede
Journal:  Proteins       Date:  2011-10-15

3.  Protein loop modeling by using fragment assembly and analytical loop closure.

Authors:  Julian Lee; Dongseon Lee; Hahnbeom Park; Evangelos A Coutsias; Chaok Seok
Journal:  Proteins       Date:  2010-09-24

4.  Assessment of CASP7 predictions for template-based modeling targets.

Authors:  Jürgen Kopp; Lorenza Bordoli; James N D Battey; Florian Kiefer; Torsten Schwede
Journal:  Proteins       Date:  2007

5.  All-atom chain-building by optimizing MODELLER energy function using conformational space annealing.

Authors:  Keehyoung Joo; Jinwoo Lee; Joo-Hyun Seo; Kyoungrim Lee; Byung-Gee Kim; Jooyoung Lee
Journal:  Proteins       Date:  2009-06

6.  The other 90% of the protein: assessment beyond the Calphas for CASP8 template-based and high-accuracy models.

Authors:  Daniel A Keedy; Christopher J Williams; Jeffrey J Headd; W Bryan Arendall; Vincent B Chen; Gary J Kapral; Robert A Gillespie; Jeremy N Block; Adam Zemla; David C Richardson; Jane S Richardson
Journal:  Proteins       Date:  2009

7.  Refinement of protein termini in template-based modeling using conformational space annealing.

Authors:  Hahnbeom Park; Junsu Ko; Keehyoung Joo; Julian Lee; Chaok Seok; Jooyoung Lee
Journal:  Proteins       Date:  2011-07-13

8.  Incorporation of evolutionary information into Rosetta comparative modeling.

Authors:  James Thompson; David Baker
Journal:  Proteins       Date:  2011-06-02

9.  Closed conformation of the active site loop of rabbit muscle triosephosphate isomerase in the absence of substrate: evidence of conformational heterogeneity.

Authors:  Ricardo Aparicio; Sérgio T Ferreira; Igor Polikarpov
Journal:  J Mol Biol       Date:  2003-12-12       Impact factor: 5.469

10.  TM-align: a protein structure alignment algorithm based on the TM-score.

Authors:  Yang Zhang; Jeffrey Skolnick
Journal:  Nucleic Acids Res       Date:  2005-04-22       Impact factor: 16.971

View more
  163 in total

1.  Structure and function of the hypochlorous acid-induced flavoprotein RclA from Escherichia coli.

Authors:  Yeongjin Baek; Jinwoo Kim; Jinsook Ahn; Inseong Jo; Seokho Hong; Sangryeol Ryu; Nam-Chul Ha
Journal:  J Biol Chem       Date:  2020-01-26       Impact factor: 5.157

2.  Chalcones as a basis for computer-aided drug design: innovative approaches to tackle malaria.

Authors:  Marilia Nn Lima; Bruno J Neves; Gustavo C Cassiano; Marcelo N Gomes; Kaira Cp Tomaz; Leticia T Ferreira; Tatyana A Tavella; Juliana Calit; Daniel Y Bargieri; Eugene N Muratov; Fabio Tm Costa; Carolina Horta Andrade
Journal:  Future Med Chem       Date:  2019-09-26       Impact factor: 3.808

3.  Molecular Characterization and Directed Evolution of a Metagenome-Derived l-Cysteine Sulfinate Decarboxylase.

Authors:  Jie Deng; Qiaofen Wu; Hua Gao; Qian Ou; Bo Wu; Bing Yan; Chengjian Jiang
Journal:  Food Technol Biotechnol       Date:  2018-03       Impact factor: 3.918

4.  Evidence to Suggest Bacterial Lipoprotein Diacylglyceryl Transferase (Lgt) is a Weakly Associated Inner Membrane Protein.

Authors:  Nikhil Sangith; Subramani Kumar; Krishnan Sankaran
Journal:  J Membr Biol       Date:  2019-06-29       Impact factor: 1.843

5.  Methods for Molecular Modelling of Protein Complexes.

Authors:  Tejashree Rajaram Kanitkar; Neeladri Sen; Sanjana Nair; Neelesh Soni; Kaustubh Amritkar; Yogendra Ramtirtha; M S Madhusudhan
Journal:  Methods Mol Biol       Date:  2021

6.  Chemotherapeutic drug selectivity between wild-type and mutant BRaf kinases in colon cancer.

Authors:  Jianchun Zhang; Tao Ji
Journal:  J Mol Model       Date:  2016-12-05       Impact factor: 1.810

7.  Partial unfolding and refolding for structure refinement: A unified approach of geometric simulations and molecular dynamics.

Authors:  Avishek Kumar; Paul Campitelli; M F Thorpe; S Banu Ozkan
Journal:  Proteins       Date:  2015-11-17

8.  GalaxySite: ligand-binding-site prediction by using molecular docking.

Authors:  Lim Heo; Woong-Hee Shin; Myeong Sup Lee; Chaok Seok
Journal:  Nucleic Acids Res       Date:  2014-04-21       Impact factor: 16.971

9.  Computer-aided discovery of two novel chalcone-like compounds active and selective against Leishmania infantum.

Authors:  Marcelo N Gomes; Laura M Alcântara; Bruno J Neves; Cleber C Melo-Filho; Lucio H Freitas-Junior; Carolina B Moraes; Rui Ma; Scott G Franzblau; Eugene Muratov; Carolina Horta Andrade
Journal:  Bioorg Med Chem Lett       Date:  2017-04-04       Impact factor: 2.823

10.  Evaluation of gene expression and protein structural modeling involved in persister cell formation in Salmonella Typhimurium.

Authors:  Negar Narimisa; Fatemeh Amraei; Behrooz Sadeghi Kalani; Faramarz Masjedian Jazi
Journal:  Braz J Microbiol       Date:  2020-10-30       Impact factor: 2.476

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.