Literature DB >> 28387820

GalaxyHomomer: a web server for protein homo-oligomer structure prediction from a monomer sequence or structure.

Minkyung Baek¹, Taeyong Park¹, Lim Heo¹, Chiwook Park², Chaok Seok¹.

Abstract

Homo-oligomerization of proteins is abundant in nature, and is often intimately related with the physiological functions of proteins, such as in metabolism, signal transduction or immunity. Information on the homo-oligomer structure is therefore important to obtain a molecular-level understanding of protein functions and their regulation. Currently available web servers predict protein homo-oligomer structures either by template-based modeling using homo-oligomer templates selected from the protein structure database or by ab initio docking of monomer structures resolved by experiment or predicted by computation. The GalaxyHomomer server, freely accessible at http://galaxy.seoklab.org/homomer, carries out template-based modeling, ab initio docking or both depending on the availability of proper oligomer templates. It also incorporates recently developed model refinement methods that can consistently improve model quality. Moreover, the server provides additional options that can be chosen by the user depending on the availability of information on the monomer structure, oligomeric state and locations of unreliable/flexible loops or termini. The performance of the server was better than or comparable to that of other available methods when tested on benchmark sets and in a recent CASP performed in a blind fashion.

Entities: Chemical Disease Gene Species

Mesh：

Substances：
Proteins

Year: 2017 PMID： 28387820 PMCID： PMC5570155 DOI： 10.1093/nar/gkx246

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

A large fraction of cellular proteins self-assemble to form symmetric homo-oligomers with distinct biochemical and biophysical properties (1–3). For example, ligand-binding sites or catalytic sites are located at oligomer interfaces in many proteins (4–6), and oligomerization is often necessary for effective signal transduction through membrane receptor proteins (7,8) and selective gating of channel proteins (9). Therefore, knowledge of the homo-oligomer structure is essential for understanding the physiological functions of proteins at the molecular level and for designing molecules that regulate the functions. Methods for predicting the protein homo-oligomer structure can be divided into two categories: those that use templates selected from the protein structure database and others that dock monomer structures ab initio, without using template information. Usually, template-based methods require a sequence as input, whereas docking methods require a monomer structure as input. The latter requirement can be more restrictive for the user if the monomer structure has to be predicted by another method, but it may be preferred if an experimentally resolved monomer structure is available. It is generally expected that template-based methods produce more accurate predictions under a situation in which similar proteins forming oligomers exist in the structure database. Docking methods may be more useful when proper oligomer templates are not available but the monomer structure is reliable. Several protein–protein docking methods have been reported to date (10–18), and some of these are available as public web servers for predicting homo-oligomer structures. M-ZDOCK (13) and GRAMM-X (15), which use ab initio docking based on fast Fourier transformation (FFT), are two such examples. The oligomeric state must be provided as input in these servers. However, relatively few web servers that use template-based methods have been reported. ROBETTA (19,20) and SWISS-MODEL (21) are two web servers that predict the homo-oligomer structure from an amino acid sequence. GalaxyGemini (22) predicts the homo-oligomer structure from a monomer structure. These servers predict the oligomeric state automatically. Depending on the availability of information on the oligomeric state, the user may or may not prefer to specify the oligomeric state. Here, we introduce a new web server called GalaxyHomomer that predicts the homo-oligomer structure from either the amino acid sequence or from the monomer structure. The oligomeric state may or may not be specified by user. The server can perform both template-based oligomer modeling and ab initio docking. It returns five model structures and automatically decides how many models are generated by which method depending on the existence of proper oligomer templates. Oligomer structures predicted by template-based methods may have errors due to sequence differences between the target and template proteins. Those predicted by docking methods may have inaccuracy if structural change of the monomer induced by oligomerization is not considered. In the previous CASP experiment conducted in 2014 in collaboration with CAPRI, we showed that such errors in predicted oligomer structures could be reduced by re-modeling inaccurately predicted loops or termini and by relaxing the overall structure (23). GalaxyHomomer incorporates such state-of-the-art model refinement methods to improve the accuracy of homo-oligomer models generated by both template-based modeling and ab initio docking. According to the assessment of the recent blind prediction experiment CASP12 conducted in 2016, GalaxyHomomer, participated as ‘Seok-assembly’, ranked second among the servers participated in the assembly category. When we tested GalaxyHomomer on 136 targets from PISA benchmark set, 47 targets from a membrane protein set, 20 targets from CASP11 experiments and 89 targets from CAMEO protein structure prediction category, it showed a performance better than or comparable to that of other available homo-oligomer structure prediction methods.

THE GALAXYHOMOMER METHOD

Overall procedure

The overall pipeline of GalaxyHomomer is presented in Figure 1. Either a sequence or structure (experimental or predicted structure) of the monomer can be provided as input. If the oligomeric state is not specified by the user, possible oligomeric states are predicted first. Five homo-oligomer structures with the given oligomeric states are then generated by template-based modeling and ab initio docking. Oligomer templates required by template-based modeling are detected based only on the sequence as well as with additional structure information. The models are further refined by loop/terminus modeling using GalaxyLoop (24–26) and by overall relaxation using GalaxyRefineComplex (27).

Figure 1.

Flowchart of the GalaxyHomomer algorithm. The homo-oligomer structure prediction methods based on sequence similarity, structure similarity and ab initio docking are attempted in the order in which they are numbered until five homo-oligomer models are generated. When the monomer structure is given as input, only shaded procedures are executed.

Prediction of the oligomeric state

Possible oligomeric states are predicted from the input sequence by a similarity-based method as follows. First, HHsearch (28) is run in the local alignment mode to detect proteins that are similar to the target in the protein structure database ‘pdb70’, with a maximum mutual sequence identity of 70%. The oligomeric states of the database proteins were assigned according to the biological units described in ‘REMARK 350’. Second, the proteins are re-ranked by a score S, which combines the HHsearch sequence score and HHsearch secondary structure score (29). Next, the S scores of the proteins in the same oligomeric states are summed for the top 100 proteins, and the ratios of different oligomeric states are determined in proportion to the S sums. Finally, oligomeric states for five models are assigned according to the oligomeric state ratios.

Template-based oligomer modeling

The same top 100 proteins described above are considered as candidates for oligomer templates. If a sequence is provided as input, up to five proteins are selected as templates based on the ranking of S among those with S >0.2 times the highest S overall and those >0.7 times the highest S for the given oligomeric state. If the number of detected templates using this sequence-based method is less than five, additional templates are selected using the monomer structure predicted by the template-based modeling program GalaxyTBM (29). Structure-based templates are selected according to the ranking of S among those with monomer structures similar to the given monomer structure (TM-score calculated using TM-align (30) >0.5) and in the given oligomeric state. If a structure is provided as input, only the structure-based template detection is used with the monomer structure provided by the user. For each oligomer template detected by the sequence-based method, an oligomer structure is built using the in-house model-building program GalaxyCassiopeia, a component of the most recent version of GalaxyTBM (29). GalaxyCassiopeia builds models from the sequence alignment and template structure by the VTFM optimization used in MODELLER (31) but with FACTS solvation free energy (32), knowledge-based hydrogen bond energy (33) and dipolar-DFIRE (34) in addition to molecular mechanics bonded and non-bonded energy terms and template-derived restraints. For each template detected by the structure-based method, an oligomer structure is built by superimposing the monomer structure onto the oligomer template.

Ab initio docking

If less than five oligomer templates are detected by the two template detection methods described above, the remaining homo-oligomer models with the given oligomeric states are generated using the in-house ab initio docking program GalaxyOligoTongDock. This docking program predicts homo-oligomer structures from the monomer structure using the grid-based FFT docking method of M-ZDOCK (13) implemented in-house. Only C-symmetry is considered, and D-symmetry is not supported. The top 200 homo-oligomer structures generated by FFT are clustered using NMRCLUST (35), and the clusters are ranked according to the cluster size. From each of the highest ranking clusters, the highest-score structure is selected.

Structure refinement

Less reliable loop or terminal regions are re-modeled using GalaxyLoop (24–26) considering symmetry of the homo-oligomer structure for the first model for those regions predicted to be unreliable if a sequence is provided as input, and for all five models for user-specified regions if a structure is provided as input. GalaxyRefineComplex (27) is subsequently run to further relax the overall structure. The user can run additional refinement jobs by clicking the ‘Submit’ button in the results table on the output page.

Performance of the method

The GalaxyHomomer server was tested on 25 targets in CASP12 in a blind fashion, and this server, named ‘Seok-assembly’, ranked second among the servers participated in the assembly category (http://www.predictioncenter.org/casp12/). In CASPs, the oligomeric state is provided by the organizers. The server was also tested on three benchmark sets for which the oligomeric state is given as input (136 homo-oligomer proteins from the PISA benchmark set (36), 47 homo-oligomer membrane proteins compiled from the PDB (Supplementary Data) and 20 homo-oligomer proteins among the targets of CASP11 held in 2014 in collaboration with CAPRI (18)) and on a set for which the oligomeric state is not provided as input (89 homo-oligomer proteins among CAMEO (37) targets released from 13 August 2016 to 11 November 2016). In these tests, the performance of GalaxyHomomer was better than or comparable to that of other methods for which performance data are available for the sets in terms of the CAPRI accuracy criterion, as summarized in Table 1. Note that some methods take only the structure as input. The CAPRI criterion reflects the biological relevance of the model structures, and model qualities are classified as high (***), medium (**), acceptable (*) and incorrect considering the ligand root mean-square deviation (L-RMSD) and interface RMSD (I-RMSD) from the experimental structure and the fraction of predicted native contacts (Fnat) (38). See Supplementary Data for details on the benchmark tests.

Table 1.

Performance comparison of homo-oligomer structure prediction methods in terms of the CAPRI accuracy criterion

Benchmark set	Prediction methods	Input	Up to 5 models^a	Top 1 model^a
PISA (136 targets)^b	GalaxyHomomer	Sequence	62/5*/38	57/3*/39
	HH+MODELLER^c	Sequence	61/3*/38	45/1*/26
Membrane proteins (47 targets)^b	GalaxyHomomer	Sequence	19/1*/14	19/1*/9
	HH+MODELLER^c	Sequence	18/0*/6	14/0*/4
CASP11 (20 targets)^b	GalaxyHomomer	Sequence	12/0*/8	12/0*/5
	HADDOCK	Structure	14/0*/10	13/0*/9
	ClusPro	Structure	14/0*/7	10/0*/5
	BAKER-ROSETTASERVER	Sequence	9/0*/8	9/0*/7
	SwarmDock	Structure	9/0*/3	8/0*/3
	GalaxyGemini^d	Structure	Not available	7/0*/5
	GRAMM-X	Structure	5/0*/1	3/0*/1
CAMEO (89 targets)	GalaxyHomomer	Sequence	44/6*/25	35/3*/25
	Robetta	Sequence	28/4*/17	26/4*/15
	SWISS-MODEL^d	Sequence	Not available	23/3*/16

aData represent the numbers of targets for which the best of up to five predicted models were of acceptable or higher/high accuracy (***) and medium accuracy (**); values for model 1 are shown.

bOligomeric state of target protein is given as an input.

cUp to five homo-oligomer models were generated by MODELLER based on the templates detected by HH-search.

dData for up to five models were not provided for GalaxyGemini and SWISS-MODEL because they generated only single models.

aData represent the numbers of targets for which the best of up to five predicted models were of acceptable or higher/high accuracy (***) and medium accuracy (**); values for model 1 are shown. bOligomeric state of target protein is given as an input. cUp to five homo-oligomer models were generated by MODELLER based on the templates detected by HH-search. dData for up to five models were not provided for GalaxyGemini and SWISS-MODEL because they generated only single models. It has to be noted that GalaxyHomomer does not consider the lipid bilayer environment of membrane proteins explicitly in terms of energy or geometry during energy-based optimization and docking. However, the results on membrane proteins in Table 1 are quite promising, implying that membrane environment was effectively taken into account in an implicit manner by using the database structures of membrane proteins as templates. GalaxyHomomer showed better performance than GalaxyGemini (22), a previous homo-oligomer structure prediction server developed by us, on the CASP11 benchmark set, as summarized in Table 1. The difference in the performance is mainly due to the cases in which predicted monomer structures are not accurate enough. In such cases, oligomer structures built directly from the sequence using sequence-based templates (method 1 in Figure 1) tended to be more accurate than those obtained by superimposing the predicted monomer structures on the structure-based templates (method 2 in Figure 1) GalaxyGemini builds oligomer models using only method 2. Additional model refinement performed by GalaxyHomomer also improved the model accuracy.

THE GALAXYHOMOMER SERVER

Hardware and software

The GalaxyHomomer server runs on a cluster of 12 Linux servers of 2.33-GHz Intel Xeon 8-core processors. The web application uses the Python programming language and the MySQL database. The whole GalaxyHomomer pipeline is implemented using Python. The model building, ab initio docking and refinement methods are implemented as part of the GALAXY program package (39) written in Fortran 90. The JavaScript Protein Viewer (http://biasmv.github.io/pv/) is used for visualization of the predicted models.

Input and output

The required input is a protein monomer sequence in FASTA format or a protein monomer structure in PDB format. The number of residues in the input file is limited to 1000 for computational efficiency. Users can provide additional information such as the oligomeric state and locations of unreliable/flexible loops or termini of the input structure. Usual run time is 6–12 h, but it depends heavily on the homo-oligomer size and the input type, as shown in Supplementary Figure S2. Five homo-oligomer model structures are visualized and available for download in PDB format. Detailed prediction results, including the number of subunits, interface area, information on templates and ab initio docking score, are reported in the tables (Figure 2).

Figure 2.

An example output page of GalaxyHomomer. Five generated models are visualized using the JavaScript Protein Viewer. The models can be downloaded in PDB format. Additional information such as the number of subunits, interface area, information on templates and ab initio docking score is provided in the tables.

CONCLUSION

The GalaxyHomomer server predicts the homo-oligomer structure of a target protein from a sequence or monomer structure. It performs both template-based modeling and ab initio docking, and adopts additional model refinement that can consistently improve model quality. The server provides different options that can be chosen by the user depending on the availability of information on monomer structure, oligomeric state and locations of unreliable/flexible loops or termini. By combining additional refinement based on loop modeling and overall structure refinement, GalaxyHomomer may generate more precise homo-oligomer models that can be useful for further applications such as for drug design targeting protein homo-oligomer interfaces. Click here for additional data file.

38 in total

1. Protein-protein docking with simultaneous optimization of rigid-body displacement and side-chain conformations.

Authors: Jeffrey J Gray; Stewart Moughon; Chu Wang; Ora Schueler-Furman; Brian Kuhlman; Carol A Rohl; David Baker
Journal: J Mol Biol Date: 2003-08-01 Impact factor: 5.469

2. Protein loop modeling by using fragment assembly and analytical loop closure.

Authors: Julian Lee; Dongseon Lee; Hahnbeom Park; Evangelos A Coutsias; Chaok Seok
Journal: Proteins Date: 2010-09-24

Review 3. Receptor signaling: dimerization and beyond.

Authors: J Stock
Journal: Curr Biol Date: 1996-07-01 Impact factor: 10.834

4. HexServer: an FFT-based protein docking server powered by graphics processors.

Authors: Gary Macindoe; Lazaros Mavridis; Vishwesh Venkatraman; Marie-Dominique Devignes; David W Ritchie
Journal: Nucleic Acids Res Date: 2010-05-05 Impact factor: 16.971

5. Molecular Basis for Drug Resistance in HIV-1 Protease.

Authors: Akbar Ali; Rajintha M Bandaranayake; Yufeng Cai; Nancy M King; Madhavi Kolli; Seema Mittal; Jennifer F Murzycki; Madhavi N L Nalam; Ellen A Nalivaika; Ayşegül Özen; Moses M Prabu-Jeyabalan; Kelly Thayer; Celia A Schiffer
Journal: Viruses Date: 2010-11-12 Impact factor: 5.818

6. The Protein Model Portal--a comprehensive resource for protein structure and model information.

Authors: Juergen Haas; Steven Roth; Konstantin Arnold; Florian Kiefer; Tobias Schmidt; Lorenza Bordoli; Torsten Schwede
Journal: Database (Oxford) Date: 2013-04-26 Impact factor: 3.451

7. Protein loop modeling using a new hybrid energy function and its application to modeling in inaccurate structural environments.

Authors: Hahnbeom Park; Gyu Rie Lee; Lim Heo; Chaok Seok
Journal: PLoS One Date: 2014-11-24 Impact factor: 3.240

8. GalaxyRefineComplex: Refinement of protein-protein complex model structures driven by interface repacking.

Authors: Lim Heo; Hasup Lee; Chaok Seok
Journal: Sci Rep Date: 2016-08-18 Impact factor: 4.379

9. A direct interaction between NQO1 and a chemotherapeutic dimeric naphthoquinone.

Authors: Lakshmi Swarna Mukhi Pidugu; J C Emmanuel Mbimba; Muqeet Ahmad; Edwin Pozharski; Edward A Sausville; Ashkan Emadi; Eric A Toth
Journal: BMC Struct Biol Date: 2016-01-28

10. Prediction of homoprotein and heteroprotein complexes by protein docking and template-based modeling: A CASP-CAPRI experiment.

Authors: Marc F Lensink; Sameer Velankar; Andriy Kryshtafovych; Shen-You Huang; Dina Schneidman-Duhovny; Andrej Sali; Joan Segura; Narcis Fernandez-Fuentes; Shruthi Viswanath; Ron Elber; Sergei Grudinin; Petr Popov; Emilie Neveu; Hasup Lee; Minkyung Baek; Sangwoo Park; Lim Heo; Gyu Rie Lee; Chaok Seok; Sanbo Qin; Huan-Xiang Zhou; David W Ritchie; Bernard Maigret; Marie-Dominique Devignes; Anisah Ghoorah; Mieczyslaw Torchala; Raphaël A G Chaleil; Paul A Bates; Efrat Ben-Zeev; Miriam Eisenstein; Surendra S Negi; Zhiping Weng; Thom Vreven; Brian G Pierce; Tyler M Borrman; Jinchao Yu; Françoise Ochsenbein; Raphaël Guerois; Anna Vangone; João P G L M Rodrigues; Gydo van Zundert; Mehdi Nellen; Li Xue; Ezgi Karaca; Adrien S J Melquiond; Koen Visscher; Panagiotis L Kastritis; Alexandre M J J Bonvin; Xianjin Xu; Liming Qiu; Chengfei Yan; Jilong Li; Zhiwei Ma; Jianlin Cheng; Xiaoqin Zou; Yang Shen; Lenna X Peterson; Hyung-Rae Kim; Amit Roy; Xusi Han; Juan Esquivel-Rodriguez; Daisuke Kihara; Xiaofeng Yu; Neil J Bruce; Jonathan C Fuller; Rebecca C Wade; Ivan Anishchenko; Petras J Kundrotas; Ilya A Vakser; Kenichiro Imai; Kazunori Yamada; Toshiyuki Oda; Tsukasa Nakamura; Kentaro Tomii; Chiara Pallara; Miguel Romero-Durana; Brian Jiménez-García; Iain H Moal; Juan Férnandez-Recio; Jong Young Joung; Jong Yun Kim; Keehyoung Joo; Jooyoung Lee; Dima Kozakov; Sandor Vajda; Scott Mottarella; David R Hall; Dmitri Beglov; Artem Mamonov; Bing Xia; Tanggis Bohnuud; Carlos A Del Carpio; Eichiro Ichiishi; Nicholas Marze; Daisuke Kuroda; Shourya S Roy Burman; Jeffrey J Gray; Edrisse Chermak; Luigi Cavallo; Romina Oliva; Andrey Tovchigrechko; Shoshana J Wodak
Journal: Proteins Date: 2016-06-01

36 in total

1. Residue Interaction Network Analysis Predicts a Val24-Ile31 Interaction May be Involved in Preventing Amyloid-Beta (1-42) Primary Nucleation.

Authors: Jeddidiah W D Griffin; Patrick C Bradshaw
Journal: Protein J Date: 2021-02-10 Impact factor: 2.371

2. HSYMDOCK: a docking web server for predicting the structure of protein homo-oligomers with Cn or Dn symmetry.

Authors: Yumeng Yan; Huanyu Tao; Sheng-You Huang
Journal: Nucleic Acids Res Date: 2018-07-02 Impact factor: 16.971

3. Automatic structure prediction of oligomeric assemblies using Robetta in CASP12.

Authors: Hahnbeom Park; David E Kim; Sergey Ovchinnikov; David Baker; Frank DiMaio
Journal: Proteins Date: 2017-10-16

4. Flexible Backbone Assembly and Refinement of Symmetrical Homomeric Complexes.

Authors: Shourya S Roy Burman; Remy A Yovanno; Jeffrey J Gray
Journal: Structure Date: 2019-04-18 Impact factor: 5.006

5. Building Structural Models of a Whole Mycoplasma Cell.

Authors: Martina Maritan; Ludovic Autin; Jonathan Karr; Markus W Covert; Arthur J Olson; David S Goodsell
Journal: J Mol Biol Date: 2021-11-10 Impact factor: 5.469

6. The HDOCK server for integrated protein-protein docking.

Authors: Yumeng Yan; Huanyu Tao; Jiahua He; Sheng-You Huang
Journal: Nat Protoc Date: 2020-04-08 Impact factor: 13.491

7. Modeling in the Time of COVID-19: Statistical and Rule-based Mesoscale Models.

Authors: Ngan Nguyen; Ondrej Strnad; Tobias Klein; Deng Luo; Ruwayda Alharbi; Peter Wonka; Martina Maritan; Peter Mindek; Ludovic Autin; David S Goodsell; Ivan Viola
Journal: IEEE Trans Vis Comput Graph Date: 2021-01-28 Impact factor: 4.579

8. Physics-based protein structure refinement in the era of artificial intelligence.

Authors: Lim Heo; Giacomo Janson; Michael Feig
Journal: Proteins Date: 2021-06-29

9. ProtCHOIR: a tool for proteome-scale generation of homo-oligomers.

Authors: Pedro H M Torres; Artur D Rossi; Tom L Blundell
Journal: Brief Bioinform Date: 2021-11-05 Impact factor: 11.622

10. Alternative splicing creates a pseudo-strictosidine β-d-glucosidase modulating alkaloid synthesis in Catharanthus roseus.

Authors: Inês Carqueijeiro; Konstantinos Koudounas; Thomas Dugé de Bernonville; Liuda Johana Sepúlveda; Angela Mosquera; Dikki Pedenla Bomzan; Audrey Oudin; Arnaud Lanoue; Sébastien Besseau; Pamela Lemos Cruz; Natalja Kulagina; Emily A Stander; Sébastien Eymieux; Julien Burlaud-Gaillard; Emmanuelle Blanchard; Marc Clastre; Lucia Atehortùa; Benoit St-Pierre; Nathalie Giglioli-Guivarc'h; Nicolas Papon; Dinesh A Nagegowda; Sarah E O'Connor; Vincent Courdavault
Journal: Plant Physiol Date: 2021-04-02 Impact factor: 8.340