Literature DB >> 20484373

SMAP-WS: a parallel web service for structural proteome-wide ligand-binding site comparison.

Jingyuan Ren¹, Lei Xie, Wilfred W Li, Philip E Bourne.

Abstract

The proteome-wide characterization and analysis of protein ligand-binding sites and their interactions with ligands can provide pivotal information in understanding the structure, function and evolution of proteins and for designing safe and efficient therapeutics. The SMAP web service (SMAP-WS) meets this need through parallel computations designed for 3D ligand-binding site comparison and similarity searching on a structural proteome scale. SMAP-WS implements a shape descriptor (the Geometric Potential) that characterizes both local and global topological properties of the protein structure and which can be used to predict the likely ligand-binding pocket [Xie,L. and Bourne,P.E. (2007) A robust and efficient algorithm for the shape description of protein structures and its application in predicting ligand-binding sites. BMC bioinformatics, 8 (Suppl. 4.), S9.]. Subsequently a sequence order independent profile-profile alignment (SOIPPA) algorithm is used to detect and align similar pockets thereby finding protein functional and evolutionary relationships across fold space [Xie, L. and Bourne, P.E. (2008) Detecting evolutionary relationships across existing fold space, using sequence order-independent profile-profile alignments. Proc. Natl Acad. Sci. USA, 105, 5441-5446]. An extreme value distribution model estimates the statistical significance of the match [Xie, L., Xie, L. and Bourne, P.E. (2009) A unified statistical model to support local sequence order independent similarity searching for ligand-binding sites and its application to genome-based drug discovery. Bioinformatics, 25, i305-i312.]. These algorithms have been extensively benchmarked and shown to outperform most existing algorithms. Moreover, several predictions resulting from SMAP-WS have been validated experimentally. Thus far SMAP-WS has been applied to predict drug side effects, and to repurpose existing drugs for new indications. SMAP-WS provides both a user-friendly web interface and programming API for scientists to address a wide range of compute intense questions in biology and drug discovery. SMAP-WS is available from the URL http://smap.nbcr.net.

Entities: Chemical Disease Gene Species

Mesh：

Substances：
Ligands
Proteome

Year: 2010 PMID： 20484373 PMCID： PMC2896174 DOI： 10.1093/nar/gkq400

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

The 3D structure of a protein is an essential component in elucidating biological function(s) at the molecular level. Ligand-binding sites and their interactions with binding partners provides a strong correlation between structure and function and thus are critical to address a wide range of fundamental and practical problems—predicting functions for structural genomics targets, bridging knowledge derived from small molecules and proteins, correlating molecular functions to physiological processes, studying protein evolution and diversity, and designing safe and efficient therapeutics. The SMAP web service (SMAP-WS) is distinct from the downloadable software and is designed for web accessible 3D ligand-binding site comparison and similarity searching on a structural proteome scale. The underlying algorithms comprising SMAP-WS and the standalone software, SMAP, are distinct from existing web servers SiteEngine (1), SitesBase (2,3), CavBase (4–6), SuMo (7), PdbSiteScan (8), eF-Site (9,10), pvSOAR (11), ProFunc (12), PevoSoar (13) and fPOP(14). First, SMAP represents protein structures using C-α atoms only and hence is tolerant to structural variation, meaning it can be applied to homology models and low-resolution structures. Second, amino acid residues are characterized by surface orientation and a geometric potential (15) which provides a geometrical constraint to reduce the search space when undertaking ligand-binding site comparison. Third, two structures are compared using a sequence order independent profile–profile alignment (SOIPPA) algorithm (16). SOIPPA aligns two structures in the spirit of local sequence alignment, but independent of the sequence order. As a result the location and boundary of the ligand-binding site does not need to be pre-defined. This property is important for real world applications since information on the ligand-binding site may be unknown. Fourth, SMAP can compare two biological units that may include multiple chains. This is important since binding sites may be located in the homo- or hetero-dimer interface. For example, the binding site of the antibiotic myxopyronin to the bacterial RNA polymerase is located in the ‘switch region’ between the β and β′ chains. Finally, SMAP determines the similarity between two binding sites through the combination of geometrical fit, residue conservation and physiochemical similarity. The statistical significance of the similarity is estimated using an extreme value distribution model (17). Putting these features together within a parallel computing environment means that SMAP-WS is capable of an all-by-all comparison of binding sites for a complete structural proteome. In benchmark studies, SOIPPA outperforms most existing ligand-binding site comparison algorithms (16). Around 30% of evolutionary and functional relationships across superfamilies are identified by SOIPPA with a false–positive ratio of 5%. Moreover, SOIPPA outperforms global structural alignment algorithms in detecting remote homologous that belong to the same superfamily. For a false–positive ratio of 5%, SOIPPA detects 15% more true positives than the global structural alignment. More important, several predictions from SMAP have been experimentally validated (16,18–20). Given the reliability of SMAP, it has been applied to constructing drug–target interaction networks on a structural proteome scale (17), predicting molecular mechanisms of drug side effects (21,22), repurposing old drugs for new medical usage (19), designing polypharmacology (dirty) drugs (18), and establishing evolutionary relationships across protein fold space (16). Thus, SMAP is useful for studying fundamental questions in protein structure, function and evolution, as well as for computer aided drug design based on polypharmacology. As standalone software, SMAP can be installed locally and executed from the command line. SMAP-WS has several improvements that make it more user-friendly and computationally efficient. SMAP-WS has a web-based interface for the input of PDB structures, the set of required parameters, a Jmol visualization plugin to analyze results, pre-computed databases to search against and a parallel implementation of SMAP accessible from a large compute cluster to improve database search speed. Thus SMAP-WS facilitates the application of comparative ligand-binding site analysis to address practical problems in biology and drug discovery.

METHODS

Opal powered SMAP web services

SMAP-WS is powered by Opal (23), a toolkit that enables scientists to easily wrap applications as web services that have user-friendly web forms by configuring simple XML files. Two SMAP-WS interfaces are implemented: (i) pair-wise comparison of two potential ligand-binding sites; and (ii) search using a query structure against a non-redundant structure database from the RCSB Protein Data Bank (PDB) (24). In the first application structures and their components can be chosen from the PDB or uploaded by the user. In the second application the user may either choose to enter a PDB structure id and its chain id(s) or upload a structure file in PDB format. The user can then choose to perform a search using this structure against several databases, including human homologous proteins and non-redundant PDB structures based on sequence identities of 30 and 90%, respectively. The user has the option to modify the appropriate SMAP-WS parameters for both applications. In order to improve the search speed, the database search has pre-cached the protein structures used for the SMAP comparison. The structure cached is characterized by geometric, evolutionary and physiochemical properties and uses default parameters. The pair-wise comparison interface provides the user with the ability to modify more parameters for comparing two protein structures based on the similarity of their potential ligand-binding sites. In additional to the web input forms (http://smap.ncbr.net), SMAP-WS can be accessed through a programming API. The details of how to write a client program can be found on the web site.

Output of SMAP-WS

The hits from a database search are sorted by the similarity score of the match, along with P-values of the match, their PDB structure ids, chain ids and biological descriptions. The PDB id is linked to the structure summary page of the RCSB PDB (http://www.rcsb.org/pdb). For each of the hits, detailed information on the ligand-binding site similarity is presented (P-value, raw alignment score, RMSD and Tanimoto coefficient of overlap). The amino-acid residue alignment between two ligand-binding sites and the transformation matrix to superpose them are also displayed. It is important to evaluate if the predicted residue cluster is a potential binding region. SMAP-WS relies on the geometric potential (15), which is a shape descriptor to characterize both local and global topological properties of each residue, to determine whether a residue is located in a pocket on the protein structure or not. However, in a real application where the binding region is unknown, additional information such as ligand-binding affinity may be required to determine if the predicted region is suited for ligand binding. Thus, a visualization tool that allows the user to inspect the protein–ligand complex structure was implemented. A Jmol plugin (Jmol: an open-source Java viewer for chemical structures in 3D.) that displays the superposition of two protein structures with predicted and aligned ligand-binding site residues is provided. An example of such a superposition is shown in Figure 1. The estrogen receptor ligand-binding domain (PDB id: 1QKT) is compared with the steroid delta-isomerase (PDB id: 1OHP) without specifying the co-crystallized ligand-binding site (in the web interface, the option of ‘search only co-crystal ligand sites’ for both structures was set to false). A statistically significant similarity between the two estradiol ligand-binding sites is detected (P = 4.09e–6), although the two structures do not share global structure or sequence similarity [FATCAT (25) P-value 1.19e–1, and sequence identity %9.62], and the estradiol-binding sites are not pre-defined in both of the structures.

Figure 1.

Two superposed structures of steroid delta-isomerase (PDB id: 1OHP, blue backbone) and estrogen receptor ligand-binding domain (PDB id 1QKT, red backbone) from an SMAP alignment. The co-crystalized estradiol in 1QKT is shown as a light blue stick model. The aligned residues between the two structures are highlighted. One of the major applications of SMAP is to predict off-targets given a known protein–ligand complex. The ligand-binding site similarity between two proteins alone is prerequisite, but not sufficient to determine their cross-reactivity for a specific ligand. The chemical nature of the ligand also contributes to the binding promiscuity. For example, staurosporine can bind to a large panel of kinases with Kd < 100 nm. However, compound VX-745 is a highly specific ATP competitive inhibitor of p38 MAP kinase (26). A more recent case is the chemical phylogenetic study of histone deacetylases (HDAC), where two chemical analogs have different binding profiles across multiple members of the HDAC superfamily (27). To computationally determine the potential off-target binding, it is necessary to calculate the binding free energy of the protein–ligand complex using techniques such as protein–ligand docking and molecular dynamics simulation. SMAP-WS narrows down the potential off-targets to a small subset of the whole structure proteome as well as provides an initial binding pose for the given ligand to the off-target it found in the query structure. The accuracy of predicted binding poses by SMAP-WS has been evaluated in a previous study (16). In a rigorous benchmark test, 6.5 and 25.9% of predicted binding poses fall within RMSD values of <2.0 and <5.0 Å, respectively, when compared with co-crystallized ligands that bind to proteins with different folds. Hence, the predicted protein–ligand complex from SMAP-WS could be used as a starting point for more computational intensive studies. The pipeline has been successfully applied to determine the polypharmacological targets of Trypanosoma Brucei RNA-ligase inhibitors (18). To facilitate such applications, SMAP-WS allows users to download the structure of potential off-targets with the superposed ligand. These complexes can then be subject to more computationally intensive studies such as protein–ligand docking and MD simulation.

Paralleled implementation of SMAP-WS

SMAP-WS database search is scheduled by the Sun Grid Engine (SGE), which allows SMAP pair-wise comparison to be executed concurrently on all available compute nodes. As a result, SMAP-WS significantly improves the speed of ligand-binding site database searching. Using SMAP on a single processor, sequential comparison of a query structure against a database of about 40 000 non-redundant structures from the PDB takes more than 20 days (17). Our solution to speed up this process was to set up a wrapper program that submits a SGE array job for the set of SMAP comparisons to allow these comparisons to run in parallel on the computer nodes in a cluster. The SMAP-WS server cluster has available up to 99 computer nodes with two processors on each node. Thus 198 SMAP-WS jobs can run in parallel, when all computer nodes are available, with a scan of the non-redundant PDB being done within one day.

CONCLUSION

We have developed a high performance computation environment SMAP-WS for protein ligand-binding site comparison and database searching. SMAP-WS provides both user-friendly interfaces and a programming API to help a wide spectrum of scientists to access the service. It is expected that the integration of SMAP-WS with other bioinformatics, molecular modeling and systems biology tools will facilitate the study of protein–ligand interactions on a structural proteome scale and drug design based on polypharmacology.

FUNDING

National Institutes of Health grant GM078596; and National Center for Research Resources, National Institutes of Health NIH P41RR08605 for the National Biomedical Computation Resource (NBCR). Funding for open access charge: National Institutes of Health. Conflict of interest statement. None declared.

26 in total

1. The Protein Data Bank.

Authors: H M Berman; J Westbrook; Z Feng; G Gilliland; T N Bhat; H Weissig; I N Shindyalov; P E Bourne
Journal: Nucleic Acids Res Date: 2000-01-01 Impact factor: 16.971

2. A new method to detect related function among proteins independent of sequence and fold homology.

Authors: Stefan Schmitt; Daniel Kuhn; Gerhard Klebe
Journal: J Mol Biol Date: 2002-10-18 Impact factor: 5.469

3. FATCAT: a web server for flexible structure comparison and structure similarity searching.

Authors: Yuzhen Ye; Adam Godzik
Journal: Nucleic Acids Res Date: 2004-07-01 Impact factor: 16.971

4. Efficient similarity search in protein structure databases by k-clique hashing.

Authors: Nils Weskamp; Daniel Kuhn; Eyke Hüllermeier; Gerhard Klebe
Journal: Bioinformatics Date: 2004-07-10 Impact factor: 6.937

5. A searchable database for comparing protein-ligand binding sites for the analysis of structure-function relationships.

Authors: Nicola D Gold; Richard M Jackson
Journal: J Chem Inf Model Date: 2006 Mar-Apr Impact factor: 4.956

6. A class of selective antibacterials derived from a protein kinase inhibitor pharmacophore.

Authors: J Richard Miller; Steve Dunham; Igor Mochalkin; Craig Banotai; Matthew Bowman; Susan Buist; Bill Dunkle; Debra Hanna; H James Harwood; Michael D Huband; Alla Karnovsky; Michael Kuhn; Chris Limberakis; Jia Y Liu; Shawn Mehrens; W Thomas Mueller; Lakshmi Narasimhan; Adam Ogden; Jeff Ohren; J V N Vara Prasad; John A Shelly; Laura Skerlos; Mark Sulavik; V Hayden Thomas; Steve VanderRoest; LiAnn Wang; Zhigang Wang; Amy Whitton; Tong Zhu; C Kendall Stover
Journal: Proc Natl Acad Sci U S A Date: 2009-01-22 Impact factor: 11.205

7. A multidimensional strategy to detect polypharmacological targets in the absence of structural and sequence homology.

Authors: Jacob D Durrant; Rommie E Amaro; Lei Xie; Michael D Urbaniak; Michael A J Ferguson; Antti Haapalainen; Zhijun Chen; Anne Marie Di Guilmi; Frank Wunder; Philip E Bourne; J Andrew McCammon
Journal: PLoS Comput Biol Date: 2010-01-22 Impact factor: 4.475

8. fPOP: footprinting functional pockets of proteins by comparative spatial patterns.

Authors: Yan Yuan Tseng; Z Jeffrey Chen; Wen-Hsiung Li
Journal: Nucleic Acids Res Date: 2009-10-30 Impact factor: 16.971

9. Drug discovery using chemical systems biology: repositioning the safe medicine Comtan to treat multi-drug and extensively drug resistant tuberculosis.

Authors: Sarah L Kinnings; Nina Liu; Nancy Buchmeier; Peter J Tonge; Lei Xie; Philip E Bourne
Journal: PLoS Comput Biol Date: 2009-07-03 Impact factor: 4.475

10. A robust and efficient algorithm for the shape description of protein structures and its application in predicting ligand binding sites.

Authors: Lei Xie; Philip E Bourne
Journal: BMC Bioinformatics Date: 2007-05-22 Impact factor: 3.169

18 in total

1. Binding site matching in rational drug design: algorithms and applications.

Authors: Misagh Naderi; Jeffrey Mitchell Lemoine; Rajiv Gandhi Govindaraj; Omar Zade Kana; Wei Pan Feinstein; Michal Brylinski
Journal: Brief Bioinform Date: 2019-11-27 Impact factor: 11.622

Review 2. Structure and dynamics of molecular networks: a novel paradigm of drug discovery: a comprehensive review.

Authors: Peter Csermely; Tamás Korcsmáros; Huba J M Kiss; Gábor London; Ruth Nussinov
Journal: Pharmacol Ther Date: 2013-02-04 Impact factor: 12.310

3. Opal web services for biomedical applications.

Authors: Jingyuan Ren; Nadya Williams; Luca Clementi; Sriram Krishnan; Wilfred W Li
Journal: Nucleic Acids Res Date: 2010-06-06 Impact factor: 16.971

Review 4. Are predicted protein structures of any value for binding site prediction and virtual ligand screening?

Authors: Jeffrey Skolnick; Hongyi Zhou; Mu Gao
Journal: Curr Opin Struct Biol Date: 2013-02-14 Impact factor: 6.809

5. eModel-BDB: a database of comparative structure models of drug-target interactions from the Binding Database.

Authors: Misagh Naderi; Rajiv Gandhi Govindaraj; Michal Brylinski
Journal: Gigascience Date: 2018-08-01 Impact factor: 6.524

6. Paclitaxel is an inhibitor and its boron dipyrromethene derivative is a fluorescent recognition agent for botulinum neurotoxin subtype A.

Authors: Saedeh Dadgar; Zack Ramjan; Wely B Floriano
Journal: J Med Chem Date: 2013-03-29 Impact factor: 7.446