Literature DB >> 19429894

SuperLooper--a prediction server for the modeling of loops in globular and membrane proteins.

Peter W Hildebrand¹, Andrean Goede, Raphael A Bauer, Bjoern Gruening, Jochen Ismer, Elke Michalsky, Robert Preissner.

Abstract

SuperLooper provides the first online interface for the automatic, quick and interactive search and placement of loops in proteins (LIP). A database containing half a billion segments of water-soluble proteins with lengths up to 35 residues can be screened for candidate loops. A specified database containing 180,000 membrane loops in proteins (LIMP) can be searched, alternatively. Loop candidates are scored based on sequence criteria and the root mean square deviation (RMSD) of the stem atoms. Searching LIP, the average global RMSD of the respective top-ranked loops to the original loops is benchmarked to be <2 A, for loops up to six residues or <3 A for loops shorter than 10 residues. Other suitable conformations may be selected and directly visualized on the web server from a top-50 list. For user guidance, the sequence homology between the template and the original sequence, proline or glycine exchanges or close contacts between a loop candidate and the remainder of the protein are denoted. For membrane proteins, the expansions of the lipid bilayer are automatically modeled using the TMDET algorithm. This allows the user to select the optimal membrane protein loop concerning its relative orientation to the lipid bilayer. The server is online since October 2007 and can be freely accessed at URL: http://bioinformatics.charite.de/superlooper/.

Entities: Chemical Disease Gene Species

Mesh：

Substances：
Membrane Proteins

Year: 2009 PMID： 19429894 PMCID： PMC2703960 DOI： 10.1093/nar/gkp338

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

Loop prediction is generally one of the most challenging tasks in protein structure determination and modeling (1–17). The preferred conformation of loops often remains unclear even when the rest of the protein is resolved at high resolution. This is due to the high flexibility of loops that is often related to their function (18). Loops are regularly involved in the recognition and binding of modulators or associated proteins. Medically highly relevant interactions, such as the coupling of receptors to G proteins are mediated by membrane protein loops (19). Therefore, the knowledge of the conformation or the conformational space of a loop is essentially important to understand the mechanisms to activate or deactivate membrane receptors and transporters, or more broadly to model protein–protein or protein–ligand interactions. For loop modeling, two different methods, ab initio (1,3,5,8,15–17) and comparative modeling (6,9,14) are applied. Ab initio methods calculate possible loop conformations with the help of various energy functions and minimizations. These methods do not depend on large template libraries, but are generally time consuming, and are therefore less appropriate for interactive searches. Comparative modeling approaches allow quick searches, but the quality of prediction largely depends on the availability of a suitable template loop structure. Thus, the potential of comparative modeling methods grows, as the diversity of available templates enlarges (14). It is estimated that, at the moment, the conformation of any loop up to the length of 14 residues is already represented very well by protein fragments in the RCSB Protein Data Bank (PDB) (12,20). Therefore, the performance of knowledge-based methods to find the native loop conformation particularly depends on the size of the loop databank and on the scoring function. We have developed a scoring function for knowledge-based loop predictions that performs very well compared with other methods (14). Based on this scoring function, we now setup SuperLooper, a web application that provides a very simple, quick, user-friendly and reliable way to fill in a missing loop. No extra software has to be installed and no databank has to be downloaded to get the program started. For user guidance, the candidate loops can be visualized by a JMol (http://www.jmol.org/) plug-in. Moreover, the web server provides information on sequence identities or proline and glycine exchanges between the template and the target, as well as close distances between a selected loop and the remainder of the protein. Finally, the membrane planes are automatically detected and visualized using the TMDET algorithm (21). Thus, the specificities of membrane protein loops arising from the positioning at the membrane–water interface can be respected, too (22).

METHODS

To allow the searches to be performed in real time, we have improved the scoring procedure that is the most time consuming process of our method (14). The search for the appropriate loop is now performed in a three-step process, described below. This hierarchical principle causes that the most CPU intensive calculations are performed on relatively small datasets. Up to 100 000 candidates with the required loop length are preselected from the two databases LIP (loops in proteins, ∼500 000 000 protein segments) and LIMP (loops in membrane proteins, ∼180 000 loops). The stem atoms (two main chain atoms preceding and following the loop, respectively) of candidate loops must fit the stem atoms of the target structure with a maximum deviation of 0.75 Å for each atom pair. The best 500 candidates are chosen by a specific ‘goodness value’ that allows a quick estimation of the steric fit of loop candidates to a target protein, described in detail in our previous analysis (14). Finally, the loop candidates are ranked by a score that includes the sequence similarity between loop candidate and target sequence, as well as the root mean square deviation (RMSD) of the stem atoms. To assure that the 50 top listed loops cover a maximum of the plausible conformational space, candidates with identical sequences and similar backbone conformations (RMSD < 1.0 Å) are further excluded from the list. For the benchmarks described in the following, only the top-ranked loop was considered in each case.

RESULTS

Performance

Using the test dataset of the Sali lab (15), we have shown previously that the accuracy of the method underlying SuperLooper performs better than other methods in particular for longer loops (14). The performance of SuperLooper was now benchmarked applying a new test dataset that was recently published to benchmark four commercially available programs for loop sampling Prime (Schrödinger, LLC), Modeler (Accelrys Software, Inc.), ICM (Molsoft, LLC) and Sybyl (Tripos, Inc.) (7). The outcome of that study is that Prime, an ab initio method performs best especially with increasing loop lengths. To compare our results with this study, protein structures with the same PDB entry as in the test datasets were first of all excluded from LIP. In the next step, loop candidates coming from proteins with very similar sequences were also excluded from LIP. Similarity here means ‘different versions of the same protein or slightly mutated variants’. This criterion is assessed by a sliding window technique as described previously (14). As a result, top-ranked loops show a global RMSD (main chain atoms) to the original loops of <1.3 Å for loops up to six residues or <3.0 Å for loops shorter than 10 residues. Best results are obtained, when loops with nearly identical sequences or close homologs are available. This, however, is presently not always the case for longer loops. To compare the performance of SuperLooper with that of the above mentioned tools, the analysis was repeated for loops with 11- and 12-residues length using a sequence identity limit of 90%. As a result, the average performance of SuperLooper at loop lengths 11 and 12 (RMSD = 2.6 and 4.0, respectively) is comparable with that of Prime (RMSD = 3.7 and 3.5, respectively). At loop length 11 homologous templates with sequence identities ranging from 32% to 82% are detected by SuperLooper for 9 of 14 tested loops. The average global RMSD of the modeled to the native loops is 0.7. For the remaining five template loops (with no homologous template available) the RMSD is 5.9. At loop length 12 homologous templates with sequence identities ranging from 58% to 95% are found for 4 of 10 tested loops. The average global RMSD of the modeled to the native loops is 0.6. For the remaining six template loops, the RMSD = 6.3. Thus, SuperLooper clearly outperforms Prime at these critical loop lengths if a homologous template is available. If no homologue is found, the ab initio method Prime performs usually better. In conclusion, the performance of knowledge based methods such as SuperLooper clearly depends on the size and actuality of the data base in use. SuperLooper is thus regularly updated. More detailed data on actual benchmarks of SuperLooper are available from http://bioinformatics.charite.de/superlooper/. Better results can always be obtained when not only the top ranked loop is considered. Thus, the user is encouraged to visually inspect the loops to determine, which is most reasonable. SuperLooper was, therefore, implemented with a user-friendly interface to visualize and select the proper loop structure from a list of proposed conformations.

Server implementation

SuperLooper is implemented as an easy to use web application combining an interactive query of the loop database with a 3D visualization of the results. At the query site, the stem amino acids of the uploaded PDB file have to be provided together with the destined amino acid sequence. The result site provides all information necessary for the user to select the appropriate loop from a list of candidates ranked from the LIMP and LIP data bases (Figure 1). Loop candidates can be selected from both data bases provided. Due to the extensive size, the quality of loop predictions taken from the LIP data base generally ranges above that of predictions with the LIMP data base. Nevertheless, considering the specific amino acid composition of transmembrane helix caps and loops (22) candidates taken from the LIMP data base should always be checked first, when a membrane loop is to be modeled.

Figure 1.

Alternative conformations (red) for loop 2 of the human β2-adrenergic receptor (2rh1.pdb) can be selected from the list calculated by SuperLooper considering the predicted membrane planes (yellow).

Alternative conformations (red) for loop 2 of the human β2-adrenergic receptor (2rh1.pdb) can be selected from the list calculated by SuperLooper considering the predicted membrane planes (yellow). If no appropriate loop is found, the search may be expanded easily in N- or C-terminal direction up to a final loop length of 35 amino acids. To generally avoid unfavorable loop conformations and steric hindrance, the positions of proline and glycine exchanges in the selected loop are highlighted as well as distances <2.4 Å to the rest of the protein. The percentage sequence identity of a template loop is always noted to inform the user about the probability that the native loop conformation is actually matched. A membrane protein loop should be selected with respect to its relative orientation to the lipid bilayer indicated by the protein viewer. The expansions of the lipid bilayer are predicted applying the TMDET algorithm (21,23).

Technical notes

The web application uses PHP and AJAX. Membrane planes are calculated on a remote server (TMDET) connected via web service (21). The web site uses Jmol (http://jmol.sf.net) for visualization, and therefore needs a Java JRE, freely available from http://java.net. The web application uses the PDB-file format as the default input and output format, and is designed to be used with Internet Explorer 7 and Firefox 2.0–3.0. The web application is also compatible with IE 6, but tends to be unstable on some computers regarding some combinations of JRE and IE 6.

FUNDING

European Union (ProFIT); Deutsche Forschungsgemeinschaft (SFB449, SFB740, DFG GRK1360). Funding for open access charge: SFB449. Conflict of interest statement. None declared.

23 in total

1. The Protein Data Bank.

Authors: H M Berman; J Westbrook; Z Feng; G Gilliland; T N Bhat; H Weissig; I N Shindyalov; P E Bourne
Journal: Nucleic Acids Res Date: 2000-01-01 Impact factor: 16.971

2. ModLoop: automated modeling of loops in protein structures.

Authors: András Fiser; Andrej Sali
Journal: Bioinformatics Date: 2003-12-12 Impact factor: 6.937

3. A hierarchical approach to all-atom protein loop prediction.

Authors: Matthew P Jacobson; David L Pincus; Chaya S Rapp; Tyler J F Day; Barry Honig; David E Shaw; Richard A Friesner
Journal: Proteins Date: 2004-05-01

4. Loops In Proteins (LIP)--a comprehensive loop database for homology modelling.

Authors: E Michalsky; A Goede; R Preissner
Journal: Protein Eng Date: 2003-12

5. Structural features of transmembrane helices.

Authors: Peter Werner Hildebrand; Robert Preissner; Cornelius Frömmel
Journal: FEBS Lett Date: 2004-02-13 Impact factor: 4.124

6. TMDET: web server for detecting transmembrane regions of proteins by using their 3D coordinates.

Authors: Gábor E Tusnády; Zsuzsanna Dosztányi; István Simon
Journal: Bioinformatics Date: 2004-11-11 Impact factor: 6.937

7. The third extracellular loop of G-protein-coupled receptors: more than just a linker between two important transmembrane helices.

Authors: Z Lawson; M Wheatley
Journal: Biochem Soc Trans Date: 2004-12 Impact factor: 5.407

8. Prediction of the conformation and geometry of loops in globular proteins: testing ArchDB, a structural classification of loops.

Authors: Narcis Fernandez-Fuentes; Enrique Querol; Francesc X Aviles; Michael J E Sternberg; Baldomero Oliva
Journal: Proteins Date: 2005-09-01

9. Crystal structure of opsin in its G-protein-interacting conformation.

Authors: Patrick Scheerer; Jung Hee Park; Peter W Hildebrand; Yong Ju Kim; Norbert Krauss; Hui-Woog Choe; Klaus Peter Hofmann; Oliver P Ernst
Journal: Nature Date: 2008-09-25 Impact factor: 49.962

10. PDB_TM: selection and membrane localization of transmembrane proteins in the protein data bank.

Authors: Gábor E Tusnády; Zsuzsanna Dosztányi; István Simon
Journal: Nucleic Acids Res Date: 2005-01-01 Impact factor: 16.971

38 in total

1. Structural and kinetic modeling of an activating helix switch in the rhodopsin-transducin interface.

Authors: Patrick Scheerer; Martin Heck; Andrean Goede; Jung Hee Park; Hui-Woog Choe; Oliver P Ernst; Klaus Peter Hofmann; Peter W Hildebrand
Journal: Proc Natl Acad Sci U S A Date: 2009-06-17 Impact factor: 11.205

2. BCSearch: fast structural fragment mining over large collections of protein structures.

Authors: Frédéric Guyon; François Martz; Marek Vavrusa; Jérôme Bécot; Julien Rey; Pierre Tufféry
Journal: Nucleic Acids Res Date: 2015-05-14 Impact factor: 16.971

3. The antibodies against the computationally designed mimic of the glycoprotein hormone receptor transmembrane domain provide insights into receptor activation and suppress the constitutively activated receptor mutants.

Authors: Ritankar Majumdar; Reema Railkar; Rajan R Dighe
Journal: J Biol Chem Date: 2012-08-17 Impact factor: 5.157

4. LoopWeaver: loop modeling by the weighted scaling of verified proteins.

Authors: Daniel Holtby; Shuai Cheng Li; Ming Li
Journal: J Comput Biol Date: 2013-03 Impact factor: 1.479

5. Interdomain Flexibility within NADPH Oxidase Suggested by SANS Using LMNG Stealth Carrier.

Authors: Annelise Vermot; Isabelle Petit-Härtlein; Cécile Breyton; Aline Le Roy; Michel Thépaut; Corinne Vivès; Martine Moulin; Michael Härtlein; Sergei Grudinin; Susan M E Smith; Christine Ebel; Anne Martel; Franck Fieschi
Journal: Biophys J Date: 2020-07-03 Impact factor: 4.033

Review 6. Microscopic Characterization of Membrane Transporter Function by In Silico Modeling and Simulation.

Authors: J V Vermaas; N Trebesch; C G Mayne; S Thangapandian; M Shekhar; P Mahinthichaichan; J L Baylon; T Jiang; Y Wang; M P Muller; E Shinn; Z Zhao; P-C Wen; E Tajkhorshid
Journal: Methods Enzymol Date: 2016-07-11 Impact factor: 1.600

7. A network of phosphatidylinositol 4,5-bisphosphate binding sites regulates gating of the Ca²⁺-activated Cl^- channel ANO1 (TMEM16A).

Authors: Kuai Yu; Tao Jiang; YuanYuan Cui; Emad Tajkhorshid; H Criss Hartzell
Journal: Proc Natl Acad Sci U S A Date: 2019-09-12 Impact factor: 11.205

8. Predicting transmembrane helix packing arrangements using residue contacts and a force-directed algorithm.

Authors: Timothy Nugent; David T Jones
Journal: PLoS Comput Biol Date: 2010-03-19 Impact factor: 4.475

9. Modeling of loops in proteins: a multi-method approach.

Authors: Michal Jamroz; Andrzej Kolinski
Journal: BMC Struct Biol Date: 2010-02-11

10. Enzymatic, expression and structural divergences among carboxyl O-methyltransferases after gene duplication and speciation in Nicotiana.

Authors: Frank Hippauf; Elke Michalsky; Ruiqi Huang; Robert Preissner; Todd J Barkman; Birgit Piechulla
Journal: Plant Mol Biol Date: 2009-11-21 Impact factor: 4.076