Literature DB >> 27105847

SL2: an interactive webtool for modeling of missing segments in proteins.

Jochen Ismer¹, Alexander S Rose¹, Johanna K S Tiemann¹, Andrean Goede², Robert Preissner², Peter W Hildebrand³.

Abstract

SuperLooper2 (SL2) (http://proteinformatics.charite.de/sl2) is the updated version of our previous web-server SuperLooper, a fragment based tool for the prediction and interactive placement of loop structures into globular and helical membrane proteins. In comparison to our previous version, SL2 benefits from both a considerably enlarged database of fragments derived from high-resolution 3D protein structures of globular and helical membrane proteins, and the integration of a new protein viewer. The database, now with double the content, significantly improved the coverage of fragment conformations and prediction quality. The employment of the NGL viewer for visualization of the protein under investigation and interactive selection of appropriate loops makes SL2 independent of third-party plug-ins and additional installations.

Entities: Chemical Disease Gene Species

Mesh：

Substances：

Year: 2016 PMID： 27105847 PMCID： PMC4987885 DOI： 10.1093/nar/gkw297

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

Structural biology is an established but still emerging research field of life sciences, as reflected by the exponential rise of atomic models deposited in the Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB) (1). However, in more than one half of all entries deposited in the RSCB PDB segments are missing (2). These missing segments are often located in flexible and functionally important regions of proteins such as loops or turns, not resolved by X-ray crystallography or single particle cryo-electron microscopy. These regions have to be modeled to obtain a more complete structural model for further analysis of the structure, e.g. for molecular dynamics simulations (3). Loop regions are one of the most demanding regions in homology modeling workflows. A prominent example are G protein coupled receptors (GPCRs), which constitute the largest protein family in the human genome. The number of available templates for modeling of GPCRs has increased dramatically in the last decade facilitating the generation of homology models for structure-based drug design. The common topology of the transmembrane-spanning regions, even of distantly related GPCRs, allows homology modeling of these regions and docking of small rigid orthosteric ligands with close to experimental accuracy. However, predictions of long or flexible loops remain unsolved problems, as evaluated recently by the community-wide GPCR Dock assessment (4). As the sequence similarity within loop regions is generally much lower than within other parts of proteins, specialized methods are required for modeling. Loop modeling approaches can be divided into ab initio (5–8), fragment-based, (9–12) or a mixture of both methods (13,14). Ab initio based methods utilize molecular mechanics force fields to determine possible loop conformations. These methods are generally CPU-intensive but capable of predicting currently unknown loop conformations. Fragment-based methods on the other hand are less CPU-intensive and thus faster, but depend on known structures and precalculated fragment databases to find loop conformations. It remains unclear which method provides the better predictions. Some studies find that both methods perform on a similar level (9,12), while others describe advantages to either ab initio (15) or fragment-based (16) methods. As fragment-based methods generally provide results much faster, they are well suited for web-based tools such as SuperLooper (17), allowing instant visualization and control of the results. The quality of fragment-based loop predictions using depends on the completeness of the fragment database. Independent studies have shown that the conformational space for short loops up to 12–14 residues is covered by structural fragments derived from the RCSB PDB (18,19). Enlargement of fragment databases may thus particularly enhance prediction of longer loops. Depending on the method used, also the prediction of shorter loops might benefit from a larger pool of available templates, e.g. when the exact fit of the stem atoms of the template loop to the gap is an evaluation criterion. The database of globular and membrane proteins has more than doubled since our previous publication (17). In order to benefit from this enlargement of available structures we updated our fragment database. Fragment-based tools such as SuperLooper depend on databases too large to distribute as stand-alone programs (∼80 GB in the case of SL2). The rapid delivery of a large number of possible loop conformation makes web-based tools a perfect candidate. The database remains on a server and the user is able to choose a suitable loop from listed results using a web-based molecule viewer. Here, we use NGL (20) for protein and fragment visualization, which adopts capabilities of modern web browsers, such as WebGL for molecular graphics. NGL allows interactive display of even large molecular complexes and is unaffected by the retirement of third-party plug-ins such as Flash or Java-Applets. This viewer offers comprehensive molecular visualization through a graphical user interface so that life scientists can easily access and utilize available structural data without any further installations (20). Thus, SL2 benefits from the significantly enlarged database of fragments and new fast molecule viewer. Due to the improved coverage of the conformational loop space, the quality of prediction, measured by the backbone root mean square deviation (RMSD), has improved by 20% on average compared to our previous version (17). The new version of our fragment-based web-application for loop modeling SL2 thus has an improved performance in loop prediction as well as an up-to-date visualization.

UPDATE OF THE LIP AND LIMP DATABASE

The loop database (LIP) is composed of all possible fragments of 3–35 amino acids length extracted from the RSCB PDB entries in December 2015. Here, not only loops are considered but also fragments derived from secondary structure elements like helices and β-sheets. For each fragment, the amino acid sequence, PDB identifier, chain identifier, residue number of stem atoms and a geometrical fingerprint is stored. Geometrical fingerprint matching is used as a criterion to estimate the sterical fit of stem atoms of N- and C-termini of each database fragment to the C- and N-terminal stem atoms of a gap in a protein structure. The geometrical fingerprints of both the stem atoms of each database fragment and the stem atoms of the gap are composed of the distance between the N- and C-terminal stem atoms and three angles defining their relative orientation (Figure 1). Compared to our previous version, we slightly altered the geometric fingerprint. Previously, we used a combination of two distances and two angles for scoring, resulting in a higher weighting of the fit of the residue where the angle was measured. In SL2, we solved this problem employing distance and three angles.

Figure 1.

Schematic representation of the geometrical fingerprint: The geometrical fingerprint is characterized by the distance d between the N-terminal C- and the C-terminal N atom and the following three angles: α defined by the line between Cα(N), C(N) and d, β is spanned by the line between N(C), Cα(C) and d, γ is the angle between the two planes A (defined by Cα(N), C(N) and N(C)) and B (Cα(C), C(N) and N(C)). Since the first release of SuperLooper in 2008, the number of entries deposited in the RSCB PDB has more than doubled from 54 543 structures to 114 693 in 2015. A total of 901 609 231 fragments with a length of 3 to 35 residues was extracted from this enlarged pool of template structures (Figure 2A). Because more short than long overlapping fragments are extracted from a given template structure, the number of fragments decreases linearly with length. For loops with three amino acids, more than 30 million fragments are stored in the database, for 35 amino acids 24 million fragments are available. To benefit from the continuous growth of the RCSB PDB an update protocol was implemented that automatically adds novel fragments to the LIP or LIMP database every three months.

Figure 2.

Length dependency of the number of fragments stored in our previous (black) and present fragment (gray) database; (A) loops in proteins (LIP), and (B) loops in membrane proteins (LIMP).

Length dependency of the number of fragments stored in our previous (black) and present fragment (gray) database; (A) loops in proteins (LIP), and (B) loops in membrane proteins (LIMP). Due to (partial) embedding into the lipid bilayer, loops of membrane proteins have a more hydrophobic amino acid composition compared to loops of globular proteins (21). Tools developed for the prediction of loops connecting transmembrane helices were indeed found to enhance prediction of GPCR loops (22). In SL2, such loops can be selected from LIMP, which is a collection of fragments extracted from loops of all helical transmembrane proteins. Loops were defined as parts without regular fold, thus also containing kinks, bulges or re-entrant loops (23). To allow selection of membrane protein loops taking the lipid bilayer into account, the extension of the lipid bilayer is indicated by two parallel planes (as described below). The number of membrane protein structures deposited in the RCSB PDB rose from 805 (in 2008) to 2298 (in 2015) according to the Protein Data Bank of Transmembrane Proteins (24). As a result, the loops stored in LIMP doubled from 179 580 to 378 839. For LIMP is composed mainly of loop structures, the length distribution differs from LIP where the fragments also include helical fragments and fragments derived from β-sheets. In LIMP (Figure 2B), few loop templates are available for short loops of 3–5 amino acids in length. The number of loops stored in LIMP increases markedly to a maximum of 20 000 up to a length of 20 residues before it decreases again.

SEARCH PROCEDURE

To start the search the stem residues flanking the N- and C-terminus of a missing (or existing) loop in a protein model and the amino acid sequence have to be provided. As in our previous version, the search procedure is based on a stepwise approach which minimizes the calculation time. Fragments with appropriate sequence length, and with geometrical fingerprints of the fragment and the gap matching with an accuracy of at least 0.75 Å RMSD distance are selected. This RMSD value is subsequently used to determine the top 1000 loop candidates. These loop candidates are then rescored by the parameters ‘sequence similarity between missing segment and template loop’ and ‘fingerprint matching of the template loop to the gap in the model.’ Only one representative of fragments with identical primary structure and high tertiary structure similarity (with backbone RMSD < 0.5 Å) is kept in the results list to maximize the conformational space of fragments used for further calculations. The top 100 loop candidates are finally displayed in the results list. Suitable candidates can be selected from that list by visual inspection.

VISUALIZATION AND USER INTERFACE

For visual inspection of results, we employed the NGL viewer which works without installation of additional plug-ins (20). As a common graphical user interface for the NGL viewer (Figure 3) the search mask and the results list were implemented within JavaScript. A protein structure uploaded via the file selection dialog is instantly loaded to the NGL viewer. The stem residues of the gap in the protein model must either be typed into the according search field or can be selected by clicking them in the NGL viewer. The sequence of the missing segment must be typed or copied into the search mask. If the membrane protein-specific LIMP data base (Membrane DB) is not checked, the LIP data base will be searched. After the submission button is pressed, the search is started. Depending on the loop length, results are expected to appear after few seconds or up to half a minute in the results list.

Figure 3.

Screenshot of the SL2 results page (NGL viewer). Structure of the human cytomegalovirus GPCR US28 (PDB-ID: 4xt1) in a gray cartoon representation with top ranked loop (green) and calculated membrane planes. The list of loop candidates filling the gap 94 to 103 in the GPCR structure is displayed as table on the right hand just below the search mask. The top hit will automatically be loaded into the gap of the protein model depicted in the viewer window. Alternative loop conformations can be selected from the results table containing the 100 best loop candidates. For each candidate, the score ranging from 0 to 0.455, the RCSB PDB entry-code and sequence of the template protein, the number of clashes, and the sequence identity between target and template are listed. If no appropriate loop is found, the user can select ‘Decrease N-terminal stem’ or ‘Increase C-terminal stem’ to add a residue to the loop and shift the stem atoms of the gap, accordingly. As an additional visual control, for helical membrane proteins, the position of the lipid bilayer can be calculated ('Calculate membrane planes'), employing the web-service TMDET (25). There is an option to display the complete list of loop candidates at the same time as visualizing the conformation space of the loop. Loop candidates can be colored according to score, sequence identity or clashes by selecting the corresponding color scheme from the dropdown menu. The completed structure (initial model plus selected loop) can be downloaded by clicking the download button. Alternatively, the complete list of loops can be downloaded for further analysis.

TECHNICAL ASPECTS

Visualization is carried out by the NGL viewer (20). To use the full feature set of the NGL viewer an up-to-date web browser (tested on the recent versions of Firefox, Google Chrome, Safari, IE and Edge) is recommended. The specialized graphical user interface is written in JavaScript. For job handling a simple python job server based on the Flask framework (http://flask.pocoo.org/) is used.

PERFORMANCE, LIMITATIONS AND OUTLOOK

The updated version of our fragment based web-application tool for loop modeling, SL2, benefits from an enlarged fragment database and a new user interface including an updated protein viewer. As a result of the enlarged fragment database the prediction quality has been further improved. Using the same dataset (15) and validation procedure as in our previous publication (17), an average gain in prediction quality by 20% is observed for loops of 3–16 residues length (Figure 4). A drop of the backbone RMSD between experimentally determined and modeled loops (only the top hit was considered) starts to become evident for loops with eight residues length. This implies that the coverage of possible loop conformations has been further optimized starting with this length.

Figure 4.

Comparison of benchmarks of our previous (17) (black rhombus) and updated version SL2 (gray star) using a standard loop dataset (15).

Comparison of benchmarks of our previous (17) (black rhombus) and updated version SL2 (gray star) using a standard loop dataset (15). Despite the gain of prediction quality, the top hit results obtained by SL2 sometimes deviate from the experimentally determined structure even for short loops. There are several possible reasons for this. First, many loops are highly flexible or are even located in structurally disordered regions of proteins (26,27). The conformations suggested by SL2 may thus indicate alternative loop conformations not observed by protein X-ray structure crystallography (e.g. Figure S6 in (28)). Second, as scoring of the loops mainly depends on the stem residues, experimentally caused distortions of these stem atoms may prevent selection of a specific conformation (29). Prediction quality drops with loop length, mainly due to the increased conformational space. A promising strategy to enhance prediction quality of longer loops would be inclusion of additional experimental constraints such as mass spectrometry (30,31) or electron density maps from single particle cryo-electron microscopy (32).

32 in total

1. Have we seen all structures corresponding to short protein fragments in the Protein Data Bank? An update.

Authors: Peicheng Du; Michael Andrec; Ronald M Levy
Journal: Protein Eng Date: 2003-06

2. TMDET: web server for detecting transmembrane regions of proteins by using their 3D coordinates.

Authors: Gábor E Tusnády; Zsuzsanna Dosztányi; István Simon
Journal: Bioinformatics Date: 2004-11-11 Impact factor: 6.937

3. Modeling protein loops with knowledge-based prediction of sequence-structure alignment.

Authors: Hung-Pin Peng; An-Suei Yang
Journal: Bioinformatics Date: 2007-09-07 Impact factor: 6.937

4. Coils in the membrane core are conserved and functionally important.

Authors: Anni Kauko; Kristoffer Illergård; Arne Elofsson
Journal: J Mol Biol Date: 2008-05-01 Impact factor: 5.469

5. CODA: a combined algorithm for predicting the structurally variable regions of protein models.

Authors: C M Deane; T L Blundell
Journal: Protein Sci Date: 2001-03 Impact factor: 6.725

Review 6. Integrative structural modeling with small angle X-ray scattering profiles.

Authors: Dina Schneidman-Duhovny; Seung Joong Kim; Andrej Sali
Journal: BMC Struct Biol Date: 2012-07-16

7. De novo protein structure determination from near-atomic-resolution cryo-EM maps.

Authors: Ray Yu-Ruei Wang; Mikhail Kudryashev; Xueming Li; Edward H Egelman; Marek Basler; Yifan Cheng; David Baker; Frank DiMaio
Journal: Nat Methods Date: 2015-02-23 Impact factor: 28.547

8. Fast protein loop sampling and structure prediction using distance-guided sequential chain-growth Monte Carlo method.

Authors: Ke Tang; Jinfeng Zhang; Jie Liang
Journal: PLoS Comput Biol Date: 2014-04-24 Impact factor: 4.475

9. The beginning of a beautiful friendship: cross-linking/mass spectrometry and modelling of proteins and multi-protein complexes.

Authors: Juri Rappsilber
Journal: J Struct Biol Date: 2010-10-26 Impact factor: 2.867

10. MP:PD--a data base of internal packing densities, internal packing defects and internal waters of helical membrane proteins.

Authors: Alexander Rose; Dominic Theune; Andrean Goede; Peter W Hildebrand
Journal: Nucleic Acids Res Date: 2013-11-04 Impact factor: 16.971

13 in total

1. A short HLA-DRA isoform binds the HLA-DR2 heterodimer on the outer domain of the peptide-binding site.

Authors: Hengameh Shams; Jill A Hollenbach; Atsuko Matsunaga; Mohammad R K Mofrad; Jorge R Oksenberg; Alessandro Didonna
Journal: Arch Biochem Biophys Date: 2022-02-24 Impact factor: 4.013

Review 2. Molecular dynamics: a powerful tool for studying the medicinal chemistry of ion channel modulators.

Authors: Daniel Şterbuleac
Journal: RSC Med Chem Date: 2021-07-22

3. GPCR-SSFE 2.0-a fragment-based molecular modeling web tool for Class A G-protein coupled receptors.

Authors: Catherine L Worth; Franziska Kreuchwig; Johanna K S Tiemann; Annika Kreuchwig; Michele Ritschel; Gunnar Kleinau; Peter W Hildebrand; Gerd Krause
Journal: Nucleic Acids Res Date: 2017-07-03 Impact factor: 16.971

4. A fragment based method for modeling of protein segments into cryo-EM density maps.

Authors: Jochen Ismer; Alexander S Rose; Johanna K S Tiemann; Peter W Hildebrand
Journal: BMC Bioinformatics Date: 2017-11-13 Impact factor: 3.169

Review 5. Structural-Functional Features of the Thyrotropin Receptor: A Class A G-Protein-Coupled Receptor at Work.

Authors: Gunnar Kleinau; Catherine L Worth; Annika Kreuchwig; Heike Biebermann; Patrick Marcinkowski; Patrick Scheerer; Gerd Krause
Journal: Front Endocrinol (Lausanne) Date: 2017-04-24 Impact factor: 5.555

6. FragFit: a web-application for interactive modeling of protein segments into cryo-EM density maps.

Authors: Johanna Ks Tiemann; Alexander S Rose; Jochen Ismer; Mitra D Darvish; Tarek Hilal; Christian Mt Spahn; Peter W Hildebrand
Journal: Nucleic Acids Res Date: 2018-07-02 Impact factor: 16.971

7. DaReUS-Loop: a web server to model multiple loops in homology models.

Authors: Yasaman Karami; Julien Rey; Guillaume Postic; Samuel Murail; Pierre Tufféry; Sjoerd J de Vries
Journal: Nucleic Acids Res Date: 2019-07-02 Impact factor: 16.971

8. A splice acceptor variant in HLA-DRA affects the conformation and cellular localization of the class II DR alpha-chain.

Authors: Alessandro Didonna; Vincent Damotte; Hengameh Shams; Atsuko Matsunaga; Stacy J Caillier; Ravi Dandekar; Maneesh K Misra; Mohammad R K Mofrad; Jorge R Oksenberg; Jill A Hollenbach
Journal: Immunology Date: 2020-10-19 Impact factor: 7.397

9. DaReUS-Loop: accurate loop modeling using fragments from remote or unrelated proteins.

Authors: Yasaman Karami; Frédéric Guyon; Sjoerd De Vries; Pierre Tufféry
Journal: Sci Rep Date: 2018-09-12 Impact factor: 4.379

10. Modeling cancer genomic data in yeast reveals selection against ATM function during tumorigenesis.

Authors: Marcel Hohl; Aditya Mojumdar; Sarem Hailemariam; Vitaly Kuryavyi; Fiorella Ghisays; Kyle Sorenson; Matthew Chang; Barry S Taylor; Dinshaw J Patel; Peter M Burgers; Jennifer A Cobb; John H J Petrini
Journal: PLoS Genet Date: 2020-03-18 Impact factor: 5.917