Literature DB >> 28472367

PIGSPro: prediction of immunoGlobulin structures v2.

Rosalba Lepore^1,2, Pier P Olimpieri¹, Mario A Messih¹, Anna Tramontano^1,2.

Abstract

PIGSpro is a significant upgrade of the popular PIGS server for the prediction of the structure of immunoglobulins. The software has been completely rewritten in python following a similar pipeline as in the original method, but including, at various steps, relevant modifications found to improve its prediction accuracy, as demonstrated here. The steps of the pipeline include the selection of the appropriate framework for predicting the conserved regions of the molecule by homology; the target template alignment for this portion of the molecule; the selection of the main chain conformation of the hypervariable loops according to the canonical structure model, the prediction of the third loop of the heavy chain (H3) for which complete canonical structures are not available and the packing of the light and heavy chain if derived from different templates. Each of these steps has been improved including updated methods developed along the years. Last but not least, the user interface has been completely redesigned and an automatic monthly update of the underlying database has been implemented. The method is available as a web server at http://biocomputing.it/pigspro.

Entities: Chemical Disease Species

Mesh：

Substances：

Year: 2017 PMID： 28472367 PMCID： PMC5570210 DOI： 10.1093/nar/gkx334

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

Immunoglobulins are multimeric glycoproteins secreted by B-cells and formed by two identical light and heavy chains made by structurally similar domains, two for the light and four or more for the heavy chain, respectively. These domains are very well conserved and form what is called the framework of the protein, while a few regions at the tip of the protein are very variable and constitute the antigen-binding site (1) (Figure 1A). These regions structurally correspond to loops, named L1 to L3 and H1 to H3 according to the order in which they appear in the light (L) and heavy (H) chain.

Figure 1.

(A) Variable region of an antibody molecule. Heavy and light chain framework regions are coloured in grey and white, respectively. Loops composing the antigen-binding site are coloured in pale cyan for the heavy chain and light violet for the light chain. (B) Chothia numbering scheme for VH, VK and VL. The numbers above the sequences represent the numbering of specific residues. The remaining residues are numbered consecutively. Letters correspond to insertions. Framework regions are depicted in grey for VH and in white for VK and VL. Complementarity determining regions are coloured in pale cyan for VH and in light violet for VK and VL. Arrows indicate Chothia and Lesk definition of hypervariable loops. Conserved residues are reported in dark red. The main chain conformation of the light chain hypervariable loops and of the first two of the heavy chain have been shown to follow the canonical structure model according to which a few key residues, within or without the loops, determine their structure (2–14). The third loop of the heavy chain (H3) has a different behaviour (15–19). The region of the loop closer to the framework (named the ‘torso’(16)) follows a canonical structure model and its first four and last six residues form a bulged or non-bulged beta sheet conformation according to the identity of the residues at position 94 and 101 of the heavy chain (Figure 1B) (the numbering scheme of Chothia (8) is used throughout the paper), the remaining part has revealed to be very difficult to predict given its very high variability in both length and shape. Given the above, the classical strategy for predicting the structure of an immunoglobulin from its amino acid sequence as implemented by us and others (20,21), follows the following steps: The main chains of the light and heavy chain frameworks are modelled by homology using, as template, the corresponding domain from a similar immunoglobulin; The L1-L3 and H1-H2 loops are modelled by inheriting their conformation from an immunoglobulin with the same canonical structure (i.e. bearing in the key positions the same amino acids as the target protein); The H3 loop is modelled using loops with the highest sequence similarity to the target available or by other methods (22); The two chains are packed together; The side chains of the molecule are predicted using classical methods for side chain conformation prediction such as SCWRL4 (23). The above protocol has been implemented in the original PIGS server, which has been positively evaluated in two blind assessments of the prediction accuracy of antibody modelling (24,25). The server also included the option of selecting the two frameworks from the same immunoglobulin even though a template with a higher similarity was available for one of the two. This choice is beneficial when the difference in similarity between the two best templates for the heavy and light chain is not too high (26): selecting a less than optimal template for one chain, but coming from the same immunoglobulin used as template for the other chain, was found to minimize the error introduced by the packing of the two chains. Different protocols for antibody modelling have also been developed, using strategies different from the one here described (20,27–31). During the past years, we continued to analyze the immunoglobulin structures taking advantage of the increased number of experimentally solved structure and have now improved the pipeline including the results of our new findings. More specifically, we integrated new strategies for: (i) selecting and aligning the framework and the loop templates based on immunoglobulin specific profiles (26); (ii) modelling the lambda light chains using an increased repertoire of canonical structures (12); (iii) packing the heavy and the light chain domains, which have been found to cluster around two dominant orientations (32); (iv) predicting the structure of the hypervariable loop H3, a rather elusive and complex problem so far, with significantly improved accuracy than that achieved by PIGS and by other methods (22). Last but importantly, our database of templates is now composed of 1691 immunoglobulins of known structures, to be compared with 312 of the original one. We tested the performance of the new pipeline and compared it with the results obtained with PIGS. To this purpose we used the same strategy previously adopted in (26) and summarized below, as well as the same benchmark dataset. In both the older and the new version of the server, an option is provided to ‘blacklist’ an immunoglobulin of the database of known immunoglobulin structures, i.e. not to use any information about the specific blacklisted structure. This option has been used for testing the old version of the server and is used here for testing the new version. In practice, we use a leave one out procedure in which, one by one, the immunoglobulins of known structure are blacklisted and predicted without using their coordinates or the coordinates of any other immunoglobulin sharing a sequence identity equal or higher than 98% with the target (26). The produced models are then compared with the corresponding experimental structures. In summary, we were able to improve both the prediction of the overall antibody structure and of the antigen-binding site.

MATERIALS AND METHODS

There are two modes of using the server, single sequence and multiple sequence. In the first case, the user needs to input the light and heavy chain sequence of the target immunoglobulin. It can select to blacklist one of the known immunoglobulin structures, an option useful for testing the methods. A project title and an email address can be provided, but are not required. Upon selecting the Submit button, the user is presented with a new page where she/he can select a number of options (Figure 2), namely:

Figure 2.

Template selection page. Different options are provided for modelling the framework region, the loops and the side chains. Two lists of templates are displayed, for the heavy and light chain frameworks, in two separate tables. The best available templates are highlighted and automatically selected according to the ‘Framework modelling method’ and the ‘Number of shown results’. The tables report for each template, the PDB ID, the canonical structures of the loops and the target-template sequence identity. A button to visualize the target-template alignment is also provided.

Framework options

The four available criteria are: Same Antibody (default) Selects the known structure that can provide a template for both the heavy and light chain, even if a different template with a higher sequence identity exists for one of the chains Same Canonical Structures Selects the template having loops with the same canonical structure of the target even if a different template with a higher sequence identity exists for one or both chains Same Antibody and Canonical Structures Selects an antibody structure that can be used as a template for both VL and VH and where the canonical structures of the loops are the same as those of the target even if a different template with a slightly better sequence identity exists for one or both chains Best H and L chains Selects the two chains with highest sequence identity with the corresponding chains of the target and, if needed, pack the two chains together and take the loops from a different structure Number of shown results (default = 10) Number of results to be displayed. Templates (both in single sequence and multiple sequence modes) are chosen among these results The page also shows the list of available templates together with the canonical structure of their loops (the definition of which can be seen by clicking an information icon) and the sequence identity between the target and each of the templates. A button allows the corresponding alignment to be displayed interactively. The user can also select the Loop modelling method. The four options are: Keep loops with similar Canonical Structures from template (default) If one or more of the target and template loops have the same canonical structure, keep the main chain structure of the template loops Keep loops with similar Canonical Structures, H3LooPred for H3 If one or more of the target and template loops have the same canonical structure, keep the main chain structure for all of them excluding H3. Build the latter with the H3LooPred method (22) Select Canonical Structures from most similar loops Take the main chain of each loop from the antibody with the same canonical structure and the highest sequence similarity Select Canonical Structures from most similar loops, H3LooPred for H3 Take the main chain of each loop from the antibody with the same canonical structure and the highest sequence similarity. In any case build H3 with the H3LooPred method (22) Side chain modelling method. Criteria for the side chains modelling can be chosen among: Transfer Conserved + SCWRL (default). The conformation of the side chains of residues conserved between the target and the template is maintained. Non-conserved side chains are modelled using SCWRL4 Transfer Conserved The conformation of the side chains of residues conserved between the target and the template is maintained. Only backbone atoms are included in the final model for non-conserved residues All SCWRL All side chains are modelled using SCWRL4 Backbone only This will generate a ‘backbone-only’ model In the multiple sequence mode, the user can input the sequences as a multifasta file. The file should contain both heavy and light chain sequences for each antibody. The FASTA header line should contain a name and/or a unique identifier for each antibody. Chains from the same antibody should have the same name/identifier. In this case the user selects the options at the input stage and they will be used for all the target immunoglobulins.

Output

The output page contains a summary of the user choices, the alignment used for building the model and the structure of the model visualized in a JSmol window (Figure 3). The final model can be downloaded as a PDB file. The REMARK records of the PDB file contain a summary of the options used (template and canonical structure for each loop).

Figure 3.

The output page. The output includes two main tables with information about the templates used to build the three-dimensional model of the target antibody. The final model can be either visualized in the jsmol window (http://www.jmol.org/) or downloaded using the ‘Download PDB’ button. The final target-template alignments for both the heavy and light chains are also shown.

RESULTS AND DISCUSSION

What is new

Prediction of immunoglobulin lambda chains

In mammals there are two types of immunoglobulin light chain, called lambda (λ) and kappa (κ). The level of expression of the two chains is different, in mouse the ratio is about 20:1 in favour of the kappa chains (33). This imbalance in the most common model animal has led to the presence of a much higher proportion of kappa chains among the immunoglobulins of known structure and, consequently, less attention has been devoted to the analysis of the conformation of the lambda type of chains. However, in human the ratio is much more balanced (34). It was found that there are recurring conformations in lambda chains, as well, and that they are not the same as those for the kappa chains. Work in our group identified several lambda-restricted canonical structures, in detail eight for L1, two for L2 and five for L3, together with the key residues determining each of them (12). These definitions are now included in PIGSPro that is able to predict the conformation of the lambda light chains with satisfactory accuracy (Table 1). It should be mentioned that the number of available lambda structures that can be used for a comparison between the old and the new server is very limited. We could only compute the data for 15 structures and, even though these are predicted with an average Cα RMSD of 0.83 in PIGSPro compared to 0.89 for PIGS, the difference is not statistically significant.

Table 1.

Cα RMSD values of the models produced by the old server (PIGS) and its updated version (PIGSPro)

	All residues	Loop residues: local	Loop residues: global	H3 residues: local	H3 residues: global	Framework residues	Lambda light chains
PIGS	1.36 ± 0.64	2.21 ± 1.48	2.26 ± 1.52	3.59 ± 2.93	3.67 ± 2.92	0.78 ± 0.28	0.89 ± 0.27
PIGSPro	1.16 ± 0.47	1.75 ± 0.95	1.79 ± 1.03	2.41 ± 2.2	2.45 ± 2.18	0.75 ± 0.24	0.83 ± 0.64
Number of models	252	252	252	252	252	252	15

The RMSD for the loop residues and H3 are computed both after superimposing their stems, i.e. the two residues before and after the loop (local) and after superposition the framework (global). Underlined values indicate a statistically significant difference (95% confidence level) with respect to the PIGS method based on an unpaired t-test.

The framework selection

In the old version of PIGS, the alignment between the target immunoglobulin sequence and that of the template(s) was based on sequence specific rules, mainly related to the position of very conserved residues (two cysteines and one tryptophan in each chain) and on the observation that insertions and deletions among immunoglobulins occur, with rare exceptions, in the loop regions (Figure 1B). In this new version, we use a different approach that is also able to take into account unusual cases of insertions and deletions in positions other than the hypervariable loops. In particular, we built ad-hoc Hidden Markov Models (HMM) for the light (kappa and lambda) and heavy chains and use them to select the template and the alignment. As shown in previous discrimination tests, HMMs can be successfully used to distinguish members of a protein family from non-members with a high degree of accuracy (35). In our specific case, stringent criteria are used in order to ensure (i) high specificity (i.e. to distinguish immunoglobulin from other Ig-like sequences) and (ii) obtain more accurate target-template alignments, i.e. more reliable models.

The packing of the light and heavy chain framework

In (32), our group demonstrated that the immunoglobulins of known structure could be clustered according to the relative orientation of their light and heavy chains and discovered that the large majority of them can be assigned to one of two clusters. A set of residues were found to be able to discriminate between the two orientations with a classification error lower than 10%. In particular, the identity of the residue in position L44, located at the interface between the chains, permits to discriminate between the two packing modes. The specific packing of the two chains differs according to whether the residues in position L44 is or is not a Proline. In PIGSPro, we inherit the relative orientation of the two chains by using as templates the VL–VH complex with the highest sequence similarity and superimposing the conserved residues at the interface. However, at variance with the old method, the orientation of the two chains is inherited only from complexes where the light chain of the immunoglobulin has the same residue in position L44 as the target. This, according to our previous findings (32), is expected to lead to a better prediction of the relative orientation of the chains in the model.

H3

As mentioned above, only a partial canonical structure model existed for H3. Accordingly, so far the prediction accuracy for H3 loops has been not equally satisfactory as for the other loops, an important drawback because the H3 loop is central in the binding site and therefore essential in determining the antibody–antigen interactions. We approached the problem of obtaining a better prediction for the main chain of this loop by training a Random Forest machine learning algorithm to select the closest loop among a dataset of H3 loops present in immunoglobulins of known structure. The selected putative templates are subsequently ranked according to their intramolecular interactions by comparing the predicted interactions of the modelled H3 residues with those observed in immunoglobulin of known structure (22). This strategy has proven to be sufficiently robust in identifying reliable structural templates and also provided the first evidence that the H3 environment information can be used to successfully rank large sets of conformations of the same loop (22). In summary, in terms of prediction accuracy we achieve significant improvements compared to PIGS (Table 1) and to other methods (22).

Performance improvement

We compared the performance of the new and old server on a data set of 252 structures using the leave one out procedure described above and adopted in (26). Models were only considered when the sequence identity with the corresponding template was lower than 98%. Table 1 shows the results obtained on this dataset compared with the corresponding results for the PIGS server. As it can be seen, there is an improvement in both the prediction of the overall structure and of the antigen-binding site.

The web interface

The PIGSPro web interface has been redesigned to improve user friendliness. Responsive layouts are implemented using the Bootstrap front-end web framework, JavaScript and JQuery.

The database

PIGSPro heavily relies on the underlying database of known immunoglobulin structures. This has increased from 312 to 1691 structures and, most importantly, is now automatically updated every month. The 1691 antibodies of the database include only structures obtained from x-ray experiments with a resolution better than 3.0 Å and without any missing residue or atom.

35 in total

1. Structural repertoire of the human VH segments.

Authors: C Chothia; A M Lesk; E Gherardi; I M Tomlinson; G Walter; J D Marks; M B Llewelyn; G Winter
Journal: J Mol Biol Date: 1992-10-05 Impact factor: 5.469

2. Light-chain ratios of immunoglobulins G, A, and M determined by enzyme immunoassay.

Authors: S H Chui; C W Lam; K N Lai
Journal: Clin Chem Date: 1990-03 Impact factor: 8.327

3. PIGS: automatic prediction of antibody structures.

Authors: Paolo Marcatili; Alessandra Rosi; Anna Tramontano
Journal: Bioinformatics Date: 2008-07-19 Impact factor: 6.937

4. SCWRL and MolIDE: computer programs for side-chain conformation prediction and homology modeling.

Authors: Qiang Wang; Adrian A Canutescu; Roland L Dunbrack
Journal: Nat Protoc Date: 2008 Impact factor: 13.491

5. Conformations of the third hypervariable region in the VH domain of immunoglobulins.

Authors: V Morea; A Tramontano; M Rustici; C Chothia; A M Lesk
Journal: J Mol Biol Date: 1998-01-16 Impact factor: 5.469

6. Structural determinants in the sequences of immunoglobulin variable domain.

Authors: C Chothia; I Gelfand; A Kister
Journal: J Mol Biol Date: 1998-05-01 Impact factor: 5.469

7. Second antibody modeling assessment (AMA-II).

Authors: Juan C Almagro; Alexey Teplyakov; Jinquan Luo; Raymond W Sweet; Sreekumar Kodangattil; Francisco Hernandez-Guzman; Gary L Gilliland
Journal: Proteins Date: 2014-04-26

8. Conformational sampling of CDR-H3 in antibodies by multicanonical molecular dynamics simulation.

Authors: H Shirai; N Nakajima; J Higo; A Kidera; H Nakamura
Journal: J Mol Biol Date: 1998-05-01 Impact factor: 5.469

9. Hidden Markov models in computational biology. Applications to protein modeling.

Authors: A Krogh; M Brown; I S Mian; K Sjölander; D Haussler
Journal: J Mol Biol Date: 1994-02-04 Impact factor: 5.469

10. RosettaAntibody: antibody variable region homology modeling server.

Authors: Aroop Sircar; Eric T Kim; Jeffrey J Gray
Journal: Nucleic Acids Res Date: 2009-05-20 Impact factor: 16.971

21 in total

1. Systematic Structure-Based Virtual Screening Approach to Antibody Selection and Design of a Humanized Antibody against Multiple Addictive Opioids without Affecting Treatment Agents Naloxone and Naltrexone.

Authors: Chun-Hui Zhang; Kyungbo Kim; Zhenyu Jin; Fang Zheng; Chang-Guo Zhan
Journal: ACS Chem Neurosci Date: 2020-12-23 Impact factor: 4.418

2. Structural basis for antibody recognition of the NANP repeats in Plasmodium falciparum circumsporozoite protein.

Authors: David Oyen; Jonathan L Torres; Ulrike Wille-Reece; Christian F Ockenhouse; Daniel Emerling; Jacob Glanville; Wayne Volkmuth; Yevel Flores-Garcia; Fidel Zavala; Andrew B Ward; C Richter King; Ian A Wilson
Journal: Proc Natl Acad Sci U S A Date: 2017-11-14 Impact factor: 11.205

Review 3. How B-Cell Receptor Repertoire Sequencing Can Be Enriched with Structural Antibody Data.

Authors: Aleksandr Kovaltsuk; Konrad Krawczyk; Jacob D Galson; Dominic F Kelly; Charlotte M Deane; Johannes Trück
Journal: Front Immunol Date: 2017-12-08 Impact factor: 7.561

4. SWISS-MODEL: homology modelling of protein structures and complexes.

Authors: Andrew Waterhouse; Martino Bertoni; Stefan Bienert; Gabriel Studer; Gerardo Tauriello; Rafal Gumienny; Florian T Heer; Tjaart A P de Beer; Christine Rempfer; Lorenza Bordoli; Rosalba Lepore; Torsten Schwede
Journal: Nucleic Acids Res Date: 2018-07-02 Impact factor: 16.971

5. THETA system allows one-step isolation of tagged proteins through temperature-dependent protein-peptide interaction.

Authors: Kota Miura; Yusuke Tsuji; Hiromasa Mitsui; Takuya Oshima; Yosei Noshi; Yudai Arisawa; Keiko Okano; Toshiyuki Okano
Journal: Commun Biol Date: 2019-06-14

6. DaReUS-Loop: a web server to model multiple loops in homology models.

Authors: Yasaman Karami; Julien Rey; Guillaume Postic; Samuel Murail; Pierre Tufféry; Sjoerd J de Vries
Journal: Nucleic Acids Res Date: 2019-07-02 Impact factor: 16.971

7. Computational Investigation of Gantenerumab and Crenezumab Recognition of Aβ Fibrils in Alzheimer's Disease Brain Tissue.

Authors: Yujie Chen; Guanghong Wei; Jun Zhao; Ruth Nussinov; Buyong Ma
Journal: ACS Chem Neurosci Date: 2020-10-09 Impact factor: 4.418

8. A public antibody lineage that potently inhibits malaria infection through dual binding to the circumsporozoite protein.

Authors: Joshua Tan; Brandon K Sack; David Oyen; Isabelle Zenklusen; Luca Piccoli; Sonia Barbieri; Mathilde Foglierini; Chiara Silacci Fregni; Jessica Marcandalli; Said Jongo; Salim Abdulla; Laurent Perez; Giampietro Corradin; Luca Varani; Federica Sallusto; Betty Kim Lee Sim; Stephen L Hoffman; Stefan H I Kappe; Claudia Daubenberger; Ian A Wilson; Antonio Lanzavecchia
Journal: Nat Med Date: 2018-03-19 Impact factor: 53.440

Review 9. Methods for sequence and structural analysis of B and T cell receptor repertoires.

Authors: Shunsuke Teraguchi; Dianita S Saputri; Mara Anais Llamas-Covarrubias; Ana Davila; Diego Diez; Sedat Aybars Nazlica; John Rozewicki; Hendra S Ismanto; Jan Wilamowski; Jiaqi Xie; Zichang Xu; Martin de Jesus Loza-Lopez; Floris J van Eerden; Songling Li; Daron M Standley
Journal: Comput Struct Biotechnol J Date: 2020-07-17 Impact factor: 7.271

10. Recurring and Adaptable Binding Motifs in Broadly Neutralizing Antibodies to Influenza Virus Are Encoded on the D3-9 Segment of the Ig Gene.

Authors: Nicholas C Wu; Seiya Yamayoshi; Mutsumi Ito; Ryuta Uraki; Yoshihiro Kawaoka; Ian A Wilson
Journal: Cell Host Microbe Date: 2018-10-10 Impact factor: 21.023