Literature DB >> 16381978

Epitome: database of structure-inferred antigenic epitopes.

Avner Schlessinger¹, Yanay Ofran, Guy Yachdav, Burkhard Rost.

Abstract

Immunoglobulin molecules specifically recognize particular areas on the surface of proteins. These areas are commonly dubbed B-cell epitopes. The identification of epitopes in proteins is important both for the design of experiments and vaccines. Additionally, the interactions between epitopes and antibodies have often served as a model for protein-protein interactions. One of the main obstacles in creating a database of antigen-antibody interactions is the difficulty in distinguishing between antigenic and non-antigenic interactions. Antigenic interactions involve specific recognition sites on the antibody's surface, while non-antigenic interactions are between a protein and any other site on the antibody. To solve this problem, we performed a comparative analysis of all protein-antibody complexes for which structures have been experimentally determined. Additionally, we developed a semi-automated tool that identified the antigenic interactions within the known antigen-antibody complex structures. We compiled those interactions into Epitome, a database of structure-inferred antigenic residues in proteins. Epitome consists of all known antigen/antibody complex structures, a detailed description of the residues that are involved in the interactions, and their sequence/structure environments. Interactions can be visualized using an interface to Jmol. The database is available at http://www.rostlab.org/services/epitome/.

Entities: Chemical Disease Species

Mesh：

Substances：

Year: 2006 PMID： 16381978 PMCID： PMC1347416 DOI： 10.1093/nar/gkj053

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

BACKGROUND

Protein–antigen structures

Antigen–antibody complexes have long been used as a model for understanding the general phenomenon of molecular recognition (1–5). The number of experimental high-resolution 3D structures of antibody–antigen complexes in the PDB (6) has significantly increased over the last years. Several groups have used these data to analyze and characterize antigenic interactions, i.e. interactions between the protein (the antigen) and the Complementarity Determining Regions (CDRs) of the antibody (7,8). An important first step in studying antigenic interactions is the characterization of CDRs. MacCallum et al. (8) observed that the hypervariable loops of CDRs adopt only a limited number of backbone conformations that are determined by a few key residues. Two recent studies have suggested that the amino acid composition and the length of CDRs determine the type of antigen that can be bound (9,10). Several studies have attempted to differentiate the residues on the antigen surface that are involved in the antigenic interaction from all others (5,7,11). The results of these studies were rather inconsistent. Differences in the data sets chosen (some of which were very small) and in the methodologies may explain some of those inconsistencies. Most importantly, however, the definitions of the CDRs often differed greatly, i.e. if two studies investigate the same PDB complex and use the same methodology, they might disagree on which of the interactions are antigenic (7). An important ramification of this problem was unveiled by Blythe and Flower (12), who showed that most existing B-cell epitope prediction methods do not work adequately. One explanation for this observation could be that most methods rely on inaccurate identifications of epitopes.

Definition of the CDRs

Antibodies are composed of a skeleton of beta-sheets. Most of the amazing variety of antibodies is realized by differences in six hypervariable loops of the CDRs. Therefore, the CDRs have previously been defined through these six loops. The first definition of CDRs was as regions in the Kabat sequence variability plot (13,14). The residues in these regions are identified through an alignment between the query sequence and a consensus motif for antibodies. Although widely used, the Kabat CDR-definitions can be problematic because CDRs that are in structural loops often have very unusual sequences that are not captured by regular sequence motifs (15). In fact, any method based only on sequence information is prone to misaligning and therefore mis-assigning loopy CDRs. Chothia and co-workers (16) therefore based their CDR identification on structural information. Initially, hypervariable loops were defined according to a few structures. Later, the numbering of the residues that was used to locate the CDRs was changed to account for structures that became available subsequently (17). Studies also differ in their definition of secondary structures, thereby increasing the inconsistency in defining hypervariable loops. Additional disadvantages of both the Kabat and Chothia et al. method are described elsewhere (). Here, we address these problems through a comprehensive study of all known antigen–antibody complexes in the PDB. Analyzing the structures, we identified the consensus residues on the antibodies and thereby identified the CDRs on all known protein–antibody complexes (details below). This initial set of CDRs facilitated the automatic generation of a database with all known antigenic residues in the PDB; we also included the sequence environment and a detailed description of the CDR with which they interact. Several databases of antibody–antigen complex structures are available (15,18,19). Some of these databases focus on the structural aspects of the interaction (19,20). There are also databases that compile B-cell epitopes without their corresponding antibodies (12,21). However, none of these databases explicitly locates the CDRs or identifies the antigenic residues semi-automatically. In this sense, our resource is more comprehensive and easily adjustable to growing data, as more 3D structures of antigen–antibody complexes become available. Thus, the databases mentioned above, particularly the ones that are not structure based, are complementary to Epitome.

DATABASE

Extraction of 3D structures and identification of CDRs

In order to identify all structures in the PDB that contain at least one antibody–antigen complex, we searched with BLAST (22) for a consensus sequence of an antibody against the PDB. The rationale for using BLAST rather than PSI-BLAST was to avoid capturing molecules such as T-cell receptors which, despite their similarity to antibodies, participate in cell-mediated immune response, and therefore represent a different type of antigenic interaction. We then added PDB structures that contain an immunoglobulin fold from the Structural Classification of Proteins database (SCOP) (23) and PDB entries that are identified as antibody–antigen complexes through keywords (e.g. ‘antibody’ and ‘antigen’). We discarded all complexes with T-cell receptors or MHC molecules, since these are formed during cell-mediated immune response. We labeled residues as interacting if any of their respective atoms were within a sphere of ≤6Å (24). This resulted in our final list of interactions between antibodies and antigens. Thus, we define antibody–antigen interaction as spatial proximity between a residue within the CDRs and a residue on the surface of the antigenic protein. We located the CDRs in the known protein–antibody complexes through the following knowledge-based approach. We began by creating multiple structure alignments of antibody structures using SKA (25,26). Since the light and heavy chains have different CDRs, two different multiple structure alignments were performed corresponding to each type of antibody chain. Additionally, due to the fact that our database included several redundant sequences, we ran the structural alignment program on a sequence-unique subset of all protein–antibody complexes. As antibody sequences are highly similar to each other, the criteria for the redundancy of the complex set was determined by the antigen sequences; sequence redundancy was reduced at HSSP-values of 0 (corresponding to <33% pairwise sequence identity for long alignments) (27–30). Then, we identified structurally aligned positions that interact with a protein in more than 10% of the complexes of the alignment. We defined the borders of the CDRs through those highly populated positions. Given the CDRs in the aligned antibodies, we transferred their location to the antibody chains of the corresponding sequence–structure family that they represent by structural pairwise alignments using Combinatorial Extension (CE) (31) (Figure 1). Finally, we defined all the residues on the protein surface that are in contact with the residues on the antibody CDRs as antigenic residues.

Figure 1

Antigenic residues according to Epitome. Complex structure of quail lysozyme (in blue) and the light chain of an antibody (in green), as taken from PDB ID 1bql (33). The residues that are defined to be in CDR 1 of the light chain according to Kabat definition (13) are colored in black. Residues in red are all the residues that are involved in the interaction according to Epitome. Note that not all of the residues on the antibody surface that are located on ‘Kabat’ CDR are involved in the antigenic reaction. Additionally, although 1bql antibody chains did not participate in the multiple structure alignment, i.e. the information about the location of the CDR was transferred from a homologous structure, the interaction was correctly identified.

Content statistics

Epitome currently contains 142 antigens from protein–antibody complex structures with a current total of 10 180 antigenic interactions. A total of 63 of the complexes consist of antigens that are sequence-unique, i.e. 63 are such that no other antigen in the database has a level of sequence similarity to any other of the 63 that would enable coarse-grained homology modeling.

Input and fields

Epitome users can search for epitopes either by querying the database or by entering a sequence and ‘BLASTing’ for similar sequences that are stored in the database. The fields that can be queried include one or more of the following: PDB identifier (four-letter code used by the PDB, e.g. 1pdb); Antigen chain ID (PDB identifier for the chain of the antigen, e.g. 1pdb_C), antigen residue type (one letter code for amino acids, e.g. Y corresponds to Tyrosine), antigen residue secondary structure state as defined by DSSP (32) (1 letter code; GHI corresponds to helical structures, EB to strands and TSL to other), antigen residue solvent accessibility (the input is the accessible surface in Å2 as defined by DSSP (32) and the search is on all residues with accessibility values that are bigger or equal to the input value), antigen residue position (the residue number as annotated in the PDB file), heavy/light chain (the interaction involves residues that are located either on the light or the heavy or both chains of the antibody), antibody chain identifier (similar to the antigen chain identifier), antibody residue type (one letter code for amino acids, e.g. C corresponds to Cysteine), antibody residue position in the PDB (the position of the antibody residue that is involved in the interaction as annotated by the PDB) and CDR number (possible values: 1, 2, 3).

Output

Results for database queries are presented as a table that lists all features of the result sets (Figure 2). The antigen results include the residues in the environment of the antigen (highlighted in red). If a user performs a BLAST sequence search against the Epitome database to find PDB structures containing antigens with similar sequences, the output will be all complex structures consisting of proteins with high degree of similarity to the input sequence, the corresponding E-value and BLAST score of the pairwise sequence alignments. Additionally, each PSI-BLAST hit contains a link that can trigger another database query.

Figure 2

Screenshot of a database entry. Each line of the table represents different antigenic interaction, i.e. interaction of a protein surface residue with an antibody surface residue that is located on one of the antibody's 6 CDRs. Note that the search could be performed using any of the table fields and that there is additional link to visualize the interaction using Jmol ().

Updates

Since most Epitome entries were identified using the SCOP database, Epitome updates will follow updates of SCOP, i.e. Epitome will be updated twice a year as soon as SCOP updates its parseable files. Additionally, all the other programs used to create the database are installed locally and can be run automatically.

32 in total

1. SACS--self-maintaining database of antibody crystal structure information.

Authors: Lee C Allcorn; Andrew C R Martin
Journal: Bioinformatics Date: 2002-01 Impact factor: 6.937

2. Protein structure alignment by incremental combinatorial extension (CE) of the optimal path.

Authors: I N Shindyalov; P E Bourne
Journal: Protein Eng Date: 1998-09

3. Refined structures of bobwhite quail lysozyme uncomplexed and complexed with the HyHEL-5 Fab fragment.

Authors: S Chacko; E W Silverton; S J Smith-Gill; D R Davies; K A Shick; K A Xavier; R C Willson; P D Jeffrey; C Y Chang; L C Sieker; S Sheriff
Journal: Proteins Date: 1996-09

4. Antibody-antigen interactions: contact analysis and binding site topography.

Authors: R M MacCallum; A C Martin; J M Thornton
Journal: J Mol Biol Date: 1996-10-11 Impact factor: 5.469

5. The HSSP database of protein structure-sequence alignments.

Authors: R Schneider; A de Daruvar; C Sander
Journal: Nucleic Acids Res Date: 1997-01-01 Impact factor: 16.971

Review 6. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.

Authors: S F Altschul; T L Madden; A A Schäffer; J Zhang; Z Zhang; W Miller; D J Lipman
Journal: Nucleic Acids Res Date: 1997-09-01 Impact factor: 16.971

7. Analysis of protein-protein interaction sites using surface patches.

Authors: S Jones; J M Thornton
Journal: J Mol Biol Date: 1997-09-12 Impact factor: 5.469

8. Prediction of protein-protein interaction sites using patch analysis.

Authors: S Jones; J M Thornton
Journal: J Mol Biol Date: 1997-09-12 Impact factor: 5.469

9. Standard conformations for the canonical structures of immunoglobulins.

Authors: B Al-Lazikani; A M Lesk; C Chothia
Journal: J Mol Biol Date: 1997-11-07 Impact factor: 5.469

Review 10. Interactions of protein antigens with antibodies.

Authors: D R Davies; G H Cohen
Journal: Proc Natl Acad Sci U S A Date: 1996-01-09 Impact factor: 11.205

26 in total

Review 1. Immunoinformatics: an integrated scenario.

Authors: Namrata Tomar; Rajat K De
Journal: Immunology Date: 2010-08-16 Impact factor: 7.397

2. Prediction of residues in discontinuous B-cell epitopes using protein 3D structures.

Authors: Pernille Haste Andersen; Morten Nielsen; Ole Lund
Journal: Protein Sci Date: 2006-09-25 Impact factor: 6.725

3. A meta-learning approach for B-cell conformational epitope prediction.

Authors: Yuh-Jyh Hu; Shun-Chien Lin; Yu-Lung Lin; Kuan-Hui Lin; Shun-Ning You
Journal: BMC Bioinformatics Date: 2014-11-18 Impact factor: 3.169

4. BEID: database for sequence-structure-function information on antigen-antibody interactions.

Authors: Joo Chuan Tong; Chun Meng Song; Paul Thiam Joo Tan; Ee Chee Ren; Animesh A Sinha
Journal: Bioinformation Date: 2008-10-13

Review 5. Prediction and redesign of protein-protein interactions.

Authors: Rhonald C Lua; David C Marciano; Panagiotis Katsonis; Anbu K Adikesavan; Angela D Wilkins; Olivier Lichtarge
Journal: Prog Biophys Mol Biol Date: 2014-05-27 Impact factor: 3.667

Review 6. Removal of B cell epitopes as a practical approach for reducing the immunogenicity of foreign protein-based therapeutics.

Authors: Satoshi Nagata; Ira Pastan
Journal: Adv Drug Deliv Rev Date: 2009-08-11 Impact factor: 15.470