Literature DB >> 26759709

An ontology for major histocompatibility restriction.

Randi Vita1, James A Overton1, Emily Seymour1, John Sidney1, Jim Kaufman2, Rebecca L Tallmadge3, Shirley Ellis4, John Hammond4, Geoff W Butcher5, Alessandro Sette1, Bjoern Peters1.   

Abstract

BACKGROUND: MHC molecules are a highly diverse family of proteins that play a key role in cellular immune recognition. Over time, different techniques and terminologies have been developed to identify the specific type(s) of MHC molecule involved in a specific immune recognition context. No consistent nomenclature exists across different vertebrate species.
PURPOSE: To correctly represent MHC related data in The Immune Epitope Database (IEDB), we built upon a previously established MHC ontology and created an ontology to represent MHC molecules as they relate to immunological experiments. DESCRIPTION: This ontology models MHC protein chains from 16 species, deals with different approaches used to identify MHC, such as direct sequencing verses serotyping, relates engineered MHC molecules to naturally occurring ones, connects genetic loci, alleles, protein chains and multi-chain proteins, and establishes evidence codes for MHC restriction. Where available, this work is based on existing ontologies from the OBO foundry.
CONCLUSIONS: Overall, representing MHC molecules provides a challenging and practically important test case for ontology building, and could serve as an example of how to integrate other ontology building efforts into web resources.

Entities:  

Keywords:  Immune epitope; MHC; Major histocompatibility complex; Ontology

Mesh:

Substances:

Year:  2016        PMID: 26759709      PMCID: PMC4709943          DOI: 10.1186/s13326-016-0045-5

Source DB:  PubMed          Journal:  J Biomed Semantics


Background

Major histocompatibility complex (MHC) proteins play a central role in the adaptive immune system. First discovered due to their role in transplant rejection, MHC molecules are encoded by a large family of genes with wide variation within each species. MHC molecules typically bind peptide fragments of proteins and display them on the cell surface where they are scanned by T cells of the immune system. If a peptide fragment is displayed by MHC, it can trigger a T cell immune response. Peptides triggering a response are referred to as ‘epitopes’. Thus, binding of epitopes to MHC molecules is an integral step for immune recognition. The specific MHC molecule that presents an epitope to a T cell is knowns as its “MHC restriction”, often called its MHC restriction (or restricting) element. Accurately representing this MHC restriction, which can be determined in different manners, is the goal of the work presented here. Most MHC molecules consist of two protein chains, of which at least one gene is present within the MHC locus. In humans this locus is known as the human leukocyte antigen (HLA) and is depicted in Fig. 1a. There are thousands of different allelic variants of these genes coding for different proteins that result in diverse MHC binding specificities found in the human population. The most precise way of specifying MHC restriction is to identify the exact protein chains that make up the MHC molecule. However, until recently such exact molecular typing was not possible, and patterns of antibody binding were utilized to group MHC molecules together into serotypes that share a common serological (antibody based) recognition pattern, as shown in Fig. 1b. Tying such traditional serotype information together with current sequence based MHC typing techniques is one of the goals of our study. In yet other cases, such as inbred mouse strains, MHC restriction is narrowed down based on the haplotype of the animal, the set of alleles present on a single chromosome and thus expressed consistently together in select subspecies or strains. Another way MHC restriction is sometimes inferred is based on the T cells recognizing the epitope. MHC molecules are divided into three classes: MHC class I, MHC class II, and non-classical MHC. MHC class I molecules present epitopes to CD8+ T cells and are made up of one alpha chain and one β2 microglobulin chain, which is invariant and encoded outside the MHC locus. MHC class II molecules present epitopes to CD4+ T cells and are composed of one alpha and one beta chain, as shown in Fig. 1c. Thus knowing if the responding T cell expresses CD4 verses CD8 can be used to narrow down the possible MHC restriction into classes. At the same time, current research has identified that some T cell populations do not follow this pattern exactly (e.g. some T cells recognizing MHC-II restricted epitopes express CD8). It is therefore important to capture not only the inferred restriction information, but also the evidence upon which it was based.
Fig. 1

MHC presentation and restriction. a. HLA locus of human chromosome 6 encodes specific MHC protein chains. b. The MHC on APC presenting epitopes can be bound by antibodies to establish the serotype. c. If responding effector cells are known to be CD4 cells, the MHC presenting the epitope can be presumed to be class II restricted

MHC presentation and restriction. a. HLA locus of human chromosome 6 encodes specific MHC protein chains. b. The MHC on APC presenting epitopes can be bound by antibodies to establish the serotype. c. If responding effector cells are known to be CD4 cells, the MHC presenting the epitope can be presumed to be class II restricted

Methods

The Immune Epitope Database (www.iedb.org) presents thousands of published experiments describing the recognition of immune epitopes by antibodies, T cells, or MHC molecules [1]. The data contained in the IEDB is primarily derived through manual curation of published literature, but also includes some directly submitted data, primarily from NIAID funded epitope discovery contracts [2]. The goal of the current work was to represent MHC data as they are utilized by immunologists to meet the needs of the IEDB users. We collected user input at workshops, conferences and the IEDB help system regarding how they wanted to retrieve data from the IEDB regarding MHC restriction. These requests were used to identify goals for this ontology project and the final ontology was evaluated if it could answer these requests. As shown in Additional file 1: Table S1, an example of such a request was to be able to query for epitopes restricted by MHC molecules with serotype ‘A2’ and retrieve not only serotyped results but also those where the restriction is finer mapped e.g. to MHC molecule A*02:01 which has serotype A2. We set out to logically represent the relationships between the genes encoding MHC, the haplotypes linking together groups of genes in specific species, and the individual proteins comprising MHC complexes, in order to present immunological data in an exact way and to improve the functionality of our website. Our work builds on MaHCO [3], an ontology for MHC developed for the StemNet project, using the well-established MHC nomenclature resources of the international ImMunoGeneTics information system (IMGT, http://www.imgt.org) for human data and The Immuno Polymorphism Database (IPD, http://www.ebi.ac.uk/ipd) for non-human species. It contains 118 terms for MHC across human, mouse, and dog. We were encouraged by the success of MaHCO in expressing official nomenclature using logical definitions. However, we needed to extend it for the purpose of the IEDB to include data from a growing list of 16 species, as well as data about MHC protein complexes (not just MHC alleles), haplotypes and serotypes. Thus, our current work goes beyond MaHCO, and we have utilized this opportunity to also enhance the integration with other ontological frameworks. We used the template feature of the open source ROBOT ontology tool [4] to specify the content of our ontology in a number of tables. Most of the tables correspond to a single “branch” of the ontology hierarchy, in which the classes have a consistent logical structure, e.g. gene loci, protein chains, mutant MHC molecules, haplotypes, etc. The OWL representation of our ontology is generated directly from the tables using ROBOT. This method enforces the ontology design patterns we have chosen for each branch, and makes certain editing tasks easier than with tools such as Protégé.

Results and discussion

Our MHC Restriction Ontology (MRO) is available in a preliminary state at https://github.com/IEDB/MRO. It is based on existing ontology terms, including: ‘material entity’ from the Basic Formal Ontology (BFO) [5], ‘protein complex’ from The Gene Ontology (GO) [6], ‘protein’ from The Protein Ontology (PRO) [7], ‘organism’ from The Ontology for Biomedical Investigations (OBI) [8], ‘genetic locus’ from The Reagent Ontology (REO) [9], ‘has part’, ‘in taxon’, and ‘gene product of’ from The Relation Ontology (RO) [10]. The NCBI Taxonomy was used to refer to each species [11]. Although it is not yet complete, we strive to conform to Open Biological and Biomedical Ontologies (OBO) [12] standards. MRO currently contains 1750 classes and nearly 9000 axioms, including more than 2100 logical axioms. Its DL expressivity is “ALEI”, and the HermiT reasoner [13] completes reasoning in less than 10 seconds on a recent laptop. Synonyms were also included, as immunologists often utilize synonyms that are either abbreviations or based on previous states of the nomenclature. The current MHC nomenclatures for various species have been revised through several iterations. In order to ensure accuracy and remain up to date with the latest nomenclature, we referred to the well-established MHC nomenclature resources of the IMGT and IPD. For specific species where the literature was most formidable, such as chicken, cattle, and horse, we collaborated with experts in these fields. These experts reviewed the encoded hierarchy by determining whether the inferred parentage hierarchy in their area of expertise reflected their input. Each MHC molecule for which the IEDB has data is modeled as a protein complex consisting of two chains. Each chain is a gene product of a specific MHC genetic locus. For certain species, sub-loci are also defined, when useful. For example, as shown in Fig. 2 HLA-DPA1*02:01/DPB1*01:01 consists of one HLA-DPA1*02:01 chain, encoded by the DPA sub-locus of DP, and one HLA-DPB1*01:01 chain, encoded by the DPB1 sub-locus of DP. Together these two chains make up one DPA1*02:01/DPB1*01:01 MHC molecule.
Fig. 2

Ontologic relationships between MRO terms

Ontologic relationships between MRO terms When the identity of only a single chain of the complex is known, a “generic” second chain is used to make up the MHC complex. Thus, MHC restriction of HLA-DPB1*04:02 is modeled as one HLA-DPB1*04:02 chain in complex with an HLA-DPA chain that is not further specified, as shown within the context of the hierarchy in Fig. 3.
Fig. 3

Ontological model showing human MHC class II molecules

Ontological model showing human MHC class II molecules The data in the ontology drives the Allele Finder on the IEDB website, available at http://goo.gl/r8Tgrz, an interactive application that allows users to browse MHC restriction data in a hierarchical format. We evaluated the ability of MRO to meet the needs of IEDB users, as shown in Additional file 1: Table S1, and found it to meet our initial goals. Currently the use of the ontology is behind the scenes, but we have requested namespace and permanent identifiers from The Open Biomedical Ontologies (OBO). As soon as these identifiers are in place, they will be utilized and displayed on the IEDB website to allow users to link out to the ontology. In MHC binding and elution assays, the exact MHC molecule studied is typically known; however this is often not the case for T cell assays. When a T cell responds to an epitope, the identity of the MHC molecule presenting the epitope may not be known at all, it may be narrowed down to a subset of all possible molecules or it may be exactly identified. In the context of T cell assays, the MHC restriction can be determined by the genetic background of the host, conditions of the experiment, or the biological process being measured; therefore we represent MHC molecules at a variety of levels and specify the rationale behind the determined restriction using evidence codes. As shown in Fig. 4a, IEDB Evidence codes include “author statement” for cases where authors report previously defined restriction and “MHC ligand assay” used for MHC restriction established via an experiment that demonstrated the ability of the epitope to bind strongly to the MHC molecule or to have been eluted from that molecule. Figure 4b shows the metadata associated with this evidence code. “MHC binding prediction” is used when computer algorithms are used to predict the likelihood of an epitope to bind to a specific MHC molecule. In cases where authors analyze the MHC phenotype of a study population and conclude a likely restriction based upon epitope recognition patterns among the subjects, “statistical association” is used as the evidence code. We use a set of evidence codes to communicate restriction shown by the response of T cells to the epitope: MHC complex. These include “Single MHC available” for cases where T cells respond to the epitope when only a single MHC molecule is available and “reactivity of same T cells with different MHC” is used when different APC expressing different MHC are used to narrow the potential restriction. The use of antibodies to block or purify subsets of MHC molecules typically determines restriction to an imprecise level, such as HLA-DR and is conveyed by “set of MHC available.” When the T cells being studied are known to be CD8 or CD4 cells, the restriction can be deduced to be class I or class II, respectively, due to the known binding pattern of the molecules, as depicted in Fig. 1c. This case is communicated by the evidence code of “type of effector T cell.” Lastly, certain T cell responses can indicate the effector cell phenotype of CD8 or CD4, based upon known functions of the subsets and thus, class I or II restriction can be inferred and is noted by the evidence code of “biological process measured.” Figure 4c shows the modeling of these evidence codes in terms of the specific experiments, data transformations performed (using OBI terms), and the type of conclusion drawn. This work is being conducted in parallel with the general alignment of the Evidence Ontology (ECO) [14], which provides succinct codes for such types of evidence, with OBI, which can break down how such a code translates to specific experiments performed.
Fig. 4

Evidence codes in MRO

Evidence codes in MRO The IEDB MHC Allele Finder application, shown in Fig. 5, now allows users to browse data in different views. MHC molecules are first categorized into ‘class I, class II or non-classical’, and then further subdivided by species. Within each species, MHC molecules are organized by genetic locus. For select species, such as human, there are a large number of MHC molecules known and studied per genetic locus, thus sub-loci are also used in order to present the data in a more user-friendly format. Each MHC molecule is presented under its locus, its haplotype, and/or its serotype, when available, all representing newly added functionalities. The haplotype the host species expresses is represented as immunologists often rely on the known haplotypes of research animals to narrow the potential MHC restriction. For example, when BALB/c (H2d) mice demonstrate a response to an epitope and the responding T cells are CD4+, the restricting MHC can be assumed to be one of the two MHC class II molecules of that haplotype, namely H2 IAd or IEd.
Fig. 5

IEDB’s MHC Allele Finder, demonstrating chicken haplotypes

IEDB’s MHC Allele Finder, demonstrating chicken haplotypes The serotype of an MHC molecule, defined by antibody staining patterns, is relevant in immunology as this was the method of choice to identify MHC molecules until quite recently. In contrast to molecular definitions of MHC molecules based on their specific nucleotide or amino acid sequence, serotyping classifies MHC molecules based entirely on antibody binding patterns to the MHC molecule. These patterns are linked to the panel of antibodies used. Changing the antibody panel changes the serotype of a molecule. This can result in “serotype splits” where MHC molecules that were previously considered identical by one antibody panel, are later found to actually be two different molecules by a different antibody panel. To reflect this extrinsic nature of serotyping, we refer to serotypes as information entities rather than physical entities. Alternatively, the concept of serotype could also be modeled as collections of binding dispositions, but we chose what we thought was the simpler approach. MHC for all 16 species currently having MHC data in the IEDB are modeled to give users the ability to browse the tree in multiple ways and search IEDB data broadly, by entire MHC class, for example, or narrowly by a specific MHC protein chain. As new MHC molecules are encountered, they can be easily incorporated into this ontology.

Conclusions

In conclusion, we formally represented MHC data building on established ontologies in order to represent MHC restrictions as required by immunologists. Accordingly, we modeled MHC molecules as a protein complex of two chains and established the relationships between the genes encoding these proteins, the haplotypes expressed by specific species, and the MHC classes. Traditional serotype information was also related to specific MHC molecules. Precise MHC restriction was conveyed, as well as inferred MHC restriction and also the experimental evidence upon which the restriction was established. We will continue to formalize this work and will release a completed interoperable ontology later this year. Thus, MHC data in the IEDB is now presented to its users in a hierarchical format which simplifies searching the data and additionally instructs users on the inherent relationships between MHC genes and MHC restriction.
  11 in total

1.  A roadmap for the immunomics of category A-C pathogens.

Authors:  Alessandro Sette; Ward Fleri; Bjoern Peters; Muthuraman Sathiamurthy; Huynh-Hoa Bui; Stephen Wilson
Journal:  Immunity       Date:  2005-02       Impact factor: 31.745

2.  Formal ontology for natural language processing and the integration of biomedical databases.

Authors:  Jonathan Simon; Mariana Dos Santos; James Fielding; Barry Smith
Journal:  Int J Med Inform       Date:  2005-09-08       Impact factor: 4.046

3.  The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration.

Authors:  Barry Smith; Michael Ashburner; Cornelius Rosse; Jonathan Bard; William Bug; Werner Ceusters; Louis J Goldberg; Karen Eilbeck; Amelia Ireland; Christopher J Mungall; Neocles Leontis; Philippe Rocca-Serra; Alan Ruttenberg; Susanna-Assunta Sansone; Richard H Scheuermann; Nigam Shah; Patricia L Whetzel; Suzanna Lewis
Journal:  Nat Biotechnol       Date:  2007-11       Impact factor: 54.908

4.  MaHCO: an ontology of the major histocompatibility complex for immunoinformatic applications and text mining.

Authors:  David S DeLuca; Elena Beisswanger; Joachim Wermter; Peter A Horn; Udo Hahn; Rainer Blasczyk
Journal:  Bioinformatics       Date:  2009-05-07       Impact factor: 6.937

5.  Modeling biomedical experimental processes with OBI.

Authors:  Ryan R Brinkman; Mélanie Courtot; Dirk Derom; Jennifer M Fostel; Yongqun He; Phillip Lord; James Malone; Helen Parkinson; Bjoern Peters; Philippe Rocca-Serra; Alan Ruttenberg; Susanna-Assunta Sansone; Larisa N Soldatova; Christian J Stoeckert; Jessica A Turner; Jie Zheng
Journal:  J Biomed Semantics       Date:  2010-06-22

6.  The Protein Ontology: a structured representation of protein forms and complexes.

Authors:  Darren A Natale; Cecilia N Arighi; Winona C Barker; Judith A Blake; Carol J Bult; Michael Caudy; Harold J Drabkin; Peter D'Eustachio; Alexei V Evsikov; Hongzhan Huang; Jules Nchoutmboube; Natalia V Roberts; Barry Smith; Jian Zhang; Cathy H Wu
Journal:  Nucleic Acids Res       Date:  2010-10-08       Impact factor: 16.971

7.  Relations in biomedical ontologies.

Authors:  Barry Smith; Werner Ceusters; Bert Klagges; Jacob Köhler; Anand Kumar; Jane Lomax; Chris Mungall; Fabian Neuhaus; Alan L Rector; Cornelius Rosse
Journal:  Genome Biol       Date:  2005-04-28       Impact factor: 13.583

8.  Database resources of the National Center for Biotechnology Information.

Authors:  Eric W Sayers; Tanya Barrett; Dennis A Benson; Stephen H Bryant; Kathi Canese; Vyacheslav Chetvernin; Deanna M Church; Michael DiCuccio; Ron Edgar; Scott Federhen; Michael Feolo; Lewis Y Geer; Wolfgang Helmberg; Yuri Kapustin; David Landsman; David J Lipman; Thomas L Madden; Donna R Maglott; Vadim Miller; Ilene Mizrachi; James Ostell; Kim D Pruitt; Gregory D Schuler; Edwin Sequeira; Stephen T Sherry; Martin Shumway; Karl Sirotkin; Alexandre Souvorov; Grigory Starchenko; Tatiana A Tatusova; Lukas Wagner; Eugene Yaschenko; Jian Ye
Journal:  Nucleic Acids Res       Date:  2008-10-21       Impact factor: 16.971

9.  Standardized description of scientific evidence using the Evidence Ontology (ECO).

Authors:  Marcus C Chibucos; Christopher J Mungall; Rama Balakrishnan; Karen R Christie; Rachael P Huntley; Owen White; Judith A Blake; Suzanna E Lewis; Michelle Giglio
Journal:  Database (Oxford)       Date:  2014-07-22       Impact factor: 3.451

10.  The immune epitope database (IEDB) 3.0.

Authors:  Randi Vita; James A Overton; Jason A Greenbaum; Julia Ponomarenko; Jason D Clark; Jason R Cantrell; Daniel K Wheeler; Joseph L Gabbard; Deborah Hix; Alessandro Sette; Bjoern Peters
Journal:  Nucleic Acids Res       Date:  2014-10-09       Impact factor: 16.971

View more
  11 in total

1.  Minimal Information about MHC Multimers (MIAMM).

Authors:  Randi Vita; Apurva Mody; James A Overton; Soren Buus; Stephen T Haley; Alessandro Sette; Vamsee Mallajosyula; Mark M Davis; Dale L Long; Richard A Willis; Bjoern Peters; John D Altman
Journal:  J Immunol       Date:  2022-02-01       Impact factor: 5.426

2.  FAIR principles and the IEDB: short-term improvements and a long-term vision of OBO-foundry mediated machine-actionable interoperability.

Authors:  Randi Vita; James A Overton; Christopher J Mungall; Alessandro Sette; Bjoern Peters
Journal:  Database (Oxford)       Date:  2018-01-01       Impact factor: 3.451

3.  Better living through ontologies at the Immune Epitope Database.

Authors:  Randi Vita; James A Overton; Alessandro Sette; Bjoern Peters
Journal:  Database (Oxford)       Date:  2017-01-01       Impact factor: 3.451

4.  The Immune Epitope Database: How Data Are Entered and Retrieved.

Authors:  Ward Fleri; Kerrie Vaughan; Nima Salimi; Randi Vita; Bjoern Peters; Alessandro Sette
Journal:  J Immunol Res       Date:  2017-05-29       Impact factor: 4.818

5.  Identification of errors in the IEDB using ontologies.

Authors:  Randi Vita; James A Overton; Bjoern Peters
Journal:  Database (Oxford)       Date:  2018-01-01       Impact factor: 3.451

6.  Protein nanovaccine confers robust immunity against Toxoplasma.

Authors:  Kamal El Bissati; Ying Zhou; Sara Maria Paulillo; Senthil Kumar Raman; Christopher P Karch; Craig W Roberts; David E Lanar; Steve Reed; Chris Fox; Darrick Carter; Jeff Alexander; Alessandro Sette; John Sidney; Hernan Lorenzi; Ian J Begeman; Peter Burkhard; Rima McLeod
Journal:  NPJ Vaccines       Date:  2017-09-05       Impact factor: 7.344

7.  Epitope Specific Antibodies and T Cell Receptors in the Immune Epitope Database.

Authors:  Swapnil Mahajan; Randi Vita; Deborah Shackelford; Jerome Lane; Veronique Schulten; Laura Zarebski; Martin Closter Jespersen; Paolo Marcatili; Morten Nielsen; Alessandro Sette; Bjoern Peters
Journal:  Front Immunol       Date:  2018-11-20       Impact factor: 7.561

8.  Thematic issue of the Second combined Bio-ontologies and Phenotypes Workshop.

Authors:  Karin Verspoor; Anika Oellrich; Nigel Collier; Tudor Groza; Philippe Rocca-Serra; Larisa Soldatova; Michel Dumontier; Nigam Shah
Journal:  J Biomed Semantics       Date:  2016-12-12

Review 9.  MIRO: guidelines for minimum information for the reporting of an ontology.

Authors:  Nicolas Matentzoglu; James Malone; Chris Mungall; Robert Stevens
Journal:  J Biomed Semantics       Date:  2018-01-18

10.  The Immune Epitope Database (IEDB): 2018 update.

Authors:  Randi Vita; Swapnil Mahajan; James A Overton; Sandeep Kumar Dhanda; Sheridan Martini; Jason R Cantrell; Daniel K Wheeler; Alessandro Sette; Bjoern Peters
Journal:  Nucleic Acids Res       Date:  2019-01-08       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.