Literature DB >> 23080122

The IMGT/HLA database.

James Robinson¹, Jason A Halliwell, Hamish McWilliam, Rodrigo Lopez, Peter Parham, Steven G E Marsh.

Abstract

It is 14 years since the IMGT/HLA database was first released, providing the HLA community with a searchable repository of highly curated HLA sequences. The HLA complex is located within the 6p21.3 region of human chromosome 6 and contains more than 220 genes of diverse function. Of these, 21 genes encode proteins of the immune system that are highly polymorphic. The naming of these HLA genes and alleles and their quality control is the responsibility of the World Health Organization Nomenclature Committee for Factors of the HLA System. Through the work of the HLA Informatics Group and in collaboration with the European Bioinformatics Institute, we are able to provide public access to these data through the website http://www.ebi.ac.uk/imgt/hla/. Regular updates to the website ensure that new and confirmatory sequences are dispersed to the HLA community and the wider research and clinical communities. This article describes the latest updates and additional tools added to the IMGT/HLA project.

Entities: Chemical Disease Gene Species

Mesh：

Substances：
HLA Antigens

Year: 2012 PMID： 23080122 PMCID： PMC3531221 DOI： 10.1093/nar/gks949

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

The IMGT/HLA database was established to provide a locus-specific database (LSDB) for the allelic sequences of the genes in the HLA system, also known as the human major histocompatibility complex (MHC). The MHC is one of the most complex and polymorphic regions of the human genome, with excess of 220 genes (1). The core genes of interest in the HLA system are 21 highly polymorphic HLA genes, found within the 6p21.3 region of the short arm of human chromosome 6, whose protein products mediate human responses to infectious disease and influence the outcome of cell and organ transplants. Three distinct regions have been identified within the MHC. The class I region is located at the telomeric end of the MHC and encodes the genes for the HLA class I molecules, HLA-A, -B and -C. These are co-dominantly expressed on the cell surface and responsible for presenting intracellularly derived peptides to CD8-positive T cells. The class II region lies at the centromeric end of the MHC and encodes HLA class genes HLA-DRA, -DRB1, -DRB3, -DRB4, -DRB5, -DQA1, -DQB1, -DPA1 and -DPB1. HLA class II expression is limited to cells involved in immune responses, where these molecules present extracellularly derived peptides to CD4-positive T cells. Located between the class I and class II regions lies the class III region where a number of non-HLA genes with immune function are located. With a nomenclature covering more than 50 genes and 8000 alleles, there is an obvious need for a curated LSDB to manage these highly polymorphic variants. The first public release of the IMGT/HLA database was made on the 16 December 1998 (2). Since then the database has been updated every 3 months, in a total of 55 releases, to include all the publicly available sequences officially named by the World Health Organization (WHO) Nomenclature Committee at the time of release. The naming of new HLA genes and allele sequences and their quality control is the responsibility of the WHO Nomenclature Committee for Factors of the HLA System, which first met in 1968. This committee meets regularly to discuss the issues of nomenclature and has published 19 major reports (3–21) initially documenting the serologically defined HLA antigens and more recently the genes and alleles defined by nucleotide sequences. The IMGT/HLA database provides the nomenclature committee with the online tools necessary for its task. The dissemination of new allele names and sequences is of paramount importance in the clinical transplant setting, because the variation that distinguishes HLA alleles can have a critical impact on the outcome of a haematopoietic stem cell transplant (22,23). The identification, verification and publication of the sequences of these variants through a centralized resource are necessary for accurate identification of HLA alleles in a clinical setting. Sequencing of HLA alleles began in the late 1970’s, predominantly using protein-based techniques to determine the sequences of HLA class I allotypes. The first complete HLA class I allotype sequence, B7.2, now known as B*07:02:01, was published in 1979 (24). The first HLA class II allele, DRA*01:01, was defined by protein sequencing and later in 1982 by DNA sequencing (25–27). The first HLA DNA sequences or alleles were named by the WHO Nomenclature Committee for Factors of the HLA System (10) in 1987. At that time, 12 class I alleles and 9 class II alleles were named: in the first 8 months of 2012, the WHO Nomenclature Committee was able to assign names to 1163 alleles (Figure 1).

Figure 1.

The number of HLA alleles named each year and included in the IMGT/HLA Database. The recent surge in the number of submissions received by the database is clearly shown.

IMGT/HLA DATA SOURCES

The IMGT/HLA database receives submissions from laboratories across the world. These submissions are curated and analysed, and if they meet the strict requirements, an official allele designation is assigned. The IMGT/HLA database is the official repository for the WHO Nomenclature Committee for factors of the HLA system and is the only way of receiving an official allele designation for a sequence. The sequence is then incorporated into the next 3-monthly release of the database. Since its release in December 1998, the database has received over 14 000 submissions. These submissions come from a variety of sources; the majority are from laboratories involved in clinical HLA typing, for hospitals or donor registries, or commercial organizations performing contract HLA typing for large haematopoietic stem cell donor registries. Further data have been submitted following large-scale genome sequencing projects (1,28). All submissions must meet strict acceptance criteria before the sequence receives an official designation. These minimum standards cover the methodologies used to define the sequence, the length of sequence submitted and the source of the sequence; the full list of the minimum criteria can be seen at http://www.ebi.ac.uk/imgt/hla/subs/submit.html. Around 3% of the submissions received fail to meet these criteria and are rejected. In addition, all the submissions received by the IMGT/HLA database are also available from the International Nucleotide Sequence Database Collaboration (INSDC) (29). The INSDC consists of DNA DataBank of Japan (Japan), GenBank (USA) and the EMBL-European Nucleotide Archive (ENA) (UK) (30–32). The ENA entries also contain database cross-references to the IMGT/HLA entries. The cross-references to the IMGT/HLA database are also included in ENSEMBL (33) and vertebrate genome annotation (VEGA) entries (34).

TOOLS AVAILABLE AT IMGT/HLA

The IMGT/HLA database provides a diversity of tools for the analysis of HLA sequences. Some of these tools were custom written for the IMGT/HLA database, and others were incorporated from the existing set of tools provided on the European Bioinformatics Institute’s (EBI) website (35,36). The website (Figure 2) includes tools for producing user-defined sequence alignments at the protein, cDNA and gDNA level. The user is also able to perform queries for particular HLA alleles; the output provides access to detailed information on any HLA allele, including information on the ethnic origin of the source, database cross-references and seminal publications. This information is also available through integration with the Sequence Retrieval System (SRS) service at EBI (37).

Figure 2.

The IMGT/HLA homepage, which acts as a portal to the different tools provided on the website.

The IMGT/HLA homepage, which acts as a portal to the different tools provided on the website. Tools have also been developed to support the laboratories that sequence HLA. The use of sequence-based typing (SBT) as a method for defining the HLA type is well documented (38,39); most SBT typing strategies currently employed use the exon 2 and exon 3 sequences for HLA class I analysis and exon 2 alone for HLA class II analysis. Because of the heterozygous nature of the SBT analysis, the combinations of many pairs of alleles may give an ambiguous typing result; currently, there are over 60 000 recognized ambiguous combinations. The IMGT/HLA maintains and regularly updates a listing of these ambiguous allele combinations. The document also includes a list of all alleles that are identical over exons 2 + 3 for HLA class I and exon 2 for HLA class II. Where possible, sequence data, both nucleotide and protein, from the IMGT/HLA database is incorporated into the EBI’s suite of search tools including FASTA (40) and BLAST (41) and downloadable from the EBI’s File Transfer Protocol (FTP) directory in a variety of commonly used formats like FASTA, MSF and PIR.

LATEST DEVELOPMENTS

In 2012, the IMGT/HLA database added an Extensible Markup Language (XML) export to the data formats available. XML is a simple but flexible language that defines a set of rules for encoding documents in a format that is both human and machine readable. Designed to meet the challenges of large-scale electronic publishing, XML is playing an increasingly important role in the exchange of scientific data. The data format has been developed in a collaborative project between the HLA Informatics Group of the Anthony Nolan Research Institute and the Bioinformatics Department of the National Marrow Donor Programme (NMDP). The NMDP Bioinformatics group has previous success in developing an XML format for electronically communicating HLA typing data, the Histoimmunogenetics Markup Language file format (42). This experience facilitated the collaboration to develop a similar project for publishing the data contained within each release of the IMGT/HLA database. The new format combines the data present in the multiple files of each quarterly IMGT/HLA release into a single file. The IMGT/HLA database provides an FTP site for the retrieval of sequences in a number of pre-formatted files. The sequences are provided as FASTA, PIR and MSF formats, as well as an archive of the sequence alignments and an ENA flat file like formatted copy of the database. The NMDP Bioinformatics Department has also developed a suite of tools for importing data into different database schema, both open source and proprietary, allowing incorporation into different laboratory systems (Figure 3). Additional XML exports are being developed for other sections of the IMGT/HLA database. Further developmental work on a suite of tools for integrating the XML into laboratory systems used by HLA-typing laboratories is underway.

Figure 3.

The IMGT/HLA export combines a number of existing file formats and data source into a single format. The data are available from the IMGT/HLA database. The tool set is available from the Bioinformatics Group, of the National Marrow Donor Program. Together these allow the user to import the HLA data directly into their local database structure. HLA matching is a critical factor when considering potential donors for patients receiving allogeneic transplants for haematological disorders (22,23). The most recent development on the IMGT/HLA website is an online tool to implement the T-cell epitope matching algorithm described by Zino et al. (43–45) and updated by Fleischhauer and Shaw (46). This algorithm classifies the HLA-DPB1 alleles into a number of groups based on functional studies and protein motifs. Predictive analysis of the HLA-DPB1 mismatches between patient and donor based on T-cell epitope (TCE) groups has the potential to distinguish between mismatches that are tolerated (permissive) from those that increase the risks of poor clinical outcome (non-permissive). This tool allows the user to enter the HLA-DPB1 of a prospective patient and donor pair and view the predicted TCEs and resulting prediction of the effect of mismatching when selecting appropriate donors for HSCT recipients. Any allele that does not have a TCE group ‘protein’ is analysed for a motif match to particular protein motifs of those alleles with known TCE group. If the tool needs to predict the TCE group for an allele, then a warning is issued within the output to the user, to ensure that the lack of functional studies is acknowledged. The implementation of an easy to use online tool makes it simple for all those staff involved with selecting donors for transplantation to factor in DPB1 mismatches into their own search algorithms and procedures.

FUTURE DEVELOPMENTS

A major challenge for the database is to keep up with the increasing number of allele sequences that are being submitted. In recent years, the number of sequences in the database increased on average by 29% each year. The database must develop new tools for the visualization of sequences while maintaining the high standards set in the presentation and quality of the HLA sequences and nomenclature to the research community. The database aims to continually develop new tools and refine existing tools to meet this challenge.

CONCLUSIONS

The IMGT/HLA database provides a centralized resource for everybody interested, clinically or scientifically, in the HLA system. The database and accompanying tools allow the study of HLA alleles from a single site on the World Wide Web. It aids in the management and development of HLA nomenclature, providing a continuing and updated resource for the WHO Nomenclature Committee. The challenges for the database are to keep up with this increase in submitted sequences, keep pace with the increasing difficulties in performing analyses on the larger datasets and develop new tools for the visualization of the sequences while maintaining the high standards set in the presentation and quality of the HLA sequences and nomenclature to the research community.

LICENSING

The IMGT/HLA database is covered by the Creative Commons Attribution-NoDerivs Licence, which is applicable to all copyrightable parts of the database, which includes the sequence alignments. This means that users are free to copy, distribute, display and make commercial use of the databases in all legislations, provided they give the appropriate credit (47,48). If users intend to distribute a modified version of the data in any form, then they must ask us for permission; this can be done by contacting hla@alleles.org for further details of how modified data can be reproduced.

FUNDING

Histogenetics; One Lambda Inc.; Conexio; Abbott Molecular Laboratories Inc.; European Federation for Immunogenetics; Gen-Probe; LabCorp; Life Technologies; Olersup SSP; 454 Sequencing; American Society for Histocompatibility and Immunogenetics; Anthony Nolan; Asia-Pacific Histocompatibility and Immunogetics Association; BAG Healthcare; Be the Match Foundation; DKMS, Inno-train Diagnostik GMBH; National Marrow Donor Program; Rose and Zentrum Knochenmarkspender-Register Deutschland; Imperial Cancer Research Fund (now Cancer Research UK) and a EU Biotech grant [BIO4CT960037]. Funding for open access charge: The publication costs will be met by the Anthony Nolan Research Institute. Conflict of interest statement. None declared.

45 in total

1. Nomenclature for factors of the HLA system.

Authors:
Journal: Bull World Health Organ Date: 1975 Impact factor: 9.408

2. Public web-based services from the European Bioinformatics Institute.

Authors: Nicola Harte; Ville Silventoinen; Emmanuel Quevillon; Stephen Robinson; Kimmo Kallio; Xavier Fustero; Pravin Patel; Petteri Jokinen; Rodrigo Lopez
Journal: Nucleic Acids Res Date: 2004-07-01 Impact factor: 16.971

Review 3. Nomenclature for factors of the HLA system, 2010.

Authors: S G E Marsh; E D Albert; W F Bodmer; R E Bontrop; B Dupont; H A Erlich; M Fernández-Viña; D E Geraghty; R Holdsworth; C K Hurley; M Lau; K W Lee; B Mach; M Maiers; W R Mayr; C R Müller; P Parham; E W Petersdorf; T Sasazuki; J L Strominger; A Svejgaard; P I Terasaki; J M Tiercy; J Trowsdale
Journal: Tissue Antigens Date: 2010-04

4. Nomenclature for factors of the HL-A system.

Authors:
Journal: Bull World Health Organ Date: 1972 Impact factor: 9.408

5. Complete amino acid sequence of a papain-solubilized human histocompatibility antigen, HLA-B7. 2. Sequence determination and search for homologies.

Authors: H T Orr; J A López de Castro; D Lancet; J L Strominger
Journal: Biochemistry Date: 1979-12-11 Impact factor: 3.162

6. Nomenclature for factors of the HLA system--1977.

Authors:
Journal: Tissue Antigens Date: 1978-02

7. Nonpermissive HLA-DPB1 disparity is a significant independent risk factor for mortality after unrelated hematopoietic stem cell transplantation.

Authors: Roberto Crocchiolo; Elisabetta Zino; Luca Vago; Rosi Oneto; Barbara Bruno; Simona Pollichieni; Nicoletta Sacchi; Maria Pia Sormani; Jessica Marcon; Teresa Lamparelli; Renato Fanin; Lucia Garbarino; Valeria Miotti; Giuseppe Bandini; Alberto Bosi; Fabio Ciceri; Andrea Bacigalupo; Katharina Fleischhauer
Journal: Blood Date: 2009-06-10 Impact factor: 22.113

8. A new bioinformatics analysis tools framework at EMBL-EBI.

Authors: Mickael Goujon; Hamish McWilliam; Weizhong Li; Franck Valentin; Silvano Squizzato; Juri Paern; Rodrigo Lopez
Journal: Nucleic Acids Res Date: 2010-05-03 Impact factor: 16.971

9. Diverging effects of HLA-DPB1 matching status on outcome following unrelated donor transplantation depending on disease stage and the degree of matching for other HLA alleles.

Authors: B E Shaw; N P Mayor; N H Russell; J F Apperley; R E Clark; J Cornish; P Darbyshire; M E Ethell; J M Goldman; A-M Little; S Mackinnon; D I Marks; A Pagliuca; K Thomson; S G E Marsh; J A Madrigal
Journal: Leukemia Date: 2009-11-19 Impact factor: 11.528

10. Nomenclature for factors of the HL-a system.

Authors:
Journal: Bull World Health Organ Date: 1968 Impact factor: 9.408

137 in total

Review 1. Liquid biopsies: donor-derived cell-free DNA for the detection of kidney allograft injury.

Authors: Michael Oellerich; Karen Sherwood; Paul Keown; Ekkehard Schütz; Julia Beck; Johannes Stegbauer; Lars Christian Rump; Philip D Walson
Journal: Nat Rev Nephrol Date: 2021-05-24 Impact factor: 28.314

2. Histoimmunogenetics Markup Language 1.0: Reporting next generation sequencing-based HLA and KIR genotyping.

Authors: Robert P Milius; Michael Heuer; Daniel Valiga; Kathryn J Doroschak; Caleb J Kennedy; Yung-Tsi Bolon; Joel Schneider; Jane Pollack; Hwa Ran Kim; Nezih Cereb; Jill A Hollenbach; Steven J Mack; Martin Maiers
Journal: Hum Immunol Date: 2015-08-28 Impact factor: 2.850

3. Heterogeneity of dN/dS Ratios at the Classical HLA Class I Genes over Divergence Time and Across the Allelic Phylogeny.

Authors: Bárbara Domingues Bitarello; Rodrigo dos Santos Francisco; Diogo Meyer
Journal: J Mol Evol Date: 2015-11-14 Impact factor: 2.395

4. Increased diversity of the HLA-B40 ligandome by the presentation of peptides phosphorylated at their main anchor residue.

Authors: Miguel Marcilla; Adán Alpízar; Manuel Lombardía; Antonio Ramos-Fernandez; Manuel Ramos; Juan Pablo Albar
Journal: Mol Cell Proteomics Date: 2013-12-23 Impact factor: 5.911

5. Full screening and accurate subtyping of HLA-A*02 alleles through group-specific amplification and mono-allelic sequencing.

Authors: Shengli Song; Miaomiao Han; Han Zhang; Yuanxia Wang; Hong Jiang
Journal: Cell Mol Immunol Date: 2013-08-19 Impact factor: 11.530

6. KIR3DL1 genetic diversity and phenotypic variation in the Chinese Han population.

Authors: S D Tao; Y M He; Y L Ying; J He; F M Zhu; H J Lv
Journal: Genes Immun Date: 2013-10-31 Impact factor: 2.676

7. Study of MHC class II region polymorphism in the Filipino cynomolgus macaque population.

Authors: A Blancher; A Aarnink; Y Yamada; K Tanaka; H Yamanaka; T Shiina
Journal: Immunogenetics Date: 2014-02-26 Impact factor: 2.846

Review 8. Human gamma delta T cells: Evolution and ligand recognition.

Authors: Erin J Adams; Siyi Gu; Adrienne M Luoma
Journal: Cell Immunol Date: 2015-05-06 Impact factor: 4.868

9. Major Histocompatibility Complex Class I Chain-Related A and B (MICA and MICB) Gene, Allele, and Haplotype Associations With Dengue Infections in Ethnic Thais.

Authors: Panpimon Luangtrakool; Sasijit Vejbaesya; Komon Luangtrakool; Somporn Ngamhawornwong; Kusuma Apisawes; Siripen Kalayanarooj; Louis R Macareo; Stefan Fernandez; Richard G Jarman; Robert W M Collins; Steven T Cox; Anon Srikiatkhachorn; Alan L Rothman; Henry A F Stephens
Journal: J Infect Dis Date: 2020-08-04 Impact factor: 5.226

10. Class II human leucocyte antigen DRB1*11 in hairy cell leukaemia patients with and without haemolytic uraemic syndrome.

Authors: Evgeny Arons; Sharon Adams; David J Venzon; Ira Pastan; Robert J Kreitman
Journal: Br J Haematol Date: 2014-06-13 Impact factor: 6.998