Literature DB >> 34875068

IMGT® databases, related tools and web resources through three main axes of research and development.

Taciana Manso1, Géraldine Folch1, Véronique Giudicelli1, Joumana Jabado-Michaloud1, Anjana Kushwaha1, Viviane Nguefack Ngoune1, Maria Georga1, Ariadni Papadaki1, Chahrazed Debbagh1, Perrine Pégorier1, Morgane Bertignac1, Saida Hadi-Saljoqi1, Imène Chentli1, Karima Cherouali1, Safa Aouinti1, Amar El Hamwi1, Alexandre Albani1, Merouane Elazami Elhassani1, Benjamin Viart1, Agathe Goret1, Anna Tran1, Gaoussou Sanou1, Maël Rollin1, Patrice Duroux1, Sofia Kossida1.   

Abstract

IMGT®, the international ImMunoGeneTics information system®, http://www.imgt.org/, is at the forefront of the immunogenetics and immunoinformatics fields with more than 30 years of experience. IMGT® makes available databases and tools to the scientific community pertaining to the adaptive immune response, based on the IMGT-ONTOLOGY. We focus on the recent features of the IMGT® databases, tools, reference directories and web resources, within the three main axes of IMGT® research and development. Axis I consists in understanding the adaptive immune response, by deciphering the identification and characterization of the immunoglobulin (IG) and T cell receptor (TR) genes in jawed vertebrates. It is the starting point of the two other axes, namely the analysis and exploration of the expressed IG and TR repertoires based on comparison with IMGT reference directories in normal and pathological situations (Axis II) and the analysis of amino acid changes and functions of 2D and 3D structures of antibody and TR engineering (Axis III).
© The Author(s) 2021. Published by Oxford University Press on behalf of Nucleic Acids Research.

Entities:  

Mesh:

Substances:

Year:  2022        PMID: 34875068      PMCID: PMC8728119          DOI: 10.1093/nar/gkab1136

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

The adaptive immune response appeared with the jawed vertebrates (or Gnathostomata), 450 million years ago. It is characterized by a remarkable immune specificity and memory which are the properties of the B and T cells owing to an extreme diversity of their antigen receptors, immunoglobulins (IG) or antibodies and T cell receptors (TR) (1). In human and other mammals, an IG consists of two identical light chains (Kappa (IGK) or Lambda (IGL)) and two identical heavy chains (IGH) (2), while a TR consists of two chains, either Alpha (TRA) and Beta (TRB), or Gamma (TRG) and Delta (TRD) (3). Each IG and TR chain comprises a variable domain (V-DOMAIN) which determines the specificity for the antigen, and a constant region (C-REGION). The V-DOMAIN results from the genomic DNA rearrangement of variable (V), diversity (D) and joining (J) genes for IGH, TRB and TRD chains (V-D-J-REGION) and from V and J genes for IGK, IGL, TRA and TRG chains (V-J-REGION) (Supplementary Figure S1). Additional mechanisms occurring during the rearrangements (N diversity, somatic hypermutations for the IG) contribute to the extreme diversity of the IG and TR (theoretically 1012 different IG and TR per individual, which is only limited by the number of the B and T cells that an organism is genetically programmed to produce). IMGT®, the international ImMunoGeneTics information system® (http://www.imgt.org) (4), was created in 1989 in order to characterize the genes and alleles involved in the IG and TR synthesis of vertebrates. IMGT® is an integrated knowledge system for sequences, genes and structures of the IG or antibodies, TR and major histocompatibility proteins (MH) of the adaptive immune responses, as well as of other proteins of the IG superfamily (IgSF) and MH superfamily (MhSF) of vertebrates and invertebrates. IMGT® comprises 7 databases, 17 online tools (Figure 1A) and >20 000 pages of Web resources.
Figure 1.

IMGT resources. (A) Overview of IMGT databases and tools for genes, sequences and structures. (B) Main databases and datasets in the three axes of IMGT information system.

IMGT resources. (A) Overview of IMGT databases and tools for genes, sequences and structures. (B) Main databases and datasets in the three axes of IMGT information system. The accuracy and the consistency of the IMGT® data are based on IMGT-ONTOLOGY (5,6), the first ontology for immunogenetics and immunoinformatics and IMGT Scientific chart rules. IMGT-ONTOLOGY includes the IMGT structured terminology and the annotation rules and is composed of seven axioms. The IDENTIFICATION axiom provides the standardized keywords for the identification of nucleotide and protein sequences and the 3D structures. The DESCRIPTION axiom comprises the IMGT standardized labels for the description and the delimitation of constitutive motifs within sequences and structures. The CLASSIFICATION axiom defines the criteria for IG and TR genes and alleles classification for the setting of the standardized nomenclature. The NUMEROTATION axiom includes the IMGT unique numbering and its graphical 2D representation, the IMGT Collier de Perles. The LOCALIZATION axiom allows to characterize the localization of IG and TR genes. The ORIENTATION axiom defines the orientation of genomic instances (chromosome, locus and gene) of DNA strands. The OBTENTION axiom precises the biological and methodological origins of the IMGT data (5,6). IMGT® comprises in particular databases which are specialized in nucleotide sequences (IMGT/LIGM-DB) (7), genes and alleles (IMGT/GENE-DB) (8), amino acid sequences and 2D (IMGT/2Dstructure-DB) and 3D structures (IMGT/3Dstructure-DB) (9) and therapeutic monoclonal antibodies (IG, mAb) and other proteins for clinical applications (IMGT/mAb-DB) (4). The four IMGT databases, the related tools and Web resources are described in this manuscript through the three main axes of IMGT research and development: the identification and characterization of IG and TR genes and knowledge of their genomic organization (Axis I), the analysis and exploration of the expressed IG and TR repertoires in normal and pathological situations (Axis II) and the analysis of adaptive immune proteins from antigen receptor to amino acid changes (Axis III) (Figure 1B).

AXIS I Understanding the adaptive immune response: gene characterization and knowledge of their genomic organization

IG and TR chains are encoded by polymorphic multigene families located on different chromosomes. In humans and other mammals, there are seven main loci for IG and TR: three for IG (IGH, IGK and IGL) (2,10) and four for TR (TRA, TRB, TRD and TRG) (3). The V, D, J and constant (C) IMGT gene names were assigned according to the concepts of the CLASSIFICATION axiom (5,6) and were approved by the Human Genome Organization (HUGO) Nomenclature Committee (HGNC) for human (11) in 1999 and were endorsed by the WHO IUIS Nomenclature Subcommittee for IG and TR (12). The characterization of genes and alleles for the seven loci of human (Homo sapiens) and mouse (Mus musculus) were published in 2001 and 2005. The organization of the genes within these loci was deduced and built from the complete annotation of the genomic nucleotide sequences and contigs integrated in the IMGT nucleotide sequence database IMGT/LIGM-DB (7) from European Nucleotide Archive (ENA) (13) and GenBank (14). IMGT genes and alleles are managed in the IMGT gene database IMGT/GENE-DB (8) and displayed in IMGT Repertoire (IMGT Web resources) and IMGT tools (http://www.imgt.org/IMGTposters/Poster-10th-Biocuration-Conference2017.pdf). With the introduction of genome assemblies, which have become available in NCBI assembly (15) and Ensembl (16), IMGT® developed a new approach and new concepts in order to decipher complete IG and TR loci. First of all, IMGT® defines conserved genes that flank the IG and TR loci, designated as ‘IMGT bornes’. IMGT bornes are genes coding for proteins other than IG or TR, which are conserved among species. They are located either upstream of the first IG or TR gene (IMGT_locus_5prime_borne) or downstream of the last IG or TR gene (IMGT_locus_3prime_borne) of the IMGT locus. If the IMGT bornes are identified and are at most 10 kb away from the closest IG or TR genes, they will be included in the locus genomic nucleotide sequences available through IMGT/LIGM-DB. These IMGT bornes have allowed to set a standardized delimitation of the locus whatever the species and they are helpful for comparative genomics. However, such conserved non IG or TR genes could not be systematically defined (n.d.) up to now, as for example for the IGH locus. In absence of the IMGT borne, the limit of the locus is artificially defined by 10 kb in 5′ upstream of the first IG or TR gene and in 3′ downstream from the last IG or TR gene. TRB is an example of locus with delimited IMGT bornes and can be accessed on the page http://www.imgt.org/IMGTrepertoire/LocusGenes/bornes/bornesTRB.html.

IMGT/LIGM-DB

IMGT/LIGM-DB provides standardized and detailed immunogenetics annotations for IG, TR and MH nucleotide sequences from human and other vertebrate species (7). IMGT/LIGM-DB includes sequences from different steps of IG and TR synthesis and therefore, it integrates: (i) large germline (non-rearranged) genomic DNA (gDNA) sequences, which may involve a complete locus from several hundred kilobases to one (or more) megabase(s); (ii) rearranged gDNA sequences resulting from the recombination of V, J genes or V, D and J genes; and (iii) rearranged V-J-C and V-D-J-C complementary DNA (cDNA) sequences. Most of the IMGT/LIGM-DB nucleotide sequences come from ENA and from GenBank, using the same accession numbers to facilitate interoperability with the generalist nucleotide databases. More recently, with the extraction of IG and TR loci nucleotide sequences from NCBI genome assemblies, IMGT® created new IMGT/LIGM-DB accession numbers starting with ‘IMGT’ followed by 6 digits. IMGT/LIGM-DB sequences are annotated according to IMGT-ONTOLOGY concepts of the DESCRIPTION axiom (5,6), with IMGT labels (http://www.imgt.org/ligmdb/label) and IMGT qualifiers (http://www.imgt.org/ligmdb/qualifier.action). In order to delimit and annotate a complete IG or TR locus extracted from genome assemblies, a specific IMGT label and a set of IMGT qualifiers has been created for its description (Table 1).
Table 1.

New IMGT concepts and their definitions

New IMGT conceptsDefinition
IMGT labelIMGT-LOCUS-UNITgDNA of an immunoglobulin (IG) or T cell receptor (TR) IMGT locus unit from chromosome genomic assembly, that starts at the 5 prime (5′) end of the most 5′ IG or TR GENE-UNIT in the locus and ends at the 3 prime (3′) end of the most 3′ IG or TR GENE-UNIT in the locus
IMGT qualifiersIMGT_locus_3prime_borneName of the gene identified as the 3′ borne of an IMGT-LOCUS-UNIT
IMGT_locus_3prime_geneIMGT gene name of the most 3′ IG or TR GENE-UNIT of an IMGT-LOCUS-UNIT
IMGT_locus_5prime_borneName of the gene identified as the 5′ borne of an IMGT-LOCUS-UNIT
IMGT_locus_5prime_geneIMGT gene name of the most 5′ IG or TR GENE-UNIT of an IMGT-LOCUS-UNIT
IMGT_locus_lengthLength of an IMGT-LOCUS-UNIT in kb or in bp
IMGT_locus_nameName of an IMGT-LOCUS-UNIT, that includes the Latin genus and species name and the IMGT locus type
IMGT_locus_orientationOrientation of an IMGT-LOCUS-UNIT on a chromosome, is either forward (FWD) or reverse (REV)
IMGT_locus_positionsPositions of an IMGT-LOCUS-UNIT on a chromosome
IMGT_locus_typeIMGT locus type (in higher vertebrates: IGH, IGK, IGL, TRA, TRB, TRG, TRD) of an IMGT-LOCUS-UNIT
New IMGT concepts and their definitions

IMGT/LIGM-DB interface

The IMGT/LIGM-DB data are accessible via a user-friendly interface described previously in (7). IMGT/LIGM-DB can be queried by: Accession number, IMGT-ONTOLOGY concepts (IDENTIFICATION or Keywords, CLASSIFICATION, DESCRIPTION or labels, OBTENTION), or bibliographical references. For each nucleotide sequence, IMGT/LIGM-DB provides ‘View details’ displaying an IMGT/LIGM-DB entry according to nine topics: annotations, IMGT flat file, coding regions with protein translation, catalogue and external references, sequence in IMGT/LIGM-DB dump format, sequence in FASTA format, sequence with three reading frames, EMBL flat file, and a direct link to IMGT/V-QUEST (17). As of September 2021, IMGT/LIGM-DB contains 196,516 entries from 358 species and 48,682 IG and TR nucleotide sequences are fully annotated. Weekly release of IMGT/LIGM-DB flat files can be downloaded directly from the IMGT web site (http://www.imgt.org/download/LIGM-DB/) and from ENA (http://ftp.ebi.ac.uk/pub/databases/imgt/LIGM-DB/).

IMGT/GENE-DB

The curated IG and TR genes are entered and managed in IMGT/GENE-DB (8) with all IMGT identified alleles, which highlight the potential high polymorphism of these genes. Each allele is characterized by its IMGT reference allele sequence defined for the coding label V-REGION (with gaps according to the IMGT numbering (18)), D-REGION, J-REGION and C-REGION (or C exons) (with gaps for C-DOMAIN according to the IMGT numbering (19)) of the V, D, J and C genes respectively. An IMGT allele reference sequence is identified by IMGT/LIGM-DB accession number, IMGT gene and allele names, species, allele functionality and IMGT label. IMGT allele reference sequences compose the IMGT reference directories that are used by IMGT sequence analysis tools and by IMGT databases and IMGT Web resources for sequence comparison.

IMGT/GENE-DB interface

From the IMGT/GENE-DB Query page, search can be performed by IMGT-ONTOLOGY concepts (IDENTIFICATION or keywords, LOCALIZATION, and CLASSIFICATION), LOCALIZATION IN GENOME ASSEMBLIES or IMGT/GENE-DB direct links. IMGT/GENE-DB provides a full access to characterized genes and alleles displaying an IMGT/GENE-DB entry according to six topics: IMGT gene name and definition, Chromosomal localization, IMGT reference alleles, Annotated IMGT/LIGM-DB cDNA and rearranged genomic DNA sequences, Annotated IMGT/3Dstructure-DB structures, and External links. The section ‘LOCALIZATION IN GENOME ASSEMBLIES’ created in 2015, provides the localizations of the genes and alleles, and IMGT labels in the reference genome assemblies available at NCBI. For each gene, its orientation in the locus is mentioned, and the allele identified in the sequence of the assembly is indicated with its characteristics. The ‘IMGT/GENE-DB direct links’ allows to query dynamically the database, on IMGT gene name, IMGT Group, and to extract labels from the reference sequences of a given gene or gene group. The format for IMGT/GENE-DB direct links is described in http://www.imgt.org/genedb/directlinks. As of September 2021, IMGT/GENE-DB contains 8,498 genes, 11,349 alleles from human, mouse and other vertebrates. The reference sequences of the IG and TR genes in FASTA format are accessible by group and species from http://www.imgt.org/vquest/refseqh.html#refdir2. IMGT/GENE-DB has a specific section in the ‘IMGT downloads’ section, updated weekly, of the IMGT® portal (http://www.imgt.org/download/GENE-DB/) in different formats. With the development of new high throughput sequencing technologies for the analysis of IG and TR repertoires, new potential alleles are highlighted by inference from expressed repertoires, particularly in human. Inferred alleles are not systematically integrated within the IMGT databases, because the sequences are not mapped. However, IMGT® can accept inferred alleles if and only if validated by the Working Group (WG) Inferred Allele Review Committee (IARC), within the Adaptive Immune Receptor Repertoire (AIRR) community. IARC ensures that IMGT data quality requirements are met. Nevertheless, reference sequences of inferred alleles are replaced by the corresponding germline DNA sequence once they are characterized (20).

IMGT Repertoire

An overview of IMGT® annotated data is compiled and knowledge pages are made available in IMGT Web Resources ‘IMGT Repertoire’ (http://imgt.org/IMGTrepertoire/), the global ImMunoGeneTics Web Resource for IG, TR, MH of human and other vertebrate species. IMGT Repertoire includes seven organized sections: Locus and genes, Proteins and alleles, 2D and 3D structures, Probes and RFLP, Taxonomy, Gene regulation and expression, Genes and clinical entities. Novel IMGT Repertoire (IG and TR) pages in Locus and genes section were created, focusing on the ‘Locus descriptions’, including Locus bornes, Locus in genome assembly and Locus gene order. As of September 2021, the number of species present in the IMGT Repertoire reaches 80 species. For each gene analyzed, there are >200 different information fields available in IMGT databases and web pages. Therefore, IMGT Repertoire bridges the gap between curated data resulting from Axis I and IMGT databases and tools (Table 2).
Table 2.

54 fully annotated IG and TR loci are available in IMGT databases and tools, among these 54 loci, 50 have an IMGT locus accession number and 4 (with * in this table) have accession numbers from ENA, NCBI and Ensembl contigs built before the creation of IMGT Locus accession numbers. Note that the IMGT® biocuration of the first two fully annotated species, human (Homo sapiens) and mouse (Mus musculus) are not shown in this table. More information is available in http://www.imgt.org/IMGTrepertoire/LocusGenes/

TaxonSpeciesNCBI AssemblyLocusChromosomal localizationNCBI Chromosome Accession numbersIMGT locus Accession numbers
MAMMALIA EUTHERIA (placentals) Bos taurus (bovine) Breed: HerefordARS-UCD1.2IGK11CM008178.2IMGT000047
IGL17CM008184.2IMGT000046
TRA10CM008177.2IMGT000049
TRD10CM008177.2IMGT000049
Bos taurus (bovine) Breed: HolsteinUnknownIGH21q24Unknown*
Bos taurus (bovine)UnknownTRG4Unknown*
Camelus dromedarius (Arabian camel)CamDro3IGK28CM016654.2IMGT000061
Canis lupus familiaris (dog) Breed: BoxerCanFam3.1IGH8CM000008.3IMGT000001
IGK17CM000017.3IMGT000002
IGL26CM000026.3IMGT000003
TRA8CM000008.3IMGT000004
TRB16CM000016.3IMGT000005
TRD8CM000008.3IMGT000004
TRG18CM000018.3IMGT000006
Canis lupus familiaris (dog) Breed: BasenjiBasenji_breed-1.1IGK17CM016447.1IMGT000067
Capra hircus (goat) Breed: San ClementeARS1IGK11CM004572.1IMGT000009
IGL17CM004578.1IMGT000033
Equus caballus (horse) Breed: ThoroughbredEquCab3.0IGH24CM009171.1IMGT000040
IGK15CM009162.1IMGT000053
Equus caballus (horse) Breed: ThoroughbredEquCab2.0IGK15CM000391.2IMGT000060
Felis catus (domestic cat) Breed: AbyssinianFelis_catus_9.0IGKA3CM001380.3IMGT000050
IGLD3CM001389.3IMGT000038
TRAB3CM001383.3IMGT000045
TRBA2CM001379.3IMGT000037
TRDB3CM001383.3IMGT000045
TRGA2CM001379.3IMGT000036
Macaca fascicularis (crab-eating macaque)Macaca_fascicularis_5.0TRB3CM001921.1IMGT000075
Macaca mulatta (Rhesus monkey) Isolate: AG07107Mmul_10IGH7CM014342.1IMGT000064
IGK13CM014348.1IMGT000063
IGL10CM014345.1IMGT000062
TRB3CM014338.1IMGT000073
TRG3CM014338.1IMGT000059
Macaca mulatta (Rhesus monkey) Isolate: 17573Mmul_8.0.1TRA7CM002991.3IMGT000013
TRB3CM002984.2IMGT000012
TRD7CM002991.3IMGT000013
Mustela putorius furo (Domestic ferret) Breed: SableMusPutFur1.0TRBUnknownUnplaced genomic scaffoldIMGT000023
Oryctolagus cuniculus (rabbit) Breed: Thorbecke inbredOryCun2.0TRA17CM000806.1IMGT000031
TRBUnknownUnplaced genomic scaffoldIMGT000032
TRD17CM000806.1IMGT000031
TRG10CM000799.1IMGT000030
Ovis aries (sheep) Breed: TexelOar_v4.0IGK3CM001584.2IMGT000010
IGL17CM001598.2IMGT000034
Ovis aries (sheep) Breed: RambouilletOar_rambouillet_v1.0IGL17CM008488.1IMGT000041
TRA7CM008478.1IMGT000048
TRB4CM008475.1IMGT000042
TRD7CM008478.1IMGT000048
Rattus norvegicus (Norway rat) Strain: BN; Sprague-DawleyRn_Celera Alternate Assembly AC_000074.1IGH6q32,33CM000236.2*
Sus scrofa (pig) Breed: DurocSscrofa11.1TRB18CM000829.5IMGT000039
Tursiops truncatus (bottlenose dolphin)turTru1 (Ensembl assembly)TRAUnknownEnsembl genomic scaffoldIMGT000016
IMGT000017
IMGT000018
IMGT000020
TRDUnknownIMGT000016
IMGT000017
IMGT000018
Tursiops truncatus (bottlenose dolphin) Isolate: MMESES2002162SCNIST Tur_tru v1TRGUnknownUnplaced genomic scaffoldIMGT000015
Aves Gallus gallus (chicken) Breed: Red Jungle fowlGRCg6IGH31CM003638.2IMGT000014
Gallus_gallus-5.0UnknownUnplaced genomic scaffoldIMGT000007
Teleostei Danio rerio (zebrafish) Isolate: TuebingenGRCz11IGH3CM002887.2*
Oncorhynchus mykiss (Rainbow trout) Isolate: SwansonOmyk_1.0IGH13CM007947.1IMGT000043
12CM007946.1IMGT000044
Salmo salar (Atlantic salmon) Breed: double haploidICSASG_v2IGH6CM003284.1IMGT000028
3CM003281.1IMGT000029
54 fully annotated IG and TR loci are available in IMGT databases and tools, among these 54 loci, 50 have an IMGT locus accession number and 4 (with * in this table) have accession numbers from ENA, NCBI and Ensembl contigs built before the creation of IMGT Locus accession numbers. Note that the IMGT® biocuration of the first two fully annotated species, human (Homo sapiens) and mouse (Mus musculus) are not shown in this table. More information is available in http://www.imgt.org/IMGTrepertoire/LocusGenes/ IMGT® has recently performed the biocuration of the IG and TR loci of several veterinary species which are useful for biotechnological applications that can also be applied to human medicine (21–27). IMGT Biocuration makes possible the understanding of the gene characterization and the genomic organization of IG and TR, which provide a better understanding of the adaptive immune response.

AXIS II Exploring the expressed IG and TR repertoires

The analysis of the expressed IG and TR repertoires has become an essential step for the study and the understanding of the adaptive response in normal (infectious diseases, vaccination) and pathological situations (autoimmune diseases, cancers) especially since the advent of high throughput sequencing (HTS) over a decade ago. Basically, this analysis relies on the comparison of the expressed V-DOMAIN with the reference sequences of IG and TR genes and alleles. The dedicated and widely used IMGT tools for the IG and TR V-DOMAIN nucleotide sequence analysis are IMGT/V-QUEST (17) and its high throughput version IMGT/HighV-QUEST (28,29). The IMGT/V-QUEST reference directories used by both tools for sequence comparison are defined of IG and TR gene and allele data from species managed in IMGT/GENE-DB and in the IMGT Web resources. They comprise one sequence per V-REGION, D-REGION, J-REGION of functional, ORF and in-frame pseudogenes V, D and J genes and alleles respectively. V-REGION are gapped according to the IMGT unique numbering (18). Table 3 summarizes the IMGT/V-QUEST reference directories per species and locus available for V-DOMAIN analysis.
Table 3.

IMGT/V-QUEST reference directories for the analysis of rearranged V-DOMAIN (release 202135–4 on 2 September 2021).

IMGT/V-QUEST reference directories
TaxonSpeciesIGTR
MAMMALIA EUTHERIA (placentals) Homo sapiens (human)IGH, IGK, IGLTRA, TRB, TRG, TRD
Mus musculus (mouse)IGH, IGK, IGLTRA, TRB, TRG, TRD
Aotus nancymaae (Ma's night monkey)TRA, TRG
Bos taurus (bovine)IGH, IGK, IGLTRA, TRG, TRD
Camelus dromedarius (Arabian camel)IGKTRB, TRG
Canis lupus familiaris (dog)IGH, IGK, IGLTRA, TRB, TRG, TRD
Capra hircus (goat)IGK, IGL
Equus caballus (horse)IGH, IGK
Felis catus (domestic cat)IGK, IGLTRA, TRB, TRG, TRD
Macaca fascicularis (crab-eating macaque)IGHTRB
Macaca mulatta (Rhesus monkey)IGH, IGK, IGLTRA, TRB, TRG, TRD
Mustela putorius furo (ferret)TRB
Oryctolagus cuniculus (rabbit)IGH, IGK, IGLTRA, TRB, TRG, TRD
Ovis aries (sheep)IGH, IGK, IGLTRA, TRB, TRD
Rattus norvegicus (Norway rat)IGH, IGK, IGL
Sus scrofa (pig)IGH, IGK, IGLTRB
Tursiops truncatus (bottlenose dolphin)TRA, TRG, TRD
Vicugna pacos (alpaca)IGH
MAMMALIA PROTHERIA (monotremes) Ornithorhynchus anatinus (platypus)IGH
Aves Gallus gallus (chicken)IGH, IGL
Teleostei Danio rerio (zebrafish)IGH, IGITRA, TRD
Oncorhynchus mykiss (Rainbow trout)IGHTRB
Salmo salar (Atlantic salmon)IGH
IMGT/V-QUEST reference directories for the analysis of rearranged V-DOMAIN (release 202135–4 on 2 September 2021). The classical functionalities of IMGT/V-QUEST and IMGT/HighV-QUEST tools have been described previously (17,28–30) and the main results deduced from alignments with the IMGT reference directories by the tools are listed in Table 4.
Table 4.

IMGT/V-QUEST reference directory based alignment results for nucleotide V-DOMAIN analysis

IMGT/V-QUEST reference directory setsIMGT toolsResults for IG and TR V-DOMAIN
V, D, J reference sequences per species and per locusIMGT/V-QUEST IMGT/HighV-QUEST1. Introduction of IMGT gaps according to the IMGT unique numbering (18)
2. Identification of the closest germline V, D and J genes and alleles
3. Delimitation of the FR-IMGT and CDR-IMGT
Closest germline V gene and allele5. Identification of indels and their corrections (optional) (17)
6. Evaluation of the percentage of identity for the V-REGION
7. Description of mutations and amino acid (AA) changes (transitions, transversions, codon change, qualification of AA change according to the eleven IMGT AA classes (31), localisation of mutation hotspot motifs)
Closest V, D, J genes and allelesPerformed by the integrated IMGT/JunctionAnalysis (32).8. Analysis of the Junction
IMGT/V-QUEST IMGT/HighV-QUEST9. Evaluation of the V-DOMAIN functionality
Performed by the integrated IMGT/Automat (33)10. Complete V-DOMAIN annotation (33)
IMGT/V-QUEST11. Advanced functionality for Clinical application: search for CLL subsets #2 and #8 (optional) (34,35)
IMGT/V-QUEST reference directory based alignment results for nucleotide V-DOMAIN analysis It should be noticed that the V-DOMAIN analysis based on the IMGT/V-QUEST directories has been extended to two new advanced functionalities, one related to the antibody engineering for analysis and annotation of scFv (sequences comprising 2 IG or TR V-DOMAIN covalently linked by a linker) (30) and the second one related to clinical applications with identification of sequences that could be assigned to stereotyped subsets 2 and 8 of Chronic Lymphocytic Leukemia (CLL), related to a non-favourable prognostic outcome (34,35). Interestingly, the characterization of the IMGT clonotypes (AA) and the evaluation of profiles for clonal diversity and expression (36) performed by statistic module of IMGT/HighV-QUEST and the subsequent statistical analysis (37) also rely on the results deduced from the alignment of the IMGT/V-QUEST reference directory sets. IMGT reference directory sets are used by other external tools dedicated to IG and TR analysis based on sequence comparison such as IgBLAST (38) and MiXCR (39). The IMGT/V-QUEST reference directory sets are regularly enriched with the results of Axis I, whether it is the integration of a new species or the upgrade of existing repertoires. Each update gives rise to a new IMGT/V-QUEST reference directory release (see http://www.imgt.org/IMGT_vquest/data_releases). Links to the IMGT/V-QUEST reference directory sets per species, locus and gene type are available in IMGT reference directory in FASTA format (IG and TR) from http://www.imgt.org/vquest/refseqh.html#VQUEST and from the IMGT/V-QUEST Welcome page.

AXIS III IMGT 2D and 3D structure databases and tools for analysis of the adaptive immune proteins

Considering the great complexity of the immune proteins, their interactions with the antigens and their high number of published sequences, the classification and the detailed annotation are very difficult tasks, especially at the structural level. Therefore, a specialized 3D immune protein database was established to identify the genes and alleles encoding these proteins through alignment against the amino acid IMGT reference directory, provided by Axis I. Since 2001, IMGT/3Dstructure-DB (9) has provided IMGT annotations and contact analysis for immune proteins structural data. From 2008 onwards, AA sequences of mAb and fusion proteins for immune applications from World Health Organization (WHO) - International Nonproprietary Names (INN) programme (40,41) are being incorporated in IMGT/2Dstructure-DB, a section of IMGT/3Dstructure-DB. To bring together information about therapeutic proteins and to facilitate their access, IMGT/mAb-DB was made available online in 2010. IMGT/mAb-DB extends 2D and 3D annotations with a unique resource on mAbs and relevant therapeutic metadata. Figure 2 provides a schematic representation of the whole procedure.
Figure 2.

Axis III workflow overview. The left panel displays all analysis processes: the data input panel shows the sources of the data present in the Axis III databases. The data analysis procedure is done by IMGT experts and it includes the analysis of the amino acid sequences and the 3D structures. Data are stored in IMGT/2Dstructure-DB and IMGT/3Dstructure-DB and linked to IMGT/mAb-DB. The right panel shows the user interface that provides flexible ways to query the data sets. The result page panel shows the online tools, such as IMGT/DomainGapAlign (Chain Details) and IMGT/Collier-de-Perles included in IMGT/2Dstructure-DB and Paratope/Epitope description and 3D structure incorporated in IMGT/3Dstructure-DB. IMGT/mAb-DB centralizes the information present in the other databases and adds several metadata for therapeutic proteins.

Axis III workflow overview. The left panel displays all analysis processes: the data input panel shows the sources of the data present in the Axis III databases. The data analysis procedure is done by IMGT experts and it includes the analysis of the amino acid sequences and the 3D structures. Data are stored in IMGT/2Dstructure-DB and IMGT/3Dstructure-DB and linked to IMGT/mAb-DB. The right panel shows the user interface that provides flexible ways to query the data sets. The result page panel shows the online tools, such as IMGT/DomainGapAlign (Chain Details) and IMGT/Collier-de-Perles included in IMGT/2Dstructure-DB and Paratope/Epitope description and 3D structure incorporated in IMGT/3Dstructure-DB. IMGT/mAb-DB centralizes the information present in the other databases and adds several metadata for therapeutic proteins.

IMGT/3Dstructure-DB functionalities

The IMGT/3Dstructure-DB structural data are extracted from the Research Collaboratory for Structural Bioinformatics (RCSB) Protein Data Bank (PDB) (42) and annotated according to the IMGT Scientific chart rules based on the IMGT-ONTOLOGY concepts (5,6,43). IMGT/3Dstructure-DB integrates the IMGT/DomainGapAlign tool (44), which aligns the AA sequences per domain, creates gaps according to the IMGT unique numbering and highlights differences with the closest reference genes and alleles found in the IMGT reference directory. 3D structure analysis includes chain annotation, paratope/epitope description of IG/antigen and TR/pMH complexes and contact analysis.

IMGT/2Dstructure-DB functionalities

The IMGT/2Dstructure-DB data include AA sequences of immune proteins, which are retrieved from WHO-INN programme (41) and from Kabat database (45). The AA sequences are analysed with the IMGT® criteria of the standardized IDENTIFICATION axiom, DESCRIPTION axiom, CLASSIFICATION axiom and NUMEROTATION axiom (5,6), and the V, C and G domain sequences are numbered according to the IMGT unique numbering (18,19,44). Amino acid sequences from the WHO-INN programme have been provided since 2008 (IMGT entry type INN). This programme provides names for pharmaceutical substances recognized worldwide in biannual lists. The IMGT INN data include mAb, fusion proteins for immune application (FPIA), composite proteins for clinical applications (CPCA) and related proteins of the immune system (RPI). The INN name, INN number, common name, commercial name, Proposed and Recommended lists are available for each entry, along with the IMGT receptor description, the target and the molecule species. Recently, AA sequences of CAR-T (chimeric antigen receptor T cell) and TR were made available in IMGT/2Dstructure-DB, also from WHO-INN, after translating the nucleotide sequences and analysing them according to standardized IMGT information on chains and domains by IMGT experts. IMGT/2Dstructure-DB and IMGT/3Dstructure-DB use the same interface via which amino acid sequences and 3D structures for immunological proteins can be queried and analysed. Currently, their algorithms have been revisited and they are more robust and efficient. Around 100 new structures are automatically retrieved from PDB per month. As of September 2021, the IMGT/3Dstructure-DB and IMGT/2Dstructure-DB have 7,657 entries, 6,533 PDB, 788 INN and 336 KAB.

IMGT/mAb-DB for therapeutic proteins

IMGT/mAb-DB provides a unique resource on mAbs and other therapeutic proteins. This database facilitates access to the therapeutic proteins present in IMGT/2Dstructure-DB and IMGT/3Dstructure-DB. The database is updated twice per year, in line with WHO-INN lists. In addition, metadata are constantly enriched from regulatory agencies as FDA and EMA. As of September 2021, the IMGT/mAb-DB contains 1,189 entries (1,033 IG, 53 RPI, 62 CPCA, 36 FPIA and 5 TR). The IMGT/mAb-DB provides information about many therapeutic metadata. The ‘Specificity target name’ allows to select mAbs that bind to a particular antigen, for instance SARS-CoV-2. Results are returned in a table format, i.e. nine entries (eight mAbs and one CPCA) are shown for ‘Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2)’ specificity target query. The common name, the INN name and number, as well as the Proprietary name (when available) are listed in the first columns. Following AA sequence analysis by IMGT® experts, the molecule information as receptor type, IG species, IG class and subclass are shown within the table. A standardized graphical format of the molecule, based on antibody INN definition, that facilitates the visualization of the molecule, is available in the database. Links to AA sequences (IMGT/2Dstructure-DB) and 3D structures (IMGT/3Dstructure-DB) are shown. The gene name of the target is linked to HGNC or VGNC pages that assign standardized names and unique symbols to genes for human or vertebrate loci, respectively (11). Other therapeutic metadata such as ‘Company’, ‘Clinical trials’ and ‘Authority decisions’ are also accessible in the result table. The therapeutic monoclonal antibody engineering field represents a real promising potential in medicine (46–48). The rich, precise and standardized information available via IMGT/mAb-DB provides a unique and useful resource to the scientific community.

CONCLUSION

IMGT® provides to the scientific community a huge amount of knowledge and curated data in the field of immunogenetics, from genome to proteome through IMGT databases, IMGT tools and IMGT Web resources, which represent >20 000 html pages. To our knowledge, the richness of the website is still unmatched in 2021. IMGT metadata in the IMGT databases, tools and Web resources are based on IMGT-ONTOLOGY, the first ontology in immunogenetics and immunoinformatics. IMGT research and development rely on three main axes which correspond to the deciphering of the IG and TR loci, genes and alleles in the genomes of jawed vertebrates (Axis I), the exploration of the expressed IG and TR repertoires (Axis II), and the analysis of the 2D and 3D structures of the adaptive immune proteins (Axis III). We focussed on the most recent data integrated in IMGT/LIGM-DB and IMGT/GENE-DB, the extraction of the complete IG and TR loci from genome assemblies and on the creation of terminology and new concepts for their annotation. A new section in IMGT/GENE-DB was created to provide links between genes and alleles of the IG and TR loci and their localization in genome assemblies (for interoperability with genome sites). IMGT tools and IMGT reference directories for the analysis of expressed IG and TR repertoire are regularly updated. Regarding the importance of the chemical interactions in the antibody specificity, affinity and half-life, the IMGT/2Dstructure-DB, IMGT/3Dstructure-DB and IMGT/mAb-DB provide an integrated and standardized approach for the description of new engineered antibody formats. This approach can be used for the construction and expression of engineered antibodies towards targeted and customized therapy in the context of personalized medicine. The three IMGT axes are heavily interconnected and there is a constant flow of information among them. IMGT® is continuing the standardization efforts and the improvement of application of the FAIR principles (49) in order to enhance the quality, findability, accessibility, interoperability and reusability of IMGT data and metadata. To be Findable, IMGT databases use unique and persistent identifiers (IMGT/LIGM-DB, IMGT/2Dstructure-DB, IMGT/3Dstructure-DB and IMGT/mAb-DB) and are described with rich metadata based on IMGT-ONTOLOGY and IMGT Scientific chart rules. To be Accessible, IMGT data and metadata are freely available for academics. In addition, IMGT/GENE-DB can be dynamically queried through HTML direct links. To be Interoperable and Reusable, IMGT data and metadata have links to their sources and related databases, all IMGT sequence data are available in FASTA format, widely accepted by many bioinformatics programs and are described with their relevant attributes. Furthermore, the IMGT download sections for the IMGT reference directories ensure the follow up of new releases and facilitate the extraction and the reusability of the data by external tools.

DATA AVAILABILITY

IMGT® is freely available online for academics and non-profit use at http://www.imgt.org/. All the databases and tools referred to in this article are accessible from IMGT® webpage. Click here for additional data file.
  45 in total

1.  Antibody nomenclature: from IMGT-ONTOLOGY to INN definition.

Authors:  Marie-Paule Lefranc
Journal:  MAbs       Date:  2011-01-01       Impact factor: 5.857

2.  The T cell receptor (TRA) locus in the rabbit (Oryctolagus cuniculus): Genomic features and consequences for invariant T cells.

Authors:  Stanislas Mondot; Olivier Lantz; Marie-Paule Lefranc; Pierre Boudinot
Journal:  Eur J Immunol       Date:  2019-08-13       Impact factor: 5.532

3.  IMGT/GENE-DB: a comprehensive database for human and mouse immunoglobulin and T cell receptor genes.

Authors:  Véronique Giudicelli; Denys Chaume; Marie-Paule Lefranc
Journal:  Nucleic Acids Res       Date:  2005-01-01       Impact factor: 16.971

4.  IMGT®, the international ImMunoGeneTics information system® 25 years on.

Authors:  Marie-Paule Lefranc; Véronique Giudicelli; Patrice Duroux; Joumana Jabado-Michaloud; Géraldine Folch; Safa Aouinti; Emilie Carillon; Hugo Duvergey; Amélie Houles; Typhaine Paysan-Lafosse; Saida Hadi-Saljoqi; Souphatta Sasorith; Gérard Lefranc; Sofia Kossida
Journal:  Nucleic Acids Res       Date:  2014-11-05       Impact factor: 19.160

Review 5.  Inferred Allelic Variants of Immunoglobulin Receptor Genes: A System for Their Evaluation, Documentation, and Naming.

Authors:  Mats Ohlin; Cathrine Scheepers; Martin Corcoran; William D Lees; Christian E Busse; Davide Bagnara; Linnea Thörnqvist; Jean-Philippe Bürckert; Katherine J L Jackson; Duncan Ralph; Chaim A Schramm; Nishanth Marthandan; Felix Breden; Jamie Scott; Frederick A Matsen Iv; Victor Greiff; Gur Yaari; Steven H Kleinstein; Scott Christley; Jacob S Sherkow; Sofia Kossida; Marie-Paule Lefranc; Menno C van Zelm; Corey T Watson; Andrew M Collins
Journal:  Front Immunol       Date:  2019-03-18       Impact factor: 7.561

6.  IMGT® Biocuration and Comparative Study of the T Cell Receptor Beta Locus of Veterinary Species Based on Homo sapiens TRB.

Authors:  Perrine Pégorier; Morgane Bertignac; Imène Chentli; Viviane Nguefack Ngoune; Géraldine Folch; Joumana Jabado-Michaloud; Saida Hadi-Saljoqi; Véronique Giudicelli; Patrice Duroux; Marie-Paule Lefranc; Sofia Kossida
Journal:  Front Immunol       Date:  2020-05-05       Impact factor: 7.561

7.  Ensembl 2021.

Authors:  Kevin L Howe; Premanand Achuthan; James Allen; Jamie Allen; Jorge Alvarez-Jarreta; M Ridwan Amode; Irina M Armean; Andrey G Azov; Ruth Bennett; Jyothish Bhai; Konstantinos Billis; Sanjay Boddu; Mehrnaz Charkhchi; Carla Cummins; Luca Da Rin Fioretto; Claire Davidson; Kamalkumar Dodiya; Bilal El Houdaigui; Reham Fatima; Astrid Gall; Carlos Garcia Giron; Tiago Grego; Cristina Guijarro-Clarke; Leanne Haggerty; Anmol Hemrom; Thibaut Hourlier; Osagie G Izuogu; Thomas Juettemann; Vinay Kaikala; Mike Kay; Ilias Lavidas; Tuan Le; Diana Lemos; Jose Gonzalez Martinez; José Carlos Marugán; Thomas Maurel; Aoife C McMahon; Shamika Mohanan; Benjamin Moore; Matthieu Muffato; Denye N Oheh; Dimitrios Paraschas; Anne Parker; Andrew Parton; Irina Prosovetskaia; Manoj P Sakthivel; Ahamed I Abdul Salam; Bianca M Schmitt; Helen Schuilenburg; Dan Sheppard; Emily Steed; Michal Szpak; Marek Szuba; Kieron Taylor; Anja Thormann; Glen Threadgold; Brandon Walts; Andrea Winterbottom; Marc Chakiachvili; Ameya Chaubal; Nishadi De Silva; Bethany Flint; Adam Frankish; Sarah E Hunt; Garth R IIsley; Nick Langridge; Jane E Loveland; Fergal J Martin; Jonathan M Mudge; Joanella Morales; Emily Perry; Magali Ruffier; John Tate; David Thybert; Stephen J Trevanion; Fiona Cunningham; Andrew D Yates; Daniel R Zerbino; Paul Flicek
Journal:  Nucleic Acids Res       Date:  2021-01-08       Impact factor: 16.971

8.  IMGT/3Dstructure-DB and IMGT/DomainGapAlign: a database and a tool for immunoglobulins or antibodies, T cell receptors, MHC, IgSF and MhcSF.

Authors:  François Ehrenmann; Quentin Kaas; Marie-Paule Lefranc
Journal:  Nucleic Acids Res       Date:  2009-11-09       Impact factor: 16.971

9.  Assembly: a resource for assembled genomes at NCBI.

Authors:  Paul A Kitts; Deanna M Church; Françoise Thibaud-Nissen; Jinna Choi; Vichet Hem; Victor Sapojnikov; Robert G Smith; Tatiana Tatusova; Charlie Xiang; Andrey Zherikov; Michael DiCuccio; Terence D Murphy; Kim D Pruitt; Avi Kimchi
Journal:  Nucleic Acids Res       Date:  2015-11-17       Impact factor: 16.971

View more
  1 in total

1.  The 2022 Nucleic Acids Research database issue and the online molecular biology database collection.

Authors:  Daniel J Rigden; Xosé M Fernández
Journal:  Nucleic Acids Res       Date:  2022-01-07       Impact factor: 16.971

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.