Literature DB >> 15608191

IMGT/GENE-DB: a comprehensive database for human and mouse immunoglobulin and T cell receptor genes.

Véronique Giudicelli1, Denys Chaume, Marie-Paule Lefranc.   

Abstract

IMGT/GENE-DB is the comprehensive IMGT genome database for immunoglobulin (IG) and T cell receptor (TR) genes from human and mouse, and, in development, from other vertebrates. IMGT/GENE-DB is the international reference for the IG and TR gene nomenclature and works in close collaboration with the HUGO Nomenclature Committee, Mouse Genome Database and genome committees for other species. IMGT/GENE-DB allows a search of IG and TR genes by locus, group and subgroup, which are CLASSIFICATION concepts of IMGT-ONTOLOGY. Short cuts allow the retrieval gene information by gene name or clone name. Direct links with configurable URL give access to information usable by humans or programs. An IMGT/GENE-DB entry displays accurate gene data related to genome (gene localization), allelic polymorphisms (number of alleles, IMGT reference sequences, functionality, etc.) gene expression (known cDNAs), proteins and structures (Protein displays, IMGT Colliers de Perles). It provides internal links to the IMGT sequence databases and to the IMGT Repertoire Web resources, and external links to genome and generalist sequence databases. IMGT/GENE-DB manages the IMGT reference directory used by the IMGT tools for IG and TR gene and allele comparison and assignment, and by the IMGT databases for gene data annotation. IMGT/GENE-DB is freely available at http://imgt.cines.fr.

Entities:  

Mesh:

Year:  2005        PMID: 15608191      PMCID: PMC539964          DOI: 10.1093/nar/gki010

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

IMGT/GENE-DB, part of IMGT, the international ImMunoGeneTics information system®, http://imgt.cines.fr (1–4) is the comprehensive IMGT genome database, which has been developed to classify the immunoglobulin (IG) and the T cell receptor (TR) genes from vertebrate species, and to standardize and manage the complex IG and TR gene data knowledge (5) (http://www.bioinfo.de/isb/2003/04/0004/). The molecular genetics of the IG and TR genes is so complex and unique in the genome of vertebrates (6,7) that a specific gene database was required to manage all their characteristics. Indeed, the synthesis of IG and TR chains involves multigene families from four different gene types: variable (V), diversity (D), joining (J) and constant (C), each one with unique characteristics. These genes are organized in hundreds of cassettes, as in fish, or in large clusters from several hundred kilobases to one (or more) megabase(s), as in mouse and human (6,7). IG and TR genes that belong to same subgroup may be highly similar in their coding sequence, but at the same time, highly polymorphic (e.g. 13 allelic forms have been sequenced for the human IGHV2-70 gene) (6), with alleles displaying different functionalities. The presence of many pseudogenes in the loci, and the frequency of the polymorphisms by gene insertion and deletion in these multigene families, add an additional level of complexity (6,7). Although most human IG and TR genes were sequenced and characterized independently from and before the completion of the Human Genome Project, the classification and the characterization of the IG and TR genes remain a big challenge in the analysis of the genome. Indeed, the annotations of the IG and TR loci, which represent for instance, in human, ∼6 Mb on chromosomes 2, 7, 14 and 22, are not available through classical genome software, owing to the unique IG and TR gene structure (6,7). At the level of gene expression analysis (e.g. cDNAs), data are even more difficult to interpret as the mechanisms involved in the IG and TR synthesis include DNA rearrangements with large DNA deletion of several hundred kilobases, and recombinations, nucleotide deletions and insertions at the rearranged junctions and, for IG, somatic hypermutations. Such somatic mechanisms create an extraordinary diversity of 1012 different IG and TR per individual (6,7). Thus, most IG and TR expressed sequences, available in IMGT/LIGM-DB (8) (http://www3.oup.co.uk/nar/database/summary/504), the IMGT sequence database, and in IMGT/3Dstructure-DB, the IMGT 3D structure database (9) show significant nucleotide and amino acid differences, respectively, by comparison with the germline (not rearranged) sequences. IMGT/GENE-DB has been implemented to provide an easy and common access to standardized and expertly annotated IG and TR gene and allele data and knowledge. The first task of IMGT was to define a reference sequence for each individual gene and allele (6,7), based on the IMGT ‘gene’ and ‘allele’ concepts. IMGT/GENE-DB has been developed using Java and cgi programs and has been available on the Web since January 2003. IMGT/GENE-DB, which currently contains human and mouse IG and TR genes, is the international reference for the IG and TR gene nomenclature.

IMGT ‘GENE’ AND ‘ALLELE’ CONCEPTS

The IMGT ‘gene’ and ‘allele’ concepts represent the cornerstone of the IMGT-ONTOLOGY ‘CLASSIFICATION’ concept (10) and of the IMGT/GENE-DB implementation. A gene is a DNA sequence that can be potentially transcribed and/or translated (this definition includes the regulatory elements in 5′ and 3′, and the introns, if present). Instances of the ‘gene’ concept are gene names (10). By extension, orphons and pseudogenes are also instances of the ‘gene’ concept (6,7). The IMGT gene names integrate the main CLASSIFICATION concepts of IMGT-ONTOLOGY: the group, the subgroup, the locus and the chromosomal orphon set (10). All IMGT gene names for human IG and TR genes were approved by the Human Genome Organisation (HUGO) Gene Nomenclature Committee (HGNC) (11) in 1999, and entered in the Genome DataBase GDB (Canada) (12), LocusLink and Entrez Gene at NCBI (USA) (13). An allele is a polymorphic variant of a gene, which is characterized by the mutations of its sequence compared to the gene reference sequence designated as allele *01. An IMGT gene or allele name is systematically associated to a species. Each allele is characterized by its functionality and by an IMGT reference sequence (10). The allele functionality, part of the IDENTIFICATION concept of IMGT-ONTOLOGY, has three instances: functional (F), open reading frame (ORF) and pseudogene (P) (10). These instances refer to the V, D and J alleles in their ‘germline’ (non-rearranged) configuration (6,7), and to the C alleles (the configuration of the C genes that do not rearrange is ‘undefined’) (10). An IMGT/GENE-DB allele reference sequence is identified by the IMGT/LIGM-DB accession number, the IMGT gene and allele name, the species, the allele functionality, and the gene core (V-REGION, D-REGION, J-REGION and C-REGION) (10). The sequences of the gene core are extracted from the IMGT/LIGM-DB reference sequences. The IMGT/GENE-DB allele reference sequences are provided in FASTA format with a complete header, for example: For C-REGION encoded by several exons, each exon is provided separately with, in addition, the complete artificially spliced C-REGION.

IMGT/GENE-DB CONTENT

As on July 2004, IMGT/GENE-DB contained 1375 genes and 2204 alleles from human and mouse (673 IG and TR genes and 1208 alleles from Homo sapiens, and 702 IG and TR genes and 996 alleles from mouse (most entries from Mus musculus, a few entries from Mus cookii, Mus minutoides, Mus pahari, Mus saxicola and Mus spretus) (Tables 1 and 2). This represents the complete set of human IG and TR genes, for all the seven loci (the three IG loci: IGH, IGK and IGL; and the four TR loci: TRA, TRB, TRG and TRD) and for the chromosomal orphon sets (6,7). The mouse entries are complete, except for the mouse IGHV group, which still has a provisional IMGT nomenclature but is near completion.
Table 1.

IMGT/GENE-DB statistics: number of human and mouse IG genes, and within parentheses, number of alleles

LocusIGH   IGK  IGL  Total
GroupIGHVIGHDIGHJIGHCIGKVIGKJIGKCIGLVIGLJIGLC 
Human164 (387)37 (44)9 (16)12 (45)108 (131)5 (9)1 (5)79 (129)7 (10)9 (22)431 (798)
Mousea216 (239)17 (19)4 (9)9 (26)176 (203)5 (10)6 (8)12 (19)7 (7)7 (10)459 (550)
Total380 (626)54 (63)13 (25)21 (71)284 (334)10 (19)7 (13)91 (148)14 (17)16 (32)890 (1348)

aMus cookii, Mus minutoides, Mus musculus, Mus pahari, Mus saxicola and Mus spretus.

Table 2.

IMGT/GENE-DB statistics: number of human and mouse TR genes, and within parentheses, number of alleles

LocusTRA  TRB   TRG  TRD   Total
GroupTRAVTRAJTRACTRBVTRBDTRBJTRBCTRGVTRGJTRGCTRDVTRDDTRDJTRDC 
Human54 (112)61 (63)1 (1)76 (162)2 (3)14 (16)2 (4)14 (22)5 (6)2 (7)3 (6)3 (3)4 (4)1 (1)242 (410)
Mousea98 (233)60 (67)1 (2)35 (61)2 (2)14 (19)7 (9)7 (28)4 (4)4 (5)6 (10)2 (2)2 (3)1 (1)243 (446)
Total152 (345)121 (130)2 (3)111 (223)4 (5)28 (35)9 (13)21 (50)9 (10)6 (12)9 (16)5 (5)6 (7)2 (2)485 (856)

aMus minutoides, Mus musculus, Mus pahari and Mus spretus.

IMGT/GENE-DB QUERY PAGE

The IMGT/GENE-DB Query page comprises three types of search (Figure 1): (i) ‘GENERAL CRITERIA’ allows a search of IG and TR genes, for a given species, by locus or chromosomal orphon set, by gene type, group or subgroup, or functionality. The user can select genes that have been found rearranged, transcribed or translated. (ii) ‘SHORT CUT’ allows a selection, for a given species, on gene name or clone name. (iii) ‘IMGT/GENE-DB direct links’ gives access to a set of links, which allow the retrieval of the information related to either one given gene, or to genes of a group using configurable URL, which can be used by humans or programs.
Figure 1

The IMGT/GENE-DB Query page.

IMGT/GENE-DB RESULT PAGE

Following a ‘GENERAL CRITERIA’ or a ‘SHORT CUT’ selection, the IMGT/GENE-DB result page (Figure 2) shows, at the top, the user selection, the number of resulting genes and the number of resulting alleles, then the list of resulting genes with, for each gene, the species, IMGT gene name, gene functionality, IMGT gene definition, number of alleles, chromosomal localization and IMGT/LIGM-DB reference sequence(s) for the allele *01 (Figure 2). In the ‘Choose your display’ section, the user can select between three types of display: (i) the complete individual IMGT/GENE-DB entries for the genes selected in the list of resulting genes (an IMGT/GENE-DB entry is described in the next paragraph); (ii) the IMGT/GENE-DB allele reference sequences in FASTA format: nucleotide or amino acid sequences, either with gaps according to the IMGT unique numbering (14–16), or without gaps; (iii) the IMGT label sequences in FASTA format, extracted from expertly annotated IMGT/LIGM-DB reference sequences. This allows to retrieve any label sequence (V-EXON, V-HEPTAMER, etc.), the core regions of out-of-frame pseudogenes, which are not available in the IMGT/GENE-DB allele reference sequences, and the artificially spliced L-PART1+L-PART2 and L-PART1+V-EXON. For nucleotide sequences, the user has the possibility to extend the limits in 5′ or 3′ by typing the number of nucleotides of one's choice.
Figure 2

The IMGT/GENE-DB result page and the three types of choice in ‘Choose your display’.

IMGT/GENE-DB ENTRY

An individual IMGT/GENE-DB entry provides a full characterization of a gene and of its alleles: IMGT name and definition, chromosomal localization, number of alleles, IMGT reference alleles and other sequences from the literature (as defined in IMGT Gene tables), and for each sequence, allele functionality, clone name, accession number, molecule type. The IMGT/GENE-DB entry gives also access (i) to the IMGT/GENE-DB allele reference sequences in FASTA format [nucleotide and amino acid sequences with gaps according to the IMGT unique numbering (14–16), or without gaps], (ii) to the IMGT Repertoire standardized resources (Chromosomal localization, Locus representation, Tables of alleles, Alignments of alleles, IMGT Protein displays, IMGT Colliers de Perles, etc.) via internal links (‘Locus and genes’, ‘Proteins and alleles’, ‘2D and 3D structures’, ‘Probes and RFLP’, ‘Gene regulation and expression’, ‘Genes and clinical entities’ sections), (iii) to the known IMGT/LIGM-DB cDNA sequences of the gene with a direct IMGT/LIGM-DB query, which then allows the choice of the nine different IMGT/LIGM-DB displays including IMGT/V-QUEST results (17,18), (iv) to the IMGT tools for genome analysis (IMGT/GeneSearch, IMGT/GeneView, IMGT/LocusView, IMGT/GeneInfo) (3,5,19), and (v) to the external links on genome databases LocusLink and Entrez Gene at NCBI, GDB, GeneCards (20), OMIM, MGD (21), sequence databases EMBL (22)/GenBank (23)/DDBJ (24) and nomenclature database HGNC Genenew (11).

CONCLUSION AND PERSPECTIVES

The central management of gene-related data in IMGT/GENE-DB improves the dynamic generation of knowledge resources from data, which are extracted from the IMGT sequence database IMGT/LIGM-DB, from HTML pages in IMGT Repertoire and from the IMGT tools for genome analysis. Reciprocally, the IMGT/GENE-DB data are used by other IMGT databases (IMGT/PRIMER-DB, IMGT/3D structure-DB) and tools (IMGT/V-QUEST, IMGT/JunctionAnalysis, etc.). The dynamic interactions are currently implemented through IMGT-Choreography (29) based on IMGT-ONTOLOGY and using IMGT-ML Web services. All the mouse IG and TR genes from IMGT/GENE-DB with IMGT reference sequences were provided by IMGT to HGNC and MGD in July 2002. IG and TR genes from genomes of other species (chimpanzee, rat, etc.), as well as members of the immunoglobulin superfamily (IgSF) and of the major histocompatibility complex superfamily (MhcSF) (currently described in the IMGT Repertoire ‘RPI’ section, for the related proteins of the immune system), will be added in IMGT/GENE-DB following the exhaustive analysis of the corresponding genes in IMGT.

CITATION

Users of IMGT/GENE-DB are requested to cite this article in their publications and to quote the IMGT® home page URL, http://imgt.cines.fr.
  24 in total

1.  Ontology for immunogenetics: the IMGT-ONTOLOGY.

Authors:  V Giudicelli; M P Lefranc
Journal:  Bioinformatics       Date:  1999-12       Impact factor: 6.937

2.  Guidelines for human gene nomenclature.

Authors:  Hester M Wain; Elspeth A Bruford; Ruth C Lovering; Michael J Lush; Mathew W Wright; Sue Povey
Journal:  Genomics       Date:  2002-04       Impact factor: 5.736

Review 3.  IMGT unique numbering for immunoglobulin and T cell receptor variable domains and Ig superfamily V-like domains.

Authors:  Marie-Paule Lefranc; Christelle Pommié; Manuel Ruiz; Véronique Giudicelli; Elodie Foulquier; Lisa Truong; Valérie Thouvenin-Contet; Gérard Lefranc
Journal:  Dev Comp Immunol       Date:  2003-01       Impact factor: 3.636

Review 4.  IMGT, the international ImMunoGeneTics information system, http://imgt.cines.fr.

Authors:  Marie-Paule Lefranc
Journal:  Novartis Found Symp       Date:  2003

5.  IMGT/GeneInfo: enhancing V(D)J recombination database accessibility.

Authors:  Thierry-Pascal Baum; Nicolas Pasqual; Florence Thuderoz; Vivien Hierle; Denys Chaume; Marie-Paule Lefranc; Evelyne Jouvin-Marche; Patrice-Noël Marche; Jacques Demongeot
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

6.  IMGT/3Dstructure-DB and IMGT/StructuralQuery, a database and a tool for immunoglobulin, T cell receptor and MHC structural data.

Authors:  Quentin Kaas; Manuel Ruiz; Marie-Paule Lefranc
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

7.  IMGT, the international ImMunoGeneTics database.

Authors:  Marie-Paule Lefranc
Journal:  Nucleic Acids Res       Date:  2003-01-01       Impact factor: 16.971

Review 8.  IMGT databases, web resources and tools for immunoglobulin and T cell receptor sequence analysis, http://imgt.cines.fr.

Authors:  M-P Lefranc
Journal:  Leukemia       Date:  2003-01       Impact factor: 11.528

9.  Human Gene-Centric Databases at the Weizmann Institute of Science: GeneCards, UDB, CroW 21 and HORDE.

Authors:  Marilyn Safran; Vered Chalifa-Caspi; Orit Shmueli; Tsviya Olender; Michal Lapidot; Naomi Rosen; Michael Shmoish; Yakov Peter; Gustavo Glusman; Ester Feldmesser; Avital Adato; Inga Peter; Miriam Khen; Tal Atarot; Yoram Groner; Doron Lancet
Journal:  Nucleic Acids Res       Date:  2003-01-01       Impact factor: 16.971

10.  MGD: the Mouse Genome Database.

Authors:  Judith A Blake; Joel E Richardson; Carol J Bult; Jim A Kadin; Janan T Eppig
Journal:  Nucleic Acids Res       Date:  2003-01-01       Impact factor: 16.971

View more
  208 in total

1.  High-throughput VDJ sequencing for quantification of minimal residual disease in chronic lymphocytic leukemia and immune reconstitution assessment.

Authors:  Aaron C Logan; Hong Gao; Chunlin Wang; Bita Sahaf; Carol D Jones; Eleanor L Marshall; Ismael Buño; Randall Armstrong; Andrew Z Fire; Kenneth I Weinberg; Michael Mindrinos; James L Zehnder; Scott D Boyd; Wenzhong Xiao; Ronald W Davis; David B Miklos
Journal:  Proc Natl Acad Sci U S A       Date:  2011-12-12       Impact factor: 11.205

2.  Human Immunoglobulin Heavy Gamma Chain Polymorphisms: Molecular Confirmation Of Proteomic Assessment.

Authors:  Magalie Dambrun; Célia Dechavanne; Alexandra Emmanuel; Florentin Aussenac; Marjorie Leduc; Chiara Giangrande; Joëlle Vinh; Jean-Michel Dugoujon; Marie-Paule Lefranc; François Guillonneau; Florence Migot-Nabias
Journal:  Mol Cell Proteomics       Date:  2017-03-06       Impact factor: 5.911

3.  A comparison of human and macaque (Macaca mulatta) immunoglobulin germline V regions and its implications for antibody engineering.

Authors:  Philippe Thullier; Siham Chahboun; Thibaut Pelat
Journal:  MAbs       Date:  2010-09-01       Impact factor: 5.857

4.  Antibody nomenclature: from IMGT-ONTOLOGY to INN definition.

Authors:  Marie-Paule Lefranc
Journal:  MAbs       Date:  2011-01-01       Impact factor: 5.857

5.  Structural and comparative analysis of the T cell receptor gamma (TRG) locus in Oryctolagus cuniculus.

Authors:  Serafina Massari; Salvatrice Ciccarese; Rachele Antonacci
Journal:  Immunogenetics       Date:  2012-07-07       Impact factor: 2.846

6.  OM-RCA-01, a novel humanized monoclonal antibody targeting fibroblast growth factor receptor 1, in renal cell carcinoma model.

Authors:  Ilya Tsimafeyeu; Elina Zaveleva; Evgenia Stepanova; Walter Low
Journal:  Invest New Drugs       Date:  2013-09-13       Impact factor: 3.850

7.  Local and global anatomy of antibody-protein antigen recognition.

Authors:  Meryl Wang; David Zhu; Jianwei Zhu; Ruth Nussinov; Buyong Ma
Journal:  J Mol Recognit       Date:  2017-12-08       Impact factor: 2.137

8.  Cynomolgus and pigtail macaque IgG subclasses: characterization of IGHG genes and computational analysis of IgG/Fc receptor binding affinity.

Authors:  Doan C Nguyen; Rashesh Sanghvi; Franco Scinicariello; Joanna Pulit-Penaloza; Nicole Hill; Roberta Attanasio
Journal:  Immunogenetics       Date:  2014-05-09       Impact factor: 2.846

9.  High throughput sequencing reveals a complex pattern of dynamic interrelationships among human T cell subsets.

Authors:  Chunlin Wang; Catherine M Sanders; Qunying Yang; Harry W Schroeder; Elijah Wang; Farbod Babrzadeh; Baback Gharizadeh; Richard M Myers; James R Hudson; Ronald W Davis; Jian Han
Journal:  Proc Natl Acad Sci U S A       Date:  2010-01-04       Impact factor: 11.205

10.  Gamma delta T cell receptors confer autonomous responsiveness to the insulin-peptide B:9-23.

Authors:  Li Zhang; Niyun Jin; Maki Nakayama; Rebecca L O'Brien; George S Eisenbarth; Willi K Born
Journal:  J Autoimmun       Date:  2010-01-18       Impact factor: 7.094

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.