| Literature DB >> 31921202 |
Christian E Busse1, Katherine J L Jackson2, Corey T Watson3, Andrew M Collins4.
Abstract
Mammalian immunoglobulin (IG) genes are found in complex loci that contain hundreds of highly similar pseudogenes, functional genes and repetitive elements, which has made their investigation particularly challenging. High-throughput sequencing has provided new avenues for the investigation of these loci, and has recently been applied to study the IG genes of important inbred mouse strains, revealing unexpected differences between their IG loci. This demonstrated that the structural differences are of such magnitude that they call into question the merits of the current mouse IG gene nomenclatures. Three nomenclatures for the mouse IG heavy chain locus (Igh) are presently in use, and they are all positional nomenclatures using the C57BL/6 genome reference sequence as their template. The continued use of these nomenclatures requires that genes of other inbred strains be confidently identified as allelic variants of C57BL/6 genes, but this is clearly impossible. The unusual breeding histories of inbred mouse strains mean that, regardless of the genetics of wild mice, no single ancestral origin for the IG loci exists for laboratory mice. Here we present a general discussion of the challenges this presents for any IG nomenclature. Furthermore, we describe principles that could be followed in the formulation of a solution to these challenges. Finally, we propose a non-positional nomenclature that accords with the guidelines of the International Mouse Nomenclature Committee, and outline strategies that can be adopted to meet the nomenclature challenges if three systems are to give way to a new one.Entities:
Keywords: B cell; IGH; IGK; IGL; V genes; immunoglobulin; nomenclature
Year: 2019 PMID: 31921202 PMCID: PMC6930147 DOI: 10.3389/fimmu.2019.02961
Source DB: PubMed Journal: Front Immunol ISSN: 1664-3224 Impact factor: 7.561
Figure 1Visualized scheme of three nomenclature strategies, using a hypothetical locus encompassing seven V genes (labeled V1–V7) belonging to three V gene families (indicated as red, blue, green). The year of the first report is indicated above the genes. The (D)JC region is shown as a yellow box and provides orientation for the positional strategies. The designations beneath the individual V genes follow the
Figure 2Ternary plot depicting the constraints for gene symbols. The individual properties and their boundaries are located on the three corners. The minimal information content is based on the requirement to be able to encode at least 4 loci, 3 different types of gene segments (V, D, J), 32 gene families and 1024 members. The 10 character limit (usability) is based on current IMNC guidelines. The limits for human readability are a compromise between standard English language entropy (≈ 1 bit per character) and pure numerical representation (3.3 bit per character). Examples that optimize two properties (shown in green) at the expense of the third one (red) are shown on the median of the respective edges.