| Literature DB >> 19568742 |
Maryam B Yassai1, Yuri N Naumov, Elena N Naumova, Jack Gorski.
Abstract
T cell receptor (TCR) nucleotide sequences are often generated during analyses of T cell responses to pathogens or autoantigens. The most important region of the TCR is the third complementarity-determining region (CDR3) whose nucleotide sequence is unique to each T cell clone. The CDR3 interacts with the peptide and thus is important for recognizing pathogen or autoantigen epitopes. While conventions exist for identifying the various TCR chains, there is a lack of a concise nomenclature that would identify both the amino acid translation and nucleotide sequence of the CDR3. This deficiency makes the comparison of published TCR genetic and proteomic information difficult. To enhance information sharing among different databases and to facilitate computational assessment of clonotypic T cell repertoires, we propose a clonotype nomenclature. The rules for generating a clonotype identifier are simple and easy to follow, and have a built-in error-checking system. The identifier includes the V and J region, the CDR3 length as well as its human or mouse origin. The framework of this naming system could also be expanded to the B cell receptor.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19568742 PMCID: PMC2706371 DOI: 10.1007/s00251-009-0383-x
Source DB: PubMed Journal: Immunogenetics ISSN: 0093-7711 Impact factor: 2.846
Genetic codes and their assigned ID numbers
| T | C | A | G | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| T | TTT | F | 1 | TCT | S | 1 | TAT | Y | 1 | TGT | C | 1 |
| TTC | F | 2 | TCC | S | 2 | TAC | Y | 2 | TGC | C | 2 | |
| TTA | L | 1 | TCA | S | 3 | TAA | Ochre=O | 1 | TGA | Opal=O | 3 | |
| TTG | L | 2 | TCG | S | 4 | TAG | Amber=O | 2 | TGG | W | 1 | |
| C | CTT | L | 3 | CCT | P | 1 | CAT | H | 1 | CGT | R | 1 |
| CTC | L | 4 | CCC | P | 2 | CAC | H | 2 | CGC | R | 2 | |
| CTA | L | 5 | CCA | P | 3 | CAA | Q | 1 | CGA | R | 3 | |
| CTG | L | 6 | CCG | P | 4 | CAG | Q | 2 | CGG | R | 4 | |
| A | ATT | I | 1 | ACT | T | 1 | AAT | N | 1 | AGT | S | 5 |
| ATC | I | 2 | ACC | T | 2 | AAC | N | 2 | AGC | S | 6 | |
| ATA | I | 3 | ACA | T | 3 | AAA | K | 1 | AGA | R | 5 | |
| ATG | M | 1 | ACG | T | 4 | AAG | K | 2 | AGG | R | 6 | |
| G | GTT | V | 1 | GCT | A | 1 | GAT | D | 1 | GGT | G | 1 |
| GTC | V | 2 | GCC | A | 2 | GAC | D | 2 | GGC | G | 2 | |
| GTA | V | 3 | GCA | A | 3 | GAA | E | 1 | GGA | G | 3 | |
| GTG | V | 4 | GCG | A | 4 | GAG | E | 2 | GGG | G | 4 | |
ID numbers are assigned to the codons for each amino acids in the codon table by numbering sequentially from top to bottom and across giving “1” for the first sequence that coded any amino acid and “2” for the second sequence that coded the same amino acid and so on. The termination sequences are assigned the letter “O”. The letter “O” is used because none of the one letter amino acid codes are represented as “O”. ID number of “1” is assigned to the first termination code TAA (Ochre, O1), and 2 for the TAG (Amber, O2), and 3 for the TGA (Opal, O3)
Breakdown of assigned characters for human and mouse V genes in clonotype identifier
| TCR | TCR chain assigned ID | Human V gene characters (No.) letter (No.) | Human V gene overall characters | Mouse V gene characters (No.) "− "or letter (No.) | Mouse V gene overall characters |
|---|---|---|---|---|---|
| TCR α | A | (2 digit) S (1 digit) | 5 | (2 digit) “−“ or “D” (1 digit) | 5 |
| TCR β | B | (2 digit) S (1 digit) | 5 | (2 digit) “−“ (1 digit) | 5 |
| TCR γ | G | (2 digit) | 3 | (1 digit) | 2 |
| TCR δ | D | (1 digit) | 2 | (1 digit) “−“ (1 digit) | 4 |
Column one represents the TCR chain. Column two represents the human V characters and their placeholders. For example, the V04S1 is displayed as 04S1 that has two digits (04) and the letter (S), and the 1 digit (1). Column three represents the overall human V gene characters, for example the 5 for B04S1. Columns four and five represent the same for mouse V genes in the clonotype identifier
Human and mouse α/δ gene assignment
| Human alpha/delta v genes (by Rowen et al.) | Human assigned V name (proposed nomenclature) | Mouse alpha/delta V genes (by IMGT) | Mouse assigned V name (proposed nomenclature) |
|---|---|---|---|
| hADV14S1 | A14S1 | AV4-4/DV10 | A4-4 |
| hADV23S1 | A23S1 | AV6-7/DV9 | A6-7 |
| hADV29S1 | A29S1 | AV13-4/DV7 | A13-4 |
| hADV36S1 | A36S1 | AV14D-3/DV8 | A14D3 |
| hADV38S2 | A38S2 | AV15-1/DV6-1 | A15-1 |
| AV16D-1/DV11 | A16D1 | ||
| AV21-1/DV12 | A21-1 |
Column one represents the Human α/δ V gene names based on (Rowen et al. 1996). Column two represents the human α/δ V gene names assigned by the present nomenclature. Column three represents the mouse α/δ V gene names based on IMGT nomenclature. Column four represents the mouse α/δ V gene names assigned by the present nomenclature.
Breakdown of assigned characters for human and mouse J genes in clonotype identifier
| TCR | TCR Chain assigned ID | Human J Gene Characters (No.) | Human J Gene Overall Characters | Mouse J Gene Characters (No.) | Mouse J Gene Overall Characters |
|---|---|---|---|---|---|
| TCR α | A | (2 digit) | 3 | (2 digit) | 3 |
| TCR β | B | (2 digit) | 3 | (2 digit) | 3 |
| TCR γ | G | (1 digit) | 2 | (1 digit) | 2 |
| TCR δ | D | (1 digit) | 2 | (1 digit) | 2 |
Column one represents the TCR chain. Column two represents the human J characters which is number and their place holders that are 2. For example, J2S1 is displayed as 21 (two digits). Column 3 represents the overall human J gene characters for example B21. Columns 4 and 5 represent the same for mouse J genes in the clonotype identifier
Fig. 1An example of TCR β-chain clonotype identifier. The BV and the BJ regions are fully identified. The single-letter -code amino acid translation is shown below the nucleotide sequence. The bold uppercase letters represent the conserved amino acids from the V (C) and from the J (FG). The amino acids that are not completely encoded by the germline, which are predominantly encoded by the NDN, are also in uppercase (IRSS). Below the NDN-encoded amino acids is the codon ID for each of them as assigned from the Table 1. The bold underlined lowercase letters represent the last amino acid that is completely encoded by the V gene (s) and the first amino acid that is completely encoded by the J region (y). The clonotype identifier takes the uppercase NDN amino acids and flanks them with the lowercase V and J encoded amino acids. This is followed by the codon ID for the uppercase NDN sequence. The V and J chains are next identified. Finally, the length of the CDR3 is determined by counting the number of amino acids between the uppercase C and uppercase FG. This count is shown in the top line
Examples of human TCR α, β, γ, and δ clonotype identifiers
The top row of each block is the nucleotide sequence of the TCR chain, middle row is the amino acid translation, and bottom row is the assigned number for the amino acids that are not completely encoded by the germline gene. The uppercase underlined letters (shown in bold) represent the conserved amino acids from the V region (C) and the J region (F), and the amino acids that are not completely encoded from the germline gene. The lowercase underlined letters (shown in bold) represent the last amino acid that is completely encoded by V gene and the first amino acid that is completely encoded by J gene
Fig. 2Deriving the nucleotide sequence of the CDR3 by decoding the clonotype TCR β-chain identifier. The genomic sequence of the TCRV gene (AV38S2) is obtained and the positions of the amino acids lined up with a length ruler starting with the position immediately after the conserved cysteine. The TCRJ gene (AJ53) is then placed so that the last amino acid before the conserved FG lines up with the end of length ruler. The two lowercase letters “r” and “s” in the name identify the last V and first J position encoded by germline. This leaves one position to be filled by the N nucleotides and this is the threonine represented as “T” in the clonotype name. The codon table shows that codon 4 for T is ACG, and the only way that the T can be encoded is that the initial nucleotide, A, is from the V germline sequence and the rest of the sequence is N derived
Examples of different human clonotype sequences coding identical amino acids in the CDR3β
The top row of each block is the nucleotide sequence encoding CDR3, middle row is the amino acid translation, and bottom row is the assigned number for the amino acids that are not completely encoded by the germline gene. The uppercase underlined letters (shown in bold) represent the conserved amino acids from the V region (C) to the J region (F), and the amino acids that are not completely encoded from the germline gene. The underlined lowercase letters (shown in bold) represent the last amino acid that is completely encoded by the V gene, and the first amino acid that is completely encoded by the J gene
Identifiers of the HLA-A2.1:M158–66-specific clones/clonotypes found in multiple studies
| Clonotype identifier | Moss et al. | Lehner et al. | Naumov et al. | Naumov et al. |
|---|---|---|---|---|
| sIRSs.146B19S1B27L11 | DDD8 | 132 | 19.27 | |
| iRSs.66B19S1B27L11 | 1a8 | 71 | 16.27 | |
| sIRSs.226B19S1B27L11 | B1b | MODG5 | ||
| iRSt.62B19S1B22L11 | B1c | |||
| sIRs.24B19S1B23L11 | B1d | |||
| sMRSs.166B19S1B27L11 | JNJ1 | 43.27 | ||
| iRSSy.626B19S1B27L11 | NMH8 | |||
| sIRSAy.2662B19S1B27L11 | KEF9 | 94 | 28.27 | |
| sTRs.23B19S1B23L11 | HLE19 | 13.23 | ||
| sMRSs.163B19S1B27L11 | MODG4 | |||
| sMRs.16B19S1B23L11 | JN5K2 |
Different HLA-A2.1 individuals were recruited in the studies reported as Naumov et al. (1998) and (Naumov et al. (2006). The numbers of individuals sharing identical influenza M158–66-specific clonotypes are shown in the first column. The clonotype identifiers are shown in second column. The identifiers of the M158–66-specific CD8 T cell clones reported by Moss et al. (1991) and Lehner et al. (1995) and clonotypes reported by Naumov et al. (1998) and Naumov et al. (2006) are shown in the corresponding columns, respectively