| Literature DB >> 24523721 |
Wenzhao Meng1, Sahana Jayaraman1, Bochao Zhang2, Gregory W Schwartz2, Robert D Daber3, Uri Hershberg4, Alfred L Garfall5, Christopher S Carlson6, Eline T Luning Prak1.
Abstract
VH replacement (VHR) is a type of antibody gene rearrangement in which an upstream heavy chain variable gene segment (VH) invades a pre-existing rearrangement (VDJ). In this Hypothesis and Theory article, we begin by reviewing the mechanism of VHR, its developmental timing and its potential biological consequences. Then we explore the hypothesis that specific sequence motifs called footprints reflect VHR versus other processes. We provide a compilation of footprint sequences from different regions of the antibody heavy chain, and include data from the literature and from a high throughput sequencing experiment to evaluate the significance of footprint sequences. We conclude by discussing the difficulties of attributing footprints to VHR.Entities:
Keywords: DH; V(D)J recombination; VH; VH replacement; and JH gene segments; receptor editing
Year: 2014 PMID: 24523721 PMCID: PMC3906580 DOI: 10.3389/fimmu.2014.00010
Source DB: PubMed Journal: Front Immunol ISSN: 1664-3224 Impact factor: 7.561
Figure 1(A) VH replacement: an upstream VH gene invades by rearranging into a pre-existing rearrangement. RAG cleaves the conventional recombination signal sequence (black triangle) of the invading VH (light blue VH gene) and cleaves at a cryptic heptamer sequence (cRSS, dashed white triangle) of the invaded VH gene (yellow VH gene). The resulting rearrangement is shown on the second line of the diagram and includes the DH and JH genes of the previous rearrangement and the new VH gene. Also included in the VH replacement product is a remnant or “footprint” of the preceding VH gene (denoted by a yellow box). Often the products of VH replacement exhibit CDR3 elongation, due to the retention of the footprint sequence (the CDR3 is indicated by the bar under the sequence and the added length of the new CDR3 sequence including the footprint (in red) is indicated by the black bar below the sequence.) (B) Serial VH replacement. The same conventions are used as in (A) and a longer CDR3 is generated, via the accumulation of footprint sequences. In both panels, boxes denote exons, lines introns, triangles RSS and the rearrangement is indicated by dashed black lines. This diagram is not drawn to scale. (C) Long CDR3 sequence with possible VH replacement(s). Shown is the nucleotide sequence of an expanded B cell clone that was recovered from peripheral blood DNA of a patient with systemic lupus erythematosus (SLE) that reveals a 91 nucleotide CDR3. Kowal et al. described an anti-dsDNA H chain sequence comprised of a VH3-N-DH2-2-JH6, which has similar features to this junction, although it was shorter (21). Sequences in black font match the corresponding germline gene segments. Red font denotes possible N-additions and yellow shading highlights potential footprint sequences. Dashes indicate regions where sequences do not overlap. FR3, framework region 3; CDR3, third complementarity determining sequence; FR4, framework region 4.
Figure 2(A) Longer CDR3 sequences have more footprints. Plotted are the average numbers of footprint 5-mers per sequence. Sequences are averaged at each CDR3 size by the number of sequences that have CDR3s of that particular length. Blue dots are in-frame (IF) rearrangements and red dots are out of frame (OF) rearrangements. We find a positive linear correlation between the length of the CDR3 and the number of footprints (r2 = 0.89 for IF, r2 = 0.7 for OF). The line describing this relationship has a slope of 0.06 ± 0.008 for OF (red line) and 0.05 ± 0.008 for IF rearrangements (blue line). (B,C) are two examples of the entire distribution of footprint numbers at two positions – position 56 (red circle OF) and position 69 (blue circle IF). Black stems indicate the numbers of footprints and red lines represent the fit of a Poisson distribution (λ = 1.57 and 2.74, respectively).
Footprint sequences in the 3′ end of human germline VH genes and alleles.
| Footprint (5-mer variants) | VH gene allele(s) | |
|---|---|---|
| CGAGAGA (CGAGA, GAGAG, AGAGA) | CGAGAGA | VH1-18, VH1-2*1, VH1-2*2, VH1-2*3, VH1-2*5, VH1-3, VH1-46*1, VH1-46*2, VH1-69*1, VH1-69*4, VH1-69*6, VH1-69*8, VH1-69*9, VH1-69*10, VH1-69*11, VH1-69*12, VH1-69*13, VH1/OR15-1*2, VH1/OR15-1*3, VH1/OR15-1*4, VH3-11*1, VH3-11*4, VH3-11*5, VH3-21, VH3-30*1, VH3-30*3, VH3-30*4, VH3-30*5, VH3-30*6, VH3-30*7, VH3-30*9, VH3-30*10, VH3-30*11, VH3-30*12, VH3-30*13, VH3-30*14, VH3-30*15, VH3-30*16, VH3-30*17, VH3-30*18, VH3-30*19, VH3-33*1, VH3-33*2, VH3-33*4, VH3-33*5, VH3-48, VH3-53*1, VH3-53*4, VH3-64*1, VH3-64*2, VH3-64*4, VH3-66*1, VH3-66*3, VH3-7*1, VH3-7*3, VH4-28*3, VH4-30-2*4, VH4-31*1, VH4-31*2, VH4-31*3, VH4-31*10, VH4-34*9, VH4-39*2, VH4-39*6, VH4-39*7, VH4-4*2, VH4-4*6, VH4-4*7, VH4-59*1, VH4-59*2, VH4-61*1, VH4-61*2, VH4-61*3, VH4-61*8, VH4/OR15-8, VH7-4-1*2, VH7-4-1*4, VH7-4-1*5 |
| CGAGA | VH1-2*4, VH1-69*2, VH1-69*5, VH1/OR15-1*1, VH3-11*3, VH3-30*8, VH3-30-3*1, VH3-53*2, VH3-66*2, VH3-7*2, VH4-28*4, VH4-34*12, VH4-59*7, VH4-61*5, VH4-b, VH5-51*3, VH5-51*4, VH5-a, VH7-4-1*1 | |
| CGAGAGG (CGAGA, AGAGG) | VH1-8, VH4-34*1, VH4-34*2, VH4-34*4, VH4-34*5, VH4-34*13, VH4-59*9 | |
| CGAGACA (CGAGA, GAGAC, AGACA) | VH3-66*4, VH4-30-2*3, VH4-39*1, VH4-59*8, VH4-61*7, VH5-51*1, VH5-51*2 | |
| CGAGATA (CGAGA, GAGAT, AGATA) | VH4-34*10, VH4-59*10, VH7-81 | |
| CGAGAAA (CGAGA, GAGAA, AGAAA) | VH4-28*1, VH4-28*2, VH4-28*5, VH4-28*6 | |
| CAAGA | ||
| CAAGATA | VH1-45*2 | |
| CAAGAGA | VH3-13*1, VH3-13*2, VH3-13*4, VH3-74*1, VH3-74*3, VH3/OR16-10*3, VH6-1 | |
| CAAGA | VH1-45*3, VH3-13*3, VH3-74*2, VH3/OR16-10*1, VH3/OR16-10*2, VH3/OR16-12 | |
| CAACAGA | CAACAGA | VH1-24 |
| CAACA | VH1-f*1 | |
| CTAGAGA (CTAGA, TAGAG, AGAGA) | CTAGAGA | VH1-46*3, VH3-72*1, VH3/OR15-7*5 |
| CTAGA | VH3/OR15-7*1, VH3/OR15-7*2, VH3/OR15-7*3 | |
| CTAGGGA (CTAGG, TAGGG, | VH3-53*3 | |
| CGAAAGA (CGAAA, GAAAG, AAAGA) | CGAAAGA | VH3-23*1, VH3-23*2, VH3-23*4, VH3-30*2, VH3-30-3*2, VH3-33*3, VH3-33*6, VH3-NL1 |
| CGAAA | VH3-23*3, VH3-23*5 | |
| CCAGATATA (CCAGA, CAGAT, AGATA, | VH3-38 | |
| CCAGAGA (CCAGA, CAGAG, AGAGA) | VH4-30-2*1, VH4-30-2*5, VH4-30-4*1, VH4-30-4*2, VH4-30-4*5, VH4-30-4*6, VH4-61*6 | |
| TGAAACA (TGAAA, GAAAC, AAACA) | TGAAA | VH3/OR16-8*1, VH3/OR16-9 |
| TGAAACA | VH3/OR16-8*2 | |
| TGAGA | ||
| TGAGAGA (TGAGA, GAGAG, AGAGA) | TGAGA | VH1/OR15-5 |
| TGAGAGA | VH1/OR15-9, VH1/OR21-1 | |
| TGAGAAA (TGAGA, GAGAA, AGAAA) | VH3-16, VH3-35 | |
| TGAAAGA (TGAAA, GAAAG, AAAGA) | VH3-64*3, VH3-64*5 | |
| CGGCAGA (CGGCA, GGCAG, GCAGA) | VH1-58 | |
| CACGGATAC (CACGG, ACGGA, CGGAT, | VH2-26, VH2-70*1, VH2-70*10, VH2-70*11 | |
| CATGGAGAG ( | VH2/OR16-5 | |
| TACGG | VH2-5*4, VH2-70*9 | |
| VH2-5*7 | ||
| CACGG | VH2-5*10 | |
| CACACAGACC (CACAC, ACACA, CACAG, ACAGA, CAGAC, AGACC) | CACACAGACC | VH2-5*1 |
| CACACAGAC | VH2-5*5, VH2-5*8, VH2-5*9, VH2-70*12 | |
| CACACAGA | VH2-5*6 | |
| CAAAAGATA (CAAAA, AAAAG, AAAGA, AAGAT, AGATA) | VH3-43, VH3-9 | |
Two hundred and seventy-three functional VH genes, including alleles and sequences designated as open reading frames, were downloaded from the IMGT database (.
Figure 3Positions of footprint 5-mers. (A) Frequency distribution of all footprint 5-mers out of total unique sequences (n = 42,221) plotted against the normalized CDR3 position. The CDR3 is herein defined to begin at the conserved CAR amino acid sequence (TGT GCG AGA nucleotide sequence) within the 3′ end of VH and end at the conserved W (TGG nucleotide sequence) that is immediately upstream of the first conserved glycine, GGC nucleotide sequence) within the JH. “TGGAG” is excluded as it is found in many alleles of all DH genes. The position of the footprint is defined by where the footprint starts within the CDR3. For example, if a footprint occupies nucleotides 12–16 of the CDR3, it will be plotted at position 12. CDR3 lengths were normalized to a scale of 1–100 using p1 = p/(L × 100); where p is the position of the 5-mer in the real CDR3 sequence, L is the length of the CDR3 sequence, and p1 is the normalized position. Normalized positions are rounded to the nearest integer. (B) Frequency distribution of all footprint 5-mers plotted against the normalized CDR3 position, corrected for footprint 5-mers in the germline JH6 gene. “ATGGA,” “TACGG,” and “CATGG” were excluded when found in the 3′ end of the CDR3 within the JH6 gene, as these 5-mers are found in the germline JH6 sequence. Footprint 5-mers found in other JHs (see Table 2) are not counted in either (A) or (B) because they are all located outside CDR3 region of JH.
Footprint sequences in DH and JH alleles.
| DH gene | Sequence (footprint(s) in red font) |
|---|---|
| D1-1*01 | GGTACAACTGGAACGAC |
| D1-14*01 | GGTATAACCGGAACCAC |
| D1-20*01 | GGTATAACTGGAACGAC |
| D1-26*01 | GGTATAGTGGGAGCTACTAC |
| D1-7*01 | GGTATAACTGGAACTAC |
| D2-15*01 | A |
| D2-2*01 | A |
| D2-2*02 | A |
| D2-2*03 | T |
| D2-21*01 | AGCATATTGTGGTGGTGATTGCTATTCC |
| D2-21*02 | AGCATATTGTGGTGGTGACTGCTATTCC |
| D2-8*01 | A |
| D2-8*02 | A |
| D3-10*01 | GTATTACTATGGTTCGGGGAGTTATTATAAC |
| D3-10*02 | GTATTACTATGTTCGGGGAGTTATTATAAC |
| D3-16*01 | GTATTATGATTACGTTTGGGGGAGTTATGCTTATACC |
| D3-16*02 | GTATTATGATTACGTTTGGGGGAGTTATCGTTATACC |
| D3-22*01 | GTATTACTATGATAGTAGTGGTTATTACTAC |
| D3-3*01 | GTATTACGATTTT |
| D3-3*02 | GTATTAGCATTTT |
| D3-9*01 | GTATTAC |
| D4-11*01 | TGACTACAGTAACTAC |
| D4-17*01 | TGAC |
| D4-23*01 | TGAC |
| D4-4*01 | TGACTACAGTAACTAC |
| D5-12*01 | GT |
| D5-18*01 | GT |
| D5-24*01 | G |
| D5-5*01 | GT |
| D6-13*01 | GGGTATAGCAGCAGCTGGTAC |
| D6-19*01 | GGGTATAGCAGTGGCTGGTAC |
| D6-25*01 | GGGTATAGCAGCGGCTAC |
| D6-6*01 | GAGTATAGCAGCTCGTCC |
| D7-27*01 | CTAACTGGGGA |
| J1*01 | GCTGAATACTTCCAGCACTGGGGCCAGGGCACCCTGGTC ACCGTCTCCTCAG |
| J2*01 | CTACTGGTACTTCGATCTCTGGGGCCGTGGCACCCTGGTC ACTGTCTCCTCAG |
| J3*01 | TGATGCTTTTGATGTCTGGGGCCA |
| J3*02 | TGATGCTTTTGATATCTGGGGCCA |
| J4*01 | ACTACTTTGACTACTGGGGCCAAGGAACCCTGGTCACCGTCT CCTCAG |
| J4*02 | ACTACTTTGACTACTGGGGCC |
| J4*03 | GCTACTTTGACTACTGGGGCCA |
| J5*01 | ACAACTGGTTCGACTCCTGGGGCCAAGGAACCCTGGTCACC GTCTCCTCAG |
| J5*02 | ACAACTGGTTCGACCCCTGGGGCC |
| J6*01 | ATTACTACTACTAC |
| J6*02 | ATTACTACTACTAC |
| J6*03 | ATTACTACTACTACTACTA |
| J6*04 | ATTACTACTACTAC |
Thirty-four germline functional human DH alleles and 13 JH alleles were downloaded from the IMGT database (.
The number of footprints found in various regions of human VH genes.
| Footprint | Sequences | FR | CDR | FR1 | FR2 | FR3 | CDR1 | CDR2 | CDR3 |
|---|---|---|---|---|---|---|---|---|---|
| TGAGA | 131 | 212 | 12 | 94 | 118 | 3 | 3 | 6 | |
| CTAGAGA | 8 | 8 | 8 | ||||||
| CGAAAGA | 9 | 9 | 9 | ||||||
| CTAGA | 28 | 17 | 16 | 17 | 1 | 1 | 14 | ||
| TACGG | 9 | 5 | 4 | 1 | 4 | 1 | 1 | 2 | |
| CAAAA | 45 | 41 | 13 | 41 | 9 | 4 | |||
| CTAGGGA | 2 | 1 | 1 | 1 | 1 | ||||
| CAACA | 55 | 33 | 27 | 1 | 5 | 27 | 2 | 23 | 2 |
| CAAGA | 150 | 149 | 18 | 4 | 145 | 3 | 15 | ||
| CACACAGA | 20 | 20 | 6 | 20 | 6 | ||||
| CGAGACA | 7 | 7 | 7 | ||||||
| CGAGAAA | 3 | 3 | 3 | ||||||
| TGAAAGA | 2 | 2 | 2 | ||||||
| TGAAACA | 3 | 3 | 2 | 1 | |||||
| CGAGAGG | 8 | 2 | 6 | 2 | 6 | ||||
| CGAGAGA | 86 | 86 | 86 | ||||||
| CACGG | 179 | 178 | 6 | 178 | 6 | ||||
| CGAGATA | 3 | 3 | 3 | ||||||
| CGAAA | 11 | 11 | 1 | 10 | |||||
| TGAAA | 61 | 79 | 7 | 33 | 46 | 1 | 3 | 3 | |
| CACACAGAC | 20 | 20 | 5 | 20 | 5 | ||||
| CGAGA | 130 | 3 | 127 | 1 | 2 | 127 | |||
| CACACAGACC | 20 | 20 | 1 | 20 | 1 | ||||
| TGAGAAA | 6 | 6 | 3 | 3 | |||||
| CAAAAGATA | 3 | 3 | 3 | ||||||
| CGGCAGA | 2 | 2 | 2 | ||||||
| TGAGAGA | 3 | 1 | 2 | 1 | 2 | ||||
| CCAGATATA | 2 | 2 | 2 | ||||||
| CAAGATA | 1 | 1 | 1 | ||||||
| CCAGAGA | 84 | 80 | 4 | 80 | 4 | ||||
| CACGGATAC | 5 | 5 | 5 | ||||||
| CAAGAGA | 27 | 19 | 8 | 19 | 8 |
Two hundred and thirty-four functional germline human VH alleles were downloaded from the IMGT database (.
Figure 4(A) Percentage of rearrangements with footprints for the 10 most frequent VH rearrangements and VH6-1. The percentages of rearrangements that contain at least one footprint 5-mer in either (or both) the 5′ end or the 3′ end of the rearrangement are shown for unique rearrangements for each of the 10 most common VH genes and for VH6-1. CDR3 sequences are defined as described in the legend to Figure 3 and normalized to an arbitrary length of 100. The 5′ end (which almost always contains N1) is defined as the first 20% of the sequence and the 3′ end is defined as the last 20% of the sequence (which almost always contains N2). VH6-1 is the most 3′ VH gene and cannot contain footprints that are due to VH replacement. Black bars denote unique rearrangements that are in-frame (IF VH), white bars denote out of frame rearrangements (OF VH), and gray bars indicate total unique rearrangements (total VH). (B) Footprint Frequency of in-Frame VH rearrangements in N1 using IgAT Software. The same unique in-frame (IF) VH rearrangements, as shown in (A), were analyzed for the presence of one or more footprints using IgAT software (2). Plotted are the frequencies of unique IF VH rearrangements that have one or more footprints in the N1 region.
Figure 5VH usage is similar in all unique rearrangements and in rearrangements with DH5-12. VH usage is shown for total unique sequences (closed bars, n = 42,221) and for those with DH5-12 (open bars, n = 1,029). Sequences containing the DH5-12 gene segment in the CDR3 region were recognized using IMGT high V quest analysis, and further analysis on VH usage was performed in-house (see text). The red arrow points to VH2-26, which contains some of the same footprint 5-mers as DH5-12, but does not appear to be used more frequently in rearrangements that include DH5-12.