| Literature DB >> 18489735 |
Jenny von Salomé1, Jyrki P Kukkonen.
Abstract
BACKGROUND: HLA/MHC class II molecules show high degree of polymorphism in the human population. The individual polymorphic motifs have been suggested to be propagated and mixed by transfer of genetic material (recombination, gene conversion) between alleles, but no clear molecular basis for this has been identified as yet. A large number of MHC class II allele sequences is publicly available and could be used to analyze the sequence features behind the recombination, revealing possible basis for such recombination processes both in HLA class II genes and other genes, which recombination acts upon.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18489735 PMCID: PMC2408603 DOI: 10.1186/1471-2164-9-228
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1The HLA-DRB1 exon diversity. A, DRB1 exon 2 diversity compared to the rest of the coding region (fused exons 1, 3, 4, 5 and 6) in the dataset including the entire DRB1 coding region (49 sequences). Mean ± sem is shown. B, synonymous and non-synonymous diversity in the DRB1 coding region in the dataset including the entire DRB1 coding region. In the short exon 5 (24 bp) half of the alleles have G instead of C at the nucleotide position 22, resulting in high apparent diversity for the whole exon. Mean ± sem is shown. C, sliding window analysis of non-synonymous, synonymous and complex substitutions in the DRB1-e2 in the dataset including the complete DRB1-e2. Complex stands for complex combinations of non-synonymous and synonymous substitutions in the same codon. The graph illustrates the contribution of these different components in d, which is not equal to dand d(d, as calculated here does not take into consideration the capability of the codon to mutate in synonymous and non-synonymous manner).
Figure 2Transitions and transversions in HLA-DRB1, based on the dataset including the entire DRB1 coding region (49 sequences). Mean ± sem is shown.
Figure 3G+C content in HLA-DRB1 exons 1–6, based on the dataset including the entire DRB1 coding region (49 sequences). The dotted line indicates the overall average G+C. Mean ± sem is shown.
Figure 4CpG-dinucleotide content in HLA-DRB1 exons 1–6, based on the dataset including the entire DRB1 coding region (49 sequences). A, the observed CpG-dinucleotide content. B, the observed CpG level (as in A) divided by the mathematically estimated CpG content (based on the total G+C level). Mean ± sem is shown. The ratios were separately calculated for each allele and then averaged.
Figure 5CpG distribution in DRB1-e2, based on the dataset including the 374 complete DRB1-e2 sequences. A, each individual sequence lined under each other in the consensus numbering order starting from DRB1*010101. Black boxes indicate CpG dinucleotides and gray boxes other dinucleotides. B, CpG frequency for each nucleotide position. The dotted line indicates 100%.
Motifs used in the screening of DRB1-e2.
| Polypurine/-pyrimidine tract | 5'-RRRRR-3'/5'-YYYYY-3' | [47, 19, 48, 49] |
| Alternating purine-pyrimidine tract | 5'-RYRYR-3'/5'-YRYRY-3' | [19, 50] |
| Immunoglobulin heavy chain class switch repeats | 5'-GAGCT-3'/5'-AGCTC-3' | [51, 49] |
| 5'-GGGCT-3'/5'-AGCCC-3' | ||
| 5'-GGGGT-3'/5'-ACCCC-3' | ||
| 5'-TGGGG-3'/5'-CCCCA-3' | ||
| 5'-TGAGC-3'/5'-GCTCA-3' | ||
| DNA polymerase arrest site | 5'-WGGAG-3'/5'-CTCCW-3' | [49] |
| Deletion hotspot consensus | 5'-TGRRKM-3'/5'-KMYYCA-3' | [28, 49] |
| Heptamer recombination signal | 5'-CACAGTG-3'/5'-CACTGTG-3' | [22] |
| Nonamer recombination signal | 5'-ACAAAAACC-3'/5'-GGTTTTTGT-3' | [22] |
| Chi-like sequence | 5'-GCTGGGG-3'/5'-CCCCAGC-3' | [40, 52] |
| Chi-like sequence | 5'-CCAG-3'/5'-CTGG-3' | [53, 54] |
| Chi-like sequence | 5'-GCWGGWGG-3'/5'-CCWCCWGC-3' | [55] |
| Topoisomerase I consensus cleavage sites | 5'-CAT-3'/5'-ATG-3' | [56] |
| 5'-CTY-3'/5'-RAG-3' | ||
| 5'-GTY-3'/5'-RAC-3' | ||
| DNA polymerase A pause site core sequence | 5'-GAG-3'/5'-CTC-3' | [57] |
| 5'-ACG-3'/5'-CGT-3' | ||
| DNA polymerase A/B frameshift hotspots | 5'-TGGNGT-3'/5'-ACNCCA-3' | [58, 59] |
| Vertebrate topoisomerase II consensus cleavage site | 5'-RNYNNCNNGYNGKTNYNY-3'/ | [60, 61] |
| 5'-RNRNAMCNRCNNGNNTNY-3' | ||
| Human hypervariable minisatellite core sequence | 5'-GGGCAGGANG-3'/5'-CNTCCTGCCC-3' | [62] |
| DNA polymerase A frameshift hotspots | 5'-TCCCCC-3'/5'-GGGGGA-3' | [59, 63] |
| DNA polymerase B frameshift hotspots | 5'-TTTT-3'/5'-AAAA-3' | [58] |
| Indel hotspot | 5'-GTAAGT-3'/5'-ACTTAC-3' | [64] |
| Hotspot motif | 5'-CCTCCCT-3'/5'-AGGGAGG-3' | [63] |
| Repeat element motif | 5'-CCCCACCCC-3'/5'-GGGGTGGGG-3' | [63] |
| Double strand break-generating motif | 5'-TGGGGG-3'/5'-CCCCCA-3' | [63] |
The sequences of the complementary strands are separated by "/". The ambiguity code symbols are: R = A/G, Y = C/T, K = G/T, M = A/C, S = G/C, W = A/T, N = A/C/G/T.
Motifs previously identified in other genes found in the DRB1-e2
| Polypurine tract | 5'-RRRRR(RRR)-3' | 88–95 | |
| 121–125a | |||
| 178–184 | |||
| 246–250 | |||
| Polypyrimidine tract | 5'-YYYYY-3' | 5–10 | |
| 36–41 | |||
| Immunoglobulin heavy chain class switch repeat | 5'-GAGCT-3' | 141–145 | |
| 248–252 | |||
| 5'-TGGGG-3' | 145–149 | ||
| Deletion hotspot consensus | 5'-TGAAGA-3' | 37–42b | |
| 5'-TGRRKM-3' | 145–150 | ||
| 250–255b | |||
| Chi-like sequence | 5'-GCTGGGG-3' | 143–149 | |
| 5'-CTGG-3' | 144–147b | ||
| 167–170b | |||
| 176–179 | |||
| Topoisomerase I consensus cleavage site | 5'-CTY-3' | 38–40 | |
| 251–253 | |||
| 5'-GTY-3' | 4–6 | ||
| 31–33 | |||
| 47–49b | |||
| 108–110b | |||
| 114–116b | |||
| 171–173b | |||
| 183–185b | |||
| 213–215b | |||
| 231–233b | |||
| DNA polymerase a pause site core sequence | 5'-GAG-3' | 51–53 | |
| 121–123 | |||
| 246–251 (2×) | |||
| 268–270 | |||
| 5'-ACG-3' | 2–4 | ||
| 115–117 | |||
| 116–118b | |||
| Deletion hotspot | 5'-YYYTG-3' | 7–11 | |
| 177–181b | |||
| 187–191 |
Only the motifs present in at least an (almost) entire allelic family are presented; some less common motifs are presented in the text. The ambiguity code symbols are: R = A/G, Y = C/T, K = G/T, M = A/C, S = G/C, W = A/T, N = A/C/G/T.
amotif ± 1 bp
b motif in non-coding strand corresponding to these bases in the coding strand.
Fully conserved stretches of a minimum of 3 bp in all DRB1-e2 sequences
| 4–8 | 5'- | Polypyrimidine tract |
| 30–32 | 5'-TGT-3' | |
| 36–43 | 5'-T | Deletion hotspot consensus sequence (5'-TGAAGA-3') in non-coding strand |
| 47–49 | 5'-GAC-3' | |
| 51–54 | 5'-GAGC-3' | |
| 56–60 | 5'-GGTGC-3' | |
| 107–119 | 5'-CGACAGCGACGTG-3' | |
| 121–124 | 5'- | Polypurine tract |
| 142–145 | 5'- | Part of the immunoglobulin heavy chain class switch repeat (5'-G |
| 5'-A | Part of the chi-like sequence (5'- | |
| 147–149 | 5'- | Part of deletion hotspot consensus sequence (5'-TG |
| 5'- | Part of the chi-like sequence (5'-GCTG | |
| 167–174 | 5'-CTGGAACA-3' | |
| 179–182 | 5'-GAAG-3' | |
| 210–215 | 5'-GTGGAC-3' | |
| 222–227 | 5'-TGCAGA-3' | |
| 229–235 | 5'-ACAACTA-3' | |
| 246–250 | 5'- | Polypurine trac |
| 248–250 | 5'- | Part of the immunoglobulin heavy chain class switch repeat (5'- |
| 252–256 | 5'- | Deletion hotspot consensus sequence |
| 261–263 | 5'-CAG-3' | |
| 267–269 | 5'-CGA-3' |
athe full motif as in Table 2
Figure 6Sliding window analysis of nucleotide diversity in HLA-DRB1 exon 2, displaying stretches of totally conserved bases in the 374 DRB1-e2 sequences (of the length ≥ 3 bp; thick grey lines below the abscissa). Also indicated are the previously identified ARS-coding codons (thick black lines above the diversity graph).