| Literature DB >> 18193213 |
Roger Horton1, Richard Gibson, Penny Coggill, Marcos Miretti, Richard J Allcock, Jeff Almeida, Simon Forbes, James G R Gilbert, Karen Halls, Jennifer L Harrow, Elizabeth Hart, Kevin Howe, David K Jackson, Sophie Palmer, Anne N Roberts, Sarah Sims, C Andrew Stewart, James A Traherne, Steve Trevanion, Laurens Wilming, Jane Rogers, Pieter J de Jong, John F Elliott, Stephen Sawcer, John A Todd, John Trowsdale, Stephan Beck.
Abstract
The human major histocompatibility complex (MHC) is contained within about 4 Mb on the short arm of chromosome 6 and is recognised as the most variable region in the human genome. The primary aim of the MHC Haplotype Project was to provide a comprehensively annotated reference sequence of a single, human leukocyte antigen-homozygous MHC haplotype and to use it as a basis against which variations could be assessed from seven other similarly homozygous cell lines, representative of the most common MHC haplotypes in the European population. Comparison of the haplotype sequences, including four haplotypes not previously analysed, resulted in the identification of >44,000 variations, both substitutions and indels (insertions and deletions), which have been submitted to the dbSNP database. The gene annotation uncovered haplotype-specific differences and confirmed the presence of more than 300 loci, including over 160 protein-coding genes. Combined analysis of the variation and annotation datasets revealed 122 gene loci with coding substitutions of which 97 were non-synonymous. The haplotype (A3-B7-DR15; PGF cell line) designated as the new MHC reference sequence, has been incorporated into the human genome assembly (NCBI35 and subsequent builds), and constitutes the largest single-haplotype sequence of the human genome to date. The extensive variation and annotation data derived from the analysis of seven further haplotypes have been made publicly available and provide a framework and resource for future association studies of all MHC-associated diseases and transplant medicine.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18193213 PMCID: PMC2206249 DOI: 10.1007/s00251-007-0262-2
Source DB: PubMed Journal: Immunogenetics ISSN: 0093-7711 Impact factor: 2.846
Haplotype sequence contig length, number of gaps and HLA allele types
| Haplotype | Length (bp) | Gaps | HLA-A | HLA-B | HLA-C | HLA-DQA1 | HLA-DQB1 | HLA-DRB1 |
|---|---|---|---|---|---|---|---|---|
| PGF | 4754829 | 0 | A*03010101 | B*070201 | Cw*07020103 | DQA1*010201 | DQB1*0602 | DRB1*150101 |
| COX | 4731878 | 0 | A*01010101 | B*080101 | Cw*070101 | DQA1*050101 | DQB1*020101 | DRB1*030101 |
| QBL | 4249272 | 5 | A*260101 | B*180101 | Cw*050101 | DQA1*050101 | DQB1*020101 | DRB1*030101 |
| APD | 4160965 | 16 | A*01010101 | – | – | – | – | − |
| DBB | 2330101 | 28 | A*02010101 | – | Cw*06020101 | DQA1*0201 | DQB1*030302 | DRB1*070101 |
| MANN | 4191014 | 10 | A*290201 | B*440301 | Cw*160101 | DQA1*0201 | DQB1*0202 | DRB1*070101 |
| MCF | 4087413 | 15 | [A*020101] | B*15010101 | Cw*030401 | DQA1*0303 | DQB1*030101 | – |
| SSTO | 3704249 | 22 | A*320101 | B*44020101 | Cw*050101 | DQA1*030101 | DQB1*030501 | DRB1*040301 |
Sequence length (bp) and number of gaps in each haplotype sequence, together with the HLA gene types obtained by BLAST against the IMGT/HLA database. Dashes or data in square brackets indicate the absence or the partial presence, respectively, of a gene owing to a sequence gap.
Distribution of substitutions and indels amongst haplotypes
| Haplotype | Substitutions | Indels | ALL |
|---|---|---|---|
| COX | 15,967 | 2,393 | 18,360 |
| QBL | 15,282 | 2,360 | 17,642 |
| SSTO | 14,982 | 2,300 | 17,282 |
| APD | 4,230 | 683 | 4,913 |
| DBB | 14,255 | 1,975 | 16,230 |
| MANN | 12,102 | 1,654 | 13,756 |
| MCF | 10,790 | 1,545 | 12,335 |
| Overall | 37,451 | 7,093 | 44,544 |
Number of variations found by comparing the PGF haplotype sequence with each of the other haplotype sequences in turn.
Distribution of substitutions and indels within different sequence regions amongst haplotypes
| Sequence region | Base pairs | COX | QBL | SSTO | APD | DBB | MANN | MCF | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| S | ID | S | ID | S | ID | S | ID | S | ID | S | ID | S | ID | ||
| Coding | 247,505 | 353 | 8 | 503 | 19 | 380 | 2 | 74 | 0 | 351 | 6 | 401 | 9 | 348 | 2 |
| UTR | 155,960 | 382 | 34 | 438 | 59 | 331 | 35 | 38 | 9 | 326 | 39 | 303 | 35 | 309 | 31 |
| Intronic | 1,283,472 | 3,141 | 571 | 3,135 | 590 | 2,658 | 505 | 602 | 147 | 2,897 | 509 | 2,185 | 393 | 2,126 | 404 |
| Total intragenic | 1,686,937 | 3,876 | 613 | 4,076 | 668 | 3,369 | 542 | 714 | 156 | 3,574 | 554 | 2,889 | 437 | 2,783 | 437 |
| Pseudogenic | 57,223 | 235 | 15 | 226 | 21 | 227 | 19 | 101 | 8 | 191 | 10 | 109 | 6 | 113 | 10 |
| Pseudogenic intron | 63,108 | 507 | 54 | 220 | 27 | 215 | 18 | 158 | 20 | 258 | 22 | 98 | 13 | 179 | 13 |
| Transcript exon | 78,092 | 190 | 30 | 207 | 33 | 119 | 22 | 71 | 8 | 136 | 17 | 88 | 16 | 70 | 15 |
| Transcript intron | 332,705 | 1,243 | 197 | 1,186 | 216 | 1,053 | 155 | 85 | 29 | 1,245 | 192 | 1,081 | 161 | 268 | 53 |
| REPEATS: | |||||||||||||||
| LINEs | 608,429 | 2,110 | 221 | 2,015 | 240 | 2,388 | 255 | 755 | 93 | 2,097 | 217 | 2,084 | 193 | 1,530 | 164 |
| SINEs | 428,567 | 1,381 | 428 | 1,316 | 401 | 1,311 | 385 | 346 | 134 | 1,229 | 318 | 928 | 241 | 936 | 271 |
| Other repeats | 487,863 | 2,605 | 207 | 2,518 | 229 | 2,514 | 207 | 925 | 56 | 2,748 | 199 | 2,198 | 177 | 2,170 | 169 |
| Total in repeats | 1,524,859 | 6,096 | 856 | 5,849 | 870 | 6,213 | 847 | 2,026 | 283 | 6,074 | 734 | 5,210 | 611 | 4,636 | 604 |
| Microsatellite | 15,185 | 186 | 168 | 95 | 85 | 222 | 198 | 14 | 29 | 60 | 76 | 61 | 71 | 90 | 68 |
| All above | 3,297,590 | 12,333 | 1,933 | 11,859 | 1,920 | 11,418 | 1,801 | 3,169 | 533 | 11,538 | 1,605 | 9,536 | 1,315 | 8,139 | 1,200 |
| Other intergenic | 996,720 | 3,634 | 460 | 3,423 | 440 | 3,564 | 499 | 1,061 | 150 | 2,717 | 370 | 2,566 | 339 | 2,651 | 345 |
| Total | 4,754,829 | 15,967 | 2,393 | 15,282 | 2,360 | 14,982 | 2,300 | 4,230 | 683 | 14,255 | 1,975 | 12,102 | 1,654 | 10,790 | 1,545 |
Variations shown in Table 2 ascribed to sequence regions identified during annotation. These included exonic, UTR and intronic regions of coding; pseudogenic and transcript loci; repeat elements, microsatellites and other intergenic regions
S Substitution, ID indel
Codon variation caused by substitutions in HLA and other gene loci
| Codons variation by virtue of substitutions | COX | QBL | SSTO | APD | DBB | MANN | MCF | |||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| HLA | Other | Total | HLA | Other | Total | HLA | Other | Total | HLA | Other | Total | HLA | Other | Total | HLA | Other | Total | HLA | Other | Total | ||
| Synonymous | 49 | 81 | 130 | 71 | 106 | 177 | 72 | 57 | 129 | 1 | 24 | 25 | 66 | 69 | 135 | 59 | 79 | 138 | 80 | 52 | 132 | |
| Non-synonymous | Total Conservative | 125 | 76 | 201 | 184 | 121 | 305 | 164 | 72 | 236 | 19 | 27 | 46 | 120 | 76 | 196 | 144 | 91 | 235 | 147 | 56 | 203 |
| 68 | 42 | 110 | 102 | 72 | 174 | 92 | 39 | 131 | 11 | 18 | 29 | 67 | 40 | 107 | 77 | 60 | 137 | 82 | 35 | 117 | ||
| Non-conservative | 57 | 34 | 91 | 82 | 49 | 131 | 72 | 33 | 105 | 8 | 9 | 17 | 53 | 36 | 89 | 67 | 31 | 98 | 65 | 21 | 86 | |
| Total | 174 | 157 | 331 | 255 | 227 | 482 | 236 | 129 | 365 | 20 | 51 | 71 | 186 | 145 | 331 | 203 | 170 | 373 | 227 | 108 | 335 | |
Coding substitutions analysed for their effects on protein sequences and listed in by haplotype for HLA genes (HLA-A HLA-B HLA-C HLA-DRB1 HLA-DRA HLA-DQA1 HLA-DQB1 HLA-DPA1 HLA-DPB1) and for all other genes according to the changes they induced in codons as either synonymous, non-synonymous conservative, or non-synonymous non-conservative changes.
Fig. 1Annotation and variation data in VEGA. VEGA ‘overview’ (a), ‘detailed view’ (b) and ‘basepair view’ (c) example of the variation in the OR2J1 locus in which a STOP codon is present in all haplotypes except MCF
Splice-variant statistics for PGF annotation
| Type | No. |
|---|---|
| Total splice variants | 1,267 |
| Coding | |
| Unprocessed_pseudogene | 50 |
| Processed_pseudogene | 41 |
| Expressed_pseudogene | 7 |
| Transcript | 271 |
| Putative | 71 |
| Retained_intron | 263 |
| Nonsense_mediated_decay | 30 |
| Artefact | 11 |
| Total loci | 320 |
Splice variants annotated in the PGF haplotype.
Gene annotation statistics for eight MHC haplotypes
| Locus type | PGF | COX | QBL | SSTO | APD | DBB | MANN | MCF |
|---|---|---|---|---|---|---|---|---|
| Coding | 165 | 159 | 150 | 131 | 82 | 146 | 129 | 150 |
| Transcript | 28 | 28 | 26 | 26 | 19 | 26 | 27 | 22 |
| Putative | 18 | 18 | 15 | 15 | 6 | 16 | 12 | 14 |
| Pseudogenes total | 98 | 95 | 93 | 98 | 59 | 92 | 95 | 75 |
| Unprocessed | 50 | 48 | 48 | 53 | 36 | 52 | 53 | 42 |
| Processed | 41 | 42 | 40 | 39 | 19 | 34 | 37 | 28 |
| Expressed | 7 | 5 | 5 | 6 | 4 | 6 | 5 | 5 |
| Artefact | 11 | 11 | 10 | 11 | 0 | 0 | 0 | 0 |
| Total loci | 320 | 311 | 294 | 281 | 166 | 281 | 264 | 261 |
| Total variants | 1,267 | 1,191 | 1,155 | 1,058 | 568 | 1,138 | 960 | 1,115 |
Annotation statistics for loci in each haplotype. For definitions of locus types see “Materials and methods”.
Fig. 2Variation and annotation map of eight MHC haplotypes. The map represents the complete reference sequence (orange bar split into three 1.6 Mb sections) labelled PGF and marked with a scale (Mb) and approximate megabase positions on the NCBI36 build of chromosome 6 (grey milestones). Below the reference sequence are arrows representing gene positions and orientations colour-coded for variation status (invariable, black; with synonymous variation only, green; with non-synonymous, conservative variation, red; with non-synonymous, non-conservative variation, purple; see Table 8) and their symbols on a band denoting MHC class (extended class I, green; class I, yellow; class III, pale orange; class II, light blue; extended class II, pink; outside MHC, pale grey). Above the reference sequence, coloured bands represent the sequences of the other seven haplotypes (COX, orange; QBL, mauve; APD, yellow; DBB, green; MANN, light blue; SSTO, dark blue; MCF, purple) with sequence gaps in dark grey; the RCCX hyper-variable region shown with green (C4A block) and/or red (C4B block) or black (block absent), and the HLA–DRB hyper-variable region in shades of blue-green. Above each haplotype bar, a bar-graph represents total variation between the haplotype and the reference sequence (total variations/10 kb) in dark red. Re-examination of the sequence AL645922 from the PGF haplotype, which contains the RCCX region, has shown that the original assembly was erroneous. Correction of these errors leads us now to the conclusion that the C4A gene precedes the C4B gene in this clone sequence. This new gene order is reflected in Fig. 2
Other newly annotated loci
| Locus | Locus type |
|---|---|
| XXbac-BCX196D17.5 | Transcript |
| XXbac-BPG116M5.14 | Putative |
| XXbac-BPG116M5.15 | Putative |
| XXbac-BPG116M5.16 | Putative |
| XXbac-BPG118E17.9 | Putative |
| XXbac-BPG126D10.10 | Processed pseudogene |
| XXbac-BPG126D10.11 | Processed pseudogene |
| XXbac-BPG13B8.10 | Transcript |
| XXbac-BPG13B8.9 | Unprocessed pseudogene |
| XXbac-BPG154L12.4 | Putative |
| XXbac-BPG181B23.4 | Transcript |
| XXbac-BPG181M17.4 | Putative |
| XXbac-BPG246D15.8 | Transcript |
| XXbac-BPG248L24.10 | Unprocessed pseudogene |
| XXbac-BPG248L24.9 | Processed pseudogene |
| XXbac-BPG249D20.9 | Putative |
| XXbac-BPG250I8.13 | Transcript |
| XXbac-BPG254F23.5 | Putative |
| XXbac-BPG254F23.6 | Putative |
| XXbac-BPG254F23.7 | Transcript |
| XXbac-BPG254F23.7 | Putative |
| XXbac-BPG27H4.7 | Transcript |
| XXbac-BPG27H4.8 | Transcript |
| XXbac-BPG294E21.7 | Processed pseudogene |
| XXbac-BPG296P20.14 | Putative |
| XXbac-BPG296P20.15 | Putative |
| XXbac-BPG299F13.14 | Putative |
| XXbac-BPG308J9.3 | Transcript |
| XXbac-BPG308K3.5 | Putative |
| XXbac-BPG308K3.6 | Transcript |
| XXbac-BPG309N1.15 | Unprocessed pseudogene |
| XXbac-BPG32J3.18 | Putative |
| XXbac-BPG8G10.2 | Unprocessed pseudogene |
| DAQB-12N14.5 | Transcript |
| DAQB-331I12.5 | Putative |
| DAQB-335A13.8 | Transcript |
Newly annotated loci without HGNC symbols.
Haplotype variation at splice sites
| Gene | Variant | Affected exons | Donor* | Acceptor* | dbSNP cluster ID | Best evidence | PGF | QBL | COX | SSTO | DBB | APD | MANN | MCF |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2 | 3/4 | ggt | t | rs28400887 | cDNA | NC | NC | NC | C | ND | NC | NC | C | |
| 5 | 2/3 | ggt | t | rs28400887 | EST | NC | NC | NC | C | ND | NC | NC | C | |
| 7 | 3/4 | ggt | cgg | – | EST | NC | ND | NC | C | NC | ND | ND | ND | |
| 7 | 3/4 | ggt | cgg | – | EST | NC | NC | ND | C | NC | ND | ND | NC | |
| 4 | 4/5 | ggt | c | rs707947 | cDNA | C | C | C | NC | NC | ND | NC | NC | |
| 5 | 4/5 | ggt | ta | rs3667 | cDNA | NC | NC | NC | C | C | ND | C | C | |
| 2 | 2/3 | g | cag | rs9271083 | EST | NC | C | C | C | C | ND | C | ND |
Gene loci and variants that are affected by disruptive variations at splice sites. C Canonical splice site (donor = ngt; acceptor = nag), NC non-canonical, and ND no data (gene absent or gap). Donor and acceptor variable nucleotides in bold with equivalent dbSNP cluster ID number given in column to right. The C4A and C4B genes are, for these purposes, effective duplicates of each other. The two TRIM31 variants share the same splice site (but differ elsewhere in structure). The two HLA–DQA variants share the same donor but have alternative acceptors. Note the mutually exclusivity of these variants amongst the haplotypes (Hoarau et al. 2004; Hoarau et al. 2005).
Variation status of the main coding variant of each gene in the PGF haplotype annotation
| Invariable | Synonymous variation only | Non-synonymous variation | |
|---|---|---|---|
| Conservative variation | Non-conservative variation | ||
| BAT1a | AGER | BAT2 | |
| BAT5 | BRD2a | BAT3 | |
| C2 | BTNL2 | BAT4 | |
| CREBL1 | C6orf21 | C4A | |
| DAXX | C6orf27 | C4B | |
| DDR1a | CFB | C6orf10 | |
| GNL1 | DOM3Z | C6orf100 | |
| GPSM3 | DPCR1 | C6orf15 | |
| GTF2H4 | EGFL8 | C6orf205 | |
| HLA-DOAa | EHMT2 | C6orf25 | |
| HSPA1B | FKBPL | C6orf47 | |
| LY6G6C | GABBR1 | CCHCR1 | |
| MSH5 | HLA-DMA | CDSN | |
| PBX2 | HLA-DOB | COL11A2 | |
| POU5F1 | HLA-DQB2 | DHX16 | |
| PPP1R11 | HLA-DRA | HLA-A | |
| PRR3 | HSPA1A | HLA-B | |
| RING1 | LY6G6D | HLA-C | |
| RNF5 | MCCD1 | HLA-DMB | |
| RXRB | MOGa | HLA-DPB1 | |
| SYNGAP1 | OR11A1 | HLA-DPB2 | |
| TRIM10 | OR2H2 | HLA-DQA1 | |
| TRIM26 | OR2J1 | HLA-DQA2 | |
| TRIM27 | OR2J2 | HLA-DQB1 | |
| TRIM39a | OR2J3 | HLA-DRB1 | |
| VPS52 | PHF1 | HLA-E | |
| ZBTB12 | PSMB9 | HLA-F | |
| ZBTB9 | RPP21 | HLA-G | |
| ZNRD1 | SFTPG | HSPA1L | |
| SKIV2L | IER3 | ||
| SLC44A4 | KIAA1949 | ||
| TAP2 | LTA | ||
| TRIM15 | LY6G5B | ||
| WDR46 | MDC1 | ||
| ZBTB22 | MICA | ||
| ZNF311 | MICB | ||
| NFKBIL1 | |||
| NOTCH4 | |||
| OR10C1 | |||
| OR12D2 | |||
| OR12D3 | |||
| OR5U1 | |||
| OR5V1 | |||
| PPT2 | |||
| PSORS1C1 | |||
Gene coding sequences may be invariable (no recorded variation), have synonymous variation only (variation at the nucleotide but not the peptide level) or have non-synonymous variation (variation at both the nucleotide and peptide level), which in turn, may be conservative or non-conservative variation according to the criteria of positive or negative values in the BLOSUM62 matrix. The main coding variant is that numbered 001 in the VEGA database except for LY6G6E and HLA-DPB2 where the main variant is not coding. C4A and C4B were excluded from calculation of variation because the order of these genes in the PGF sequence precluded alignment with other haplotype sequences. Nevertheless, alignment of the coding sequences for each gene separately showed that there were non-synonymous, non-conservative variations. HLA-DRB5 is present in this study only in the PGF haplotype and, therefore, here appears invariable
aCoding genes where the main variant does not harbour non-conservative, non-synonymous variation but other variants do (BAT1 BRD2 DDR1 C6orf136 HLA-DOA MOG KIFC1 and TRIM39).
bSimilarly, coding genes where the main variant does not harbour conservative non-synonymous variation but other variants do (PSMB8).
Major indels in the form of retrotransposible elements
| Chr6 pos’n | Flanking loci | Presence in haplotype | Details | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
| PGF | COX | QBL | SSTO | APD | DBB | MANN | MCF | |||
| 29002370 | TRIM27:C6orf100 | C | C | C | C | ? | ? | C | C | |
| 29440424 | OR5V1:OR12D3 | ✓ | ✓ | ? | ✓ | ? | ? | X | X | AluYa5 |
| 29784097 | C6orf40:HCP5P15 | ✓ | X | ✓ | ✓ | ? | X | X | X | AluYa5/8 175..304 |
| 29788451 | Within HCP5P15 | X | X | ✓ | X | ? | ✓ | ✓ | X | AluYa5/8 176..310 |
| 29794763 | HCP5P15:HLA-F | ✓ | X | X | ✓ | ? | X | X | X | SVA_E plus simple rpt.s |
| 29922942 | HLA-G:MICF | ✓ | X | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | L1ME3B 5940..6165 |
| 29954495 | MICF:HLA-H | ✓ | X | X | X | X | X | X | X | HERVK9 inserted in MER9 |
| 30008633 | HLA-K:HLA-21 | ✓ | X | X | ✓ | X | X | ✓ | ? | SVA E/F plus simple rpt. |
| 30106475 | HCG8:ETF1P1 | X | ✓ | X | X | ✓ | ✓ | X | X | AluYb8 |
| 30547387 | SUCLA2P:RANP1 | X | X | X | ✓ | ? | X | X | ? | AluJb 1..283 and parts of MLT1D/L1PBa |
| 31079582 | C6orf205:HCG22 | X | X | ✓ | X | X | X | ? | X | AluYb8 37..297 |
| 31117638 | C6orf205:HCG22 | ✓ | X | X | ✓ | ✓ | X | ✓ | ✓ | AluY (whole & part) and MER63 1017..1062 |
| 31301931 | HCG27:HLA-C | ✓ | ✓ | X | ✓ | ? | ✓ | ✓ | ✓ | HERV3 part (6489...7339) |
| 31320352 | HCG27:HLA-C | ✓ | X | X | X | ? | X | X | X | SVA_F 349..850 plus GC rich rpt. |
| 31358220 | RPL3P2:WASF5P | X | X | ✓ | X | ? | X | X | X | AluY 35..306 |
| 31400900 | WASF5P:HLA-B | ✓ | ✓ | ✓ | ✓ | ? | X | X | X | AluSp plus L1PREC2 part (3205...4617) |
| 31405648 | WASF5P:HLA-B | ✓ | X | ✓ | ✓ | ? | X | x | x | HERVIP10F (part) and AluSg (only cf CX DB) |
| 31418854 | WASF5P:HLA-B | ✓ | ✓ | ✓ | ✓ | ? | ✓ | ✓ | X | L1PA5 part (5503..5876) |
| 31530995 | MICA:HCP5 | ✓ | X | ? | ✓ | ? | ? | X | ? | SVA B/F plus simple rpt.s |
| 32421915 | within C6orf10 | ✓ | X | X | ✓ | X | X | ✓ | X | AluYb8 |
| 32486228 | BTNL2:HLA-DRA | ✓ | ✓ | ✓ | ✓ | ✓ | X | X | X | L1P1/L1HS parts |
| 32655545 | HLA-DRB1 intron 5 | ✓ | x | x | X | ? | ✓ | ✓ | ? | AluYa5 within more or less partial LTR12 |
| 32660731 | HLA-DRB1 intron 1 | X/X | ✓/X | X/X | ✓/✓ | ? | ✓/✓ | ✓/✓ | ? | Tigger4/AluSx |
| 32661119 | HLA-DRB1 intron 1 | C | C | C | C | ? | C | C | ? | |
| 32663167 | HLA-DRB1 intron 1 | X/✓ | ✓/✓ | ✓/✓ | ✓/X | ? | ✓/X | ✓/X | ? | AluSq/AluY |
| 32669534 | HLA-DRB1:HLA-DQA1 | C | C | C | C | ? | C | C | ? | |
| 32679461 | HLA-DRB1:HLA-DQA1 | ✓ | X | X | X | ? | X | X | ? | AluY |
| 32693271 | HLA-DRB1:HLA-DQA1 | ✓ | ✓ | ✓ | ✓ | ? | X | ✓ | ? | L1PA4 (parts) |
| 32697545 | HLA-DRB1:HLA-DQA1 | X | X | X | X | ? | ✓ | ✓ | ? | L1HS 7..6032 |
| 32701428 | HLA-DRB1:HLA-DQA1 | ✓ | X | ✓ | ✓ | ? | x | X | x | L1PA2 part and from CX: MER2B and AluY |
| 32728179 | HLA-DQA1: HLA-DQB1 | C | C | C | C | ? | C | C | C | |
| 32739664 | within HLA-DQB1 | X | X | ✓ | X | ? | X | ✓ | X | AluY |
| 32743646 | HLA-DQB1: MTCO3P1 | X | X | X | X | ? | ✓ | X | X | LTR13 |
| 32746780 | HLA-DQB1: MTCO3P1 | X | X | X | X | ? | ✓ | X | ✓ | L1PA4 (parts) |
| 32751442 | HLA-DQB1: MTCO3P1 | X | X | X | X | ? | X | ✓ | X | LTR5_Hs |
| 32753489 | HLA-DQB1: MTCO3P1 | ✓ | ✓ | ✓ | ✓ | ? | X | ✓ | X | L1PA10 268..4888 around L1PA4 (part) |
| 32756020 | HLA-DQB1: MTCO3P1 | X | X | X | X | ? | X | ✓ | X | LTR5_Hs |
| 32764047 | HLA-DQB1: MTCO3P1 | ✓ | ✓ | ✓ | ✓ | ? | X | ✓ | X | AluSx |
| 32765930 | HLA-DQB1: MTCO3P1 | X | X | X | X | ? | X | ✓ | X | AluYa5 |
| 32785062 | MTCO3P1:HLA-DQB3 | ✓ | ✓ | ✓ | ✓ | ? | X | X | X | Tigger4 (Zombi)/L1HS (parts) and T-rich |
| 32795150 | MTCO3P1:HLA-DQB3 | X | X | X | X | X | ✓ | X | ✓ | AluY |
| 32796573 | MTCO3P1:HLA-DQB3 | X | X | X | X | X | ✓ | X | ✓ | AluY |
| 32815974 | HLA-DQB3: HLA-DQA2 | X | ✓ | X | X | ✓ | X | X | X | AluYa5 |
| 32857369 | HLA-DQB2:HLA-DOB | ✓ | X | ✓ | ✓ | X | X | ✓ | X | AluYg6 |
| 32881426 | HLA-DQB2:HLA-DOB | X | X | ? | ✓ | ✓ | X | X | ? | AluYa5 |
| 32887265 | HLA-DQB2:HLA-DOB | ✓ | X | ? | X | X | X | ✓ | ✓ | LTR42 and parts of L1MC5 and AluSc 3..105 |
| 33201559 | within HLA-DPB2 | ✓ | X | X | X | ✓ | ? | X | ? | AluYb8 |
| 33234360 | HCG24:COL11A2 | ✓ | ✓ | ✓ | ? | ? | ✓ | ✓ | X | AluY (1..293) AluJb (26..306) |
Where there was a break in the cross_match discrepancy list match between two clones, the inserted sequence was extracted and subjected to analysis by RepeatMasker to assess the number of major indels that were a result of retrotransposible elements. Chromosome 6 position (NCBI35/36) of the inserted sequence was that of the midpoint where the sequence was an insertion in PGF or the position before the deletion in PGF. Flanking loci were retrieved from the annotation. Insertion in a haplotypes is indicated by ‘✓’, deletion by ‘X’, complex regions by ‘C’. Where there is a sequence gap in a haplotype corresponding to the indel, this is shown by ‘?’. Four complex deletion/insertion events are listed: A, B, C and D; for details, see text.
Fig. 3Clusters of haplotypes in the European haplotypic diversity. Phylogenetic relationship of 180 founder SNP haplotypes from CEPH trios spanning a 214-kb segment of the MHC class II region, including the HLA-DRB1 and HLA-DQB1 genes (54 substitutions from rs2187823 to rs2856691). a Sequenced haplotypes are widely distributed in this NJ tree and represent the vast majority of the variation in the population sampled. Four-digit alleles are indicated for the corresponding DRB1 and DQB1 genes in each haplotype ID label to highlight the HLA haplotypic distribution based on the underlying nucleotide variation. The NJ tree was constructed using pairwise genetic distances considering the Kimura 2-parameters model without correction for rate variation among sites as implemented in the MEGA2 software (Kumar et al. 2001). b Each haplotype sequenced is associated to a single haplotype cluster. This phylogenetic network (Bandelt et al. 1999) also shows that clusters (shaded area) are constituted by one central haplotype and its derivatives. Circles represent individual haplotypes, and the size of the circle is proportional to the haplotype frequency. The length of the lines connecting nodes is relative to the distance between them, e.g. distances within shaded areas (clusters) never exceed three mutation steps. Cluster of haplotypes sharing HLA alleles with sequenced cell lines are named accordingly: COX and QBL: DRB1*0301 DQB1*0201–PGF: DRB1*1501 DQB1*0602–APD: DRB1*1301 DQB1*0603–MCF: DRB1*0401 DQB1*0301–DBB: DRB1*0701 DQB1*0303–SSTO: DRB1*0403 DQB1*0302–MANN: DRB1*0701 DQB1*0202. HLA haplotypes DRB1*1103–DQB1*0301 and DRB1*0101–DQB1*0501 indicate the two major haplotype clusters not represented in the MHC haplotype project data