| Literature DB >> 29438392 |
Douglas S Goodin1, Pouya Khankhanian2, Pierre-Antoine Gourraud1,3,4, Nicolas Vince3,4.
Abstract
OBJECTIVE: To determine the relationship between highly-conserved extended-haplotypes (CEHs) in the major histocompatibility complex (MHC) and MS-susceptibility.Entities:
Mesh:
Year: 2018 PMID: 29438392 PMCID: PMC5810982 DOI: 10.1371/journal.pone.0190043
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Location of the 11 SNPs in the haplotype surrounding the Class II DRB1 gene on chromosome 6 (6p21.3), which had the greatest disease association of any SNP haplotype in the region (see text).
The blue rectangles span the regions from the start to the stop points of the Class II genes: HLA-DRB5, HLA-DRB1, HLA-DQA1, and HLA-DQB1. The centromere of Chromosome 6 lies to the right of this portion of 6p21.3.
Selected SNP haplotypes in the Class II region of chromosome 6.
| SNP | HLA | |||
|---|---|---|---|---|
| Name | Haplotype | Association | WTCCC | EPIC |
| 10110100010 | 0.12 | 0.11 | ||
| 00000000100 | 0.02 | 0.02 | ||
| 00000010001 | 0.19 | 0.21 | ||
| 00000000001 | 0.11 | 0.13 | ||
| 10100010001 | 0.09 | 0.08 | ||
| 01011100100 | 0.10 | 0.09 | ||
| 10110100011 | 0.00 | 0.00 | ||
| 01000001010 | 0.11 | 0.11 | ||
| 00000010010 | 0.02 | 0.03 | ||
| 10111111001 | 0.01 | 0.01 | ||
| 10100100011 | 0.00 | 0.00 | ||
| 10111100010 | 0.00 | 0.00 | ||
| 10100100010 | 0.00 | 0.00 | ||
| 00000100010 | 0.00 | 0.00 |
† The "Name" is arbitrary and indicates the order of haplotype identification in the EPIC dataset [29, 30]. The SNP haplotype represents the haplotypes identified using the set of 11 SNPs shown in Fig 1 and provided in text. The number “0” indicates the presence of the major allele and the number “1” indicates the presence of the minor allele (in the control population) at the particular SNP location. Only 14 selected SNP-haplotypes (of the 174 present in the WTCCC) are listed. Haplotype frequencies found in two independent datasets (EPIC and WTCCC) are shown [29, 30]. Frequencies are provided to 2 significant digits after the decimal. Those listed as (0.00) were less than 0.005. Each of the 174 haplotypes had very specific associations with specific Class II haplotypes. For example, each of the associations (shown in the Table) of specific SNP-haplotypes with specific HLA haplotypes were highly significant. Almost all had of p-value (by Chi square analysis) of (p<10−300). The only two exceptions to this were for HLA-DRB1*07:01~HLA-DQB1*02:02~a3 (p<10−151) and for HLA-DRB1*15:01~HLA-DQB1*06:02~a34 (p<10−290). Moreover, both the EPIC and the WTCCC datasets had the same Class II HLA associations with the different SNP-haplotypes.
†† In both EPIC and the WTCCC, a3 was equally associated with four HLA haplotypes: HLA-DRB1*04:01~HLA-DQB1*03:01, HLA-DRB1*04:01~HLA-DQB1*03:02, HLA-DRB1*04:04~HLA-DQB1*03:02, and HLA-DRB1*07:01~HLA-DQB1*02:02.
§ In both EPIC and WTCCC, a27 is associated with two haplotypes: HLA-DRB1*15:01~HLA-DQB1*06:02, and HLA-DRB1*15:01~HLA-DQB1*05:02,. In WTCCC, 58% (28/48) were HLA-DRB1*15:01~HLA-DQB1*06:02, whereas, in EPIC, none of the five a27 SNP haplotypes were associated with this particular HLA haplotype.
§§ The single example of the a34 SNP haplotype in EPIC was associated with the HLA-DRB1*15:01~HLA-DQB1*06:02 HLA haplotype. No examples of the a36 SNP haplotype were present in EPIC who also had HLA information.
Fig 2The HLA haplotype/SNP haplotype associations–both by SNP haplotype (A) and also by HLA haplotype (B)–for selected SNP haplotypes (some of which are presented in Table 1). Other haplotypes not presented also had very specific haplotype associations [32].
Fig 3The WTCCC dataset consists of 59,884 haplotypes, of which 10,078 represent different (unique) combinations of the 5 HLA alleles and the SNP haplotypes (see text).
For the purpose of this graph, these unique haplotypes (CEHs) have been sorted according to their descending frequency of occurrence in the WTCCC dataset. The cumulative number of unique haplotypes (beginning with the highest frequency haplotype) has been plotted against the percentage of total number of haplotypes in the population. As can be appreciated from the graph, the large majority (~80%) of the different CEHs have only a very low frequency, whereas 80% of the haplotypes in the population are accounted for by only small number of very common CEHs (i.e., ~10 haplotypes).
Fig 4Rank order for the 10 most common extended haplotypes for the entire WTCCC dataset (labeled: c1 to c10078; in descending order of frequency).
The rank order of the haplotypes for each participating region are shown separately (see for definitions of those haplotypes, which have been colored in the figure based on the overall 10 most common haplotypes in the WTCCC). Regions are ordered (from left to right) based on the descending frequency of the c2 haplotype. Only cases are available for all regions. Nevertheless, both the complete WTCCC (Case and Control) and the EPIC (Case and Control) populations are also included for comparison.
Common a1-containing extended haplotypes in the WTCCC.
| HLA Haplotype | ||||
|---|---|---|---|---|
| Name | Frequency | OR | p-value | |
| 2961 | 3.2 (3.0–3.5) | < E-168 | ||
| 1465 | 2.2 (2.0–2.5) | < E-38 | ||
| 728 | 2.8 (2.4–3.3) | < E-36 | ||
| 440 | 3.9 (3.1–4.8) | < E-39 | ||
| 405 | 3.4 (2.7–4.2) | < E-29 | ||
| 320 | 3.7 (2.9–4.8) | < E-27 | ||
| 289 | 2.1 (1.6–2.7) | < E-7 | ||
| 229 | 2.5 (1.9–3.4) | < E-9 | ||
| 178 | 4.5 (3.2–6.3) | < E-20 | ||
| 135 | 2.9 (2.0–4.2) | < E-9 | ||
| 124 | 3.1 (2.0–4.7) | < E-7 | ||
| 105 | 3.2 (2.1–5.0) | < E-7 | ||
| 84 | 3.7 (2.2–6.1) | < E-7 | ||
| 73 | 3.4 (2.0–5.6) | < E-6 | ||
| 71 | 2.6 (1.6–4.3) | < E-3 | ||
| 64 | 3.1 (1.8–5.4) | < E-4 | ||
| 60 | 4.3 (2.4–7.9) | < E-6 | ||
| 58 | 4.5 (2.5–8.1) | < E-7 | ||
| 57 | 1.9 (1.1–3.3) | < 0.05 | ||
| 55 | 2.9 (1.6–5.1) | < E-3 | ||
| 54 | 1.8 (1.0–3.3) | < 0.05 | ||
| 52 | 3.2 (1.6–6.3) | < E-3 | ||
| 52 | 3.3 (1.7–6.4) | < E-3 | ||
| 51 | 3.0 (1.6–5.6) | < E-3 | ||
| 43 | 5.5 (2.8–10.9) | < E-7 | ||
| 29 | 20.3 (6.1–67.3) | < E-11 |
†† a1 containing haplotypes with ≥ 50 representations in the WTCCC. Two additional haplotypes with fewer representations are also shown.
† Arbitrary name for haplotype (sorted in descending order of frequency) for the entire WTCCC population.
* Odds ratio (OR) of disease for individuals having 1 copy of the listed haplotype compared to having no other copies of the HLA-DRB1*15:01~HLA-DQB1*06:02~a1 Class II haplotype (95% CI range in parenthesis). A Bonferroni correction for the number of haplotypes with 50 or more representations (146) would require a significance level of p<3*E-4.
** Significance of the association between having 1 copy of the specific allele and the disease (MS) compared to having no copies. The p-values are expressed in scientific notation as powers of 10 (E). All observations with (p<0.001) still demonstrated a statistically significant effect even after adjustment for population stratification, geographic stratification, and gender. Moreover, including each of these haplotypes in the same regression equation demonstrated that each of the listed CEHs was independently associated with having MS.
§ These two haplotypes also differed (non-significantly) in their disease-association for having two copies of each allele compared to having no copies of the HLA-DRB1*15:01~HLA-DQB1*06:02~a1 Class II haplotype. Thus, these ORs are
For c2: OR [two copies] = 5.8 (3.4–9.9)
And, for c3: OR [two copies] = 2.7 (1.3–5.5)
§§ The Class I and Class II portions of each listed haplotype were significantly associated with each other beyond the Bonferroni-adjusted level of significance. The only exception to this rule was for the haplotype c139. In this case, the association had a p-value of: p = 4.42*E−8
Fig 5Disease-associations for the different SNP-haplotype combinations with the Class II HLA haplotypes of: (A) DRB1*1501~DQB1*0602 and: (B) DRB1*03:01~DQB1*02:01 & DRB1*13:03~DQB1*03:01. The odds ratios (OR) are given comparing cases to controls with regard to carrying either one or two copies of the risk-haplotype as opposed to carrying zero copies. In these circumstances, the disease association varied markedly, depending upon which SNP-haplotype carried the HLA-haplotype. Such an observation indicates that the observed disease-associations were not due to these specific HLA alleles but, rather, to something else, which was present on these SNP-haplotypes (see text). For unclear reasons, this data set did not replicate the findings of Chao and coworkers [19] with respect to the HLA-B*08, HLA-B*13, HLA-B*27, HLA-B*32, and HLA-B*52 haplotypes (see text). In the WTCCC data, however, vast majority (96−100%) of the haplotypes that carried these HLA-B alleles, when they included the HLA-DRB1*15:01 allele, also carried the (a1) SNP haplotype. As a result, because they also carried the (a1) SNP haplotype, each of these haplotypes was strongly associated with an increased MS-risk except for the extremely rare HLA-B*52~HLA-DRB1*15:01~a1 haplotype (where OR = 1.01).
Fig 6Different SNP haplotypes at distances of 1 to 4 hamming units from the a1 SNP haplotype (SNP differences highlighted in red; for SNP definitions see text).
Several of these SNP haplotypes (indicated in yellow), at times, carried the HLA-DRB1*15:01~HLA-DQB1*06:02 HLA haplotype whereas others (indicated in blue) never did. HLA haplotypes are highlighted in green. Thus, whether or not a given SNP haplotype carried this HLA haplotype seemed to be, not a function of the hamming distance, but rather, a property of the specific SNP haplotype involved.
Fig 7Plot of the proportion of carriers of the HLA-DRB1*15:01~HLA-DQB1*06:02 haplotype at different hamming distances from the (a1) SNP haplotype.
The magenta line represents the average of all haplotypes at a given Hamming distance. Also plotted are the subgroups of haplotypes carrying HLA-DRB1*15:01~HLA-DQB1*06:02 less than 10 percent of the time (blue) and those carrying this HLA haplotype 10 or more percent of the time (orange line). Black dots represent individual observations. Certainly, as hamming distance increased, the percentage of haplotypes carrying HLA-DRB1*15:01~HLA-DQB1*06:02 diminishes (magenta). However, even at a hamming distance of 4, some specific SNP haplotypes carry this HLA haplotype almost half of the time.
Common a2-, a6-, or a14-containing (or other) extended haplotypes.
| HLA Haplotype | |||||||
|---|---|---|---|---|---|---|---|
| Name | Frequency | OR | p-value | ||||
| 212 | 2.0 (1.4–2.7) | < E-4 | |||||
| 128 | 2.1 (1.5–3.0) | < E-4 | |||||
| 75 | 1.7 (1.0–2.9) | < 0.05 | |||||
| 3782 | 1.1 (1.0–1.2) | < 0.05 | |||||
| 397 | 0.9 (0.7–1.2) | ns | |||||
| 181 | 1.7 (1.2–2.3) | < E-2 | |||||
| 121 | 0.6 (0.4–1.0) | < 0.05 | |||||
| 91 | 3.0 (1.8–4.9) | < E-5 | |||||
| 71 | 1.6 (0.9–2.6) | ns | |||||
| 68 | 1.1 (0.6–2.0) | ns | |||||
| 63 | 1.3 (0.7–2.3) | ns | |||||
| 161 | 1.9 (1.3–2.8) | < E-3 | |||||
| 69 | 2.6 (1.5–4.5) | < E-3 | |||||
| 64 | 1.9 (1.1–3.4) | < 0.05 | |||||
| 906 | 0.5 (0.4–0.6) | < E-11 | |||||
| 361 | 0.5 (0.3–0.6) | < E-5 | |||||
| 293 | 0.5 (0.3–0.7) | < E-4 | |||||
| 211 | 0.5 (0.3–0.7) | < E-3 | |||||
| 173 | 0.6 (0.4–0.9) | < 0.05 | |||||
| 166 | 0.6 (0.4–0.9) | < E-2 | |||||
| 87 | 0.4 (0.2–0.8) | < E-2 | |||||
| 79 | 3.1 (1.8–5.5) | < E-4 | |||||
†† haplotypes with ≥ 50 representations in the WTCCC. All such haplotypes carrying the a2, a6, or a14 SNP haplotype are included. For each of the listed haplotypes, the Class I and Class II portions were significantly associated with each other far beyond the Bonferroni-adjusted level of significance.
† Arbitrary name for haplotype (sorted in descending order of frequency) for the entire WTCCC population.
* Odds ratio (OR) of disease for individuals having 1 copy of the listed haplotype compared to having no copies of the particular HLA-DRB1~HLA-DQB1~SNP Class II haplotype (95% CI range in parenthesis). All haplotypes carrying the HLA-DRB1*15:01~HLA-DQB1*06:02~a1 Class II motif were excluded in this analysis. A Bonferroni correction for the number of haplotypes with 50 or more representations (146) would require a significance level of (p<3*E-4).
** Significance of the association between having 1 copy of the specific allele and the disease (MS) compared to having no copies. The p-values are expressed in scientific notation as powers of 10 (E); ns = not significant. With exception of c23 and c46, all observations with p<0.001 still showed a statistically significant effect even after adjustment for population stratification, geographic, stratification, and gender. Moreover, even c23 and c46 trended in this direction (p≈0.10)
§ Only the c1 haplotype had enough observations to explore the disease association for having two copies of an allele compared to having no copies of the HLA-DRB1*03:01~HLA-DQB1*02:01~a6 Class II haplotype. Thus, this OR was
For c1: OR [two copies] = 2.1 (1.5–2.9); p = 2.1*E-6
This effect was still statistically significant even after adjustment for population stratification (p = 3.13*E-6).
The other Class II haplotypes containing HLA-DRB1*03:01~HLA-DQB1*02:01~a6, combined, had an OR of:
OR [two copies] = 0.8 (0.1–3.4); p = ns
§§ This group of haplotypes is composed of those that also had a significant association with this disease. Most of these haplotypes seem to be protective and this protective effect remained significant (p<0.05) even after excluding all individuals who carried the HLA-DRB1*15:01~HLA-DQB1*06:02~a1 haplotype.