| Literature DB >> 35052430 |
Sean M Burnard1,2, Rodney A Lea1,2,3, Miles Benton4, David Eccles5, Daniel W Kennedy6, Jeannette Lechner-Scott2,7,8, Rodney J Scott1,9,10.
Abstract
Conventional genome-wide association studies (GWASs) of complex traits, such as Multiple Sclerosis (MS), are reliant on per-SNP p-values and are therefore heavily burdened by multiple testing correction. Thus, in order to detect more subtle alterations, ever increasing sample sizes are required, while ignoring potentially valuable information that is readily available in existing datasets. To overcome this, we used penalised regression incorporating elastic net with a stability selection method by iterative subsampling to detect the potential interaction of loci with MS risk. Through re-analysis of the ANZgene dataset (1617 cases and 1988 controls) and an IMSGC dataset as a replication cohort (1313 cases and 1458 controls), we identified new association signals for MS predisposition, including SNPs above and below conventional significance thresholds while targeting two natural killer receptor loci and the well-established HLA loci. For example, rs2844482 (98.1% iterations), otherwise ignored by conventional statistics (p = 0.673) in the same dataset, was independently strongly associated with MS in another GWAS that required more than 40 times the number of cases (~45 K). Further comparison of our hits to those present in a large-scale meta-analysis, confirmed that the majority of SNPs identified by the elastic net model reached conventional statistical GWAS thresholds (p < 5 × 10-8) in this much larger dataset. Moreover, we found that gene variants involved in oxidative stress, in addition to innate immunity, were associated with MS. Overall, this study highlights the benefit of using more advanced statistical methods to (re-)analyse subtle genetic variation among loci that have a biological basis for their contribution to disease risk.Entities:
Keywords: elastic net; genetic wide association study (GWAS); gene–gene interaction; human leukocyte antigen (HLA) complex; leukocyte receptor complex (LRC); multi-variate regression analysis; multiple sclerosis (MS); natural killer cells; natural killer gene complex (NKC); single nucleotide polymorphisms (SNPs)
Mesh:
Substances:
Year: 2021 PMID: 35052430 PMCID: PMC8774935 DOI: 10.3390/genes13010087
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
Figure 1Flow chart outlining the major steps within the analysis pipeline. Two MS GWAS have been re-analysed. (i) First, the HLA and NK receptor loci were extracted from the pre-imputed discovery cohort and both a conventional GWAS and elastic net analysis performed and compared. Haploview [35] was used to determine if the SNPs identified by elastic net were in disequilibrium. Secondly, both the discovery and replication cohort were imputed using the Michigan server with resulting SNPs subject to a stringent imputation quality control (QC) threshold (R2 < 0.8) and extraction of SNPs from the same HLA and NK receptor genetic boundaries. (ii) Then, an overlapping elastic net analysis was performed on SNPs in common between the discovery and replication cohort that met imputation QC threshold. (iii) Thirdly, an in-depth independent analysis on the imputed discovery cohort was performed to maximise coverage of high-quality imputed SNPs. Grey boxes indicate the stage of SNP extraction across the three regions, and purple boxes indicate the three different combinations of datasets and SNPs being analysed.
Cohort summary information for the discovery (ANZgene) and the replication dataset (merged from two IMSGC GWAS and the national blood service as controls. The number of SNPs passing QC is given for each of the three loci, human leukocyte antigen complex (HLA) (green), natural killer cell complex (NKC) (orange) and leukocyte receptor complex (LRC) (blue) from chromosomes 6, 12 and 19, respectively. The number of SNPs is provided for both cohorts, pre-imputation and restricted to common SNPs post- imputation (with a high imputation quality threshold of R2 = 0.8). Independent post-imputation analysis was restricted to the discovery cohort (ANZgene) due to the increased number of SNPs available pre-imputation compared to the merged replication dataset, resulting in a much greater yield of high-quality imputed SNPs (R2 = 0.8).
| Discovery Dataset (ANZgene) | Replication Dataset (IMSGC + NBS) | ||
|---|---|---|---|
| Cases | Samples (n) | 1617 | 1313 |
| F | 1172 | 994 | |
| M | 445 | 319 | |
| F:M | 2.6 | 3.1 | |
| Controls | Samples (n) | 1988 | 1458 |
| F | 1231 | 753 | |
| M | 757 | 705 | |
| F:M | 1.6 | 1.1 | |
| Pre-imputed | HLA | 1047 | 62 |
| NKC | 137 | 33 | |
| LRC | 122 | 44 | |
| Total | 1306 | 139 | |
| Post-imputation overlapping | HLA | 2359 | |
| NKC | 2872 | ||
| LRC | 520 | ||
| Total | 5751 | ||
| Post-imputation ANZgene only | HLA | 54,541 | N/A |
| NKC | 3790 | ||
| LRC | 1576 | ||
| Total | 59,907 | ||
Figure 2Conventional GWAS case vs. control analysis of the discovery dataset reveals no hits outside of the HLA loci. Manhattan plot for the HLA, NKC and LRC regions, using the log10 transformed P value from Fisher’s Exact testing against base position for the pre-imputed ANZgene dataset. The HLA loci contains SNPs reaching the gold standard GWAS Bonferroni correction threshold of 5 × 10−8 (red line), with two distinct peaks across the region. In contrast, neither the NKC nor the LRC loci contained SNPs that met the relaxed Bonferroni correction of 4 × 10−5 (blue line) accounting for the number of SNPs analysed. The SNP highlighted in green is the tag SNP for the drb15 haplotype.
Comparison of the five SNPs with the lowest p value (2sf) from each loci, using Fisher’s exact testing on the ANZgene dataset. When using Bonferroni correction only SNPs below 5 × 10−8 or 4 × 10−5 would be considered ‘significant’, representing the gold standard for GWAS or relaxed relative to the number of SNPs analysed, respectively. SNPs, in bold and with an adjacent asterisk, denote those that met the elastic model threshold (Table 3).
| HLA | NKC | LRC | ||||
|---|---|---|---|---|---|---|
| rsID | rsID | rid | ||||
|
|
| 1.83 × 10−61 |
| 9.91 × 10−4 |
| 3.29 × 10−3 |
|
| 1.30 × 10−59 |
| 1.08 × 10−3 |
| 0.0136 | |
|
| 7.59 × 10−50 | rs11052552 | 2.08 × 10−3 |
| 0.0153 | |
| rs3132946 | 1.83 × 10−48 |
| 2.27 × 10−3 | rs1671196 | 0.0269 | |
|
| 1.59 × 10−47 |
| 3.57 × 10−3 |
| 0.0270 | |
List of SNPs above 70% iterations identified by elastic net analysis on the pre-imputed discovery cohort compared to their corresponding P value from Fisher’s exact testing. The genetic consequence is given for each SNP identified and intergenic SNP placements represented as INT(gene1_gene2). SNPs identified in bold * were also highlighted in Table 2 as one of the five SNPs with the lowest p value for that region. Only four out of the five SNPs for each region (from Table 2) made it above the 70% threshold; the SNPs that did not reach bootstrapping threshold were rs3132946, rs11052552 and rs1671196 for the HLA (green), NKC (orange) and LRC (blue) region, respectively. The shades of colour relates to the elastic net model range of iterations (70–79, 80–89, 90–100).
| Chromosome/Loci | rsID | Iterations (%) | Gene: Genetic Consequence | |
|---|---|---|---|---|
| 6/HLA |
| 100 | 7.59 × 10−50 | HLA-DRA: 500B Downstream Variant |
| rs3117098 | 100 | 2.95 × 10−35 | HCG23: Non Coding Transcript Variant; LOC101929163: Intron Variant | |
|
| 100 | 1.59 × 10−47 | C6orf10: Missense Variant; LOC101929163: Intron Variant | |
| rs6903608 | 100 | 3.14 × 10−32 | INT(HLA-DRB9_HLA-DRB5) | |
|
| 100 | 1.83 × 10−61 | INT(HLA-DRB1_HLA-DQA1) | |
|
| 100 | 1.29 × 10−59 | INT(NOTCH4_TSPBP1-AS1) | |
| rs2854050 | 99.6 | 7.36 × 10−10 | NOTCH4: Intron Variant | |
| rs9277535 | 99.6 | 5.45 × 10−4 | HLA-DPB1: 3 Prime UTR Variant | |
| rs926070 | 99.2 | 2.74 × 10−36 | TSBP1-AS1: Intron Variant | |
| rs2281389 | 98.4 | 4.73 × 10−4 | HLA-DPA2: not reported | |
| rs2394160 | 98.4 | 3.00 × 10−12 | HLA-F: Intron Variant; HLA-F-AS1: Intron Variant | |
| rs2844482 | 98.1 | 0.673 | LTA: Intron Variant; LOC100287329: Intron Variant | |
| rs2647050 | 96.5 | 4.00 × 10−19 | INT(HLA-DQB1_MTC03P1) | |
| rs2856718 | 96.5 | 4.00 × 10−19 | INT(HLA-DQB1_MTC03P1) | |
| rs2395150 | 95.2 | 3.90 × 10−29 | C6orf10: Intron Variant; LOC101929163: Intron Variant | |
| rs1362126 | 94.3 | 7.31 × 10−13 | HLA-F: 2KB Upstream Variant | |
| rs3130299 | 94.3 | 7.53 × 10−11 | INT(NOTCH4_TSPBP1-AS1) | |
| rs2301271 | 94.1 | 2.35 × 10−23 | HLA-DQB2: Intron Variant | |
| rs1611285 | 93.6 | 6.58 × 10−6 | LOC105379663: Non Coding Transcript Variant | |
| rs7453920 | 92.9 | 2.40 × 10−23 | HLA-DQB2: Intron Variant | |
| rs2050190 | 92.7 | 2.43 × 10−9 | C6orf10: Intron Variant; LOC101929163: Intron Variant | |
| rs3819721 | 92.7 | 7.70 × 10−12 | TAP2: Intron Variant | |
| rs2284178 | 92.3 | 3.02 × 10−15 | HCP5: Non Coding Transcript Variant | |
| rs2051549 | 90.2 | 3.02 × 10−23 | HLA-DQB2: Intron Variant | |
| rs1077393 | 88.9 | 6.93 × 10−23 | BAG6: Intron Variant | |
| rs2647012 | 88.2 | 1.10 × 10−34 | INT(HLA-DQB1_MTC03P1) | |
| rs9275184 | 88.1 | 0.529 | INT(HLA-DQB1_MTC03P1) | |
| rs2395174 | 87 | 2.00 × 10−8 | INT(BTNL2_HLA-DRA) | |
| rs2394412 | 86.8 | 3.35 × 10−5 | LINC00243: Non Coding Transcript Variant | |
| rs2894046 | 86.8 | 3.35 × 10−5 | LINC00243: Non Coding Transcript Variant | |
| rs2975033 | 86.4 | 3.48 × 10−11 | LOC105375010: Intron Variant | |
| rs9277554 | 84.9 | 3.11 × 10−3 | HLA-DPB1: 3 Prime UTR Variant | |
| rs2848713 | 84.8 | 1.02 × 10−3 | INT(MICA_LINC01149) | |
| rs9296057 | 84.2 | 3.17 × 10−4 | LOC100294145: Non Coding Transcript Variant | |
| rs2517912 | 84 | 1.90 × 10−13 | INT(ZDHHC20P1_HLA-F) | |
| rs620202 | 83.4 | 2.00 × 10−7 | BRD2: Intron Variant | |
| rs2395349 | 82.8 | 2.37 × 10−5 | HLA-DPB2: Intron Variant | |
| rs2260000 | 82.7 | 1.25 × 10−19 | PRRC2A: Intron Variant | |
| rs6904029 | 82.6 | 3.75 × 10−11 | HCG9: Non Coding Transcript Variant | |
| rs2071653 | 82.3 | 1.10 × 10−11 | MOG: Intron Variant | |
| rs3117230 | 82.2 | 3.37 × 10−3 | INT(COL11A2PA1_HLA-DPB2) | |
| rs719653 | 81.4 | 5.24 × 10−26 | INT(HLA-DQB2_HLA-DOB) | |
| rs4151657 | 80.8 | 9.87 × 10−19 | CFB: Intron Variant | |
| rs2064478 | 79.2 | 3.69 × 10−3 | COL11A2PA1: not reported | |
| rs1035798 | 78.4 | 1.24 × 10−12 | AGER: Intron Variant | |
| rs3129882 | 78 | 3.58 × 10−23 | HLA-DRA: Intron Variant | |
| rs2395173 | 76.9 | 2.78 × 10−32 | INT(BTNL2_HLA-DRA) | |
| rs3135338 | 76 | 2.76 × 10−32 | INT(BTNL2_HLA-DRA) | |
| rs1611185 | 75.6 | 8.21 × 10−10 | HLA-P: not reported | |
| rs2299851 | 74.8 | 2.32 × 10−3 | MSH5: Intron Variant; MSH5-SAPCD1: Intron Variant | |
| rs1737046 | 74.2 | 2.77 × 10−10 | INT(LOC353010_HLA-V) | |
| rs6941112 | 72.2 | 2.51 × 10−18 | STK19: Intron Variant | |
| rs12665700 | 72 | 0.662 | MUC22: Missense Variant | |
| rs721394 | 71 | 0.417 | INT(HCG24_COL11A2) | |
| 12/NKC |
| 93.3 | 3.57 × 10−3 | KLRD1: Non Coding Transcript Variant |
| rs6488285 | 91.1 | 0.0259 | LOC101928100: Intron Variant | |
|
| 85.2 | 1.08 × 10−03 | CLEC2D: Synonymous Variant | |
|
| 82.7 | 9.91 × 10−4 | INT(CD69_KLRF1) | |
| rs10505741 | 79.9 | 0.0179 | CLEC2A: Intron Variant | |
| rs10844780 | 74.5 | 0.0103 | INT(CD69_KLRF1) | |
|
| 74 | 2.27 × 10−3 | INT(CLECL1_CD69) | |
| 19/LRC |
| 97.3 | 3.29 × 10−3 | LOC100421130 |
| rs6509868 | 82.5 | 0.0449 | INT(LAIR1_TTYH1) | |
| rs10411879 | 82 | 0.0706 | INT(LILRA1_LILRB1) | |
|
| 78.9 | 0.0153 | INT(LILRB2_LILRA5) | |
| rs11669029 | 77.6 | 0.0706 | INT(TARM1_OSCAR) | |
|
| 71.5 | 0.027 | INT(LILRA4_LAIR1) | |
| rs272411 | 70.9 | 0.0878 | LILRA1: Intron Variant | |
|
| 70.1 | 0.0136 | INT(LAIR1_TTYH1) | |
| rs2296371 | 70.1 | 0.185 | LILRP2: Non Coding Transcript Variant |
Figure 3Haplotype structure of the HLA SNPs identified by the elastic net model (Table 2) using haploview. The rsIDs are given at the top of the display, with the coloured rings representing the following range of iterations they were selected by the elastic net model: blue (70 ≤ 79%), orange (80 ≤ 89%), red (90 ≤ 99%) and red dotted (100%). For each SNP, their corresponding r2 value are given within each diamond (shown as 0–100, which is equivalent to 0.00–1.0) and red shading indicating strength of D’ between SNPs intersecting diagonally (see Figure S1 for the individual r2 and D’ plots). As expected, there is a complex underlying LD structure across the HLA region with 11 blocks of inherited SNPs predicted by haploview, consisting of 2–5 SNPs each. Of the 12 SNPs that were identified above 98% iterations from elastic net analysis, only rs9267992 and rs9271366 (black arrows meeting at the black circle) were determined to be in strong LD and co-inherited (r2 ≥ 0.7 and D’ ≥ 0.8). These two SNPs also flank a central set of SNPs (encompassing blocks 4, 5 and 6), both with a relatively high level of LD for all SNPs located between them (see supplemental notes for additional comments).
Summary table for all the identified SNPs with a combined FDR value ≤ 0.1 and their associated genetic regions from the combined analysis. For all SNPs that corresponded to a gene, the total number of SNPs identified and a representative lowest combined FDR value (see appendix for full list) and genetic consequences (with the corresponding number of SNPs for that genetic consequence) is given. For SNPs that mapped to an intergenic region, the closest gene upstream and downstream has been indicated by INT(Gene1_Gene2) with the total number of SNPs and lowest representative combined FDR value. SNPs identified within genes for each loci are highlighted; HLA (green), NKC (orange) and LRC (blue).
| CHR/Loci | Gene | Total # of SNPs | Lowest combined FDR Value | Genetic Consequence(s) |
|---|---|---|---|---|
|
| BRD2 | 1 | 0.0767 | Intron Variant (1) |
| C2 | 4 | 3.18 × 10−5 | Intron Variant (4) | |
| CFB | 1 | 1.38 × 10−3 | Intron Variant (1) | |
| GPX5 | 2 | 6.86 × 10−3 | Intron Variant (1); Non-coding transcript variant (1) | |
| GPX6 | 3 | 6.86 × 10−3 | Intron Variant (2); missense variant (1) | |
| HLA-DOB | 3 | 0.0133 | Intron Variant (1); 2KB Upstream Variant (2) | |
| KIFC1 | 1 | 0.0693 | Intron Variant (1) | |
| LTA | 2 | 0.0129 | Downstream Variant (1); Missense Variant (1) | |
| LOC100287329 | 2 | 0.012855556 | 2KB Upstream Variant (2) | |
| PHF1 | 1 | 0.032153333 | Intron Variant (1) | |
| SYNGAP1 | 2 | 0.069256 | Intron Variant (2) | |
| TAP1 | 1 | 4.05E-03 | Intron Variant (1) | |
| TAP2 | 4 | 1.87 × 10−4 | Intron Variant (3); synonymous Variant (1) | |
| TNF | 1 | 0.012855556 | 2K Upstream Variant (1) | |
| TNXB | 10 | 1.11 × 10−4 | Intron Variant (9); Missense Variant (1) | |
| WDR46 | 1 | 0.0254 | Intron Variant (1) | |
| INT(FKBPL_PPT2) | 4 | 5.56 × 10−7 | NA/Intergenic | |
| INT(HLA-DOA_HLA-DPA1) | 2 | 0.0118 | ||
| INT(HLA-DQB2_HLA-DOB) | 4 | 4.05 × 10−3 | ||
| INT(PPP1R2P1_ LOC100294145) | 2 | 4.05 × 10−3 | ||
| INT(TAP1_PPP1R2P1) | 1 | 1.47 × 10−5 | ||
| INT(ZBTB9_BAK1) | 2 | 0.0235 | ||
| INT(ZSCAN23_GPX6) | 1 | 0.022343889 | ||
|
| CLEC12B | 1 | 0.022577778 | Intron Variant (1) |
| LOC102724020 | 1 | 0.022577778 | Intron Variant (1) | |
| LOC112268091 | Intron Variant (1) | |||
| CLEC2A | 2 | 1.74 × 10−3 | Intron Variant (2) | |
| KLRF2 | 1 | 1.74 × 10−3 | Intron Variant (1) | |
| KLRA1P | 2 | 0.054753333 | Intron Variant (1); 2KB Upstream Variant | |
| KLRB1 | 1 | 0.093155556 | Intron Variant (1) | |
| KLRC4-KLRK1 readthrough | 1 | 0.083367778 | Intron Variant (1) | |
| KLRC4 | 1 | 0.083367778 | 2KB Upstream Variant (1) | |
| LINC02390 | 1 | 0.084108444 | Non-Coding Transcript Variant (1) | |
| LOC105369658 | 1 | 0.037733333 | Intron Variant (1) | |
| LOC374443, C-type lectin domain family 2 member D pseudogene | 2 | 0.012474 | Intron Variant (2) | |
| INT(KLRB1_CLEC2D) | 1 | 0.023585333 | NA/Intergenic | |
| INT(LINC02446_KLRA1P) | 2 | 0.0377 | ||
| INT(LOC408186_KLRB1) | 1 | 0.0286 | ||
|
| RPS9 | 3 | 9.78 × 10−4 | Intron Variant (3) |
| INT(LILRA2_LILRB1) | 2 | 0.0520 | NA/Intergenic |
Top hits from the functional Enrichment analysis by ToppGENE using the identified genes in the overlapping analysis (Table 3) and passing FDR (adjusted p < 0.05). The top five biological processes identified all belong to the same lineage of pathways, with leukocyte mediated immunity being a parent pathway of natural killer cell mediated cytotoxicity (numbered 1 to 4). See ssupplementary files ‘additional file S1’ for full results.
| GO (Biological Process) | ID | Hit Count in Query List | Hit Count in Genome | Hit in Query List | ||
|---|---|---|---|---|---|---|
| natural killer cell mediated cytotoxicity | GO:0042267 | 8.59 × 10−11 | 5.45 × 10−8 | 6 | 72 | KLRC4-KLRK1, TAP1, TAP2, KLRF2, CLEC12B, CLEC2A |
| natural killer cell mediated immunity | GO:0002228 | 1.11 × 10−10 | 5.45 × 10−8 | 6 | 75 | KLRC4-KLRK1, TAP1, TAP2, KLRF2, CLEC12B, CLEC2A |
| lymphocyte mediated immunity | GO:0002449 | 1.18 × 10−10 | 5.45 × 10−8 | 9 | 407 | C2, LTA, TNF, KLRC4-KLRK1, TAP1, TAP2, KLRF2, CLEC12B, CLEC2A |
| leukocyte mediated cytotoxicity | GO:0001909 | 3.63 × 10−9 | 1.25 × 10−6 | 6 | 133 | KLRC4-KLRK1, TAP1, TAP2, KLRF2, CLEC12B, CLEC2A |
| regulation of lymphocyte mediated immunity | GO:0002706 | 2.54 × 10−8 | 7.03 × 10−6 | 6 | 184 | LTA, TNF, KLRC4-KLRK1, TAP1, TAP2, CLEC12B |
| regulation of immune effector process | GO:0002697 | 3.06 × 10−8 | 7.05 × 10−6 | 8 | 527 | C2, LTA, TNF, KLRC4-KLRK1, TAP1, TAP2, CFB, CLEC12B |
| cell killing | GO:0001906 | 6.96 × 10−8 | 1.38 × 10−5 | 6 | 218 | KLRC4-KLRK1, TAP1, TAP2, KLRF2, CLEC12B, CLEC2A |
| GO (Molecular Function) | ||||||
| tapasin binding | GO:0046980 | 1.12 × 10−6 | 1.36 × 10−4 | 2 | 2 | TAP1, TAP2 |
| ABC-type peptide transporter activity | GO:0015440 | 6.68 × 10−6 | 2.04 × 10−4 | 2 | 4 | TAP1, TAP2 |
| ABC-type peptide antigen transporter activity | GO:0015433 | 6.68 × 10−6 | 2.04 × 10−4 | 2 | 4 | TAP1, TAP2 |
| TAP2 binding | GO:0046979 | 6.68 × 10−6 | 2.04 × 10−4 | 2 | 4 | TAP1, TAP2 |
| TAP1 binding | GO:0046978 | 1.11 × 10−5 | 2.65 × 10−4 | 2 | 5 | TAP1, TAP2 |
| carbohydrate binding | GO:0030246 | 1.31 × 10−5 | 2.65 × 10−4 | 5 | 295 | KLRC4-KLRK1, KLRB1, KLRF2, CLEC12B, CLEC2A |
| TAP binding | GO:0046977 | 3.11 × 10−5 | 5.42 × 10−4 | 2 | 8 | TAP1, TAP2 |
| MHC protein binding | GO:0042287 | 4.15 × 10−5 | 6.34 × 10−4 | 3 | 63 | HLA-DOB, TAP1, TAP2 |
| MHC class Ib protein binding | GO:0023029 | 8.63 × 10−5 | 1.17 × 10−3 | 2 | 13 | TAP1, TAP2 |
| ABC-type transporter activity | GO:0140359 | 2.54 × 10−4 | 2.82 × 10−3 | 2 | 22 | TAP1, TAP2 |
| glutathione peroxidase activity | GO:0004602 | 2.54 × 10−4 | 2.82 × 10−3 | 2 | 22 | GPX5, GPX6 |
| GO (Pathway) | ||||||
| Antigen processing and presentation | 83074 (KEGG) | 9.58 × 10−8 | 2.08 × 10−5 | 5 | 77 | TNF, HLA-DOB, TAP1, TAP2, KLRC4 |
| Herpes simplex infection | 377873 (KEGG) | 7.45 × 10−6 | 8.09 × 10−4 | 5 | 185 | LTA, TNF, HLA-DOB, TAP1, TAP2 |
| Type I diabetes mellitus | 83095 (Reactome) | 3.74 × 10−5 | 2.41 × 10−3 | 3 | 43 | LTA, TNF, HLA-DOB |
| Activation of C3 and C5 | 1269248 (KEGG) | 4.73 × 10−5 | 2.41 × 10−3 | 2 | 7 | C2, CFB |
| Malaria | 152665 (KEGG) | 5.55 × 10−5 | 2.41 × 10−3 | 3 | 49 | TNF, KLRC4-KLRK1, KLRB1 |
| Staphylococcus aureus infection | 172846 (KEGG) | 8.30 × 10−5 | 3.00 × 10−3 | 3 | 56 | C2, HLA-DOB, CFB |
| Antigen Presentation: Folding, assembly and peptide loading of class I MHC | 1269194 (Reactome) | 6.64 × 10−4 | 2.06 × 10−2 | 2 | 25 | TAP1, TAP2 |
| Regulation of Complement cascade | 1269250 (Reactome) | 7.76 × 10−4 | 2.10 × 10−2 | 2 | 27 | C2, CFB |
| Asthma | 83120 (KEGG) | 1.02 × 10−3 | 2.10 × 10−2 | 2 | 31 | TNF, HLA-DOB |
| Initial triggering of complement | 1269242 (Reactome) | 1.02 × 10−3 | 2.10 × 10−2 | 2 | 31 | C2, CFB |
| Systemic lupus erythematosus | 83122 (KEGG) | 1.06 × 10−3 | 2.10 × 10−2 | 3 | 133 | C2, TNF, HLA-DOB |
| Primary immunodeficiency | 83125 (KEGG) | 1.46 × 10−3 | 2.35 × 10−2 | 2 | 37 | TAP1, TAP2 |
| Detoxification of Reactive Oxygen Species | 1270420 (Reactome) | 1.54 × 10−3 | 2.35 × 10−2 | 2 | 38 | GPX5, GPX6 |
Figure 4Circos plot for the imputed SNPs from the ANZgene dataset, extracted from the HLA (green), NKC (orange) and LRC (blue) loci on chromosomes 6, 12 and 19, respectively. Each dot represents a single SNP plotted clockwise in bp order (see Methods for region boundaries). This plot compares one-at-a-time SNP evaluation using P values against our elastic net model for the exact same set of SNPs. The innermost ring is the -log10 P values from Fisher’s Exact testing (a circularised Manhattan plot). The values range from 0 to the outer edge representing the lowest P value (65.3). SNPs in blue represent those that were never selected by elastic net with bootstrapping (0 out of 3000 iterations), while orange signifies SNPs that were selected by elastic net with stabilisation by iterative subsampling (≥1 out of 3000 iterations). The outer ring with orange dots, represents the percentage of iterations that each SNP was selected by elastic net with 3000 iterations, ranging from 0 to 100 at the outermost edge. The middle ring of blue SNPs indicates the position of SNPs that were never selected by elastic net. Each dot representing the elastic net result is aligned at the same degree as the dots representing the -log10 P value in the innermost circle. The outermost ring with blue bars represents the combined contribution of all the iterations of each SNP (from elastic net with bootstrapping) within each genomic boundary relative to the total number of iterations, given as a percentage. The scale ranges from zero to 4.17%, which is the largest value representing the intergenic region between HLA-DQB1 and HLA-DQA2.
Epistasis results on the panel of SNPs identified by the combined FDR results (Table S2) performed using a merged discovery and replication dataset. Epistatic interactions were only considered for SNPs in different loci for HLA (green), NKC (orange) and LRC (blue), and not within genomic regions.
| SNP1 | SNP2 | Epistasis Interaction | |||
|---|---|---|---|---|---|
| Chr:bp | Genomic Location | Chr:bp | Genomic Location | OR_INT | P |
| 6:32829320 | INT(TAP1_PPP1R2P1) | 12:10750157 | KLRA1P | 0.48 | 0.000255 |
| 6:32832786 | INT(TAP1_PPP1R2P1) | 12:10750157 | KLRA1P | 0.487 | 0.000685 |
| 6:32102305 | INT(FKBPL_PPT2) | 19:55116651 | INT(LILRA1_LILRB1) | 0.632 | 0.00306 |
| 6:32112626 | INT(FKBPL_PPT2) | 19:55116651 | INT(LILRA1_LILRB1) | 0.639 | 0.00392 |
| 6:32069806 | TNXB | 19:55116651 | INT(LILRA1_LILRB1) | 0.636 | 0.00418 |
| 6:31916400 | CFB | 12:9742327 | INT(LOC408186_KLRB1) | 1.76 | 0.00526 |
| 6:32109165 | INT(FKBPL_PPT2) | 19:55116651 | INT(LILRA1_LILRB1) | 0.646 | 0.00561 |
| 6:32026257 | TNXB | 19:55116651 | INT(LILRA1_LILRB1) | 0.648 | 0.00601 |
| 6:31879158 | C2 | 19:55116651 | INT(LILRA1_LILRB1) | 0.662 | 0.0105 |
| 6:31888367 | C2 | 19:55116651 | INT(LILRA1_LILRB1) | 0.667 | 0.012 |
| 6:31884823 | C2 | 19:55116651 | INT(LILRA1_LILRB1) | 0.672 | 0.0137 |
| 6:32852448 | INT(PPP1R2P1_LOC100294145) | 12:9742327 | INT(LOC408186_KLRB1) | 2.8 | 0.0162 |
| 6:31542308 | TNF; LTA; LOC100287329 | 12:9742327 | INT(LOC408186_KLRB1) | 0.561 | 0.018 |
| 6:32069806 | TNXB | 12:10750157 | KLRA1P | 0.679 | 0.022 |
| 6:32026257 | TNXB | 12:10750157 | KLRA1P | 0.682 | 0.0239 |
| 6:32112626 | INT(FKBPL_PPT2) | 12:10044542 | CLEC2A; KLRF2 | 6.42 | 0.0287 |
| 6:32109165 | INT(FKBPL_PPT2) | 12:10044542 | CLEC2A; KLRF2 | 6.23 | 0.0314 |
| 6:32938199 | BRD2 | 12:10169041 | CLEC12B; LOC102724020; LOC112268091 | 0.722 | 0.0423 |
| 6:32057972 | TNXB | 12:10700014 | LOC105369658 | 0.142 | 0.0432 |
| 6:32010272 | TNXB | 12:10700014 | LOC105369658 | 0.142 | 0.0434 |
| 6:32021838 | TNXB | 12:10700014 | LOC105369658 | 0.142 | 0.0434 |
| 6:32030284 | TNXB | 12:10700014 | LOC105369658 | 0.142 | 0.0435 |
| 6:32019746 | TNXB | 12:10700014 | LOC105369658 | 0.142 | 0.0436 |
| 6:31540556 | LTA; LOC100287329 | 12:9742327 | INT(LOC408186_KLRB1) | 0.629 | 0.0467 |
| 6:31916400 | CFB | 19:55116651 | INT(LILRA1_LILRB1) | 0.785 | 0.0472 |
Figure 5The majority of SNP ‘hits’ identified by the elastic net model, also present in the IMSGC meta-analysis, are found to be increasingly ‘significant’ by conventional methods in the largest MS dataset. (A) Around 25% of the hits identified by both the ‘combined FDR analysis’ (Table 4 and Table S2) and the pre-imputed discovery dataset were present in the IMSGC meta-analysis. (B) For the SNPs that were present in the meta-analysis, the majority were found to have P value below the conventionally adopted GWAS threshold (p < 5 × 10−8) in both analyses. (C) Comparison of the 13 SNPs found to reach GWAS level significance in the IMSGC meta-analysis with those identified by the elastic net model in the discovery dataset, confirmed all hits with increasing significance in the meta-analysis compared to the discovery p value. This also included at least three SNPs identified by the elastic net model with p values not reaching GWAS significance threshold in the discovery dataset (denoted with an Asterisk *).