Literature DB >> 20190752

Multiple common variants for celiac disease influencing immune gene expression.

Patrick C A Dubois1, Gosia Trynka, Lude Franke, Karen A Hunt, Jihane Romanos, Alessandra Curtotti, Alexandra Zhernakova, Graham A R Heap, Róza Adány, Arpo Aromaa, Maria Teresa Bardella, Leonard H van den Berg, Nicholas A Bockett, Emilio G de la Concha, Bárbara Dema, Rudolf S N Fehrmann, Miguel Fernández-Arquero, Szilvia Fiatal, Elvira Grandone, Peter M Green, Harry J M Groen, Rhian Gwilliam, Roderick H J Houwen, Sarah E Hunt, Katri Kaukinen, Dermot Kelleher, Ilma Korponay-Szabo, Kalle Kurppa, Padraic MacMathuna, Markku Mäki, Maria Cristina Mazzilli, Owen T McCann, M Luisa Mearin, Charles A Mein, Muddassar M Mirza, Vanisha Mistry, Barbara Mora, Katherine I Morley, Chris J Mulder, Joseph A Murray, Concepción Núñez, Elvira Oosterom, Roel A Ophoff, Isabel Polanco, Leena Peltonen, Mathieu Platteel, Anna Rybak, Veikko Salomaa, Joachim J Schweizer, Maria Pia Sperandeo, Greetje J Tack, Graham Turner, Jan H Veldink, Wieke H M Verbeek, Rinse K Weersma, Victorien M Wolters, Elena Urcelay, Bozena Cukrowska, Luigi Greco, Susan L Neuhausen, Ross McManus, Donatella Barisani, Panos Deloukas, Jeffrey C Barrett, Paivi Saavalainen, Cisca Wijmenga, David A van Heel.   

Abstract

We performed a second-generation genome-wide association study of 4,533 individuals with celiac disease (cases) and 10,750 control subjects. We genotyped 113 selected SNPs with P(GWAS) < 10(-4) and 18 SNPs from 14 known loci in a further 4,918 cases and 5,684 controls. Variants from 13 new regions reached genome-wide significance (P(combined) < 5 x 10(-8)); most contain genes with immune functions (BACH2, CCR4, CD80, CIITA-SOCS1-CLEC16A, ICOSLG and ZMIZ1), with ETS1, RUNX3, THEMIS and TNFRSF14 having key roles in thymic T-cell selection. There was evidence to suggest associations for a further 13 regions. In an expression quantitative trait meta-analysis of 1,469 whole blood samples, 20 of 38 (52.6%) tested loci had celiac risk variants correlated (P < 0.0028, FDR 5%) with cis gene expression.

Entities:  

Mesh:

Year:  2010        PMID: 20190752      PMCID: PMC2847618          DOI: 10.1038/ng.543

Source DB:  PubMed          Journal:  Nat Genet        ISSN: 1061-4036            Impact factor:   38.330


Celiac disease is a common heritable chronic inflammatory condition of the small intestine induced by dietary wheat, rye and barley, as well as other unidentified environmental factors, in susceptible individuals. Specific HLA-DQA1 and HLA-DQB1 risk alleles are necessary, but not sufficient, for disease development1,2. The well defined role of HLA-DQ heterodimers, encoded by these alleles, is to present cereal peptides to CD4+ T cells, activating an inflammatory immune response in the intestine. A single genome wide association study (GWAS) has been performed in celiac disease, which identified the IL2/IL21 risk locus1. Subsequent studies probing the GWAS information in greater depth have identified a further 12 risk regions. Most of these regions contain a candidate gene functional in the immune system, although only in the case of HLA-DQA1 and HLA-DQB1 have the causal variants been established3-5. Many of the known celiac loci overlap with other immune-related diseases6. In order to identify additional risk variants, particularly of smaller effect size, we performed a second-generation GWAS using over six times as many samples as the previous GWAS and a denser genome-wide SNP set. We followed up promising findings in a large collection of independent samples. The GWAS included five European celiac disease case and control sample collections including the previously reported celiac disease dataset1. We performed stringent data quality control (Online Methods), including calling genotypes using a custom algorithm on both large sample sets, and where possible cases and controls together (Online Methods). We tested 292,387 non-HLA SNPs from the Illumina Hap300 marker set for association in 4,533 celiac disease cases and 10,750 controls of European descent (Table 1). A further 231,362 additional non-HLA markers from the Illumina Hap550 marker set were tested for association in a subset of 3,796 celiac disease cases and 8,154 controls. All markers were from autosomes or the X chromosome. Genotype call rates were >99.9% in both datasets. The overdispersion factor of association test statistics, λGC=1.12, was similar to that observed in other GWAS of this sample size7,8. Findings were not substantially altered by imputation of missing genotypes for 737 celiac disease cases genotyped on the Hap300 BeadChip and corresponding controls (Table 1, collection 1). Here we present results for directly genotyped samples, as around half the additional Hap550 markers cannot be accurately imputed from Hap300 data9 (including the new ETS1 locus finding in this study). Results for the top 1000 markers are available in Supplementary Data 1, but because of concerns regarding identity detection of individuals10, results for all markers are available only on request to the corresponding author.
Table 1

Sample collections and genotyping platforms

CollectionCountryCeliac disease casesControls

Sample size(pre-QC)aSample size(post-QC)bPlatformcSample size(pre-QC)aSample size(post-QC)bPlatformc
Stage 1: Genome wide association
1efUK778737Illumina Hap300v1-12,596j2,596Illumina Hap550-2v3
2egUK1,9221,849Illumina 670-QuadCustom_v15,069j4,936Illumina 1.2M-DuoCustom_v1
3eFinland674647Illumina 670-QuadCustom_v11,839j1,829Illumina 610-Quad
4hNetherlands876803Illumina 670-QuadCustom_v1960846Illumina 670-QuadCustom_v1
5eItaly541497Illumina 670-QuadCustom_v1580543Illumina 670-QuadCustom_v1
Analysis of Hap300 markers 4,533 10,750
Analysis of additional Hap550markers 3,796 8,154
Stage 2: Follow-up
6USA987973Illumina GoldenGate615555Illumina GoldenGate
7Hungary979965Illumina GoldenGate1,1261,067Illumina GoldenGate
8iIreland653597Illumina GoldenGate1,4991,456Illumina GoldenGate
9Poland599564Illumina GoldenGate745716Illumina GoldenGate
10Spain558550Illumina GoldenGate465433Illumina GoldenGate
11eItaly1,0561,010Illumina GoldenGate864804Illumina GoldenGate
12eFinland270259Illumina GoldenGate653j653Illumina 610-Quadd
Subtotal 4,918 5,684
Analysis of Hap300 markers, andfollow-up (91 SNPs) 9,451 16,434
Analysis of additional Hap550markers, and follow-up (40 SNPs) 8,714 13,838

Sample numbers attempted for genotyping, before any quality control (QC) steps were applied.

Sample numbers after all quality control (QC) steps (used in the association analysis).

All platforms contain a common set of Hap300 markers; the Hap550, 610-Quad, 670-Quad and 1.2M contain a common set of Hap550 markers.

Finnish stage 2 controls were individuals within the Finrisk collection for whom Illumina 610-Quad genotype data became available after the completion of stage 1.

As an additional quality control step, we performed case-case and control-control comparisons for collection 1 versus 2, and collection 3 versus 12, for the 40 SNPs in Table 2 and observed no markers with P<0.01. We did observe (as expected) differences for collection 5 versus 11, from Northern and Southern Italy, respectively.

All 737 post-QC cases reported in a previous GWAS1.

690 of the post-QC cases and 1150 of the post-QC controls were included in previous GWAS follow-up studies22,32.

498 of the post-QC cases and 767 of the post-QC controls were included in previous GWAS follow-up studies22,32.

352 of the post-QC cases and 921 of the post-QC controls were included in previous GWAS follow-up studies22,32.

Some of these data were generated elsewhere, and some prior quality control steps (information not available) had been applied.

For follow-up, we first inspected genotype clouds for the 417 non-HLA SNPs meeting PGWAS<10−4, being aware that top GWAS association signals may be enriched for genotyping artefact, and excluded 22 SNPs from further analysis using a low threshold for possible bias. We selected SNPs from 113 loci for replication. Markers that passed design and genotyping quality control included: a) 18 SNPs from all 14 previously identified celiac disease risk loci (including a tag SNP for the major celiac disease associated HLA-DQ2.5cis haplotype1); b) 13 SNPs from all 7 novel regions with PGWAS<5×10−7; c) 86 SNPs from 59 of 68 novel regions with 5×10−7> PGWAS >5×10−5 in stage 1; d) 14 SNPs from 14 of 30 novel regions with 5×10−5> PGWAS >10−4 in stage 1 (for this last category, we mostly chose regions with immune system genes). Two SNPs were selected per region for: regions with stronger association; regions with possible multiple independent associations; and/or containing genes of obvious biological interest. We successfully genotyped 131 SNPs in 7 independent follow-up cohorts comprising 4,918 celiac disease cases and 5,684 controls of European descent. Genotype call rates were >99.9% in each collection. Primary association analyses of the combined GWAS and follow-up data were performed with a two-sided 2×2×12 Cochran-Mantel-Haenszel test.

RESULTS

Celiac disease risk variants

The HLA locus and all 13 other previously reported celiac disease risk loci showed evidence for association at a genome wide significance threshold (Pcombined<5×10−8) (Table 2). We note that some loci were previously reported using less stringent criteria (e.g. the P<5×10−7 recommended by the 2007 WTCCC study11), but that in the current, much larger sample set, all known loci meet recently proposed P<5×10−8 thresholds12,13.
Table 2

Genomic regions with the strongest association signals for celiac disease

ChrPosition(bp)SNPLD blockab(Mb)MinoralleleMinorallelefreqcPGWAS4533 cases,10750controlsPfollow-up4918 cases,5684controlsPcombined9451 cases,16434controlsOdds ratioc[95% CI]MultipleindependentassociationsignalsdRefRefSeqGenesin LDblockGenes of Interestand GRAIL annotatione
Previously reported risk variants
1190803436rs2816316190.73-190.81C0.1601.45 × 10−121.56 × 10−62.20 × 10−170.80 [0.76-0.84] 22 1 RGS1
261040333rs1300346460.78-61.74G0.4014.92 × 10−81.57 × 10−63.71 × 10−131.15 [1.11-1.20]yes 32 8 REL, AHSA2
2102437000rs917997102.22-102.57A0.2365.97 × 10−157.83 × 10−41.11 × 10−151.19 [1.14-1.25] 22 5 IL18RAP, IL18R1, IL1RL1, IL1RL2
2181704290rs13010713181.50-181.97G0.4482.02 × 10−83.21 × 10−44.74 × 10−111.13 [1.09-1.18] 33 1 ITGA4, UBE2E3
2204510823rs4675374204.40-204.52A0.2238.80 × 10−84.94 × 10−35.79 × 10−91.14 [1.09-1.19] 17 2 CTLA4, ICOS, CD28
346210205rs1309891145.90-46.57A0.0972.53 × 10−111.96 × 10−73.26 × 10−171.30 [1.23-1.39]yes 22 11 CCR1, CCR2, CCRL2, CCR3, CCR5, CCR9
3161147744rs17810546161.07-161.23G0.1254.56 × 10−189.57 × 10−123.98 × 10−281.36 [1.29-1.44]yes 22 1 IL12A
3189595248rs1464510189.55-189.62A0.4859.49 × 10−243.63 × 10−182.98 × 10−401.29 [1.25-1.34] 22 1 LPP
4123334952rs13151961123.19-123.78G0.1426.31 × 10−184.45 × 10−112.18 × 10−270.74 [0.70-0.78] 1 4 IL2, IL21
632713862rs2187668geneidentifiedA0.258<10−50<10−50<10−506.23 [5.95-6.52](yes)1,36 HLA-DQA1, HLA-DQB1
6138014761rs2327832137.92-138.17G0.2161.41 × 10−141.97 × 10−64.46 × 10−191.23 [1.17-1.28] 32 0 TNFAIP3
6159385965rs1738074159.24-159.45A0.4343.14 × 10−81.56 × 10−82.94 × 10−151.16 [1.12-1.21] 22 2 TAGAP
12110492139rs653178110.19-111.51G0.4956.03 × 10−141.47 × 10−87.15 × 10−211.20 [1.15-1.24] 22 13 SH2B3
1812799340rs189321712.73-12.91G0.1655.52 × 10−71.04 × 10−42.52 × 10−101.17 [1.12-1.23] 17 1 PTPN2
New loci, genome-wide significant evidence (Pcombined <5 × 10−8)
12516606rs37488162.40-2.78G0.3394.93 ×10−71.17 × 10−33.28 × 10−90.89 [0.85-0.92]4 TNFRSF14, MMEL1
125176163rs1090312225.11-25.18A0.4803.21 × 10−58.44 × 10−71.73 × 10−100.89 [0.85-0.92]1 RUNX3
1199158760rs296547199.12-199.31A0.3576.46 × 10−51.34 × 10−54.11 × 10−90.89 [0.86-0.92]2 ?
268452459rs17035378f68.39-68.54G0.2781.34 × 10−51.41 × 10−47.79 × 10−90.88 [0.84-0.92]2 PLEK
332990473rs13314993f32.90-33.06C0.4646.87 × 10−61.09 × 10−43.27 × 10−91.13 [1.08-1.17]2 CCR4
3120601486rs11712165f120.59-120.78C0.3945.40 × 10−71.72 × 10−38.03 × 10−91.13 [1.08-1.17]5 CD80, KTELC1
690983333rs1080642590.86-91.10A0.3979.46 × 10−69.25 × 10−63.89 × 10−101.13 [1.09-1.17]1 BACH2, MAP3K7
6128320491rs802734127.99-128.38G0.3111.36 × 10−61.70 × 10−92.62 × 10−141.17 [1.12-1.22]yes2 PTPRK, THEMIS
8129333771rs9792269129.21-129.37G0.2388.14 × 10−61.00 × 10−43.28 × 10−90.88 [0.84-0.91]0 ?
1080728033rs125055280.69-80.76G0.4665.80 × 10−81.81 × 10−39.09 × 10−100.89 [0.86-0.92]1 ZMIZ1
11127886184rs11221332f127.84-127.99A0.2374.74 × 10−119.98 × 10−75.28 × 10−161.21 [1.16-1.27]yes1 ETS1
1611311394rs1292882211.22-11.39A0.1611.07 × 10−57.59 × 10−43.12 × 10−80.86 [0.82-0.91]4 CIITA, SOCS1, CLEC16A
2144471849rs481938844.42-44.47A0.2803.42 × 10−51.66 × 10−52.46 × 10−90.88 [0.84-0.92]2 ICOSLG
New loci, suggestive evidence (either A. 10−6>Pcombined>5×10−8 and/or B. PGWAS<10−4 and Pfollow-up<0.01)
17969259rs127276427.84-8.13A0.1853.06 × 10−58.21 × 10−49.11 × 10−81.14 [1.09-1.20]4 PARK7, TNFRSF9
161564451rs669176861.52-61.62G0.3782.63 × 10−51.16 × 10−31.19 × 10−70.90 [0.87-0.94]1 NFIA
1165678008rs864537165.43-165.71G0.3911.01 × 10−79.25 × 10−23.80 × 10−70.91 [0.87-0.94]1 CD247
1170977623rs859637170.87-171.20A0.4868.15 × 10−55.68 × 10−31.75 × 10−61.10 [1.06-1.14]1 FASLG, TNFSF18, TNFSF4
369335589rs6806528f69.27-69.37A0.0974.84 × 10−57.66 × 10−41.46 × 10−71.19 [1.12-1.27]1 FRMD4B
3170974795rs10936599170.84-171.09A0.2522.99 × 10−76.63 × 10−24.57 × 10−71.12 [1.07-1.16]3 ?
6328546rs1033180g0.32-0.40A0.0809.14 × 10−61.48 × 10−35.58 × 10−81.21 [1.13-1.29]yes1 IRF4 g
737341035rs697449137.32-37.41A0.1701.37 × 10−52.63 × 10−31.56 × 10−71.14 [1.09-1.20]1 ELMO1
1349733716rs276205149.63-49.96A0.1843.35 × 10−55.06 × 10−36.64 × 10−71.13 [1.08-1.18]0 ?
1468347957rs489926068.24-68.39A0.2634.55 × 10−52.21 × 10−33.92 × 10−71.12 [1.07-1.16]2 ZFP36L1
1742220599rs207440441.40-42.25C0.2505.03 × 10−55.96 × 10−31.23 × 10−60.90 [0.86-0.94]10 ?
2220312892rs229842820.14-20.35A0.2012.49 × 10−74.13 × 10−21.84 × 10−71.13 [1.08-1.19]6 UBE2L3, YDJC
X12881445rs597978512.82-12.93G0.2636.32 × 10−62.18 × 10−36.36 × 10−80.88 [0.84-0.92]1 TLR7, TLR8

The most significantly associated SNP from each region is shown.

LD regions were defined by extending 0.1 cM to the left and right of the focal SNP as defined by the HapMap3 recombination map. All chromosomal positions are based on NCBI build-36 coordinates.

Minor allele in all samples in the combined dataset, odds ratios (shown for combined dataset) defined with respect to the minor allele in all controls.

Evidence from logistic regression at a genome-wide significant or suggestive level of significance after conditioning on other associated SNPs (see Supplementary Table 2). HLA region not tested, but previously known.

Selected named genes within or adjacent to the same LD block as the associated SNPs, causality is not proven. In particular, other genes and other causal mechanisms may exist. Gene names underlined are identified from GRAIL15,16 analysis (Online Methods) with Ptext<0.01.

These markers were present on the Hap550 but not Hap300 SNP sets, and are not genotyped for 737 cases and 2596 controls in the stage I GWAS, and combined dataset analyses. Only minor changes in P values were observed when these genotypes were imputed and included in analysis.

The IRF4 region (specifically rs9738805, r2=0.08 with rs1033180 in HapMap CEU) was previously identified as showing strong geographical differentiation11. Association with coeliac disease was still observed after correction for population stratification using either a structured association approach34 (corrected PGWAS=5.16×10−6 , 478×2×2 CMH test) or principal components correction (uncorrected PGWAS=7.05 ×10−6, corrected PGWAS =2.28 ×10−5, Cochran-Armitage trend tests combined using weighted Z scores) (Online Methods). However, definitive exclusion of population stratification would require family based association studies.

We identified 13 novel risk regions with genome-wide significant evidence (Pcombined <5×10−8) of association, including regions containing the BACH2, CCR4, CD80, CIITA/SOCS1/CLEC16A, ETS1, ICOSLG, RUNX3, THEMIS, TNFRSF14, and ZMIZ1 genes which are of obvious immunological function (Table 2). A further 13 regions met ‘suggestive’ criteria for association (either 10−6> Pcombined >5×10−8 and/or PGWAS<10−4 and Pfollowup<0.01). These regions also contain multiple genes of obvious immunological function, including CD247, FASLG/TNFSF18/TNFSF4, IRF4, TLR7/TLR8, TNFRSF9 and YDJC. Six of the 39 non-HLA regions show evidence for the presence of multiple independently associated variants in a conditional logistic regression analysis (Supplementary Table 2). We tested the 40 SNPs with the strongest association (Table 2) from each of the known genome-wide significant, new genome-wide significant, and new suggestive loci for evidence of heterogeneity across the 12 collections studied. Only the HLA region was significant (Breslow-Day test P<0.05 / 40 tests, rs2187668 P=4.8×10−8) which is consistent with the well described North-South gradient in HLA allele frequency in European populations, and more specifically for HLA-DQ in celiac disease14. We observed no evidence for interaction between each of the 26 genome-wide significant non-HLA loci, which is consistent with what has been reported for complex diseases so far. However, we did observe weak evidence for lower effect sizes at non-HLA loci in high risk HLA-DQ2.5cis homozygotes, similar to what has been previously observed in type 1 diabetes7. To obtain more insight into the functional relatedness of the celiac loci, we applied GRAIL, a statistical tool that utilizes text mining of PubMed abstracts to annotate candidate genes from loci associated with common disease risk15,16. In order to assess the sensitivity of this tool (using known loci as a positive control), we first performed a ‘leave-one-out’ analysis of the 27 genome-wide significant celiac disease loci (including HLA-DQ). GRAIL scores of Ptext<0.01 were obtained for 12 of the 27 loci (44% sensitivity, Table 2). Factors that limit the sensitivity of GRAIL include biological pathways being both known (a 2006 dataset is used to avoid GWAS era studies), and published in the literature. We then applied GRAIL analysis, using the 27 known regions as a seed, to all 49 regions (49 SNPs) with 10−3 >Pcombined >5×10−8 and obtained GRAIL Ptext <0.01 for 9 regions (18.4%). As a control, only 5.5% (279 of 5033) of randomly selected Hap550 SNPs reached this threshold. In addition to the five ‘suggestive’ loci shown in Table 2, GRAIL annotated four further interesting gene regions of lower significance in the combined association results: rs944141/PDCD1LG2 (Pcombined=4.4×10−6), rs976881/TNFRSF8(Pcombined =2.1×10−4), rs4682103/CD200/BTLA(Pcombined=6.8×10−6) and rs4919611/NFKB2 (Pcombined=6.1×10−5). There appeared to be further enrichment for genes of immunological interest which are not GRAIL annotated in the 10−3>Pcombined>5×10−8 significance window, including rs3828599/TNIP1 (Pcombined=1.55×10−4), rs8027604/PTPN9 (Pcombined=1.4×10−6), rs944141/CD274 (Pcombined=4.4×10−6). Some of these findings, for which neither genome-wide significant nor suggestive association is achieved, are likely to comprise part of a longer tail of disease predisposing common variants, of weaker effect sizes. Definitive assessment of these biologically plausible regions would require genotyping and association studies using much larger sample collections than the present study. We previously showed considerable overlap between celiac disease and type 1 diabetes risk loci17, as well as celiac disease and rheumatoid arthritis18, and more generally, there is now substantial evidence for shared risk loci between the common chronic immune mediated diseases6. To update these observations, we searched ‘A Catalog of Published Genome Wide Association Studies’ (18 Nov 2009)19 and the HuGE database20. We found some evidence (requiring a published association report of P<1×10−5) of shared loci with at least one other inflammatory or immune mediated disease for 18 of the current 27 genome-wide significant celiac risk regions. We defined shared regions as the broad LD block, however different SNPs are often reported in different diseases, and at only three of the 18 shared regions are associations across all diseases with the same SNP or a proxy SNP in r2>0.8 in HapMap CEU. Currently 9 regions appear celiac disease specific and may reflect distinctive disease biology including the regions containing rs296547 and rs9792269, and the regions around CCR4, CD80, ITGA4, LPP, PLEK, RUNX3 and THEMIS. In fact, locus sharing between diseases is probably greater, due to both stochastic variation in results from sample size limitations, and regions with a genuinely stronger effect size in one disease and weaker effect size in another. Genetic variation in ETS1 has recently been reported to be associated with systemic lupus erythematosus (SLE) in the Chinese population, although is not associated with SLE in European populations21. The most strongly associated celiac (European population) SNP rs11221332 and the most strongly associated SLE (Chinese population) SNP rs6590330 map 70kb apart. Inspection of the HapMap phase II data shows broadly similar linkage disequilibrium patterns between Chinese (CHB) and European (CEU) populations in this region, with the two associated SNPs in separate non-adjacent linkage disequilibrium blocks. Thus distinct common variants within the same gene are predisposing to different autoimmune diseases across different ethnic groups.

Function of celiac risk variants

Celiac risk variants in the HLA alter protein structure and function4. However we identified only four non-synonymous SNPs with evidence for celiac disease association (PGWAS<10−4) from the other 26 genome-wide significant associated regions (rs3748816/MMEL1, rs3816281/PLEK, rs196432/RUNX3, rs3184504/SH2B3). Although comprehensive regional resequencing is required to test the possibility that coding variants contribute to the observed association signals, more subtle effects of genetic variation on gene expression are the more likely major functional mechanism for complex disease genes. With this in mind, we performed a meta-analysis of new and published genome-wide expression quantitative trait loci (eQTL) datasets comprising 1,469 human whole blood (PAXgene) samples reflecting primary leucocyte gene expression. We applied a new method, Transcriptional Components, to remove a substantial proportion of inter-individual non-genetic expression variation and performed eQTL meta-analysis on the residual expression variation (Online Methods). We assessed 38 of the 39 genome-wide significant and suggestive celiac disease associated non-HLA loci (Table 2) for cis expression - genotype correlations. We tested the SNP with the strongest association from each region. However for five regions the most associated SNP was not genotyped in the eQTL samples (Hap300 data), instead for four of these we tested a proxy SNP (r2>0.5 in HapMap CEU). In addition, for six loci showing evidence of multiple independent associations in conditional regression analyses, we tested a second SNP showing independent celiac disease association for eQTL analysis. In total we assessed 44 independent non-HLA SNP associations in peripheral whole blood samples genotyped on the Illumina Hap300 BeadChip and either Illumina Ref8 or HT12 expression arrays, correlating each SNP with data from gene-probes mapping within a 1Mb window. We identified significant (Spearman P<0.0028, corresponding to 5% false discovery rate) eQTLs at 20 of 38 (52.6%) non-HLA celiac loci tested (Table 3, Supplementary Figures 2 & 3). Some loci had evidence of eQTLs with multiple probes, genes or SNPs (Table 3). We assessed whether the number of SNPs with cis-eQTL effects out of the 44 SNPs that we tested, was significantly higher than expected. We observed that eQTL SNPs on average have a substantially higher MAF than non-eQTL SNPs in the 294,767 SNPs tested. In order to correct for this we selected 44 random SNPs that had an equal MAF distribution, and determined for how many of these MAF-matched SNPs eQTLs were observed. We observed a significantly higher number of eQTL SNPs (P=9.3 × 10−5, 106 permutations) amongst the celiac associated SNPs than expected by chance (22 observed eQTL SNPs, vs. 7.8 expected eQTL SNPs). Therefore the celiac disease associated regions are greatly enriched for eQTLs. These data suggest some risk variants may influence celiac disease susceptibility through a mechanism of altered gene expression. Candidate genes with a significant eQTL, where the peak eQTL signal and peak case/control association signal are similar (Supplementary Figure 3), include MMEL1, NSF, PARK7, PLEK, TAGAP, RRP1, UBE2L3 and ZMIZ1.
Table 3

Celiac risk variants correlated with cis gene expression

SNPaChrSNP positionbProbeCentrePositionbIlluminaArrayAddressIDExpressiondatasetcGene nameeQTL P valued
Loci with genome-wide significant evidence (Pcombined <5 × 10−8)
rs3748816125166062412221650452HT-12 PLCH2 1.66 × 10−5
rs37488161251660624829556520725Ref-8v2 + HT-12 TNFRSF14 1.30 × 10−3
rs37488161251660625104296250338Ref-8v2 C1orf93 1.16 × 10−4
rs37488161251660625331152070246Ref-8v2 + HT-12 MMEL1 1.03 × 10−20
rs29654711991587601988801461300279Ref-8v2 + HT-12 DDX59 2.45 × 10−5
rs842647260972975612638101170220Ref-8v2 + HT-12 AHSA2 3.30 × 10−10
rs13003464e261040333612638101170220Ref-8v2 + HT-12 AHSA2 6.39 × 10−11
rs3816281f268461451684619574810020Ref-8v2 + HT-12 PLEK 7.97 × 10−26
rs91799721024370001024185716520180Ref-8v2 + HT-12 IL18RAP 7.35 × 10−87
rs1301071321817042901815938651780433HT-12 UBE2E3 4.93 × 10−5
rs13098911346210205459644496550333Ref-8v2 + HT-12 CXCR6 9.66 × 10−6
rs1309891134621020546255176g2190671HT-12 CCR3 5.50 × 10−10
rs1309891134621020546255176g7570670Ref-8v2 CCR3 5.69 × 10−4
rs6441961d34632738846255176h2190671HT-12 CCR3 2.87 × 10−19
rs6441961d34632738846255176h7570670Ref-8v2 CCR3 1.02 × 10−4
rs11922594f3120608512120683364i6550288Ref-8v2 + HT-12 KTELC1 5.09 × 10−17
rs11922594f3120608512120683364i3850161Ref-8v2 + HT-12 KTELC1 7.34 × 10−6
rs10806425690983333908780753520349HT-12 BACH2 1.92 × 10−3
rs173807461593859651593800685890739Ref-8v2 + HT-12 TAGAP 1.99 × 10−3
rs17380746159385965159381094 j5360364HT-12 TAGAP 3.23 × 10−4
rs17380746159385965159381094 j4860242HT-12 TAGAP 2.18 × 10−3
rs12505521080728033806225402450131Ref-8v2 + HT-12 ZMIZ1 1.80 × 10−3
rs653178121104921391103995526560301Ref-8v2 + HT-12 SH2B3 9.24 × 10−12
rs65317812110492139110710447840253Ref-8v2 + HT-12 ALDH2 1.44 × 10−4
rs65317812110492139110894406 k2070736HT-12 TMEM116 3.68 × 10−4
rs65317812110492139110894406 k3190129Ref-8v2 TMEM116 1.51 × 10−3
rs129288221611311394113356274540072Ref-8v2 + HT-12 C16orf75 1.02 × 10−8
rs48193882144471849440495677200373Ref-8v2 RRP1 2.62 × 10−3
Loci with suggestive evidence (either A. 10−6>Pcombined>5×10−8 and/or B. PGWAS<10−4 and Pfollow-up<0.01)
rs12727642179692597956138610193Ref-8v2 + HT-12 PARK7 9.76 × 10−15
rs8645371165678008165710482 l6290400Ref-8v2 + HT-12 CD247 1.77 × 10−9
rs8645371165678008165710482 l3890689HT-12 CD247 2.93 × 10−7
rs6974491737341035371577612750154Ref-8v2 + HT-12 ELMO1 5.40 × 10−6
rs20744041742220599418243453520672Ref-8v2 + HT-12 LRRC37A 1.17 × 10−4
rs2074404174222059942106695 m5260138Ref-8v2 + HT-12 NSF 1.20 × 10−5
rs2074404174222059942106695 m1410484HT-12 NSF 4.28 × 10−4
rs20744041742220599422230124070615HT-12 WNT3 2.77 × 10−3
rs20744041742220599424851544880037HT-12 LOC388397 1.78 × 10−9
rs22984282220312892203081881230242Ref-8v2 + HT-12 UBE2L3 1.96 × 10−90
rs5979785X1288144512842944 n6480360Ref-8v2 + HT-12 TLR8 3.88 × 10−13
rs5979785X1288144512842944 n3390612Ref-8v2 + HT-12 TLR8 1.07 × 10−7

See Supplementary Figures 2 & 3 for detailed results, and Supplementary Table 3 for more detail of Illumina expression probes.

We tested the SNP with the strongest association from 34 of 39 non-HLA loci (Pcombined<10−6, Table 2), Hap300 proxy SNPs for 4 further loci, and a second independently associated SNP from 6 loci, for correlation with gene expression in PAXgene blood RNA in up to 1,349 individuals. 1 locus (containing ETS1) where an adequate proxy SNP was not available was not included for the eQTL analysis. SNP-gene expression correlations were tested for probes within a 1Mb window. Results are presented for SNPs showing significant correlations with cis gene expression after controlling false discovery rate at 5% (corresponding to P<0.0028).

All chromosomal positions are based on NCBI build-36 coordinates. Probe centre position was determined by re-mapping probe sequences to the human transcriptome and calculated from the mid-point of the transcript start and transcript end positions in genomic co-ordinates.

‘HT-12’ comprise 1240 individuals with blood gene expression assayed using Illumina Human HT-12v3 arrays, ‘Ref-8v2’ comprise 229 individuals with blood gene expression assayed using Illumina Human-Ref-8v2 arrays (Online Methods).

Spearman rank correlation of genotype and residual variance in transcript expression. Meta-analysis eQTL P value shown if both datasets had identical probes.

Second, independently associated SNP from this locus.

Proxy SNP, r2=0.61 in HapMap CEU with most associated SNP rs11712165.

Different Illumina probe sequences with the same Probe Centre Position.

Different Illumina probe sequences with the same Probe Centre Position.

Different Illumina probe sequences with the same Probe Centre Position.

Different Illumina probe sequences with the same Probe Centre Position.

Different Illumina probe sequences with the same Probe Centre Position.

Different Illumina probe sequences with the same Probe Centre Position.

Different Illumina probe sequences with the same Probe Centre Position.

Different Illumina probe sequences with the same Probe Centre Position.

We also assessed co-expression of genes mapping within 500kb of SNPs showing strongest case/control association from the 40 genome-wide significant and suggestive celiac disease loci in an analysis of the 33,109 human Affymetrix Gene Expression Omnibus dataset. This analysis loses power to detect tissue specific correlations by use of numerous tissue types, but greatly gains power by the large sample size. We detected several distinct co-expression clusters (Pearson correlation coefficient between genes >0.5), including four clusters of immune-related genes which contain at least one gene from 37 of the 40 genome-wide significant and suggestive loci (Fig. 1). These data further demonstrate that genes from celiac disease risk loci map to multiple distinct immunological pathways involved in disease pathogenesis.
Figure 1

Co-expression analysis of genes mapping to 40 genome-wide significant and suggestive celiac disease regions in 33,109 heterogenous human samples from the Gene Expression Omnibus

Genes mapping within a 1Mb window of associated SNPs (Table 2) were tested for interaction with genes from other loci. Interactions with Pearson correlation >0.5 shown (P<10−100). Only the genes known to contain causal mutations (HLA-DQA1, HLA-DQB1) were analysed from the HLA region, “HLA-DQB2 / HLA-DQB1” is a single expression probeset mapping to both genes. No probe for THEMIS was present on the earlier version of the U133 array, however in a subset analysis of U133 Plus2.0 data, THEMIS is co-expressed in the major immune gene cluster

DISCUSSION

We previously reported that most celiac genetic risk variants mapped near genes that are functional in the immune system22, and this remains true for the 13 new genome-wide significant, and 13 new suggestive, risk variants from the current study. We can now refine these observations and highlight specific immunological pathways relevant to celiac disease pathogenesis:

1) T cell development in the thymus

The rs802734 LD block contains the recently identified gene THEMIS ‘THymus-Expressed Molecule Involved in Selection’. THEMIS plays a key regulatory role in both positive and negative T-cell selection during late thymocyte development23. Furthermore, the rs10903122 LD block contains RUNX3, a master regulator of CD8+ T lymphocyte development in the thymus24,25. TNFRSF14 (LIGHTR, rs3748816 LD block) has widespread peripheral leucocyte functions as well as a critical role in promoting thymocyte apoptosis26. The ETS1 transcription factor (rs11221332 LD block) is also active in peripheral leucocytes, however it is also a key player in thymic CD8+ lineage differentiation, acting in part by promoting RUNX3 expression27. The importance of the thymus in autoimmune disease pathogenesis has been previously emphasised by the established role of thymectomy in the treatment of myasthenia gravis. In type 1 diabetes, it was shown that disease associated genetic variation in the insulin gene INS causes altered thymic insulin expression and subsequent T cell tolerance for insulin as a self-protein28. However, the importance of thymic T cell regulation has not been previously recognised in the aetiology of celiac disease. It is conceivable that the associated variants may alter biological processes prior to thymic MHC-ligand interactions. Alternatively it is now clear that exogenous antigen presentation and selection occurs in the thymus via migratory dendritic cells - this has been demonstrated for skin, and has been hypothesised for food antigens29,30. These findings suggest research into immuno-/pharmaco-logical modifiers of T cell tolerance more generally in autoimmune diseases.

2) Innate immune detection of viral RNA

Although the association signal at rs5979785 (Pcombined=6.36×10−8) in the TLR7/TLR8 region is just outside our genome wide significance threshold, we observe a strong effect of rs5979785 on TLR8 expression in whole blood. Both TLRs recognise viral RNA. Taken together with the recent observation of rare loss of function mutations in the enteroviral response gene IFIH1 protective against type 1 diabetes31, these findings suggest viral infection (and the nature of the host response to infection) as a putative environmental trigger common to these autoimmune diseases.

3) T and B cell co-stimulation (or co-inhibition)

This class of molecules controls the strength and nature of the response to T or B (immunoglobulin) cell receptor activation by antigens. We observe multiple regions with genes (CTLA4/ICOS/CD28, TNFRSF14, CD80, ICOSLG, TNFRSF9, TNFSF4) from this class of ligand-receptor pairs suggesting fine control of the adaptive immune response might be altered in at-risk individuals.

4) Cytokines, chemokines and their receptors

Our previous report discussed the function of the 2q11-12 interleukin receptor cluster (IL18RAP, etc), the 3p21 chemokine receptor cluster (CCR5, etc.) and the loci containing IL2/IL21 and IL12A. We now report additional loci containing TNFSF18 and CCR4. We estimate that the current celiac disease variants, including the major celiac disease associated HLA variant, HLA-DQ2.5cis, less common celiac disease associated haplotypes in the HLA (HLA-DQ8; HLA-DQ2.5trans; HLADQ2.2), and the additional 26 definitively implicated loci explain about 20% of total celiac disease variance, which would represent 40% of genetic variance, assuming a heritability of 0.5. A long tail of low effect size common variants, along with highly penetrant rare variants (both at the established loci and elsewhere in the genome), may substantially contribute to the remaining heritability. We observed different haplotypes within the ETS1 region associated with coeliac disease in Europeans, and SLE in the Chinese population. We further note for some autoimmune diseases studied in European origin populations, that although the same LD block has been associated, the association is with a different haplotype. In some cases, the same variants are associated, but the direction of association is opposite (e.g. rs917997/IL18RAP in celiac disease versus type 1 diabetes). We believe further exploration of these signals may reveal critical differences in the nature of the immune system perturbation between these diseases. Previous investigators have observed that only a small proportion of GWAS associations are coding variants, and have suggested that these may instead influence regulation of gene expression. Here, we show that over half the celiac disease associated variants are correlated with expression changes in nearby genes. This mechanism is likely to explain the function of some risk variants for other common, complex diseases. Further research, however, is needed to definitively determine at each locus both the celiac disease causal variants and their functional mechanisms.
  46 in total

1.  Analysis of large-scale gene expression data.

Authors:  G Sherlock
Journal:  Brief Bioinform       Date:  2001-12       Impact factor: 11.622

2.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository.

Authors:  Ron Edgar; Michael Domrachev; Alex E Lash
Journal:  Nucleic Acids Res       Date:  2002-01-01       Impact factor: 16.971

Review 3.  Recent advances in coeliac disease.

Authors:  D A van Heel; J West
Journal:  Gut       Date:  2006-07       Impact factor: 23.059

Review 4.  Detecting shared pathogenesis from the shared genetics of immune-related diseases.

Authors:  Alexandra Zhernakova; Cleo C van Diemen; Cisca Wijmenga
Journal:  Nat Rev Genet       Date:  2009-01       Impact factor: 53.242

5.  Evaluating the effects of imputation on the power, coverage, and cost efficiency of genome-wide SNP platforms.

Authors:  Carl A Anderson; Fredrik H Pettersson; Jeffrey C Barrett; Joanna J Zhuang; Jiannis Ragoussis; Lon R Cardon; Andrew P Morris
Journal:  Am J Hum Genet       Date:  2008-06-26       Impact factor: 11.025

6.  Insulin expression in human thymus is modulated by INS VNTR alleles at the IDDM2 locus.

Authors:  P Vafiadis; S T Bennett; J A Todd; J Nadeau; R Grabs; C G Goodyer; S Wickramasinghe; E Colle; C Polychronakos
Journal:  Nat Genet       Date:  1997-03       Impact factor: 38.330

7.  Coeliac disease-associated risk variants in TNFAIP3 and REL implicate altered NF-kappaB signalling.

Authors:  G Trynka; A Zhernakova; J Romanos; L Franke; K A Hunt; G Turner; M Bruinenberg; G A Heap; M Platteel; A W Ryan; C de Kovel; G K T Holmes; P D Howdle; J R F Walters; D S Sanders; C J J Mulder; M L Mearin; W H M Verbeek; V Trimble; F M Stevens; D Kelleher; D Barisani; M T Bardella; R McManus; D A van Heel; C Wijmenga
Journal:  Gut       Date:  2009-02-24       Impact factor: 23.059

8.  Phenopedia and Genopedia: disease-centered and gene-centered views of the evolving knowledge of human genetic associations.

Authors:  W Yu; M Clyne; M J Khoury; M Gwinn
Journal:  Bioinformatics       Date:  2009-10-27       Impact factor: 6.937

9.  A genome-wide association study for celiac disease identifies risk variants in the region harboring IL2 and IL21.

Authors:  David A van Heel; Lude Franke; Karen A Hunt; Rhian Gwilliam; Alexandra Zhernakova; Mike Inouye; Martin C Wapenaar; Martin C N M Barnardo; Graeme Bethel; Geoffrey K T Holmes; Con Feighery; Derek Jewell; Dermot Kelleher; Parveen Kumar; Simon Travis; Julian R F Walters; David S Sanders; Peter Howdle; Jill Swift; Raymond J Playford; William M McLaren; M Luisa Mearin; Chris J Mulder; Ross McManus; Ralph McGinnis; Lon R Cardon; Panos Deloukas; Cisca Wijmenga
Journal:  Nat Genet       Date:  2007-06-10       Impact factor: 38.330

10.  Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls.

Authors: 
Journal:  Nature       Date:  2007-06-07       Impact factor: 49.962

View more
  445 in total

Review 1.  Celiac disease: diagnostic criteria in progress.

Authors:  U Volta; V Villanacci
Journal:  Cell Mol Immunol       Date:  2011-01-31       Impact factor: 11.530

Review 2.  Regulation of CD4+ T-cell polarization by suppressor of cytokine signalling proteins.

Authors:  Camille A Knosp; James A Johnston
Journal:  Immunology       Date:  2012-02       Impact factor: 7.397

3.  Fine points in mapping autoimmunity.

Authors:  Constantin Polychronakos
Journal:  Nat Genet       Date:  2011-11-28       Impact factor: 38.330

4.  Use of genome-wide association studies for drug repositioning.

Authors:  Philippe Sanseau; Pankaj Agarwal; Michael R Barnes; Tomi Pastinen; J Brent Richards; Lon R Cardon; Vincent Mooser
Journal:  Nat Biotechnol       Date:  2012-04-10       Impact factor: 54.908

5.  Meta-analysis of genetic polymorphisms in granulomatosis with polyangiitis (Wegener's) reveals shared susceptibility loci with rheumatoid arthritis.

Authors:  Sharon A Chung; Gang Xie; Delnaz Roshandel; Richard Sherva; Jeffrey C Edberg; Megan Kravitz; Paul F Dellaripa; Gary S Hoffman; Alfred D Mahr; Philip Seo; Ulrich Specks; Robert F Spiera; E William St Clair; John H Stone; Robert M Plenge; Katherine A Siminovitch; Peter A Merkel; Paul A Monach
Journal:  Arthritis Rheum       Date:  2012-10

6.  Expression QTL analysis of top loci from GWAS meta-analysis highlights additional schizophrenia candidate genes.

Authors:  Simone de Jong; Kristel R van Eijk; Dave W L H Zeegers; Eric Strengman; Esther Janson; Jan H Veldink; Leonard H van den Berg; Wiepke Cahn; René S Kahn; Marco P M Boks; Roel A Ophoff
Journal:  Eur J Hum Genet       Date:  2012-03-21       Impact factor: 4.246

Review 7.  Targeted modification of wheat grain protein to reduce the content of celiac causing epitopes.

Authors:  C Osorio; N Wen; R Gemini; R Zemetra; D von Wettstein; S Rustgi
Journal:  Funct Integr Genomics       Date:  2012-06-26       Impact factor: 3.410

Review 8.  Genome-wide approaches to schizophrenia.

Authors:  Jubao Duan; Alan R Sanders; Pablo V Gejman
Journal:  Brain Res Bull       Date:  2010-04-28       Impact factor: 4.077

Review 9.  Using chromatin marks to interpret and localize genetic associations to complex human traits and diseases.

Authors:  Gosia Trynka; Soumya Raychaudhuri
Journal:  Curr Opin Genet Dev       Date:  2013-11-25       Impact factor: 5.578

10.  Regulatory genomic regions active in immune cell types explain a large proportion of the genetic risk of multiple sclerosis.

Authors:  Ramyiadarsini I Elangovan; Giulio Disanto; Antonio J Berlanga-Taylor; Sreeram V Ramagopalan; Lahiru Handunnetthi
Journal:  J Hum Genet       Date:  2014-02-13       Impact factor: 3.172

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.