Literature DB >> 22961000

Dense fine-mapping study identifies new susceptibility loci for primary biliary cirrhosis.

Jimmy Z Liu¹, Mohamed A Almarri, Daniel J Gaffney, George F Mells, Luke Jostins, Heather J Cordell, Samantha J Ducker, Darren B Day, Michael A Heneghan, James M Neuberger, Peter T Donaldson, Andrew J Bathgate, Andrew Burroughs, Mervyn H Davies, David E Jones, Graeme J Alexander, Jeffrey C Barrett, Richard N Sandford, Carl A Anderson.

Abstract

We genotyped 2,861 cases of primary biliary cirrhosis (PBC) from the UK PBC Consortium and 8,514 UK population controls across 196,524 variants within 186 known autoimmune risk loci. We identified 3 loci newly associated with PBC (at P<5×10(-8)), increasing the number of known susceptibility loci to 25. The most associated variant at 19p12 is a low-frequency nonsynonymous SNP in TYK2, further implicating JAK-STAT and cytokine signaling in disease pathogenesis. An additional five loci contained nonsynonymous variants in high linkage disequilibrium (LD; r2>0.8) with the most associated variant at the locus. We found multiple independent common, low-frequency and rare variant association signals at five loci. Of the 26 independent non-human leukocyte antigen (HLA) signals tagged on the Immunochip, 15 have SNPs in B-lymphoblastoid open chromatin regions in high LD (r2>0.8) with the most associated variant. This study shows how data from dense fine-mapping arrays coupled with functional genomic data can be used to identify candidate causal variants for functional follow-up.

Entities: Chemical

Mesh：

Substances：

Year: 2012 PMID： 22961000 PMCID： PMC3459817 DOI： 10.1038/ng.2395

Source DB: PubMed Journal: Nat Genet ISSN： 1061-4036 Impact factor: 38.330

Primary biliary cirrhosis (PBC) is characterized by the immune-mediated destruction of intra-hepatic bile ducts, resulting in chronic cholangitis, liver fibrosis and ultimately cirrhosis[1]. With a UK prevalence of 35:100,000, rising to 94:100,000 women over 40 years of age, it is the most common autoimmune (AI) liver disorder[1,2]. Family-based studies indicate a substantial genetic component to PBC susceptibility, with a sibling relative risk of ~10.5 in the UK[3]. Genome-wide association studies (GWAS) have identified 22 PBC risk loci, and highlighted the role of NFkB signaling, T-cell differentiation, Toll-like receptor and tumor necrosis factor signaling in disease pathogenesis[4-6]. Sixteen of these loci are also associated with other immune-mediated diseases such as multiple sclerosis, celiac disease and type 1 diabetes (T1D), shedding light on the involvement of common genes and pathways across these diseases[7]. Despite these advances, the specific causal variant at many of these loci remains unknown. To better define risk variants and identify additional susceptibility loci, we performed a fine-mapping and association study using a cohort of 2,861 cases from the UK PBC Consortium and 8,514 UK population controls from the 1958 British Birth Cohort and National Blood Service. All samples were genotyped on the Immunochip, an Illumina Infinium array containing 196,524 variants (718 small insertions/deletions and 195,806 SNPs) across 186 known AI risk loci. SNPs were derived from population-based sequencing projects such as the 1000 Genomes project and autoimmune disease resequencing efforts[8,9]. Compared to GWAS arrays, Immunochip’s increased marker density within known AI loci increases power to detect PBC associations within these selected key candidate genes, and provides a powerful means of fine-mapping known PBC loci as causal variants are more likely to be directly genotyped. Following quality control (Online Methods), 143,020 polymorphic SNPs were available across 2,861 cases and 8,514 controls. (Supplementary Tables 1-2, Supplementary figures 1-6). A further 94,559 SNPs in the Immunochip fine-mapping regions were imputed using genotypes from the 1000 Genomes Project June 2011 release (Online Methods). The inflation factor inferred from 2,258 SNPs not associated with autoimmune disease showed only a modest inflation (λ=1.096, Online Methods), similar to that reported in our previous GWAS for PBC [6]. Of the 22 known PBC risk loci, 16 reached genome-wide significance (P<5×10−8) (Figure 1) and four showed nominal evidence of association (5×10−86]. At 12 of the genome-wide significant loci, the most associated SNP was different to that previously reported (Supplementary Table 3). There was little difference in the effect-size estimates between the GWAS tagging SNP and the most strongly associated Immunochip SNP (Supplementary Figure 7), although this may be due to a large proportion of overlapping samples between the two studies (Online Methods).

Figure 1

Manhattan plot and list of genome-wide significant PBC risk loci across Immunochip

Novel risk loci are highlighted in blue. Loci with more than one independent signal are highlighted in red. The vertical red line indicates the genome-wide significance threshold of P=5×10−8. The peak on chromosome 6 is the HLA region.

Stepwise conditional regression[10] revealed multiple independent signals at five loci, with 16p13 harboring three, and 3q25 four such associations (Table 1). At the 16p13 locus, the third independent signal, rs80073729, is a rare SNP (MAF<0.5%) recently associated with celiac disease[9]. In the same study, Trynka et al.[9] also identified multiple independent signals at 3q25, though rs80014155, a rare SNP that best tags the fourth independent PBC association at this locus, was not among them. These results suggest that resequencing hundreds or thousands of cases across known GWAS loci will be a powerful means of identifying additional independent risk alleles. It is likely that these two rare SNP associations would have been missed using standard GWAS arrays due to poor tagging, unless they were directly genotyped. As these are rare SNPs, further replication in large independent cohorts will be required to confirm their associations. Haplotype association analysis at loci with multiple independent signals identified similar effect-size estimates suggesting that the causal variant is among, or is highly correlated with, genotyped SNPs (Supplementary Table 4). These additional independent association signals thus yield a more complete understanding of the genetic architecture of PBC and enable more informative genotype-based recall studies to be conducted.

Table 1

PBC risk loci at genome-wide significance

Chr	SNPa	RAb	RAFc	P d	OR	LD regione	Nearby gene(s)f	Functionalannotationg
					(95% CI)	(size)
1p31	rs72678531	G	0.17	2.47×10⁻³⁸	1.61	67,560,940-67,592,782	Il12RB2	OC
					(1.49-1.73)	(31,842)
1q31	rs2488393	A	0.21	4.29×10⁻¹²	1.28	195,609,003-196,047,821	DENND1B
					(1.19-1.37)	(438,818)
2q32	rs3024921	A	0.06	2.59×10⁻¹⁸	1.62	191,651,517-191,651,517	STAT1,STAT4
					(1.45-1.80)	(0)
2q32	Second signal	A	0.22	1.38×10⁻¹³	1.31	191,651,987-191,681,279	STAT1,STAT4
	rs7574865				(1.22-1.40)	(29,292)
3q13	rs2293370	G	0.8	6.84×10⁻¹⁶	1.39	120,598,840-120,734,898	TMEM39A,POGLUT1,TIMMDC1,CD80	NS
					(1.29-1.52)	(136,058)
3q25	rs2366643	A	0.57	3.92×10⁻²²	1.35	161,202,965-161,219,770	IL12A	OC
					(1.27-1.44)	(16,805)
3q25	Second signal	G	0.15	5.74×10⁻¹⁷	1.41	161,122,353-161,174,976	IL12A	OC
	rs62270414				(1.30-1.53)	(52,623)
3q25	Third signal	G	0.43	4.73×10⁻⁹	1.26	161,192,695-161,198,245	IL12A	OC
	rs668998				(1.17-1.36)	(5,550)
3q25	Fourth signal	A	0.004	2.64×10⁻¹¹	3.44	161,108,087-161,176,747	IL12A
	rs80014155				(2.39-4.94)	(68,660)
4q24	rs7665090	G	0.52	8.48×10⁻¹⁴	1.26	103,770,651-103,770,651	MANBA,NFKB1	NS,eQTL
					(1.19-1.34)	(0)
5p13	rs6871748	A	0.72	2.26×10⁻¹³	1.3	35,885,906-35,921,739	IL7R,CAPSL,SPEF2,UGT3A1	NS
					(1.21-1.4)	(35,833)
7q32	rs35188261	A	0.17	6.52×10⁻²²	1.52	128,372,852-128,499,110	IRF5,TNP03	OC
					(1.39-1.63)	(126,258)
7q32	Second signal	G	0.47	4.12×10⁻⁹	1.22	128,361,203-128,367,916	IRF5,TNP03	OC,eQTL
	rs3807307				(1.14-1.30)	(6,713)
11q23	rs80065107	A	0.79	7.20×10⁻¹⁶	1.39	118,115,759-118,248,982	DDX6	OC
					(1.28-1.5)	(133,223)
12p13	rs1800693	G	0.4	1.18×10⁻¹⁴	1.27	6,310,270-6,323,072	TNFRSF1A,LTBR,SCNN1A	OC
					(1.19-1.34)	(12,802)
12p13	Second signal	A	0.25	1.69×10⁻⁹	1.23	6,362,910-6,362,910	TNFRSF1A,LTBR,SCNN1A	OC
	rs11064157				(1.15-1.32)	(0)
12q24	rs11065979	A	0.44	2.87×10⁻⁹	1.2	110,368,991-111,095,097	ATXN2,BRAP,SH2B3	NS
					(1.13-1.27)	(726,106)
14q24	rs911263	A	0.71	9.95×10⁻¹¹	1.26	67,823,346-67,823,346	RAD51B
					(1.17-1.35)	(0)
16p13	rs1646019	G	0.71	6.72×10⁻¹⁵	1.31	11,254,549-11,273,001	SOCS1,CLEC16A,PRM1,PRM2	OC,eQTL
					(1.23-1.41)	(18,452)
16p13	Second signal	C	0.68	2.19×10⁻¹³	1.29	10,999,820-11,117,948	SOCS1,CLEC16A,PRM1,PRM2	OC,eQTL
	rs12708715				(1.21-1.38)	(118,128)
16p13	Third signal	A	0.004	2.69×10⁻⁸	2.96	11,281,298-11,281,298	SOCS1,CLEC16A,PRM1,PRM2	OC
	rs80073729				(2.02-4.33)	(0)
16q24	rs11117433	G	0.77	1.41×10⁻⁹	1.26	84,577,017-84,577,017	IRF8	OC
					(1.17-1.36)	(0)
17q12	rs8067378	G	0.52	6.05×10⁻¹⁴	1.26	35,158,633-35,336,333	ORMDL3,ZPBP2,GSDMB,IKZF3	NS,OC,eQTL
					(1.19-1.34)	(177,700)
17q21	rs17564829	G	0.24	2.15×10⁻⁹	1.25	41,047,160-42,211,804	CRHR1,MAPT	NS,OC,eQTL
					(1.16-1.35)	(1,164,644)
19p12	rs34536443	G	0.95	1.23×10⁻¹²	1.91	10,324,118-10,324,118	TYK2	NS
					(1.59-2.28)	(0)
22q13	rs2267407	A	0.23	1.29×10⁻¹³	1.29	38,076,996-38,086,596	SYNGR1,PDGFB,RPL3	OC,eQTL
					(1.21-1.38)	(9,600)

Non HLA PBC risk loci meeting genome-wide significance (P < 5 × 10−8) are shown.

Most significant SNP in the locus. Novel associations are highlighted in bold.

Risk allele.

Risk allele frequency in controls.

For primary signals, p-values were obtained from Cochran-Armitage tests for trend. For second, third and fourth association signals, p-values were obtained using logistic regression conditioning on the previous independent SNPs.

Regions of high linkage disequilibrium defined by SNPs with r2 > 0.8. See Supplementary Figure 10 for regional locus plots.

RefSeq UCSC hg18 track.

Denotes if there are SNPs with r2 > 0.8 with the hit SNP that lie within OC (open chromatin peaks), are non-synonymous (NS) or are expression quantitative trait locus (eQTL). Full list of SNPs is given in Supplementary Table 6-8.

Variants at three loci not previously reported as associated with PBC reached genome-wide significance threshold (Table 1). The most significant association on 19p12, rs34536443 (OR=1.91, P=1.24×10−12), is a low-frequency (1%≤MAF<5%) non-synonymous SNP in the tyrosine kinase 2 gene (TYK2), previously associated with multiple sclerosis[11]. The locus has also been associated with T1D[12], psoriasis[13] and Crohn’s disease[14], although rs34536443 was not genotyped as part of these studies. For T1D and psoriasis, the strongest associations were to common SNPs that reside on the same haplotype (rs2304256 (r2=0.06, D’=0.9) and rs280519 (r2=0.03, D’=1)). The most associated SNP in Crohn’s disease and the second psoriasis signal (rs12720356) is independent of rs34536443 (r2 = 0, D’ = 0.003). The 12q24 locus has been associated with celiac disease[9,15], rheumatoid arthritis[16] and T1D[17], though it was a non-synonymous SNP in SH2B3, rs3184504 (OR=1.19, P=1.11×10−8), rather than the most significant SNP in this study, rs11065979 (OR=1.2, P=2.87×10−9), that was most strongly associated; The two SNPs are in high LD (r2 = 0.81) and further studies are required to identify the causal variant underlying the PBC association signal at this locus. The most associated SNP in the 17q21 region, rs17564829 (OR=1.25, P=1=2.15×10−9), is located in MAPT, a gene that has been associated with cognitive symptoms in Parkinson’s disease. While cognitive symptoms are a major part of the symptom complex associated with PBC, it remains to be seen if a) the true causal variant at the locus has its functional effect through MAPT, and b) if this functional effect then results in cognitive changes in individuals with PBC. Both TYK2 and SH2B3 are involved in the production of cytokines, adding to the evidence that cytokine imbalances play a role in PBC and other autoimmune diseases[18,19]. TYK2 is a member of the Janus kinase family, which transduce cytokine signals by phosphorylating STAT transcription factors. Couturier et al.[20] showed that heterozygotes for rs34536443 have significantly reduced TYK2 activity, which promotes the secretion of Th2 cytokines[20]. For SH2B3, carriers of the A risk allele of rs3184504 show a moderate increase in production of cytokines and stronger activation of the NOD2 recognition pathway compared to carriers of the G allele[21], suggesting a possible role in helping prevent bacterial infection. Candidate genes studies have implicated several HLA-DR alleles in PBC susceptibility, particularly the DRB1*08 allele[22-25]. To further characterize HLA risk variants, the classical HLA alleles (A, B, C, DQA1, DQB1 and DRB1) were imputed from genotyped SNPs in the MHC[26,27] (Online Methods). Fourteen HLA-alleles reached genome-wide significance and conditional analysis clustered these associations into four independent signals (Supplementary Table 5, Supplementary Figure 8). The most significant association was the HLA-DQA1*0401 allele (OR=3.06, P =5.9×10−45), which forms a haplotype with two other HLA class II alleles (DQB1*0402 and DRB1*0801) and is an established PBC risk locus[22-25]. The second and third most significant clusters, DQB1*0602 (OR=0.64, P=2.32×10−15) and DQB1*0301 (OR=0.70, P =6.48×10−14) both have protective effects, confirming previous studies showing suggestive associations between these loci and PBC susceptibility [22,23]. The fourth most associated cluster, DRB1*0404 (OR=1.57, P=1.22×10−9) has not been previously associated with PBC. The variance in liability explained by the 26 independent SNPs and four HLA-types are 4.9% and 1.4% respectively, which together account for 16.2% of the total PBC heritability of liability of 0.39 (Online Methods). To identify candidate causal variants we searched for non-synonymous variants in high LD (r2>0.8) with the most associated variants at each PBC risk locus. We found 39 such variants (of which 13 were directly genotyped) within seven risk loci (Table 1 and Supplementary table 6), including two variants at two of the loci newly associated with PBC in this study, TYK2 and SH2B3. Functional follow-up studies are needed before these non-synonymous variants can be confirmed as the causal variants at these loci. As variation in gene expression is also likely to influence PBC risk, we evaluated the extent to which the most associated SNP at each locus tags expression quantitative trait loci (eQTLs) or regions of open chromatin. Regions of open chromatin are associated with gene regulatory elements including promoters, enhancers, silencers, insulators and locus control regions. Known eQTLs were collated from the University of Chicago eQTL (see URLs) Browser and Gaffney et al. (2012)[28]. Open chromatin regions in a range of cell lines were identified as part of the Encyclopedia of DNA Elements (ENCODE) project[29,30] using DNase I hypersensitive sites sequencing (DNase-seq) and formaldehyde-assisted isolation of regulatory elements sequencing (FAIRE-seq). Of the 26 independent non-HLA genome-wide significant SNPs identified in this study, 15 have an r 2>0.8 with SNPs in DNase-seq or FAIRE-seq peaks in a B-lymphoblastoid cell line, and seven are also significant eQTLs in the same cell line (Table 1 and Supplementary Table 7-8). To test if the enrichment of open chromatin within the B-lymphoblastoid cell line was significantly greater than that for all other cell lines we began by grouping SNPs into independent loci. We sequentially identified the most associated SNP not already assigned to a locus and assigned this SNP, and others in weak LD (r2>0.1) with it, to a new locus. We then calculated an enrichment score, E (Online Methods), using only candidate causal variants (r2>0.8 to the most associated SNP in each locus) across all currently assigned loci. Considering only loci where the most associated SNP achieved genome-wide significance (N=21, excluding the HLA locus, SNPs outside Immunochip fine-mapping regions and SNPs with MAF<5%), Gm12878 had the highest enrichment score compared with the other cell lines, though the difference in enrichment just failed to reach significance (P = 0.068 Online Methods) (Figure 2). Failure to correctly account for LD between associated SNPs can bias the calculated degree of enrichment (Supplementary Figure 9). Our enrichment analysis protocol can be applied to other functional annotations and other disease phenotypes and will be well powered for traits with many genome-wide significant associations.

Figure 2

Enrichment of DNase-seq peaks among PBC risk loci in Gm12878 compared to other ENCODE cell lines

The relative enrichment (E) of SNPs within DNase-seq peaks was calculated across the 35 most associated loci. There is suggestive, though non-significant, evidence that genome-wide significant loci (P<5×10−8 - vertical blue line) are more likely to lie within DNase-seq peaks in B-lymphoblastoid cell lines (solid red line) than they are to lie within the union of all other annotated cell lines (solid black line) (P=0.068). Dotted grey lines denote E for other annotated cell lines. The shaded grey area represents the 95% confidence interval of E for Gm12878 from 1000 permutations. Cell types: Gm12878: B-lymphoblastoid, H1hesc: embryonic stem cells, H9es: embryonic stem cells, Helas3: cervical carcinoma, Hepg2: liver carcinoma, Hsmm: skeletal muscle myoblasts, Huvec: umbilical vein endothelial cells, K562: leukemia, Lhsr: prostate epithelial cells, Mcf7: mammary gland adenocarcinoma, Medullo: medulloblastoma, Melano: epidermal melanocytes, Myometr: Myometrial cells, Nhbe: bronchial epithelial cells, Nhek: epidermal keratinocytes, Panislets: pancreatic islets, Progfib: fibroblasts.

In summary, we have used dense genotyping across autoimmune disease associated loci to better define the genetic architecture of known PBC risk loci. We have identified additional independent genome-wide significant associations at five loci, and have identified potentially causal protein-coding and regulatory variants within many disease associated loci. We also identified three new PBC risk loci, bringing the total number of associated loci to 25, and confirmed HLA-allele associations by imputing HLA-types. Furthermore, we have combined our SNP data with large-scale functional genomics annotations to identify the cell types in which the PBC associated variants are likely to be acting.

Online Methods

Ethical approval

This study was approved by the Research and Development Departments of all National Health Service (NHS) Trusts participating in this study and by the Oxford Research Ethics Committee C (Oxford REC C reference 07/H0606/96).

Samples

All subjects were of self-declared British or Irish ancestry. Cases were collected by the UK PBC Consortium, which consists of 142 NHS trusts including all UK liver transplant centers. All individuals were over 18 years of age with probable or certain PBC. Three criteria were applied to diagnose the condition: a) a positive test for the presence of anitmitochondrial antibodies (titer 1:40 or higher), b) liver biopsy histology consistent with PBC, and c) liver biochemistry consistent with PBC (i.e. a higher level of bilirubin, aspartate transaminase, alanine transaminase, alkaline phosphatase or gamma-glutamyl transferase compared to the upper reference level). Diagnosis was documented as probable when two criteria were satisfied and certain if all three criteria were satisfied. A total of 2,981 cases were supplied by the UK PBC Consortium. 8,970 control samples were ascertained from the 1958 British Birth Cohort and the National Blood. This study contains 1,838 cases and 2,356 controls included in our recent PBC GWAS[6].

DNA extraction

DNA was extracted from blood or saliva. Blood samples from PBC patients were extracted by the East Anglian Medical Genetics Service, while saliva samples were collected using an Oragene kit and DNA extracted at Source BioScience Healthcare. DNA samples were plated, normalized and shipped to the Wellcome Trust Sanger Institute for sample quality control.

Genotyping

Samples were genotyped on an Illumina iSelect HD custom genotyping array (Immunochip). All 2,981 cases and 4537 controls were genotyped at the Wellcome Trust Sanger Institute. A further 4433 control samples were genotyped at the Center for Public Health Genomics at the University of Virginia. Genotyping of control samples was coordinated by the Immunochip consortium for use in several Immunochip projects. The NCBI build 36 (hg18) map was used (Illumina manifest file Immuno_BeadChip_11419691_B.bpm). Normalized probe intensities were extracted for all samples passing standard laboratory QC thresholds and genotypes were called using optiCall[31]. Genotypes with an individual posterior probability lower than 0.7 were defined as unknown. optiCall was chosen because we found it to be more accurate in calling common and low-frequency variants on Immunochip compared to other established algorithms such as Illuminus[32] and GenoSNP[33].

Quality Control

Sample quality control (QC) was performed for each sample set separately. All monomorphic SNPs were removed prior to QC. Samples with a call rate lower than 98% and heterozygosity more than three standard deviations from the mean were excluded. A set of LD-pruned SNPs with MAF>20% were used to estimate identity by descent (IBD) and ancestry. For each pair of individuals with an estimated IBD>18.75%, the sample with the lower call rate was removed. Principal component analysis was used to exclude samples of non-European ancestry[34] (Supplementary Figures 1-3). Following sample QC 2,861 cases and 8,514 controls remained (Supplementary Table 1). SNPs with a minor allele frequency less than 0.1%, Hardy-Weinberg equilibrium P < 10−6, call rate lower than 98%, or significantly different (P<10−5) call rate in cases vs. controls (or between the two control sets) were excluded. After marker QC 143,020 polymorphic SNPs were available for analysis (Supplementary Table 2).

Statistical Methods

Genomic Inflation factor

The Immunochip contains 2,258 SNPs that lie in regions associated with bipolar disease. These were used as null markers to estimate the overall inflation of the distribution of association test statistics[35].

Imputation

Using the 90,977 SNPs from the cleaned Immunochip set that were in fine-mapped regions, additional genotypes were imputed using the 1000 Genomes Phase I (interim) June 2011 release reference panel and IMPUTE2[36]. Imputation was performed separately in three batches of 3792, 3792 and 3791 individuals, with the case:control ratio constant across batches. SNPs with a posterior probability less than 0.9 and those with differential missingness (P<10−5) between the three batches were removed, as were those SNPs that failed the same exclusion thresholds used for the original Immunochip QC. After imputation, a total of 237,619 SNPs were available for analysis.

Association analysis

Case-control association tests were implemented using a standard one-degree of freedom Cochran-Armitage test for trend in PLINK v1.07[37]. Secondary associations were identified using step-wise logistic regression analysis conditioning on the allelic dosage of the primary signal in each significant locus. The process was repeated, conditioning on all independent genome-wide significant SNPs, until all genome-wide significant signals were accounted for[10]. Haplotype association was performed in PLINK using logistic regression. Cluster plots for all SNPs P<5×10−6 were manually checked using Evoker[38], and poorly called SNPs were removed from further study (Supplementary Figure 11).

HLA Imputation

Imputation of six classic HLA alleles (class I: HLA-A, HLA-B and HLA-C, class II: HLA-DQA1, HLA-DQB1 and HLA-DRB1) was performed using the prediction algorithm proposed by Leslie et al. and implemented in the program HLA*IMP[26,27]. Case-control association was performed on HLA allele posterior probabilities generated from HLA*IMP using logistic regression to account for genotype uncertainty following imputation. Pairwise conditional logistic regression was used to identify independent association signals among the 21 HLA-alleles that reached P < 0.0001.

Heritability explained

The heritability explained by the 26 independent genome-wide significant SNPs and four HLA-alleles was estimated using a liability threshold model[39,40] assuming a disease prevalence of 40/100,000, log-additive risk and a sibling relative risk ratio of 10.5[3].

eQTL analysis

eQTLs within genome-wide significant loci were collated from the University of Chicago eQTL Browser (see URLs) and a study by Gaffney et al., (2012)[28]. The eQTL Browser contains significant eQTLs that were identified in recent studies across multiple cell lines and populations, while Gaffney et al., reanalysed gene expression data from 210 lymphoblastoid cell lines using a total of 13.6M SNPs from the 1000 Genomes project. For more details, see Gaffney, et al., (2012)[28] and references listed in the Chicago eQTL Browser (see URLs).

Enrichment of open chromatin regions

The ENCODE project annotated regions of open chromatin using two techniques, the direct sequencing of DNaseI cleavage sites (DNase-seq: sixteen different cell lines) and formaldehyde-assisted isolation of regulatory elements sequencing (FAIRE-seq: ten different cell lines)[29,30]. Both methods isolate nucleosome-depleted regions of DNA and map reads from next-generation sequencing to determine their location. The overlap of peaks between the two assays ranges from 30-40% depending on the cell type, and regions identified uniquely by DNase-seq or FAIRE-seq often represent relevant biological processes[29]. Positions of discrete DNase-seq and FAIRE-seq peaks were estimated from the base overlap signal (BOS) at each base-pair[29]. We quantified the evidence for the open chromatin peaks using a Poisson distribution where lambda equals the mean BOS across all Immunochip SNPs. Supplementary Figure 12 shows the relative position of open chromatin peaks and associated SNPs within significantly associated loci. For both DNase-seq and FAIRE-seq data, we estimated the amount of enrichment for open chromatin peaks among significant loci across the ENCODE cell lines. SNPs were first grouped into independent loci; we sequentially identified the most associated SNP not already assigned to a locus and assigned this SNP, and others in weak LD with it (r2>0.1), to a new locus. After the addition of each new locus, we calculated E, where, for a given cell line, OC and N are the number of candidate causal SNPs (r2>0.8 to the most associated SNP) that lie within open chromatin peaks across the selected loci and the total number of SNPs within the loci, respectively. OC and N are the equivalent measures across all SNPs within Immunochip fine mapping regions We only included the fine-mapping regions to increase the likelihood that the causal variant was assayed, and excluded SNPs in the HLA and those with MAF < 0.05 to avoid possible biases due to LD structure. The OC values for each of the cell lines are given in Supplementary Table 9. To compare E between cell lines, the number of candidate causal SNPs in open chromatin (OC) and the total number SNPs in open chromatin (OC) were first calculated for the union of open chromatin peaks across all cell lines other than that being evaluated. We then tested the alternative hypothesis that, for a given cell line, the proportion OC/OC > OC/OC using a chi-square test for the difference in proportions. To ensure that our test was well calibrated under the null hypothesis we undertook 1000 permutations, repeating the association and enrichment analyses for each permutation. Comparing the observed level of enrichment at our top 21 loci to the equivalent from the permutations we obtained a similar, non-significant empirical P-value of 0.073 indicating that our proposed enrichment analysis is well calibrated under the null. A 95% confidence interval for E was estimated using the permutations.

39 in total

1. Genome-wide association study meta-analysis identifies seven new rheumatoid arthritis risk loci.

Authors: Eli A Stahl; Soumya Raychaudhuri; Elaine F Remmers; Gang Xie; Stephen Eyre; Brian P Thomson; Yonghong Li; Fina A S Kurreeman; Alexandra Zhernakova; Anne Hinks; Candace Guiducci; Robert Chen; Lars Alfredsson; Christopher I Amos; Kristin G Ardlie; Anne Barton; John Bowes; Elisabeth Brouwer; Noel P Burtt; Joseph J Catanese; Jonathan Coblyn; Marieke J H Coenen; Karen H Costenbader; Lindsey A Criswell; J Bart A Crusius; Jing Cui; Paul I W de Bakker; Philip L De Jager; Bo Ding; Paul Emery; Edward Flynn; Pille Harrison; Lynne J Hocking; Tom W J Huizinga; Daniel L Kastner; Xiayi Ke; Annette T Lee; Xiangdong Liu; Paul Martin; Ann W Morgan; Leonid Padyukov; Marcel D Posthumus; Timothy R D J Radstake; David M Reid; Mark Seielstad; Michael F Seldin; Nancy A Shadick; Sophia Steer; Paul P Tak; Wendy Thomson; Annette H M van der Helm-van Mil; Irene E van der Horst-Bruinsma; C Ellen van der Schoot; Piet L C M van Riel; Michael E Weinblatt; Anthony G Wilson; Gert Jan Wolbink; B Paul Wordsworth; Cisca Wijmenga; Elizabeth W Karlson; Rene E M Toes; Niek de Vries; Ann B Begovich; Jane Worthington; Katherine A Siminovitch; Peter K Gregersen; Lars Klareskog; Robert M Plenge
Journal: Nat Genet Date: 2010-05-09 Impact factor: 38.330

2. Principal components analysis corrects for stratification in genome-wide association studies.

Authors: Alkes L Price; Nick J Patterson; Robert M Plenge; Michael E Weinblatt; Nancy A Shadick; David Reich
Journal: Nat Genet Date: 2006-07-23 Impact factor: 38.330

Review 3. Detecting shared pathogenesis from the shared genetics of immune-related diseases.

Authors: Alexandra Zhernakova; Cleo C van Diemen; Cisca Wijmenga
Journal: Nat Rev Genet Date: 2009-01 Impact factor: 53.242

4. GenoSNP: a variational Bayes within-sample SNP genotyping algorithm that does not require a reference population.

Authors: Eleni Giannoulatou; Christopher Yau; Stefano Colella; Jiannis Ragoussis; Christopher C Holmes
Journal: Bioinformatics Date: 2008-07-24 Impact factor: 6.937

5. HLA class II markers and clinical heterogeneity in Swedish patients with primary biliary cirrhosis.

Authors: R Wassmuth; F Depner; A Danielsson; R Hultcrantz; L Lööf; R Olson; H Prytz; H Sandberg-Gertzen; S Wallerstedt; S Lindgren
Journal: Tissue Antigens Date: 2002-05

6. Newly identified genetic risk variants for celiac disease related to the immune response.

Authors: Karen A Hunt; Alexandra Zhernakova; Graham Turner; Graham A R Heap; Lude Franke; Marcel Bruinenberg; Jihane Romanos; Lotte C Dinesen; Anthony W Ryan; Davinder Panesar; Rhian Gwilliam; Fumihiko Takeuchi; William M McLaren; Geoffrey K T Holmes; Peter D Howdle; Julian R F Walters; David S Sanders; Raymond J Playford; Gosia Trynka; Chris J J Mulder; M Luisa Mearin; Wieke H M Verbeek; Valerie Trimble; Fiona M Stevens; Colm O'Morain; Nicholas P Kennedy; Dermot Kelleher; Daniel J Pennington; David P Strachan; Wendy L McArdle; Charles A Mein; Martin C Wapenaar; Panos Deloukas; Ralph McGinnis; Ross McManus; Cisca Wijmenga; David A van Heel
Journal: Nat Genet Date: 2008-03-02 Impact factor: 38.330

7. Human leukocyte antigen polymorphisms in Italian primary biliary cirrhosis: a multicenter study of 664 patients and 1992 healthy controls.

Authors: Pietro Invernizzi; Carlo Selmi; Francesca Poli; Sara Frison; Annarosa Floreani; Domenico Alvaro; Piero Almasio; Floriano Rosina; Marco Marzioni; Luca Fabris; Luigi Muratori; Lihong Qi; Michael F Seldin; M Eric Gershwin; Mauro Podda
Journal: Hepatology Date: 2008-12 Impact factor: 17.425

8. optiCall: a robust genotype-calling algorithm for rare, low-frequency and common variants.

Authors: T S Shah; J Z Liu; J A B Floyd; J A Morris; N Wirth; J C Barrett; C A Anderson
Journal: Bioinformatics Date: 2012-04-12 Impact factor: 6.937

9. The imprinted DLK1-MEG3 gene region on chromosome 14q32.2 alters susceptibility to type 1 diabetes.

Authors: Chris Wallace; Deborah J Smyth; Meeta Maisuria-Armer; Neil M Walker; John A Todd; David G Clayton
Journal: Nat Genet Date: 2009-12-06 Impact factor: 38.330

10. Genome-wide association study and meta-analysis find that over 40 loci affect risk of type 1 diabetes.

Authors: Jeffrey C Barrett; David G Clayton; Patrick Concannon; Beena Akolkar; Jason D Cooper; Henry A Erlich; Cécile Julier; Grant Morahan; Jørn Nerup; Concepcion Nierras; Vincent Plagnol; Flemming Pociot; Helen Schuilenburg; Deborah J Smyth; Helen Stevens; John A Todd; Neil M Walker; Stephen S Rich
Journal: Nat Genet Date: 2009-05-10 Impact factor: 38.330

115 in total

1. Functional implications of disease-specific variants in loci jointly associated with coeliac disease and rheumatoid arthritis.

Authors: Javier Gutierrez-Achury; Maria Magdalena Zorro; Isis Ricaño-Ponce; Daria V Zhernakova; Dorothée Diogo; Soumya Raychaudhuri; Lude Franke; Gosia Trynka; Cisca Wijmenga; Alexandra Zhernakova
Journal: Hum Mol Genet Date: 2015-11-05 Impact factor: 6.150

Review 2. Primary biliary cirrhosis: From bench to bedside.

Authors: Elias Kouroumalis; George Notas
Journal: World J Gastrointest Pharmacol Ther Date: 2015-08-06

Review 3. The genomic landscape of human immune-mediated diseases.

Authors: Xin Wu; Haiyan Chen; Huji Xu
Journal: J Hum Genet Date: 2015-08-20 Impact factor: 3.172

Review 4. Genetic insights into common pathways and complex relationships among immune-mediated diseases.

Authors: Miles Parkes; Adrian Cortes; David A van Heel; Matthew A Brown
Journal: Nat Rev Genet Date: 2013-08-06 Impact factor: 53.242

Review 5. Novel therapeutic targets in primary biliary cirrhosis.

Authors: Jessica K Dyson; Gideon M Hirschfield; David H Adams; Ulrich Beuers; Derek A Mann; Keith D Lindor; David E J Jones
Journal: Nat Rev Gastroenterol Hepatol Date: 2015-02-03 Impact factor: 46.802

6. Association between STAT4 polymorphisms and risk of primary biliary cholangitis: a meta-analysis.

Authors: Li Zhang; Chunming Gao; Chuanmiao Liu; Jiasheng Chen; Kuihua Xu
Journal: Genes Genomics Date: 2018-06-28 Impact factor: 1.839

Review 7. Genetics of immune-mediated disorders: from genome-wide association to molecular mechanism.

Authors: Vinod Kumar; Cisca Wijmenga; Ramnik J Xavier
Journal: Curr Opin Immunol Date: 2014-10-14 Impact factor: 7.486

8. An ImmunoChip study of multiple sclerosis risk in African Americans.

Authors: Noriko Isobe; Lohith Madireddy; Pouya Khankhanian; Takuya Matsushita; Stacy J Caillier; Jayaji M Moré; Pierre-Antoine Gourraud; Jacob L McCauley; Ashley H Beecham; Laura Piccio; Joseph Herbert; Omar Khan; Jeffrey Cohen; Lael Stone; Adam Santaniello; Bruce A C Cree; Suna Onengut-Gumuscu; Stephen S Rich; Stephen L Hauser; Stephen Sawcer; Jorge R Oksenberg
Journal: Brain Date: 2015-03-28 Impact factor: 13.501

9. Integrative Genetic and Epigenetic Analysis Uncovers Regulatory Mechanisms of Autoimmune Disease.

Authors: Parisa Shooshtari; Hailiang Huang; Chris Cotsapas
Journal: Am J Hum Genet Date: 2017-07-06 Impact factor: 11.025

Review 10. Using chromatin marks to interpret and localize genetic associations to complex human traits and diseases.

Authors: Gosia Trynka; Soumya Raychaudhuri
Journal: Curr Opin Genet Dev Date: 2013-11-25 Impact factor: 5.578