Literature DB >> 27992413

Genome-wide association study of primary sclerosing cholangitis identifies new risk loci and quantifies the genetic relationship with inflammatory bowel disease.

Sun-Gou Ji¹, Brian D Juran², Sören Mucha³, Trine Folseraas^4,5,6, Luke Jostins^7,8, Espen Melum^4,5, Natsuhiko Kumasaka¹, Elizabeth J Atkinson⁹, Erik M Schlicht², Jimmy Z Liu¹, Tejas Shah¹, Javier Gutierrez-Achury¹, Kirsten M Boberg^4,6,10, Annika Bergquist¹¹, Severine Vermeire^12,13, Bertus Eksteen¹⁴, Peter R Durie¹⁵, Martti Farkkila¹⁶, Tobias Müller¹⁷, Christoph Schramm¹⁸, Martina Sterneck¹⁹, Tobias J Weismüller^20,21,22, Daniel N Gotthardt²³, David Ellinghaus³, Felix Braun²⁴, Andreas Teufel²⁵, Mattias Laudes²⁶, Wolfgang Lieb²⁷, Gunnar Jacobs²⁷, Ulrich Beuers²⁸, Rinse K Weersma²⁹, Cisca Wijmenga³⁰, Hanns-Ulrich Marschall³¹, Piotr Milkiewicz³², Albert Pares³³, Kimmo Kontula³⁴, Olivier Chazouillères³⁵, Pietro Invernizzi³⁶, Elizabeth Goode³⁷, Kelly Spiess³⁷, Carmel Moore^38,39, Jennifer Sambrook^39,40, Willem H Ouwehand^1,38,40,41, David J Roberts^38,42,43, John Danesh^1,38,39, Annarosa Floreani⁴⁴, Aliya F Gulamhusein², John E Eaton², Stefan Schreiber^3,45, Catalina Coltescu⁴⁶, Christopher L Bowlus⁴⁷, Velimir A Luketic⁴⁸, Joseph A Odin⁴⁹, Kapil B Chopra⁵⁰, Kris V Kowdley⁵¹, Naga Chalasani⁵², Michael P Manns^20,21, Brijesh Srivastava³⁷, George Mells^37,53, Richard N Sandford³⁷, Graeme Alexander⁵⁴, Daniel J Gaffney¹, Roger W Chapman⁵⁵, Gideon M Hirschfield^56,57, Mariza de Andrade⁹, Simon M Rushbrook³⁷, Andre Franke³, Tom H Karlsen^4,5,6,10, Konstantinos N Lazaridis², Carl A Anderson¹.

Abstract

Primary sclerosing cholangitis (PSC) is a rare progressive disorder leading to bile duct destruction; ∼75% of patients have comorbid inflammatory bowel disease (IBD). We undertook the largest genome-wide association study of PSC (4,796 cases and 19,955 population controls) and identified four new genome-wide significant loci. The most associated SNP at one locus affects splicing and expression of UBASH3A, with the protective allele (C) predicted to cause nonstop-mediated mRNA decay and lower expression of UBASH3A. Further analyses based on common variants suggested that the genome-wide genetic correlation (rG) between PSC and ulcerative colitis (UC) (rG = 0.29) was significantly greater than that between PSC and Crohn's disease (CD) (rG = 0.04) (P = 2.55 × 10-15). UC and CD were genetically more similar to each other (rG = 0.56) than either was to PSC (P < 1.0 × 10-15). Our study represents a substantial advance in understanding of the genetics of PSC.

Entities: Chemical

Mesh：

Substances：

Year: 2016 PMID： 27992413 PMCID： PMC5540332 DOI： 10.1038/ng.3745

Source DB: PubMed Journal: Nat Genet ISSN： 1061-4036 Impact factor: 38.330

INTRODUCTORY PARAGRAPH

Primary sclerosing cholangitis (PSC) is a rare progressive disorder leading to bile duct destruction. We undertook the largest genome-wide association study of PSC (4,796 cases and 19,955 population controls) and identified four novel genome-wide significant loci. The most associated SNP at one locus affects splicing and expression of UBASH3A, with the protective allele (C) predicted to cause non-stop mediated mRNA decay and lower expression of UBASH3A. Although 75% of PSC patients have comorbid inflammatory bowel disease (IBD), our data suggest that the genome-wide genetic correlation (rG) between PSC and ulcerative colitis (UC) (rG=0.29) is significantly greater than that between PSC and Crohn’s disease (CD) (rG=0.04) (P=2.55×10−15). Importantly, UC and CD are genetically more similar to each other (rG=0.56) than either is to PSC (P<1.0×10−15). Our study represents a significant advance in our understanding of the genetics of PSC. Primary sclerosing cholangitis affects around 1 in 10,000 individuals of European ancestry and is characterised by chronic inflammation and stricturing fibrosis of the biliary tree[1]. There remains no effective medical therapy and the majority of patients require orthotopic liver transplantation owing to the progressive nature of the disease[2]. PSC is highly comorbid with IBD, which is ultimately diagnosed in around 75% of patients. The clinical presentation of IBD in PSC is most often consistent with UC (~80%), but CD (~15%) and indeterminate forms of IBD (~5%) do occur in some patients. Time of disease onset and expression of the IBD phenotype in PSC is variable, with an overall trend toward IBD preceding PSC and milder but more extensive intestinal inflammation (pancolitis) compared to classical UC or CD[3,4] This tendency, along with other clinical and epidemiological differences, has led to the proposal that IBD in the context of PSC (PSC-IBD) should be considered a disease entity separate from both UC and CD. Elevated risk of PSC and UC in first-degree relatives of PSC patients indicates a strong genetic component to PSC susceptibility and suggests the presence of shared genetic risk factors between PSC and UC[5,6]. However, the genetic relationship between PSC and UC/CD/IBD remains poorly defined because the low prevalence of PSC has precluded familial studies. Large-scale association studies have identified sixteen loci, including the HLA locus, underlying PSC risk[7-12]. Here, we undertake the largest genome-wide association study of PSC to date to identify novel PSC risk loci and enable us, for the first time, to estimate the genome-wide genetic correlation between PSC and the common forms of IBD. Following quality control (Supplementary Tables 1 and 2, Supplementary Figs. 1–3) and imputation using reference haplotypes from the 1000 Genomes (Phase III) and UK10K projects[13,14], we tested 7,891,602 SNPs for association in a sample of 2,871 PSC cases and 12,019 population controls using a linear mixed model to account for population stratification (Online Methods, Supplementary Tables 1 and 2). Genome-wide summary statistics are available from the International PSC Study Group website (see URLs). Forty SNPs were tested for association in an independent cohort of 1,925 PSC cases and 7,936 population controls (Online Methods, Supplementary Table 3), including 24 SNPs with P < 5×10−6 in the GWAS that are located outside of known PSC loci. We used an inverse-variance weighted fixed effects meta-analysis, implemented in METAL[15], to test the evidence of association across the GWAS and replication cohorts combined and identified four new genome-wide significant loci with P < 5.26 × 10−3 in the replication study and P < 5 × 10−8 in the combined meta-analysis (Table 1, Supplementary Table 4, Supplementary Fig. 4). One of the newly associated loci, tagged by rs80060485 (3:g.71153890T>C) in FOXP1, is associated with immune-mediated disease for the first time. The three other newly associated PSC loci (implicating CCDC88B, CLEC16A and UBASH3A) are in high linkage disequilibrium (LD), defined as (r2 > 0.8) with variants significantly associated to other immune-mediated diseases (Supplementary Table 5). We found consistent evidence of association at fifteen of the sixteen previously established PSC loci and now consider 19 regions of the genome to be associated with PSC risk (Supplementary Table 4, Supplementary Fig. 4).

Table 1

Association summary statistics across four newly associated PSC risk loci

Base-pair coordinates from build 37, RAF: risk allele frequency in replication controls, OR: odds ratio in the GWAS and replication meta-analysis (Combined), 95%CI: 95% confidence interval of OR estimates. Detailed association results, including those for the 15 loci previously associated with PSC, are given in Supplementary Table 4.

SNP	Chr:Position (bp)	Risk Allele	RAF	OR	95% CI	P-value			Candidate causal gene
SNP	Chr:Position (bp)	Risk Allele	RAF	OR	95% CI	GWAS	Replication	Combined	Candidate causal gene
rs80060485	3:71153890	C	0.07	1.44	1.32–1.58	8.54 × 10⁻⁰⁹	4.67 × 10⁻⁰⁸	2.62 × 10⁻¹⁵	FOXP1
rs663743	11:64107735	G	0.66	1.20	1.14–1.26	8.42 × 10⁻⁰⁸	4.44 × 10⁻⁰⁷	2.24 × 10⁻¹³	CCDC88B
rs725613	16:11169683	T	0.65	1.20	1.14–1.26	5.50 × 10⁻¹⁰	9.52 × 10⁻⁰⁵	3.59 × 10⁻¹³	CLEC16A
rs1893592	21:43855067	A	0.73	1.22	1.15–1.29	1.90 × 10⁻⁰⁷	2.42 × 10⁻⁰⁶	2.19 × 10⁻¹²	UBASH3A

All SNPs in high LD (r2 > 0.8) with the most associated SNP at each PSC locus were evaluated for potential function using SIFT[16] and PolyPhen 2[17], the Genome Wide Annotation of Variants (GWAVA) online tool[18], and a number of eQTL databases (Online Methods, Supplementary Tables 6–8). One of the new PSC risk variants (rs1893592, 21:g.43855067A>C) is the most strongly associated eQTL of UBASH3A, a gene involved in regulation of T-cell signalling, in two whole blood-based analyses[19,20] and a B-cell only study[21]. The SNP is located three bases downstream of the 10th exon of UBASH3A, within the splice consensus sequence, and was reported as a splice-QTL in a recent RNA sequencing study[19]. The C allele, which is associated with reduced risk of PSC and has a frequency of 27.8% in our controls, disrupts the conserved 5′ splice donor sequence at this position in vertebrate introns, which is typically A (71% of sites) or G (24% of sites)[22]. The predicted consequence of this change is partial retention of the downstream intron possibly leading to non-stop mediated decay. Reanalysis of the gEUVADIS RNA-seq data[23] revealed that this SNP was the most strongly associated with increased intron expression (P = 2×10−16, Supplementary Figure 5), with the PSC protective allele causing intron 10 to be retained in the UBASH3A mRNA. Further work is required to determine whether carrying the C allele at this SNP decreases UBASH3A protein levels and if this is the causal mechanism behind the reduced risk of PSC, celiac disease and rheumatoid arthritis (Supplementary Table 5). In addition, another variant within the UBASH3A gene (rs11203203, 21:g.43836186G>A) that is in low-LD (r2 = 0.12) with rs1893592 has been associated with vitiligo[24] and type-1 diabetes[25], further supporting the role of UBASH3A in immune-mediated disorders. We were unable to identify any current drugs targeting UBASH3A (Supplementary note). To enable us to address the genetic relationship between PSC and IBD we obtained association summary statistics from the International IBD Genetics Consortium for 20,550 CD cases, 17,647 UC cases and 48,485 controls of European ancestry[26]. Across each of the eighteen non-HLA PSC risk loci we used a Bayesian test of colocalisation[27] to identify loci with strong evidence (posterior probability > 0.8) of either shared or independent causal variants between pairs of traits (Online Methods, Supplementary Table 9). Four of the eighteen PSC risk loci have not been associated at genome-wide significance with IBD (BCL2L11, FOXP1, SIK2 and UBASH3A) although the lead SNPs at two of these loci (rs72837826 – BCL2L11 and rs1893592 – UBASH3A) did demonstrate strong evidence for colocalisation (posterior probability > 0.8) and suggestive evidence of association (P < 10−4) in the UC cohort (Supplementary Table 9, 10). Of the fourteen PSC loci that had been previously associated with IBD (UC, CD or both), four demonstrated strong evidence that the causal variant is independent from that in UC and CD (IL2RA, CCDC88B, CLEC16A and PRKD2), a finding supported by the low linkage disequilibrium (r2 < 0.2) between the lead SNPs in PSC and UC/CD at these loci (Supplementary Tables 9 and 10). Thus, even for highly comorbid diseases, significant association to the same region of the genome will not always be driven by a shared causal variant. This supports similar observations for other related phenotypes such as psoriasis versus psoriatic arthritis[28,29]. Six of the fourteen loci associated with PSC and IBD displayed strong evidence of a shared causal variant with UC, CD or both (MST1, IL21, HDAC7, SH2B3, CD226 and PSMG1) (Figure 1, Supplementary Tables 9 and 10). We further tested these six SNPs for evidence of heterogeneity of effect using Cochran’s Q test (Online Methods). Four showed significantly increased effect size in PSC relative to both UC and CD (MST1, IL21, SH2B3 and CD226) (P < 2.78×10−3) with an additional locus (PSMG1) showing significantly increased effect size relative to CD only (Figure 1). Simulation studies showed that the observed heterogeneity of effect is unlikely to be driven by the large difference in sample size between the PSC and UC cohorts (Pempirical < 3.00×10−4 at all four SNPs) (Supplementary Note). We did not detect evidence of heterogeneity of effect between PSC patients expressing different IBD phenotypes (PSC-UC, PSC-CD or PSC-NoIBD) (Supplementary Fig. 6). However, our power to detect significant heterogeneity of effect between these PSC subphenotypes was limited by sample size (Supplementary Table 11).

Figure 1

Odds ratios (and their 95% confidence intervals) for PSC, UC and CD across the 6 PSC associated SNPs demonstrating strong evidence for a shared causal variant (maximum posterior probability > 0.8)

PSC ORs were taken from the GWAS and replication meta-analysis. UC and CD ORs were obtained from the latest association studies conducted by the International IBD Genetics Consortium[26]. Heterogeneity of odds tests were carried out using Cochran’s Q test. A failure to detect significant heterogeneity of odds does not necessarily indicate that effect sizes are equivalent because power to detect heterogeneity varies across SNPs.

While the much larger size of the UC and CD cohorts gives us power to investigate the effects of PSC risk SNPs in IBD, the PSC cohort is underpowered to do the reverse. Thus, to clarify the pairwise genetic correlation between PSC, UC and CD we obtained genome-wide individual level genotype data from the International IBD Genetics Consortium for 6,247 CD cases, 6,686 UC cases and 34,393 population controls of European descent[26] and used GCTA to estimate genome-wide genetic correlations (rG) using a bivariate linear mixed model[30,31] (Online Methods, Supplementary Note). This analysis quantified the SNP-heritability (h) of PSC as 0.148 (95% CI: 0.135–0.161), and showed that in the context of common genetic variation, PSC is significantly more related to UC (rG = 0.29) than CD (rG = 0.04) (P = 2.55×10−15) (Figure 2), consistent with the clinical phenotype most often observed in PSC-IBD patients. Moreover, the genetic correlation between UC and CD (rG = 0.56) is significantly greater than that between PSC and either UC or CD (P < 1.0×10−15). Due to a lack of data regarding the PSC status of individuals in the UC and CD cohorts we could not remove the approximately 5% of patients we would expect to have comorbid PSC. This suggests that, while our estimates of the genome-wide genetic correlation between PSC and both UC and CD may seem surprisingly low, these are likely slight overestimates of the true genetic correlation between the diseases. We validated the GCTA co-heritability estimates using a summary statistics-based genetic correlation analysis (LD score regression[32]), and found support for the reported genetic relationships (i.e. rGCD.vs.UC = 0.68 > rGPSC.vs.UC = 0.39 > rGPSC.vs.CD = 0.09) (Supplementary Figure 7). The low genome-wide genetic correlation between PSC and the IBDs is also supported by known differences in HLA risk alleles[11,33] and our discovery that PSC has both independent causal variants and shared causal variants of heterogeneous effect size compared to both UC and CD. The analyses presented in this study, based on common genetic variants (MAF > 1%), suggest functional studies in both the biliary tree and intestinal tract are required if we are to understand the biological consequences of PSC associated genetic variants, whether or not they are shared with IBD.

Figure 2

Genome-wide genetic correlation between PSC (and its subphenotypes), CD and UC

Genetic correlations (and their 95% confidence intervals) were calculated using a bivariate extension of the linear mixed model[30] implemented in GCTA (Online Methods). PSC has a lower genetic correlation with both CD and UC than the two inflammatory bowel diseases have to each other. PSC is genetically more correlated to UC than it is to CD and this is consistent across the PSC subphenotypes.

While it is clear that a substantial component of the genetic architecture of PSC is not shared with either CD or UC, our data also show that shared genetic risk factors do certainly exist and likely play some role in disease comorbidity. However, under a purely additive genetic liability threshold model, the genetic covariance between the two diseases would need to be greater than 0.76 to fully explain the fact that 60% of PSC cases have comorbid UC (Supplementary Figure 8). In contrast, the observed genetic correlation (rG = 0.29) would generate a PSC-UC comorbidity rate of only 1.6% under this model. This demonstrates that the observed extent of comorbidity between PSC and UC is not fully explained by shared additive genetic effects of common variants and that other factors must play a role, such as shared environmental effects or shared rare variants not captured by our GWAS and imputation data. In summary, we have performed the largest genome-wide association study of PSC to date and identified four new PSC risk loci. We now consider 23 regions of the genome to be associated with disease risk, including four loci only recently associated with PSC in a cross-disease meta-analysis[34]. One of our new associations suggests that decreased UBASH3A is associated with a lower risk of PSC through a common NMD variant. We have also shown that, even for highly comorbid phenotypes such as PSC and IBD, significant association to the same region of the genome will not always be driven by a common causal variant. Furthermore, by conducting genome-wide comparisons with CD and UC we have, for the first time, shown that the comorbid gastrointestinal inflammation seen in the majority of PSC patients cannot be fully explained by shared genetic risk. Thus, the biliary and intestinal inflammation seen specifically in PSC should be studied to advance our understanding of the disease and improve clinical outcome for patients with this devastating disorder.

Online Methods

Ethical Approval

The ethics committees or institutional review boards of all participating centers approved the studies and the recruitment of participants. Written informed consent was obtained from all participants.

GWAS cohort

Cohorts and genotyping

731 PSC cases and 3,202 population controls from Scandinavia and Germany were ascertained and genotyped using the Affymetrix Genome-Wide Human SNP Array 6.0 (Affymetrix, Santa Clara, CA, USA) at three different centers[7]. A cohort of 1,227 UK PSC cases was recruited from across more than 150 UK National Health Service Trusts or Health Boards, including all transplant centers in the UK, by the UK-PSC consortium. A cohort of 904 US PSC patients were enrolled in the PSC Resource of Genetic Risk, Environment and Synergy Studies (PROGRESS), a multicenter collaboration between eight academic research institutions across the US and Canada. PROGRESS ascertained additional DNA samples from established PSC cohorts from Canada (N=259) and Poland (N=43). The UK and US GWAS cohorts were genotyped using the Illumina HumanOmni2.5-8 BeadChip (Illumina, San Diego, CA, USA) and called using the GenCall algorithm implemented in GenomeStudio. UK samples were genotyped at the Wellcome Trust Sanger Institute (Hinxton, UK) and the US samples at the Mayo Clinic Medical Genome Facility (Rochester, MN, USA). A diagnosis of PSC was based on standard clinical, biochemical, cholangiographic and histological criteria[35], with exclusion of secondary causes of sclerosing cholangitis. Commonly accepted clinical, radiological, endoscopic and histological criteria were also used for diagnosis and classification of IBD[36]. Genetic data from 12,595 individuals genotyped on the Illumina HumanOmni2.5-4v1 array (Omni2.5-4) as part of The University of Michigan Health Retirement Study were downloaded from the Database of Genotypes and Phenotypes (dbGaP[37]). Genotyping was performed at the Center for Inherited Disease Research (CIDR) and genotypes called using GenomeStudio version 2011.e, (see the HRS website for more details).

Quality control

All SNPs were aligned to NCBI build 37 (hg19). Genotype data were quality controlled independently across 6 batches defined by genotyping centre (AffySF: N = 2,205, AffyHZ: N = 1,256; AffyAB: N = 472; IlluminaWTSI: N= 1,227;IlluminaMAYO: N= 1,206; IlluminaCIDR: N = 12,595). Initially, SNPs out of Hardy-Weinberg equilibrium (HWE: P < 1×10−6) in controls (excluding those in the HLA region) or with a call rate less than 80% were removed. SNPs failing in at least one batch were removed from all cohorts genotyped using the same chip. For sample QC, individuals whose sex determined using the X chromosome homozygosity rate (F) and Y chromosome call rate differed from that in our patient database (or could not be genetically determined, F or Y-chromosome call rate between 0.3–0.7) were removed. Next, Abberant[38] was used to identify samples with outlying heterozygosity or genotype call rate. Samples with a call rate less than 90% for an individual chromosome were also removed. A set of 82,085 independent SNPs (pairwise r2 <0.2) genotyped on all arrays was identified for the purpose of estimating sample relatedness and ancestry, excluding SNPs that a) were within regions of high linkage disequilibrium, b) had a MAF < 10% or c) were A/T or C/G SNPs. Pairwise identity by descent was estimated for all individuals in the study using PLINK, and the sample with the lowest genotype call rate was removed for all pairs with IBD > 0.9. Both samples were excluded if case/control status was discordant between duplicates. To maximize power to detect association, related samples (0.1875 < IBD < 0.9) were retained and a mixed model used for association testing. Sample ancestry was inferred via principal components analysis implemented in EIGENSTRAT[39]. Population principal components were calculated using genotype data from the CEU, YRI and CHB/JPT samples from the 1000 Genomes Project. Factor loadings from these principal components were then used to project these principal components for our cases and controls. Samples of non-European ancestry were identified using Aberrant[38]. The number of samples failing each QC step is shown in Supplementary Table 1. In total, 2,871 cases and 12,019 controls passed sample QC. Next, a more thorough marker QC was conducted within batches by excluding, genotyping platform-wide, SNPs with a) different probe sequences on the Omni2.5-4 and Omni2.5-8 array, b) a call rate < 98%, c) MAF<1%, d) significant evidence of deviation from HWE (P < 1×10−5) in controls and e) a significant difference in call rate between cases and controls (P < 1×10−5), in at least one of the genotyping batches. Outside of the HLA region, markers only present on one of the two Illumina arrays were also removed. After SNP QC, 1,207,121 Omni2.5-4 SNPs, 1,215,097 Omni2.5-8 SNPs and 528,496 Affymetrix 6 SNPs were available.

Genotype Imputation

Only 322,807 SNPs feature on both the Affy6 and Omni2.5 arrays so the samples genotyped on these arrays were phased and imputed separately. For computational efficiency, the genome was split into 3Mbp batches and those spanning the centromere were split and joined to the last complete batch either side of the centromere. Batches of less than 200 SNPs were merged with an adjacent batch. Pre-phasing was performed using the SHAPEIT2 algorithm[40] and imputation using IMPUTE2[41]. We used a combined reference panel of the 1000 Genomes Phase 1 integrated version 3 and the UK10K cohort, consisting of 4,873 individuals and 42,359,694 SNPs (k_hap=2,000, Ne=20,000). Post-imputation, SNPs with a posterior probability less than 0.9 or info score less than 0.5 were removed. The QC steps outlined above for directly genotyped SNPs were applied to the imputed genotype data. SNPs with r2 < 0.8 between directly genotyped and imputed genotypes were removed and phasing and imputation repeated. Following QC (as outlined above), a total of 7,891,602 SNPs available for association testing across 2,871 PSC cases and 12,019 population controls (Supplementary Table 2).

Association Analysis

A linear mixed model implemented in the MMM software[42] was used to test association between genetic variants and case/control status. To reduce compute time the relationship matrix was constructed using the 82,085 quasi-independent SNPs previously used in the PCA. To prevent the association analyses being biased by informed missingness across our genotyping batches, linear mixed model association tests were conducted across three different batches of directly-genotyped and imputed SNPs, defined on their availability for only the Omni2.5 genotyped samples (N = 2,015,514), only the Affy6 genotyped samples (N = 114,935), or across all genotyped samples (N = 5,761,153). Stepwise conditional regression analysis (excluding the extended MHC region) was undertaken in MMM to identify independent association signals (P < 5.0 × 10−6) within PSC associated loci. The previously reported lead SNP within each of the 15 known PSC loci was selected for replication, though we also took forward the most associated SNP in our study if it was a poor tag (r2 < 0.8) of previously reported SNP. In addition, 24 SNPs outside of established PSC risk loci with P < 5 ×10−6 were also included in the replication experiment. All cluster plots were manually inspected prior to SNP selection.

Validation and replication cohorts

An independent replication cohort of 2,011 PSC cases from Europe and North America was ascertained following the diagnostic criteria outlined above. A total of 8,784 population controls of European descent were ascertained, including 515 from the Mayo Clinic Biobank[43] and 1000 from the INTERVAL study[44]. British and Canadian samples were genotyped at the Wellcome Trust Sanger Institute in Cambridge, UK (N = 2,366) and all other samples at the Institute of Clinical Molecular Biology in Kiel, Germany (N = 11,152) using the same Agena Biosciences iPLEX design. To reduce the risk of false-positive associations being driven by imputation errors we undertook a substantial validation experiment, genotyping the 40 SNPs in our replication experiment across 2,723 cases in the GWAS study. Two SNPs yielded poor genotype clusters and were removed from further study. Four SNPs with a call rate less than 95% or Hardy Weinberg equilibrium P < 1.25 × 10−3 (Bonferroni correction for 40 SNPs) within controls were excluded (Supplementary Table 12). Samples with a call rate less than 92%, or where the genetically determined sex differed from that in our patient database, were removed. The sample with the lowest call rate in duplicate pairs was removed from duplicate pairs (IBS > 0.9) (Supplementary Table 3). Post-QC, one SNP had an r2 less than 0.90 between the discovery and validation genotyping and, following manual inspection of cluster plots, was removed from the replication study.

Replication and Combined association analyses

For the replication analysis, logistic regression tests of association were performed separately for samples from six geographic regions (Supplementary Table 3) using SNPTEST v2 (Marchini et al., 2007). Inverse-variance weighted fixed effects meta-analyses implemented in METAL[16] were then used to a) test for association across all replication samples and b) test the evidence of association across the GWAS and replication cohorts combined. To classify a region as newly associated with PSC we required both significant evidence of association in the replication cohort (P < 5.26 × 10−3, Bonferroni correction for 19 one-tailed tests) and genome-wide significance (P < 5 × 10−8) in the combined meta-analysis.

Candidate gene prioritization

Functional annotation

All SNPs in high LD (r2 > 0.8) with lead SNPs at PSC associated loci were annotated for potential function using the Genome Wide Annotation of Variants (GWAVA) online tool[19]. In addition, all coding SNPs from this set were also annotated using SIFT[16] and PolyPhen2[18].

Pathway analysis

To quantify the functional relationship between genes within PSC risk loci, we conducted a GRAIL pathway analysis. GRAIL evaluates the degree of functional connectivity between genes based on the extent they co-feature in published abstracts (we used all PubMed abstracts prior to 2006 to avoid biasing our analysis due to results from large-scale GWASs). All PSC associated loci were included in the analysis and only genes with GRAIL P < 0.05 and edges with a score of > 0.5 were included in the connectivity map.

Expression quantitative trait loci (eQTL)

eQTL analysis focused on published cis-eQTLs due to the lower reproducibility caused by smaller effect sizes and context-specificity of trans-eQTL[45]. Eight eQTL datsets were included in the analysis: eQTL data from 12 studies collated in the Chicago eQTL browser, eQTL results from 1,421 samples of 13 different tissue types by the genotype-tissue expression (GTEx) project[46], 462 lymphoblastoid cell lines[24], 922 whole blood samples[20], 8,086 whole blood samples[21], purified B cells and monocytes from 283 individuals[22], activated monocytes from 432 individuals[47], and activated monocyte-derived dendritic cells from diverse populations[48]. The most significant variant-gene associations were extracted from each eQTL dataset and were reported as overlapping if that variant was in high LD (r2 > 0.8) with any of the lead SNPs in the PSC GWAS meta-analysis.

Modelling PSC and IBD genetic risk

Association summary statistics from the European arm of the latest International IBD Genetics Consortium study[27] were downloaded. Where available we used results from their combined GWAS plus Immunochip follow-up study and otherwise used those from the GWAS analysis. Definition of the 231 significantly associated loci as CD, UC or both (IBD) was taken from Liu and van Sommeren et al[27]. Due to the limited availability of relevant subphenotype data within the IIBDGC data, we were unable to identify the 3–5% of IBD cases that we expect to have PSC. Including these individuals as IBD cases in our comparisons lowers our power to detect differences between the two diseases.

Causal variant co-localisation analysis

To identify causal variants within disease associated loci that are shared between diseases we used a summary statistic based Bayesian test of colocalisation (COLOC), implemented in R[28]. Briefly, COLOC generates posterior probabilities for five different hypotheses: 1) no association to either disease, 2) association to disease 1 but not disease 2, 3) association to disease 2 but not disease 1, 4) association to both disease 1 and 2 but independent causal variants and 5) association to both disease 1 and 2 with a common causal variant. Only SNPs present in all the cohorts (PSC, CD, UC and IBD) were included in the analysis and associated regions were defined as 1MB regions with the most associated SNP at the centre. Within each region we calculated the r2 between the PSC lead SNP and the SNP most associated with each of the other three diseases. Default priors were used for the probability of a SNP being a) associated to an individual disease (1×10−4) and b) causally associated to both diseases (1×10−5). This prior probability of colocalisation is more conservative in declaring distinct causal variants compared to a recent colocalisation analysis across six immune-mediated disorders[49].

Heterogeneity of effects analysis

A formal heterogeneity of odds test was performed between PSC and IBD using the Cochran’s Q test implemented in METAL[16] for all 18 PSC risk loci. The odds ratios and standard errors were obtained from our current PSC GWAS and the IIBDGC analysis[27]. A locus was declared to have significant heterogeneity of effects based on a threshold of P = 2.78×10−3 to account for multiple testing (Bonferroni correction applied to 5% significance threshold, N=18 tests). In order to test whether the significant heterogeneity of effects are due to an overestimation of effect sizes in the smaller PSC cohort, we undertook a simulation study which demonstrated that the observed degree of heterogeneity is unlikely to occur by chance (Supplementary Note).

Genetic correlation analysis

Genome-wide SNP data from 12,933 IBD cases and 34,393 population controls of European descent was made available to us by the International IBD Genetics Consortium (IIBDGC). The quality control and imputation of these data using 1000 Genomes haplotypes has been previously described[27]. See Supplementary Note for details of the SNP and sample quality control (Supplementary Table 13) undertaken across the IIBDGC and PSC data to ensure compatibility and remove duplicated individuals. Individual level genotype data for PSC, CD, UC and IBD were used to estimate the proportion of variance in liability explained by SNPs genome-wide under a multiplicative model using the linear mixed model based restricted maximum likelihood (REML) method implemented in the GCTA software[32,50,51]. Ancestry principal components were calculated using genotype data from the 1000 Genomes project and were projected for all our cases and controls. The first twenty principals components were included as covariates in the linear mixed model. We assumed a prevalence of 0.0001 for PSC, 0.005 for CD and 0.0025 for UC. A bivariate extension of the linear mixed model[31], again implemented in GCTA[32], was used to estimate the additive covariance component and estimate the genetic correlation (rG) between PSC and either CD, UC, or IBD. In addition, we undertook an alternative genetic correlation analysis that uses summary statistics and LD score regression[33]. Of the 7,458,430 SNPs that were shared between PSC and both IBDs, 1,102,210 HapMap3 SNPs were selected for the analysis as recommended. Then, pre-computed LD scores from the 1000 Genomes European data were used to run LD score regression to estimate genetic correlation.

Calculating comorbidity under a purely pleiotropic genetic model

Under a bivariate liability threshold model, where all disease risk is explained by additive genetics, the probability that an individual has disease 1, given that he has disease 2, is given by where K is the prevalence of disease i, T =Φ−1(1−K) is the liability threshold of disease i, is the heritability of disease i, r is the genetic correlation and F(.) is the multivariate cumulative distribution function for normal distribution.

50 in total

1. Principal components analysis corrects for stratification in genome-wide association studies.

Authors: Alkes L Price; Nick J Patterson; Robert M Plenge; Michael E Weinblatt; Nancy A Shadick; David Reich
Journal: Nat Genet Date: 2006-07-23 Impact factor: 38.330

2. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies.

Authors: Brendan K Bulik-Sullivan; Po-Ru Loh; Hilary K Finucane; Stephan Ripke; Jian Yang; Nick Patterson; Mark J Daly; Alkes L Price; Benjamin M Neale
Journal: Nat Genet Date: 2015-02-02 Impact factor: 38.330

Review 3. Update on primary sclerosing cholangitis.

Authors: Tom H Karlsen; Erik Schrumpf; Kirsten M Boberg
Journal: Dig Liver Dis Date: 2010-02-20 Impact factor: 4.088

4. Common SNPs explain a large proportion of the heritability for human height.

Authors: Jian Yang; Beben Benyamin; Brian P McEvoy; Scott Gordon; Anjali K Henders; Dale R Nyholt; Pamela A Madden; Andrew C Heath; Nicholas G Martin; Grant W Montgomery; Michael E Goddard; Peter M Visscher
Journal: Nat Genet Date: 2010-06-20 Impact factor: 38.330

5. Genome-wide association analysis in primary sclerosing cholangitis identifies two non-HLA susceptibility loci.

Authors: Espen Melum; Andre Franke; Christoph Schramm; Tobias J Weismüller; Daniel Nils Gotthardt; Felix A Offner; Brian D Juran; Jon K Laerdahl; Verena Labi; Einar Björnsson; Rinse K Weersma; Liesbet Henckaerts; Andreas Teufel; Christian Rust; Eva Ellinghaus; Tobias Balschun; Kirsten Muri Boberg; David Ellinghaus; Annika Bergquist; Peter Sauer; Euijung Ryu; Johannes Roksund Hov; Jochen Wedemeyer; Björn Lindkvist; Michael Wittig; Robert J Porte; Kristian Holm; Christian Gieger; H-Erich Wichmann; Pieter Stokkers; Cyriel Y Ponsioen; Heiko Runz; Adolf Stiehl; Cisca Wijmenga; Martina Sterneck; Severine Vermeire; Ulrich Beuers; Andreas Villunger; Erik Schrumpf; Konstantinos N Lazaridis; Michael P Manns; Stefan Schreiber; Tom H Karlsen
Journal: Nat Genet Date: 2010-12-12 Impact factor: 38.330

6. Dense genotyping of immune-related susceptibility loci reveals new insights into the genetics of psoriatic arthritis.

Authors: John Bowes; Ashley Budu-Aggrey; Ulrike Huffmeier; Steffen Uebe; Kathryn Steel; Harry L Hebert; Chris Wallace; Jonathan Massey; Ian N Bruce; James Bluett; Marie Feletar; Ann W Morgan; Helena Marzo-Ortega; Gary Donohoe; Derek W Morris; Philip Helliwell; Anthony W Ryan; David Kane; Richard B Warren; Eleanor Korendowych; Gerd-Marie Alenius; Emiliano Giardina; Jonathan Packham; Ross McManus; Oliver FitzGerald; Neil McHugh; Matthew A Brown; Pauline Ho; Frank Behrens; Harald Burkhardt; Andre Reis; Anne Barton
Journal: Nat Commun Date: 2015-02-05 Impact factor: 14.919

7. Characterizing the genetic basis of transcriptome diversity through RNA-sequencing of 922 individuals.

Authors: Alexis Battle; Sara Mostafavi; Xiaowei Zhu; James B Potash; Myrna M Weissman; Courtney McCormick; Christian D Haudenschild; Kenneth B Beckman; Jianxin Shi; Rui Mei; Alexander E Urban; Stephen B Montgomery; Douglas F Levinson; Daphne Koller
Journal: Genome Res Date: 2013-10-03 Impact factor: 9.043

8. The INTERVAL trial to determine whether intervals between blood donations can be safely and acceptably decreased to optimise blood supply: study protocol for a randomised controlled trial.

Authors: Carmel Moore; Jennifer Sambrook; Matthew Walker; Zoe Tolkien; Stephen Kaptoge; David Allen; Susan Mehenny; Jonathan Mant; Emanuele Di Angelantonio; Simon G Thompson; Willem Ouwehand; David J Roberts; John Danesh
Journal: Trials Date: 2014-09-17 Impact factor: 2.279

9. A global reference for human genetic variation.

Authors: Adam Auton; Lisa D Brooks; Richard M Durbin; Erik P Garrison; Hyun Min Kang; Jan O Korbel; Jonathan L Marchini; Shane McCarthy; Gil A McVean; Gonçalo R Abecasis
Journal: Nature Date: 2015-10-01 Impact factor: 49.962

10. The UK10K project identifies rare variants in health and disease.

Authors: Klaudia Walter; Josine L Min; Jie Huang; Lucy Crooks; Yasin Memari; Shane McCarthy; John R B Perry; ChangJiang Xu; Marta Futema; Daniel Lawson; Valentina Iotchkova; Stephan Schiffels; Audrey E Hendricks; Petr Danecek; Rui Li; James Floyd; Louise V Wain; Inês Barroso; Steve E Humphries; Matthew E Hurles; Eleftheria Zeggini; Jeffrey C Barrett; Vincent Plagnol; J Brent Richards; Celia M T Greenwood; Nicholas J Timpson; Richard Durbin; Nicole Soranzo
Journal: Nature Date: 2015-09-14 Impact factor: 49.962

72 in total

1. Estimating SNP-Based Heritability and Genetic Correlation in Case-Control Studies Directly and with Summary Statistics.

Authors: Omer Weissbrod; Jonathan Flint; Saharon Rosset
Journal: Am J Hum Genet Date: 2018-07-05 Impact factor: 11.025

2. GPR35 promotes glycolysis, proliferation, and oncogenic signaling by engaging with the sodium potassium pump.

Authors: Georg Schneditz; Joshua E Elias; Ester Pagano; M Zaeem Cader; Svetlana Saveljeva; Kathleen Long; Subhankar Mukhopadhyay; Maryam Arasteh; Trevor D Lawley; Gordon Dougan; Andrew Bassett; Tom H Karlsen; Arthur Kaser; Nicole C Kaneider
Journal: Sci Signal Date: 2019-01-01 Impact factor: 8.192

Review 3. Metal, magnet or transplant: options in primary sclerosing cholangitis with stricture.

Authors: Jawad Ahmad
Journal: Hepatol Int Date: 2018-11-14 Impact factor: 6.047

4. Genome-Wide Association Study Data Reveal Genetic Susceptibility to Chronic Inflammatory Intestinal Diseases and Pancreatic Ductal Adenocarcinoma Risk.

Authors: Fangcheng Yuan; Rayjean J Hung; Naomi Walsh; Han Zhang; Elizabeth A Platz; William Wheeler; Lei Song; Alan A Arslan; Laura E Beane Freeman; Paige Bracci; Federico Canzian; Mengmeng Du; Steven Gallinger; Graham G Giles; Phyllis J Goodman; Charles Kooperberg; Loic Le Marchand; Rachel E Neale; Jonas Rosendahl; Ghislaine Scelo; Xiao-Ou Shu; Kala Visvanathan; Emily White; Wei Zheng; Demetrius Albanes; Pilar Amiano; Gabriella Andreotti; Ana Babic; William R Bamlet; Sonja I Berndt; Paul Brennan; Bas Bueno-de-Mesquita; Julie E Buring; Peter T Campbell; Stephen J Chanock; Charles S Fuchs; J Michael Gaziano; Michael G Goggins; Thilo Hackert; Patricia Hartge; Manal M Hassan; Elizabeth A Holly; Robert N Hoover; Verena Katzke; Holger Kirsten; Robert C Kurtz; I-Min Lee; Nuria Malats; Roger L Milne; Neil Murphy; Kimmie Ng; Ann L Oberg; Miquel Porta; Kari G Rabe; Francisco X Real; Nathaniel Rothman; Howard D Sesso; Debra T Silverman; Ian M Thompson; Jean Wactawski-Wende; Xiaoliang Wang; Nicolas Wentzensen; Lynne R Wilkens; Herbert Yu; Anne Zeleniuch-Jacquotte; Jianxin Shi; Eric J Duell; Laufey T Amundadottir; Donghui Li; Gloria M Petersen; Brian M Wolpin; Harvey A Risch; Kai Yu; Alison P Klein; Rachael Stolzenberg-Solomon
Journal: Cancer Res Date: 2020-07-08 Impact factor: 12.701

5. How to approach understanding complex trait genetics - inflammatory bowel disease as a model complex trait.

Authors: Isabelle Cleynen; Jonas Halfvarsson
Journal: United European Gastroenterol J Date: 2019-12-01 Impact factor: 4.623

6. Patient Age, Sex, and Inflammatory Bowel Disease Phenotype Associate With Course of Primary Sclerosing Cholangitis.

Authors: Tobias J Weismüller; Palak J Trivedi; Annika Bergquist; Mohamad Imam; Henrike Lenzen; Cyriel Y Ponsioen; Kristian Holm; Daniel Gotthardt; Martti A Färkkilä; Hanns-Ulrich Marschall; Douglas Thorburn; Rinse K Weersma; Johan Fevery; Tobias Mueller; Olivier Chazouillères; Kornelius Schulze; Konstantinos N Lazaridis; Sven Almer; Stephen P Pereira; Cynthia Levy; Andrew Mason; Sigrid Naess; Christopher L Bowlus; Annarosa Floreani; Emina Halilbasic; Kidist K Yimam; Piotr Milkiewicz; Ulrich Beuers; Dep K Huynh; Albert Pares; Christine N Manser; George N Dalekos; Bertus Eksteen; Pietro Invernizzi; Christoph P Berg; Gabi I Kirchner; Christoph Sarrazin; Vincent Zimmer; Luca Fabris; Felix Braun; Marco Marzioni; Brian D Juran; Karouk Said; Christian Rupp; Kalle Jokelainen; Maria Benito de Valle; Francesca Saffioti; Angela Cheung; Michael Trauner; Christoph Schramm; Roger W Chapman; Tom H Karlsen; Erik Schrumpf; Christian P Strassburg; Michael P Manns; Keith D Lindor; Gideon M Hirschfield; Bettina E Hansen; Kirsten M Boberg
Journal: Gastroenterology Date: 2017-03-06 Impact factor: 22.682

7. Pleiotropic mapping and annotation selection in genome-wide association studies with penalized Gaussian mixture models.

Authors: Ping Zeng; Xingjie Hao; Xiang Zhou
Journal: Bioinformatics Date: 2018-08-15 Impact factor: 6.937

Review 8. The IBD and PSC Phenotypes of PSC-IBD.

Authors: Amanda Ricciuto; Binita M Kamath; Anne M Griffiths
Journal: Curr Gastroenterol Rep Date: 2018-03-28

9. Novel microbiota-related gene set enrichment analysis identified osteoporosis associated gut microbiota from autoimmune diseases.

Authors: Rong-Rong Cao; Pei He; Shu-Feng Lei
Journal: J Bone Miner Metab Date: 2021-08-02 Impact factor: 2.626

10. Development and evaluation of a transfusion medicine genome wide genotyping array.

Authors: Yuelong Guo; Michael P Busch; Mark Seielstad; Stacy Endres-Dighe; Connie M Westhoff; Brendan Keating; Carolyn Hoppe; Aarash Bordbar; Brian Custer; Adam S Butterworth; Tamir Kanias; Alan E Mast; Steve Kleinman; Yontao Lu; Grier P Page
Journal: Transfusion Date: 2018-11-20 Impact factor: 3.157