Literature DB >> 26974007

Analysis of five chronic inflammatory diseases identifies 27 new associations and highlights disease-specific patterns at shared loci.

David Ellinghaus¹, Luke Jostins², Sarah L Spain², Adrian Cortes^3,4, Jörn Bethune¹, Buhm Han⁵, Yu Rang Park⁶, Soumya Raychaudhuri^7,8,9,10, Jennie G Pouget^11,12, Matthias Hübenthal¹, Trine Folseraas^13,14,15,16, Yunpeng Wang¹⁷, Tonu Esko^18,19,20, Andres Metspalu¹⁸, Harm-Jan Westra^7,8,9,10, Lude Franke²¹, Tune H Pers^7,20,22,23, Rinse K Weersma²⁴, Valerie Collij²⁴, Mauro D'Amato^25,26, Jonas Halfvarson²⁷, Anders Boeck Jensen²⁸, Wolfgang Lieb^29,30, Franziska Degenhardt^31,32, Andreas J Forstner^31,32, Andrea Hofmann^31,32, Stefan Schreiber^1,33, Ulrich Mrowietz³⁴, Brian D Juran³⁵, Konstantinos N Lazaridis³⁵, Søren Brunak²⁸, Anders M Dale^17,36, Richard C Trembath³⁷, Stephan Weidinger³⁴, Michael Weichenthal³⁴, Eva Ellinghaus¹, James T Elder^38,39, Jonathan N W N Barker⁴⁰, Ole A Andreassen^41,42, Dermot P McGovern^43,44, Tom H Karlsen^13,14,15,16, Jeffrey C Barrett², Miles Parkes⁴⁵, Matthew A Brown^46,47, Andre Franke¹.

Abstract

We simultaneously investigated the genetic landscape of ankylosing spondylitis, Crohn's disease, psoriasis, primary sclerosing cholangitis and ulcerative colitis to investigate pleiotropy and the relationship between these clinically related diseases. Using high-density genotype data from more than 86,000 individuals of European ancestry, we identified 244 independent multidisease signals, including 27 new genome-wide significant susceptibility loci and 3 unreported shared risk loci. Complex pleiotropy was supported when contrasting multidisease signals with expression data sets from human, rat and mouse together with epigenetic and expressed enhancer profiles. The comorbidities among the five immune diseases were best explained by biological pleiotropy rather than heterogeneity (a subgroup of cases genetically identical to those with another disease, possibly owing to diagnostic misclassification, molecular subtypes or excessive comorbidity). In particular, the strong comorbidity between primary sclerosing cholangitis and inflammatory bowel disease is likely the result of a unique disease, which is genetically distinct from classical inflammatory bowel disease phenotypes.

Entities: Chemical

Mesh：

Year: 2016 PMID： 26974007 PMCID： PMC4848113 DOI： 10.1038/ng.3528

Source DB: PubMed Journal: Nat Genet ISSN： 1061-4036 Impact factor: 41.307

Introduction

Genome-wide association studies have revealed overlap in the genetic susceptibility to human diseases that affect a range of tissues. This overlap is most notable in immune-mediated diseases[1,2] including the clinically related conditions ankylosing spondylitis (AS), Crohn's disease (CD), psoriasis (PS), primary sclerosing cholangitis (PSC) and ulcerative colitis (UC). Co-morbidity of these conditions in the same individual and increased risk of any of these conditions in family members have been extensively documented[3,4]. Recently a large-scale discovery-driven analysis of temporal disease progression patterns using data from an electronic health registry covering the whole population of Denmark revealed substantial population-wide co-morbidity[5]. This raises the possibility of a hidden molecular taxonomy that differs from the traditional classification of disease by organ or system. Cross-disease genetic studies provide an opportunity to resolve overlapping associations into discrete pathways and explore details of apparently shared etiologies. In this study, we combined Immunochip genotype data for 52,262 cases and 34,213 controls of European ancestry, the currently largest available genetic data sets in five clinically related seronegative immune-driven phenotypes (AS, CD, PS, PSC and UC) to explore the extent of sharing of genetic susceptibility loci. The aims of this cross-phenotype study were to: 1) identify subsets of the 5 phenotypes with shared genetic risk loci using a cross-phenotype meta-analysis approach, 2) to identify additional susceptibility loci, 3) to investigate co-morbidity and pleiotropy amongst these phenotypes and 4) to improve the understanding of shared pathways and biological mechanisms common to subsets of the phenotypes studied.

Results

Cross-phenotype association analysis

We analyzed Immunochip genotype data of 52,262 cases from AS (8,726), CD (19,085), PS (6,530), PSC (3,408) and UC (14,413) and 34,213 healthy controls () using variants with a minor allele frequency >0.1% to examine the shared and distinct genetic etiology between these diseases (see Methods). By utilizing Immunochip-only data, we were able to perform a uniform and central quality control of all batches, thus reducing potentially existing batch effects (see Methods). Next, we utilized a recently published subset-based meta-analysis approach (SBM)[6] to exhaustively explore all subsets of disease combinations for the presence of association signals. The method identifies the best subset of non-null studies, while in parallel accounting for multiple testing and a fixed control group (see Methods). By performing primary SBM analyses, we identified 166 genome-wide significant (PSBM<5×10−8) loci outside the major histocompatibility complex (MHC, chromosome 6 region at 25–34 Mb) (). Three of these 166 loci (rs2042011 at MIR1208; rs2812378 at CCL21/FAM205A; rs1893592 at UBASH3A) have not been reported previously for any of the five diseases under study and thus are novel shared risk loci. SNP associations at UBASH3A (chr21q22.3) and CCL21 (9p13.3) have been reported previously for other autoimmune disorders[7,8]. These three novel loci would have been missed using single disease analyses alone. To avoid any loss of power, where variants are only associated with a single phenotype, we looked up single disease vs. control subsearches on any SNPs that achieved PSBM<5×10−7 in the primary analysis. Using this SBM-directed approach, we identified 27 novel genome-wide significant disease associations (Pdisease<5×10−8) including 17 novel genome-wide significant loci for AS, 6 loci for CD, and 4 loci for PSC (Figure 1, Supplementary Table 2, Supplementary Fig. 2). 24 out of these 27 associations were also genome-wide significant in the primary SBM analyses (PSBM<5×10−8) thus leading to a total of 169 non-MHC risk loci. In order to identify additional independent association signals within the 169 non-MHC risk loci, we performed a stepwise conditional SBM analysis following a recently published stepwise conditional SBM fine-mapping approach[9] (see Methods). In total, we identified 244 independent association signals with 187 signals being shared by at least two diseases for the five diseases under study (Supplementary table 3; Supplementary Fig. 3, 4 and 5). We estimated the heritability explained by these 244 variants for each disease () and for all pair-wise disease comparisons (). The ten pair-wise comparisons of disease-associated alleles show diverse patterns of sharing with respect to size and direction of allelic effects and the number of unique associations ().

Figure 1

27 novel genome-wide significant disease associations (Pdisease<5×10−8) for ankylosing spondylitis (AS), Crohn's disease (CD), psoriasis (PS), primary sclerosing cholangitis (PSC) and ulcerative colitis (UC)

Single disease analyses were performed only on SNPs that achieved PSBM<5×10−7 in the primary (unconditioned) cross-disease subset-based association meta-analysis (SBM) approach (see Main Text). We identified 17 novel genome-wide significant susceptibility loci for AS, 6 loci for CD and 4 loci for PSC (). Corresponding P-values and ORs for each novel association are shown separately for each disease. With this, the number of known AS, IBD, and PSC risk loci increased to 48, 206, and 20, respectively. For 22 out of 27 gws associations, lead SNPs from the SBM approach (PSBM<5×10−8) and the single disease lookups (PSBM<5×10−7 and Pdisease<5×10−8) are identical, in five instances we have different lead SNPs between SBM and the single disease analyses ().

−log -value: −log10 P-values (Pdisease) from Immunochip analysis () with regard to the physical location of markers; direction of triangle denotes direction of disease-individual effect; OR: odds ratio from the five single disease vs. control subsearches (OR(disease) in Supplementary ). Large circles denote nominal significant disease-individual P-values (Pdisease<0.05); CAF cases/controls: case/control minor allele frequency; If available, the nearest gene within 10kb of the variant is depicted.

Functional annotation of associated variants

We functionally annotated the 244 risk alleles from the 169 distinct loci (see Methods). For 210 associations signals (86.1%) the lead variant was within 10 kb of a known gene and 34 signals were classified as intergenic regions (>10 kb distant to a gene) (). The analysis identified 16 coding variants (14 missense, 1 frameshift and 1 splice donor) in genes that were previously implicated in immune-mediated diseases (). Eight of these variants (located in PTPN22, GPR35, MST1, CD6, two in NOD2, TYK2 and CARD9) have been associated before with one of the traits included in this study, and six (GCKR, two in IFIH1, SH2B3, SMAD3, TYK2) with another phenotype, either listed in the GWAS catalog[10] or in Immunobase. Two of the genes carrying a coding variant (TLR4, PRKCQ) have previously been suggested as candidate loci, but robust association signals were lacking yet (). We further checked for variants in high LD (r2>0.8) with the identified variants using 1000 Genomes haplotypes and found that in total 46 of the identified signals were highly correlated with 57 coding variants (48 missense, 2 stop_gain, 3 splice region variants, 1 frameshift, 3 regulatory variants, ). We found that 40 of the 57 coding variants, from 30 loci, had been described in previous GWAS or Immunochip studies involving one of the traits included in this study or another phenotype. Additionally, a further 9 variants have been mentioned as candidate variants in autoimmune disease publications. 8 coding variants (7 missense, 1 stop/gain, and all in high LD with our lead variants), located in EFNA1, FCGR2A, HSPA6, C7orf72, FAM118A, respectively, have not been described before in relation to any immune-mediated phenotype.

eQTL analysis in peripheral blood

Analyses of cis-eQTL microarray data from whole peripheral blood samples of an independent control cohort comprising 2,360 unrelated individuals[11,12] (see Methods) identified cis-effects for 132 (PFDR<0.05; ) out of the 244 disease-associated SNPs from Supplementary Table 3a. Five of these represent the best eQTL SNP and another five represent best secondary eQTL SNPs independent from the best eQTL SNPs at a given locus.

Pathway, cell type, and annotation enrichment analyses

We tested for enrichment between SNPs in associated loci and various types of genomic annotations using GoShifter[13]. We used 620 different annotations from the NIH Roadmap Epigenomics[14] and Fantom5[15] projects to look for enrichment of histone modifications and expressed enhancers, respectively (). Results from the SBM association analysis were separated into groups to include all 244 identified variants, variants shared amongst 3 or more phenotypes, and those associated with a phenotype (). For the Roadmap enrichment analysis, using a threshold of P<10−3, the inflammatory bowel disease (IBD) and PS phenotype subsets showed enrichment for H3K27ac modifications in CD3 primary cells and for H3K27ac in adipose tissue, respectively (). The ‘all variant’ (n=244) analysis showed enrichment for H3K4me3 (for which the largest number of cell types were analyzed by the Roadmap consortium and which highlights transcribed promoters and TSS[16]) in HUES64 cell line as undifferentiated cells, CD34+ cells (bone marrow cells) and Natural Killer Cells (CD56). The Fantom5 data analysis shows enrichment for enhancers expressed in T cells (CD and ‘all_variants’ group) and also Natural Killer cells for CD. However, only the latter (T cells, ‘all_variants’ group) met the significance threshold of 0.05/620=8.06×10−5 needed for Bonferroni correction. To test which candidate genes from the associated loci () are highly expressed in which tissues and to define disease relationships at the expression level, we conducted pathway and tissue/cell type enrichment analyses using DEPICT[17], with 77,840 microarray expression profiles from human, rat and mouse and 209 tissue/cell type annotations[18] (see Methods). Even when correcting for the biased Immunochip gene content, our DEPICT results confirmed that the genes from the 169 herein-reported non-MHC genome-wide significant susceptibility loci show greatest relevance for the regulation of immune response pathways ( and the hematopoietic system (). We further generated a protein-protein-interaction (PPI) network (111 gene nodes and 65 edges, see ) based on five prioritized gene sets of AS, CD, PS, PSC and UC SNP sets, respectively (), from DEPICT analyses and a reference PPI data from ConsensusPathDB[19] (see Methods). We observed that 36 gene nodes from this PPI network were connected in one single large component (). Then we evaluated the potential role of these genes for their “druggability” by linking genes within this core network to drugs using Drugbank (see Methods). Since the nature and effect of the interaction between the drug and the encoded protein is mostly unknown, e.g some drugs we identified have effects opposite to the what we aim for, we performed a manual literature search to assess which of the identified drugs show evidence or could potentially be promising for any of the diseases under study by using PubMed (last search July 1st 2015) and ClinicalTrials.gov. All drugs were selected based on evidence from phase I/II/III randomized clinical trials (RCTs) or published animal studies. Nine drug target genes overlap with the 36 genes from the core network (). Although further investigations are necessary, we propose that target genes/drugs selected by this approach could represent promising candidates for novel drug discovery for treatment of AS, CD, PS, PSC and UC. For example, novel CCR2-antagonists such as MLN-1202, and CCR5-antagonists INCB9471 and AMD-070 are potential new drugs for treatment of AS, CD, PS, PSC and UC.

Bayesian multinomial regression for model selection

To compare different disease models for each of the 244 risk variants while accounting for the different sample sizes per diseases, we used Bayesian multinomial regression. The aim is to estimate the posterior probability (Probmodel) for each disease model conditional on the genotype and phenotype data that was observed (see Methods). A disease model is a set of diseases that a given locus is associated (i.e. has a non-zero log odds ratio) with, e.g. “associated with CD and UC, but not with AS, PS or PSC” is one disease model. There are a total of 32 possible disease models for the 5 phenotypes, which includes the null model (“not associated with any disease“). The Bayesian setting naturally handles the different uncertainties on the effect sizes for each disease due to their different sample sizes and powers. We found 66 signals (59 non-MHC loci) with a best Probmodel≥60% including 14 Loci (with closest genes SH2B3, UBE2L3, TNP2, IL2RA, DNMT3B, CXCR2, CDKAL1, CARD9, MST1, ZMIZ1, ETS1) with Probmodel≥0.8 () when assuming that each sharing model is given the same probability (uniform prior across all models, see Methods). However, because previous studies suggested that the structure of sharing of susceptibility is non-uniform[2], we calculated posteriors for each model for each risk variant under six different priors and took a vote of the highest posterior models under each prior (see Methods). Then we counted how many priors voted for that model, and calculated the minimum, maximum and mean posterior (MeanProbmodel) for each risk variant (. Based on this consensus-finding process of merging results from six different priors, we identified 34 signals (31 non-MHC loci) with a best MeanProbmodel≥60% including 12 Loci (with closest genes SH2B3, IL2RA, IFIH1, NFKB1, TYK2) with Prob≥0.8 suggesting that we correctly identified the disease model (). Out of the 34 associations with MeanProbmodel≥0.6, 25 signals have 5 diseases involved, 6 signals have four diseases and 3 signals are unique to a single disease. Some of these disease sets show different directions of effect (risk versus protective), heterogeneity of odds ratios (P<0.01), or both, for the diseases being involved ( and ).

Distinguishing pleiotropy from heterogeneity

Statistically significant temporal co-morbidity (disease A followed by disease B within a 5-year time frame of disease A, or vice versa) amongst the five diseases studied was confirmed for 8 out of 10 possible pairs of diseases (P<0.05/823606=1.21×10−9) after screening 823,606 directed pairs of diagnoses from an electronic health registry covering the whole population of Denmark[5] (, see Methods). Consistent with previous reports, we further observed high comorbidity rates among our patients (), i.e. patients had more than one disease at the time of last diagnosis. This may occur due to pleiotropy (sharing of risk alleles between disease A and disease B) or heterogeneity (a subgroup of disease A cases has a higher loading of risk alleles for disease B). Heterogeneity can occur as the result of many different scenarios including diagnostic misclassifications, molecular subtypes, and excessive comorbidity. We evaluated whether pleiotropy or heterogeneity best explained the high comorbidity rates amongst the five diseases studied using BUHMBOX[20] (see Methods). BUHMBOX detects heterogeneity by calculating the cross-locus correlation of disease B-associated loci among disease A cases; a non-zero correlation is indicative of heterogeneity[20]. We calculated the statistical power of BUHMBOX to detect various proportions of sample heterogeneity for all disease pairs (Online Methods). For 18 out of 20 pairs of diseases, we had >50% power to detect 20% sample heterogeneity (). Since BUHMBOX has high power for these pairs, non-significant BUHMBOX results strongly suggest that the genetic risk score (GRS) association is likely due to pleiotropy rather than heterogeneity. First, to quantify genetic sharing for each of the 20 possible pairs of five diseases, we used a GRS approach (see Methods). We calculated GRSs for disease B (using known risk alleles, weighted by effect size) for all individuals in the disease A sample, and tested the association of the GRSs with disease A status using logistic regression. The GRSs test for enrichment of disease B alleles in disease A cases, and are expected to be significant both in the presence of pleiotropy and heterogeneity. As expected, we observed highly significant associations between disease B GRSs and disease A status for almost every possible pair (), which demonstrated strong sharing of risk alleles between the different immune-mediated diseases. We then tested if this observed genetic sharing was due to true pleiotropy or heterogeneity using BUHMBOX[20]. In the setting of pleiotropy, pleiotropic disease B risk alleles are shared across all disease A cases, whereas in heterogeneity, only a subset of disease A cases share disease B risk alleles. This leads to cross-locus correlations between disease B-associated loci being positive in the presence of heterogeneity, but not in the case of pleiotropy. BUHMBOX calculates the cross-locus correlation between disease B-associated loci in disease A cases, and determines if they are significantly non-zero. We calculated cross-locus correlations for all 20 disease-pairs (see Methods). We did not observe significant inter-locus correlations (), despite high statistical power for many pairs ( and ). Our findings suggest that the overall GRS association between the five immune diseases investigated is likely due to pleiotropy.

Immunochip-wide co-heritability analysis

In order to estimate Immunochip-wide pleiotropy (the genetic variation and covariation between pairs of diseases in liability that is tagged by SNPs represented on the Immunochip), we applied univariate and bivariate linear mixed model heritability methods[21,22] (see Methods). The relationships between disorders are expressed as SNP-based coheritabilities (). When excluding SNPs from the MHC region, genetic correlation was highest between CD and UC (rG=0.78 ± 0.015 s.e., in concordance with previous estimates[23]), PSC and UC (rG=0.64 ± 0.027 s.e.), moderate (rG<0.5) between AS and CD (rG=0.49 ± 0.023 s.e.), AS and UC (rG=0.47 ± 0.026 s.e.), CD and PSC (rG=0.35 ± 0.030 s.e.), AS and PSC (rG=0.33 ± 0.035 s.e.), AS and PS (rG=0.28 ± 0.035 s.e.), CD and PS (rG=0.27 ± 0.029 s.e.), and low (rG<0.25) between PS and PSC (rG=0.18 ± 0.042 s.e.), and PS and UC (rG=0.16 ± 0.034 s.e.) (see Supplementary Fig. 11,12 and Supplementary Table 15). For correlation values including MHC variants see . As a negative control, we conducted coheritability analyses between each immune-mediated disease under study and longevity, bipolar disorder, major depressive disorder and schizophrenia Immunochip studies (). No coheritability was observed with the non-immune-mediated diseases studied here.

Discussion

By combined assessments of Immunochip genotyping datasets from 52,262 patients with five closely associated conditions (AS, CD, PS, PSC and UC; all seronegative inflammatory diseases as per clinical definition) and 34,213 healthy controls we were able to delineate the genetic overlap between the conditions. A key outcome of the overlap analysis is that despite the profound pleiotropy, clear demarcations of the genetic risk for the individual conditions exist. Implicit to this, hence conflicting an existing paradigm where a causal relationship between IBD and the involved extra-intestinal conditions exists[24,25], our modeling rather supports (a) the presence of shared pathophysiological pathways as the basis for the clinical co-occurrence and (b) the hypothesis that patients with concomitant syndromes are genetically distinct from patients without concomitant syndromes. Our cross-disease association framework also enabled the identification of novel coding variants and known eQTLs. One newly identified missense variant for CD, rs4986790, is located at exon three of toll-like receptor 4 (TLR4), which is an important mediator of innate immunity. This SNP has been shown to modulate TLR4 effector functions either by interfering with the binding capacity of TLR4 with its ligands or by controlling the extracellular deposition of functional TLR4[26,27]. Another newly identified missense SNP for CD, rs2236379, which has not been previously associated with other disease traits, is located at exon nine of PRKCQ encoding protein kinase C-theta (PKC-θ). PKC-θ is essential in the signaling cascades that lead to NFkB, AP-1 and NFAT activation[28] and is also critical for stabilizing Th17 cell phenotype by selective suppression of the STAT4/IFN-c/T-bet axis at the onset of differentiation[29]. Furthermore, PKC-θ inhibition enhances Treg function and protects Treg from inactivation by TNF-α, restores activity of defective Treg from rheumatoid arthritis patients, and enhances protection of mice from inflammatory colitis[30]. We also found that one of the AS/UC secondary signals rs61802846 is in perfect LD (r2=1.0) with a stop-gain SNP rs9427397, resulting in a premature stop codon in FCGR2A. This appears to be distinct (r2=0.12) from the known IBD-associated missense variant in FCGR2A rs1801274[1,31]. Among the 10 strongest eQTL SNPs (PFDR<0.05; ) are the intronic SNP rs3766606 (at PARK7 shared by PS (risk) and CD,UC (protective)), the intronic variant rs2910686 (at ERAP2 shared by AS,CD,UC (risk only)), the intronic SNP rs1893592 (at UBASH3A shared by PSC, UC (protective)), the missense SNP rs12720356 (at TYK2 shared by AS,CD,UC (risk) and PS (protective)), and the intronic variant rs679574 (at FUT2 shared by AS,CD,PS and PSC (risk only)). Most “shared” loci exhibit complex patterns of multi-disease associations suggesting multiple types of pleiotropy[32]. Through subsequent Bayesian multinomial regression modeling, we identified 31 loci with 34 independent associations () for which we determined a specific disease model constellation with high certainty (MeanProbmodel≥60%). For example, at 12q24.12 (Locus 119; SH2B3) the single lead-SNP rs3184504 (Prob=0.98) is associated with decreased risk of AS (ORAS=0.92) but increased risk of the other diseases (ORCD=1.06; ORPS=1.06; ORPSC=1.19; ORUC=1.05), and has been associated before with >10 other phenotypes in the GWAS catalog[10], thus suggesting that 12q24.12 is a common risk locus with heterogenous effect sizes for multiple complex diseases. In addition to contrasting the genetic landscape of AS, CD, PS, PSC and UC, we investigated comorbidity and pleiotropy amongst these phenotypes. GRS and cross-locus correlation analyses[20] suggest that the increased comorbidity rates among our patients are due to biological pleiotropy rather than heterogeneity. In other words, an individual with a pleiotropic risk variant is more likely to acquire both diseases. Among all non-zero comorbid rate pairs, the pair of PSC and IBD is particularly noticeable for its high frequency of comorbidity (). PSC patients suffer from a highly increased frequency (62-83%) of IBD[33] (called PSC with concomitant IBD, or PSC-IBD, although IBD is most often classified as UC). Interestingly, despite the high prevalence of IBD in PSC the loci encoding IL23R and IL10 (both of which are strongly associated with CD and UC) did not show any evidence of association with PSC. However, we found that many PSC risk variants are shared with UC and have similar effects both in terms of magnitude and direction (). It is unlikely that pleiotropy with UC accounts for the comorbid IBD seen in PSC on its own, given the exceedingly higher prevalence of IBD in PSC patients compared to the population prevalence of UC. We therefore questioned whether PSC-IBD is a unique disease distinct from UC, or whether PSC-IBD is the result of UC that is prevalent among PSC patients due to a causal relationship between the two diseases (i.e. UC causes subsequent development of PSC, or vice versa). If the PSC-IBD phenotype is the result of a causal relationship between UC and PSC, there would be a subgroup of PSC cases with a higher loading of UC risk alleles (or vice versa). We tested UC loci in PSC cases with BUHMBOX, and found no evidence of a UC-driven subgroup () despite high power (99.9% power given the hypothesis that 62% of PSC are affected by UC). We also tested PSC loci in UC cases with BUHMBOX (); while the result was negative (P=0.48), the test was underpowered to detect subtle heterogeneity proportions. Although we cannot completely rule out a causal relationship between PSC and UC at this time, we expect that these findings will become clearer as additional PSC-associated loci are identified in future studies, improving power to detect heterogeneity. At present, our findings are most consistent with the hypothesis that PSC-IBD is a unique disease that shares some genetic factors with UC, but is distinct from classical IBD phenotypes[4,34]. This hypothesis is further supported by the observation that PSC-IBD shows significant clinical differences from classical IBD, and requires specialized management; compared to IBD, PSC-IBD has a higher rate of pancolitis with ileitis and rectal sparing, as well as a higher incidence of colorectal cancer[34]. Our results from testing of enrichment between multi-disease signals and large-scale expression data sets, epigenetic and expressed enhancer profiles further reflect this excessive pleiotropy and mainly highlight perturbations in immune response pathways and blood cell tissues. However, we could not pinpoint which genomic features and which cells a variant influences. We hypothesize that larger gene expression data sets for the disease-relevant tissues and cell types from affected individuals should be generated to allow for high-resolution and more eQTL studies since eQTLs are often cell-specific[35]. Further, the discovery of multiple further genetic associations increases the power of such analyses to define pathways and cell types involved in specific diseases. In summary, we performed the largest systematic cross-disease genetic study for chronic immune-mediated diseases to-date. Using novel cross-phenotype analytic methodologies we identified 17 novel genome-wide significant susceptibility loci for AS, 6 loci for CD and 4 loci for PSC, and 3 novel yet unreported risk loci for the diseases under study. With this, the number of known AS, IBD, and PSC risk loci increased to 48, 206, and 20, respectively. Due to lower coverage at unselected regions on Immunochip, imputed GWAS data would further increase statistical power to identify novel shared associations outside established risk loci in future studies. Future cross-disease studies of a wider range of phenotypes, in combination with more sophisticated fine-mapping studies on individual diseases and specific layers of multi-omics data sets are needed to provide another layer of information for a potential new disease classification based on molecular genetic profiles. While most cross-disease studies employ patient panels that were manually curated for single phenotypes and often rely on questionnaire data, future studies could employ even larger collections of hospitalized patients, for which exhaustive electronic medical patient records and array data exists. Moreover, longitudinal data from electronic health charts could pinpoint further comorbidities that should be included in a more systematic next-generation cross-disease approach.

Online Methods

Study subjects

All DNA samples included in the study () were genotyped using the Illumina Immunochip custom genotyping array[40], a targeted high-density genotyping array with comprehensive coverage of 1000 Genomes Project SNPs[41] within 186 autoimmune disease-associated loci. CD/UC case and control cohorts were collected from 15 countries across Europe, North America and Australia and have previously been described[1]. Initially, 19,761 Crohn's disease cases, 14,833 ulcerative colitis cases and 28,999 controls of European ancestries from the International Inflammatory Bowel Disease Genetics Consortium (IIBDGC) were included in the study. Genotyping of the IIBDGC cohorts was performed in 31 different batches (34 batches before quality control) across 11 different genotyping centers. The initial AS case-control collection (2 main batches) consisted of 10,417 cases and 12,338 controls of European ancestry and were described previously.[36] All AS case genotyping was performed at one centre (University of Queensland Diamantina Institute, Translational Research Institute, Brisbane, Australia). 6,577 Psoriasis case and 15,085 control samples (2 main batches) were collected from 13 countries across Europe and North America[37]. Recruitment of 3,789 PSC patients and 25,079 controls (2 main batches) was performed in 14 countries in Europe and North America.[38] Since most control samples we shared between different disease consortia, we identified the set of non-overlapping (unique) control samples (). 2019 schizophrenia cases, 1140 bipolar cases and 589 major depressive disorder cases were collected from different centers in Germany in the context of the MooDs consortium. All samples have been genotyped at the Life&Brain center in Bonn. Written, informed consent was obtained from all study participants and the institutional ethical review committees of the participating centers approved all protocols.

Immunochip genotype calling and quality control

Initial genotype calling was performed with the Illumina GenomeStudio GenTrain 2.0 software and the custom generated cluster file of Trynka et al. (based on an initial clustering of 2,000 UK samples and subsequent manual readjustment of cluster positions)[40]. Based on normalized intensity information, we removed samples detected as intensity outliers (>4 s.d.). Based on initial genotype data, we further removed samples with <90% callrate using PLINK[42]. To identify ethnicity outliers (ie subjects of non-Europeans ancestry), we performed principal component analysis (PCA) with Eigenstrat[43] and a set of 210 HapMap founder samples[44] and projected Immunochip samples on the principal components axes on the basis of a set of 14,484 independent (minor allele frequency (MAF)>0.05) SNPs excluding X- and Y-chromosomes, SNPs in LD (leaving no pairs with r2>0.2), and 11 high-LD regions as described by Price et al.[45]. OptiCall genotype recalling was performed with a Hardy-Weinberg equilibrium P-value threshold of 10−15 for each batch, Hardy-Weinberg equilibrium blanking disabled and a genotype call threshold of 0.7. Hardy-Weinberg equilibrium was calculated with conditioning on predicted (European) ancestry, and related individuals were removed from this calculation. After genotype calling a unified quality control procedure was conducted across 40 genotyping batches. We tested for significantly different allele frequencies of variants across the batches from a particular disease or the control group (with at most one batch being removed) with a false discovery rate (FDR) threshold of 0.01 (). Variants that had >2% missing data, a minor allele frequency <0.1% in either of the different disease sets or in controls, had different missing genotype rates in affected and unaffected individuals (P<10−5) or deviated from Hardy-Weinberg equilibrium (with a false discovery rate (FDR) threshold of 10−5 in controls (a) across the entire collection with at most one batch being removed () or (b) falling below in two single batches () were excluded. Samples that had >2% missing data and overall increased/decreased heterozygosity rates were removed (). For robust duplicate/relatedness testing (IBS/IBD estimation) and population structure analysis, we used a pruned subset of 14,484 independent SNPs (see text above). Pair-wise percentage IBD values were computed using PLINK. By definition, Z0: P(IBD=0), Z1: P(IBD=1), Z2: P(IBD=2), Z0+Z1+Z2=1, and PI_HAT: P(IBD=2) + 0.5 * P(IBD=1) (proportion IBD). One individual (the one showing greater missingness) from each pair with PI_HAT>0.1875 was removed. To resolve within-Europe relationships and to test for population stratification, the remaining QCed 52,262 cases and 34,213 unique controls were tested using the PCA method, as implemented in FlashPCA[46]. PCA revealed no non-European ancestry outliers (). We computed Tracy-Widom statistics to evaluate the statistical significance of each principal component identified by PCA and identified the top seven axes of variation being significant at P<0.05 (). 130,052 QCed polymorphic variants with MAF>0.1% and 52,262 cases and 34,213 unique controls were available for analysis. We conducted primary association analysis based on subsets (ASSET) methodology[6]. Even after adjusting for the large number of comparisons, the SBM method maintains similar type-I error rates as for standard meta-analysis. This method offers a substantial power increase (sometimes approaching between 100-500%[6]) compared to standard univariate meta-analysis approaches, where the (heterogeneous) effect of a specific SNP is not exclusively restricted to a single disease. Under the assumption that association signals from shared risk loci based on positional overlap are tagging same causal variant for different diseases, the (unconditioned) subset-based meta-analysis (SBM) approach improves power compared to standard fixed-effects meta-analysis methodology. For the situation that distinct variants within shared susceptibility region may confer independent effects for individual diseases, the conditional SBM approach is well suited to reveal these independent (often multi-disease) associations signals (see stepwise subset-based conditional logistic regression). The subset-based meta-analysis is a generalized fixed-effects meta-analysis and explores all possible subsets of diseases (or a restricted disease set if specified) for the presence of true association signals, while adjusting for the multiple testing required and a fixed control group shared by all diseases. To control for potential population stratification, we adjusted association test statistics by means of principal component analysis (PCA) using the top seven axes of variation (). Adjusted two-tailed PSBM values (risk versus protective) were obtained using the discrete local maxima (DLM) method estimating tail probabilities of the Z score test statistic that is maximized over a grid of neighboring subsets[6,47]. The maximum (in absolute value) of the subset-specific Z statistics is a conservative variable selector in the sense that for large samples, it will select only non-null studies, but it is not guaranteed to select all of the non-null studies[6]. The genomic inflation factor (λ) calculated using 1,820 “null”-SNPs (outside the MHC region) associated with reading and writing ability, psychosis and schizophrenia was 1.082 (λ1000 for an equivalent study of 1,000 cases and 1,000 controls=1.002), indicating minimal evidence of residual population stratification in the overall data set of 52,262 cases and 34,213 controls. Where a particular SNP is only associated with a single disease, the standard meta-analysis methodology has slightly higher power than the subset-based approach. To avoid loss of statistical power in such settings, we looked up every SNP with PSBM<5×10−7 within and outside the 166 non-MHC susceptibility loci, to see if gws (Pdisease<5×10−8) was achieved in any of the five single disease vs. control subsearches. Univariate association statistics (restricted to a single disease data sets versus the fixed control group) were obtained using the same DLM method. The increased statistical power of the single association test (Pdisease) in comparison to original individual disease Immunochip analyses[1,36-38,48] is likely due to the fact that the larger sample-sized Immunochip data here (except for AS) was used as a screening tool instead of using it as a replication data set after screening smaller sized GWAS discovery data sets. The large number of novel AS loci can largely be attributed to an approximately 2.5× increased size control cohort compared to the original AS Immunochip study (13,578 controls in the original study vs. 34,213 controls in the current study). After “subtracting” the novel trans-ancestry CD/UC loci from the inflammatory bowel disease (IBD) trans-ancestry study[49], 27 of 35 new gws non-MHC risk loci remain for the five diseases under study. Using an alternative method we identified more pleiotropic loci shared between UC and CD (, see Conjunctional False Discovery Rate analysis below).

Stepwise subset-based conditional logistic regression

Single and multiple disease-associated (independent) lead-SNPs were selected through stepwise regression to condition away lead-SNPs one at a time until no associations remain following a recently published stepwise conditional SBM fine-mapping approach[9]. It is an effective method for separating independent signals and assumes that LD between the independent causal variants is low. Significance was defined by Bonferroni correction of the number of LD-independent marker on the Immunochip (0.05/37,377 = 1.34×10−6).

Cluster plot inspection

Immunochip intensity cluster plots of all genome-wide significant SNP markers (PSBM<5×10−7 and Pdisease<5×10−8; PSBM<5×10−8) from were manually inspected by three different persons using Evoker[50] to ensure that they were well clustered. To compare different disease models at each locus we used Bayesian multinomial regression. A disease model is a list of diseases that a given locus is associated (i.e. has a non-zero log odds ratio) with, e.g. “associated with CD and UC, but not with PS, AS or PSC” is one disease model, as is “associated with all diseases”. There are a total of 32 possible disease models for the 5 phenotypes, which includes the null model (“not associated with any disease“). Our aim is to infer the posterior probability for each of these disease models, conditional on the genotype and phenotype data we have seen. We do this under a Bayesian setting, as it naturally handles the different uncertainties on the effect sizes for each disease due to their different sample sizes and powers. The methods we describe below are implemented in the open source Trinculo software package. The Bayesian multinomial logistic regression software calculates a marginal likelihood for each model, integrating out uncertainty in the effect size, as Where is a vector of log-odds ratios for each disease. The likelihood Pr(D | M) is given by the multinomial logistic likelihood. The prior distribution on the effect sizes is given by |M ~ MVN(0,Σ), where Σ is the prior covariance matrix for model M. To enforce phenotypes that are not associated with the disease, we set Σ = 0 if either phenotype i or j is not associated with the locus. We use Newton's method to calculate the maximum a posteriori estimate (MAP) for the parameters, and calculate the marginal likelihood using a Laplace approximation around the MAP. We calculate the posterior probability for each model as The method thus requires two priors: the model covariance matrices Σ and the per-model priors Pr(M). We analyze each variant using six different sets of priors, two different forms of the covariance matrix prior and three different forms of the per-model prior. For the covariance prior we use a) a simple independent covariance matrix where Σ = 0.2 if phenotype i is included in the model, and all other Σ = 0 and b) an empirical covariance prior where a single covariance matrix is learned by maximum likelihood assuming all loci are associated with all diseases (i.e. maximizing the product of the marginal likelihoods across all loci under the “associated with all phenotypes” model). For the per-model priors, we use a) a uniform prior across all models, b) a uniform prior across the number of phenotypes associated with the locus (so all models where there are the same number of associations sum to 1/6) with equal weight to each model with the same number of phenotypes and c) an empirical prior distribution on the number of phenotypes associated with the locus, inferred by maximum likelihood. We calculated posteriors for each model for each risk variant under the six different priors. For each risk variant we then took a vote of the highest posterior models under each prior, such that we select whichever model was considered best by the largest number of priors. We also recorded how many priors voted for that model, and how much posterior each prior gives to the winning model. If SNPs represent secondary independent association signals due to the results from stepwise conditional SBM analysis, then they were tested conditional on all other identified genome-wide significant independent signals within the same locus. Within the Bayesian logistic regression, we included the genotype at the lead SNP (and further preceding independent signals) as a covariate in the model.

Disease correlation measure and temporal comorbidity

To determine significant temporal co-occurrences (disease-pairs) for the five inflammatory diseases under study, we screened an independent data set covering ICD10 diagnose codes from 6,631,920 people of the entire Danish population in the period from 1996 to 2014[5]. We used relative risk (RR) to measure the strength of the correlation between a pair of diagnoses (diagnosis A followed by diagnosis B). RR estimates and associated P-values were calculated using a sampling approach as described in the original study[5]. In brief, given a pair, diagnosis A followed by diagnosis B, RR of a temporal association was calculated as the ratio of the observed number of patients who had A then B within 5 years and the number of randomly matched control patients would get B within 5 years from a matched discharge. Each matched control has same age (birth decade) and gender as the case and has a discharge of same type (inpatient, outpatient or emergency room) within the same calendar-week as the case's A diagnosis (from which the 5 years to develop B is started). The significance threshold of P=0.05/823606=1.21×10−9 was applied using Bonferroni correction for testing 823,606 directed pairs in the original study[5].

Distinguishing pleiotropy and heterogeneity

We used BUHMBOX v0.38 (Breaking Up Heterogeneous Mixture Based On Cross-locus correlations)[20] to evaluate whether the sharing of risk alleles observed across pairs of diseases (disease A and disease B) was driven by true pleiotropy where there is pervasive sharing of risk alleles between two diseases, or by heterogeneity where a subgroup of disease A cases has a higher loading of risk alleles for disease B. The BUHMBOX approach has been described in detail elsewhere[20]. Briefly, a genetic risk score (GRS) approach is used to detect significant sharing of risk loci between disease A and disease B. If such genetic sharing is detected using GRSs, the BUHMBOX test statistic – which identifies heterogeneity by calculating the cross-locus correlation of disease B-associated loci among disease A cases – is applied to verify whether these associations are due to heterogeneity (e.g. sample misdiagnosis, excessive comorbidity) as opposed to biological pleiotropy. In the setting of pleiotropy, pleiotropic disease B risk alleles are shared across all disease A cases, whereas in heterogeneity, only a subset of disease A cases share disease B risk alleles. This leads to cross-locus correlations between disease B-associated loci being positive in the presence of heterogeneity, but not in the case of pleiotropy. To strictly control for false positives, BUHMBOX uses LD-pruning, the top seven principal components from PCA, and delta-correlations between cases and controls. First, to quantify pleiotropy for each of the 20 possible pairs of five diseases, we calculated GRSs using known independent risk loci for disease B for each case and control in the disease A sample (based on disease B risk alleles, weighted by effect size) and tested the association of these GRSs with disease A status using logistic regression adjusted for the top seven principal components from PCA. The GRS P-values therefore test for enrichment of disease B risk alleles in disease A, and are expected to be significant both in the presence of pleiotropy and heterogeneity. We obtained the list of known associated loci from the previous literature[1,36-38] for AS, CD, PS, PSC and UC. Next, we evaluated the presence of heterogeneity by applying BUHMBOX[20] to each of the 20 pairs of diseases. We estimated the statistical power of BUHMBOX to detect a certain proportion of sample heterogeneity by simulation (), using the effect sizes and allele frequencies of the disease B loci and randomly simulating the number of cases and controls in the disease A sample. The variants identified in this study were annotated using the Ensembl variant effect predictor (VEP)[51] (release-77) to determine genomic position annotations, including the closest gene, and functional consequences (using the most severe consequence due to SIFT[52] and Polyphen[53]). The —assembly flag was set to GRCh37 and added the —pick flag to retrieve the most severe consequence for the variants. The UpDownDistance plugin was used to retrieve the nearest gene id within 10kb of the variant. TSS distance was retrieved using the TSSDistance plugin. We also included the —regulatory flag to annotate where a variant overlaps a regulatory feature. The DNA hypersensitivity sites (DHS) and promoter annotations were taken from 1KGP annotations[54]. To determine whether any of the lead variants were in high LD (r2>0.8) with a functional variant, the 1000 genomes project v3 EUR haplotypes were used (1000 genomes Phase III 20130502 release). Pairwise LD was calculated between the lead SNPs and all other SNPs within this dataset using Plinkv1.09[55]. Only variants that occurred in 1000 genome dataset were included in this analysis. The GWAS-catalog[10] was used to identify whether any lead variants or variants in high LD (r2>0.8) with the lead variants had been previously reported in other GWAS studies. Immunobase and Europe Pubmed Central were also used to determine whether variants had been previously associated with an auto-immune phenotype.

Table 1

Bayesian logistic regression analysis identified 31 loci with 34 independent associations for which we determined a specific disease model constellation with high certainty (MeanProbmodel≥0.6). A disease model is a list of diseases that a given locus is associated with (i.e. has a non-zero log odds ratio). Mean posterior probabilities (MeanProb) were calculated on a consensus-finding process of merging results from six different priors (Supplementary Table 3c, see Methods). Loci with very high certainty (MeanProbmodel≥0.8) for the best disease model are shown in bold type. Out of the 34 associations with MeanProbmodel≥0.6, 25 signals have 5 diseases involved, 6 signals have four diseases and 3 signals are unique to a single disease.

Locus	Signal	Chr	Locus_pos_L	Locus_pos_R	SNP	Nearby gene	OR(AS)	OR(CD)	OR(PS)	OR(PSC)	OR(UC)	Best model (VoteWinner)	VoteCount	MeanProb
119	1	12	111702182	113030487	rs3184504	SH2B3	0.915	1.062	1.055	1.189	1.047	PS_AS_CD_UC_PSC	6	0.978
130	1	16	11018622	11496579	rs367569	TNP2	0.947	0.906	0.893	0.920	0.929	PS_AS_CD_UC_PSC	6	0.963
166	1	22	21811991	22076405	rs2266961	UBE2L3	1.104	1.136	1.089	1.070	1.082	PS_AS_CD_UC_PSC	6	0.961
155	1	20	31201111	31588992	rs6058869	DNMT3B	1.091	1.060	1.071	1.062	1.060	PS_AS_CD_UC_PSC	6	0.953
91	1	10	6028491	6197536	rs61839660	IL2RA	1.079	1.198	1.152	0.719	1.022	PS_AS_CD_UC_PSC	6	0.943
8	1	1	67301096	67942593	rs80174646	IL23R	0.602	0.449	0.717	0.872	0.620	PS_AS_CD_UC_PSC	6	0.915
133	1	16	28289243	29025978	rs26528	IL27	1.126	1.152	1.049	1.076	1.082	PS_AS_CD_UC_PSC	6	0.91
39	1	2	241553993	241664801	rs3749171	GPR35	1.179	1.081	1.044	1.202	1.164	PS_AS_CD_UC_PSC	6	0.878
151	1	19	10364404	10625796	rs74956615	RAVER1	0.809	0.778	0.627	0.831	0.887	PS_AS_CD_UC_PSC	6	0.841
95	2	10	64284517	64759410	rs10761648	ZNF365	1.087	1.115	1.042	1.106	1.161	PS_AS_CD_UC_PSC	6	0.837
32	3	2	162960873	163358537	rs35667974	IFIH1	1.140	1.175	0.710	1.323	1.377	PS_AS_CD_UC_PSC	6	0.83
22	1	2	24684352	25594432	rs13407913	ADCY3	1.060	1.127	1.063	1.078	1.072	PS_AS_CD_UC_PSC	6	0.811
153	1	19	49092430	49278082	rs679574	FUT2	1.065	1.114	1.078	1.100	1.027	PS_AS_CD_UC_PSC	6	0.798
126	1	14	75698304	75749875	rs1569328	FOS	0.936	0.902	0.913	0.912	0.950	PS_AS_CD_UC_PSC	6	0.791
151	4	19	10364404	10625796	rs35074907	KEAP1	1.115	1.294	1.133	1.318	1.147	PS_AS_CD_UC_PSC	6	0.788
103	2	11	57887309	58457495	rs10750899	OR5B21	1.232	1.208	1.067	1.319	1.357	PS_AS_CD_UC_PSC	6	0.767
143	1	17	57487538	58119648	rs1292035	RPS6KB1	1.101	1.109	1.100	1.044	1.071	PS_AS_CD_UC_PSC	5	0.711
147	1	18	12516768	12926278	rs12968719	PTPN2	1.116	1.241	1.056	1.120	1.144	PS_AS_CD_UC_PSC	5	0.699
32	4	2	162960873	163358537	rs72871627	IFIH1	1.277	1.153	0.646	1.139	1.473	PS_AS_CD_UC_PSC	5	0.693
62	4	5	158496825	158948962	rs6556411	AC008697.1	0.913	0.913	1.092	0.951	0.907	PS_AS_CD_UC_PSC	5	0.681
52	1	5	38800374	39031577	rs395157	OSMR	0.961	0.911	0.949	0.917	0.921	PS_AS_CD_UC_PSC	5	0.678
48	1	4	103388565	104010837	rs3774937	NFKB1	1.120	0.992	1.041	1.167	1.107	PS_AS_UC_PSC	6	0.673
151	3	19	10364404	10625796	rs12720356	TYK2	1.085	1.083	0.775	0.897	1.101	PS_AS_CD_UC_PSC	4	0.673
8	4	1	67301096	67942593	rs183686347	IL23R	1.773	2.725	1.357	1.142	1.887	PS_AS_CD_UC_PSC	5	0.664
71	1	6	159322326	159545322	rs2451258	-	1.086	1.114	1.111	0.918	0.990	PS_AS_CD_PSC	4	0.658
151	2	19	10364404	10625796	rs35018800	TYK2	0.598	0.641	0.576	0.799	0.723	PS_AS_CD_UC_PSC	5	0.653
163	1	21	40413101	40483777	rs9977672	-	0.825	0.920	1.006	0.786	0.799	AS_CD_UC_PSC	4	0.652
165	1	21	45596207	45702354	rs4456788	AP001057.1	1.081	1.135	0.993	1.126	1.110	AS_CD_UC_PSC	4	0.652
11	1	1	152534954	152860452	rs6693105	LCE3B	0.994	0.998	0.799	1.006	1.006	PS	6	0.648
35	1	2	218877398	219266204	rs11676348	CXCR2	1.055	1.081	0.980	1.136	1.065	AS_CD_UC_PSC	4	0.638
160	1	20	62180117	62488635	rs6062496	TNFRSF6B	1.000	0.872	0.962	0.923	0.877	PS_CD_UC_PSC	6	0.628
110	1	11	114256749	114589971	rs661054	NXPE1	0.984	0.998	0.982	0.993	0.883	UC	6	0.614
4	3	1	20060965	20304744	rs4655215	RNF186	0.976	0.989	0.974	0.994	1.171	UC	6	0.603
32	1	2	162960873	163358537	rs2111485	FAP	1.030	1.054	0.852	1.074	1.082	PS_AS_CD_UC_PSC	5	0.6

Locus: number of locus defined by annotation of association boundaries (see Methods); Signal: number of independent signal (from conditional analysis) within a certain locus; Chr: chromosome; Locus_pos_l/Locus_pos_r: left/right association boundaries for locus (see Methods section). Genomic positions were retrieved from NCBI's dbSNP build v142 (genome build hg19); SNP: rs ID; Nearby gene: gene candidate nearest to the index SNP as long as a gene was with 10kb of the SNP; OR: single disease odds ratio: Ankylosing spondylitis (AS), Crohn's disease (CD), psoriasis (PS), primary sclerosing cholangitis (PSC) and ulcerative colitis (UC). Best model (VoteWinner): disease model with highest posterior probability under six different priors (); VoteCount: we counted how many priors voted for that model, and calculated the mean posterior from six different priors; MeanProb: Mean posterior probability (MeanProbmodel) for the proposed model and risk variant of six different priors.

53 in total

1. Protein kinase C-theta mediates negative feedback on regulatory T cell function.

Authors: Alexandra Zanin-Zhorov; Yi Ding; Sudha Kumari; Mukundan Attur; Keli L Hippen; Maryanne Brown; Bruce R Blazar; Steven B Abramson; Juan J Lafaille; Michael L Dustin
Journal: Science Date: 2010-03-25 Impact factor: 47.728

2. Principal components analysis corrects for stratification in genome-wide association studies.

Authors: Alkes L Price; Nick J Patterson; Robert M Plenge; Michael E Weinblatt; Nancy A Shadick; David Reich
Journal: Nat Genet Date: 2006-07-23 Impact factor: 38.330

3. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits.

Authors: Lucia A Hindorff; Praveen Sethupathy; Heather A Junkins; Erin M Ramos; Jayashri P Mehta; Francis S Collins; Teri A Manolio
Journal: Proc Natl Acad Sci U S A Date: 2009-05-27 Impact factor: 11.205

4. A subset-based approach improves power and interpretation for the combined analysis of genetic association studies of heterogeneous traits.

Authors: Samsiddhi Bhattacharjee; Preetha Rajaraman; Kevin B Jacobs; William A Wheeler; Beatrice S Melin; Patricia Hartge; Meredith Yeager; Charles C Chung; Stephen J Chanock; Nilanjan Chatterjee
Journal: Am J Hum Genet Date: 2012-05-04 Impact factor: 11.025

Review 5. Genetic insights into common pathways and complex relationships among immune-mediated diseases.

Authors: Miles Parkes; Adrian Cortes; David A van Heel; Matthew A Brown
Journal: Nat Rev Genet Date: 2013-08-06 Impact factor: 53.242

6. A promoter-level mammalian expression atlas.

Authors: Alistair R R Forrest; Hideya Kawaji; Michael Rehli; J Kenneth Baillie; Michiel J L de Hoon; Vanja Haberle; Timo Lassmann; Ivan V Kulakovskiy; Marina Lizio; Masayoshi Itoh; Robin Andersson; Christopher J Mungall; Terrence F Meehan; Sebastian Schmeier; Nicolas Bertin; Mette Jørgensen; Emmanuel Dimont; Erik Arner; Christian Schmidl; Ulf Schaefer; Yulia A Medvedeva; Charles Plessy; Morana Vitezic; Jessica Severin; Colin A Semple; Yuri Ishizu; Robert S Young; Margherita Francescatto; Intikhab Alam; Davide Albanese; Gabriel M Altschuler; Takahiro Arakawa; John A C Archer; Peter Arner; Magda Babina; Sarah Rennie; Piotr J Balwierz; Anthony G Beckhouse; Swati Pradhan-Bhatt; Judith A Blake; Antje Blumenthal; Beatrice Bodega; Alessandro Bonetti; James Briggs; Frank Brombacher; A Maxwell Burroughs; Andrea Califano; Carlo V Cannistraci; Daniel Carbajo; Yun Chen; Marco Chierici; Yari Ciani; Hans C Clevers; Emiliano Dalla; Carrie A Davis; Michael Detmar; Alexander D Diehl; Taeko Dohi; Finn Drabløs; Albert S B Edge; Matthias Edinger; Karl Ekwall; Mitsuhiro Endoh; Hideki Enomoto; Michela Fagiolini; Lynsey Fairbairn; Hai Fang; Mary C Farach-Carson; Geoffrey J Faulkner; Alexander V Favorov; Malcolm E Fisher; Martin C Frith; Rie Fujita; Shiro Fukuda; Cesare Furlanello; Masaaki Furino; Jun-ichi Furusawa; Teunis B Geijtenbeek; Andrew P Gibson; Thomas Gingeras; Daniel Goldowitz; Julian Gough; Sven Guhl; Reto Guler; Stefano Gustincich; Thomas J Ha; Masahide Hamaguchi; Mitsuko Hara; Matthias Harbers; Jayson Harshbarger; Akira Hasegawa; Yuki Hasegawa; Takehiro Hashimoto; Meenhard Herlyn; Kelly J Hitchens; Shannan J Ho Sui; Oliver M Hofmann; Ilka Hoof; Furni Hori; Lukasz Huminiecki; Kei Iida; Tomokatsu Ikawa; Boris R Jankovic; Hui Jia; Anagha Joshi; Giuseppe Jurman; Bogumil Kaczkowski; Chieko Kai; Kaoru Kaida; Ai Kaiho; Kazuhiro Kajiyama; Mutsumi Kanamori-Katayama; Artem S Kasianov; Takeya Kasukawa; Shintaro Katayama; Sachi Kato; Shuji Kawaguchi; Hiroshi Kawamoto; Yuki I Kawamura; Tsugumi Kawashima; Judith S Kempfle; Tony J Kenna; Juha Kere; Levon M Khachigian; Toshio Kitamura; S Peter Klinken; Alan J Knox; Miki Kojima; Soichi Kojima; Naoto Kondo; Haruhiko Koseki; Shigeo Koyasu; Sarah Krampitz; Atsutaka Kubosaki; Andrew T Kwon; Jeroen F J Laros; Weonju Lee; Andreas Lennartsson; Kang Li; Berit Lilje; Leonard Lipovich; Alan Mackay-Sim; Ri-ichiroh Manabe; Jessica C Mar; Benoit Marchand; Anthony Mathelier; Niklas Mejhert; Alison Meynert; Yosuke Mizuno; David A de Lima Morais; Hiromasa Morikawa; Mitsuru Morimoto; Kazuyo Moro; Efthymios Motakis; Hozumi Motohashi; Christine L Mummery; Mitsuyoshi Murata; Sayaka Nagao-Sato; Yutaka Nakachi; Fumio Nakahara; Toshiyuki Nakamura; Yukio Nakamura; Kenichi Nakazato; Erik van Nimwegen; Noriko Ninomiya; Hiromi Nishiyori; Shohei Noma; Shohei Noma; Tadasuke Noazaki; Soichi Ogishima; Naganari Ohkura; Hiroko Ohimiya; Hiroshi Ohno; Mitsuhiro Ohshima; Mariko Okada-Hatakeyama; Yasushi Okazaki; Valerio Orlando; Dmitry A Ovchinnikov; Arnab Pain; Robert Passier; Margaret Patrikakis; Helena Persson; Silvano Piazza; James G D Prendergast; Owen J L Rackham; Jordan A Ramilowski; Mamoon Rashid; Timothy Ravasi; Patrizia Rizzu; Marco Roncador; Sugata Roy; Morten B Rye; Eri Saijyo; Antti Sajantila; Akiko Saka; Shimon Sakaguchi; Mizuho Sakai; Hiroki Sato; Suzana Savvi; Alka Saxena; Claudio Schneider; Erik A Schultes; Gundula G Schulze-Tanzil; Anita Schwegmann; Thierry Sengstag; Guojun Sheng; Hisashi Shimoji; Yishai Shimoni; Jay W Shin; Christophe Simon; Daisuke Sugiyama; Takaai Sugiyama; Masanori Suzuki; Naoko Suzuki; Rolf K Swoboda; Peter A C 't Hoen; Michihira Tagami; Naoko Takahashi; Jun Takai; Hiroshi Tanaka; Hideki Tatsukawa; Zuotian Tatum; Mark Thompson; Hiroo Toyodo; Tetsuro Toyoda; Elvind Valen; Marc van de Wetering; Linda M van den Berg; Roberto Verado; Dipti Vijayan; Ilya E Vorontsov; Wyeth W Wasserman; Shoko Watanabe; Christine A Wells; Louise N Winteringham; Ernst Wolvetang; Emily J Wood; Yoko Yamaguchi; Masayuki Yamamoto; Misako Yoneda; Yohei Yonekura; Shigehiro Yoshida; Susan E Zabierowski; Peter G Zhang; Xiaobei Zhao; Silvia Zucchelli; Kim M Summers; Harukazu Suzuki; Carsten O Daub; Jun Kawai; Peter Heutink; Winston Hide; Tom C Freeman; Boris Lenhard; Vladimir B Bajic; Martin S Taylor; Vsevolod J Makeev; Albin Sandelin; David A Hume; Piero Carninci; Yoshihide Hayashizaki
Journal: Nature Date: 2014-03-27 Impact factor: 49.962

7. Meta-analysis identifies 29 additional ulcerative colitis risk loci, increasing the number of confirmed associations to 47.

Authors: Carl A Anderson; Gabrielle Boucher; Charlie W Lees; Andre Franke; Mauro D'Amato; Kent D Taylor; James C Lee; Philippe Goyette; Marcin Imielinski; Anna Latiano; Caroline Lagacé; Regan Scott; Leila Amininejad; Suzannah Bumpstead; Leonard Baidoo; Robert N Baldassano; Murray Barclay; Theodore M Bayless; Stephan Brand; Carsten Büning; Jean-Frédéric Colombel; Lee A Denson; Martine De Vos; Marla Dubinsky; Cathryn Edwards; David Ellinghaus; Rudolf S N Fehrmann; James A B Floyd; Timothy Florin; Denis Franchimont; Lude Franke; Michel Georges; Jürgen Glas; Nicole L Glazer; Stephen L Guthery; Talin Haritunians; Nicholas K Hayward; Jean-Pierre Hugot; Gilles Jobin; Debby Laukens; Ian Lawrance; Marc Lémann; Arie Levine; Cecile Libioulle; Edouard Louis; Dermot P McGovern; Monica Milla; Grant W Montgomery; Katherine I Morley; Craig Mowat; Aylwin Ng; William Newman; Roel A Ophoff; Laura Papi; Orazio Palmieri; Laurent Peyrin-Biroulet; Julián Panés; Anne Phillips; Natalie J Prescott; Deborah D Proctor; Rebecca Roberts; Richard Russell; Paul Rutgeerts; Jeremy Sanderson; Miquel Sans; Philip Schumm; Frank Seibold; Yashoda Sharma; Lisa A Simms; Mark Seielstad; A Hillary Steinhart; Stephan R Targan; Leonard H van den Berg; Morten Vatn; Hein Verspaget; Thomas Walters; Cisca Wijmenga; David C Wilson; Harm-Jan Westra; Ramnik J Xavier; Zhen Z Zhao; Cyriel Y Ponsioen; Vibeke Andersen; Leif Torkvist; Maria Gazouli; Nicholas P Anagnou; Tom H Karlsen; Limas Kupcinskas; Jurgita Sventoraityte; John C Mansfield; Subra Kugathasan; Mark S Silverberg; Jonas Halfvarson; Jerome I Rotter; Christopher G Mathew; Anne M Griffiths; Richard Gearry; Tariq Ahmad; Steven R Brant; Mathias Chamaillard; Jack Satsangi; Judy H Cho; Stefan Schreiber; Mark J Daly; Jeffrey C Barrett; Miles Parkes; Vito Annese; Hakon Hakonarson; Graham Radford-Smith; Richard H Duerr; Séverine Vermeire; Rinse K Weersma; John D Rioux
Journal: Nat Genet Date: 2011-02-06 Impact factor: 38.330

8. Enhanced meta-analysis and replication studies identify five new psoriasis susceptibility loci.

Authors: Lam C Tsoi; Sarah L Spain; Eva Ellinghaus; Philip E Stuart; Francesca Capon; Jo Knight; Trilokraj Tejasvi; Hyun M Kang; Michael H Allen; Sylviane Lambert; Stefan W Stoll; Stephan Weidinger; Johann E Gudjonsson; Sulev Koks; Külli Kingo; Tonu Esko; Sayantan Das; Andres Metspalu; Michael Weichenthal; Charlotta Enerback; Gerald G Krueger; John J Voorhees; Vinod Chandran; Cheryl F Rosen; Proton Rahman; Dafna D Gladman; Andre Reis; Rajan P Nair; Andre Franke; Jonathan N W N Barker; Goncalo R Abecasis; Richard C Trembath; James T Elder
Journal: Nat Commun Date: 2015-05-05 Impact factor: 14.919

9. Fast principal component analysis of large-scale genome-wide data.

Authors: Gad Abraham; Michael Inouye
Journal: PLoS One Date: 2014-04-09 Impact factor: 3.240

10. Integrative analysis of 111 reference human epigenomes.

Authors: Anshul Kundaje; Wouter Meuleman; Jason Ernst; Misha Bilenky; Angela Yen; Alireza Heravi-Moussavi; Pouya Kheradpour; Zhizhuo Zhang; Jianrong Wang; Michael J Ziller; Viren Amin; John W Whitaker; Matthew D Schultz; Lucas D Ward; Abhishek Sarkar; Gerald Quon; Richard S Sandstrom; Matthew L Eaton; Yi-Chieh Wu; Andreas R Pfenning; Xinchen Wang; Melina Claussnitzer; Yaping Liu; Cristian Coarfa; R Alan Harris; Noam Shoresh; Charles B Epstein; Elizabeta Gjoneska; Danny Leung; Wei Xie; R David Hawkins; Ryan Lister; Chibo Hong; Philippe Gascard; Andrew J Mungall; Richard Moore; Eric Chuah; Angela Tam; Theresa K Canfield; R Scott Hansen; Rajinder Kaul; Peter J Sabo; Mukul S Bansal; Annaick Carles; Jesse R Dixon; Kai-How Farh; Soheil Feizi; Rosa Karlic; Ah-Ram Kim; Ashwinikumar Kulkarni; Daofeng Li; Rebecca Lowdon; GiNell Elliott; Tim R Mercer; Shane J Neph; Vitor Onuchic; Paz Polak; Nisha Rajagopal; Pradipta Ray; Richard C Sallari; Kyle T Siebenthall; Nicholas A Sinnott-Armstrong; Michael Stevens; Robert E Thurman; Jie Wu; Bo Zhang; Xin Zhou; Arthur E Beaudet; Laurie A Boyer; Philip L De Jager; Peggy J Farnham; Susan J Fisher; David Haussler; Steven J M Jones; Wei Li; Marco A Marra; Michael T McManus; Shamil Sunyaev; James A Thomson; Thea D Tlsty; Li-Huei Tsai; Wei Wang; Robert A Waterland; Michael Q Zhang; Lisa H Chadwick; Bradley E Bernstein; Joseph F Costello; Joseph R Ecker; Martin Hirst; Alexander Meissner; Aleksandar Milosavljevic; Bing Ren; John A Stamatoyannopoulos; Ting Wang; Manolis Kellis
Journal: Nature Date: 2015-02-19 Impact factor: 69.504

245 in total

1. Subset-Based Analysis Using Gene-Environment Interactions for Discovery of Genetic Associations across Multiple Studies or Phenotypes.

Authors: Youfei Yu; Lu Xia; Seunggeun Lee; Xiang Zhou; Heather M Stringham; Michael Boehnke; Bhramar Mukherjee
Journal: Hum Hered Date: 2019-05-27 Impact factor: 0.444

Review 2. Genetics and the Causes of Ankylosing Spondylitis.

Authors: Aimee Hanson; Matthew A Brown
Journal: Rheum Dis Clin North Am Date: 2017-08 Impact factor: 2.670

Review 3. Biomarker development for axial spondyloarthritis.

Authors: Matthew A Brown; Zhixiu Li; Kim-Anh Lê Cao
Journal: Nat Rev Rheumatol Date: 2020-06-30 Impact factor: 20.543

Review 4. Rare and common variant discovery in complex disease: the IBD case study.

Authors: Guhan R Venkataraman; Manuel A Rivas
Journal: Hum Mol Genet Date: 2019-11-21 Impact factor: 6.150

Review 5. Towards a Better Classification and Novel Therapies Based on the Genetics of Systemic Sclerosis.

Authors: Marialbert Acosta-Herrera; Elena López-Isac; Javier Martín
Journal: Curr Rheumatol Rep Date: 2019-07-15 Impact factor: 4.592

6. Differential miRNA Expression in Ileal and Colonic Tissues Reveals an Altered Immunoregulatory Molecular Profile in Individuals With Crohn's Disease versus Healthy Subjects.

Authors: Aylia Mohammadi; Orlaith B Kelly; Michelle I Smith; Boyko Kabakchiev; Mark S Silverberg
Journal: J Crohns Colitis Date: 2019-10-28 Impact factor: 9.071

Review 7. The genetics revolution in rheumatology: large scale genomic arrays and genetic mapping.

Authors: Stephen Eyre; Gisela Orozco; Jane Worthington
Journal: Nat Rev Rheumatol Date: 2017-06-01 Impact factor: 20.543

8. Epigenetics in the Primary Biliary Cholangitis and Primary Sclerosing Cholangitis.

Authors: Angela C Cheung; Nicholas F LaRusso; Gregory J Gores; Konstantinos N Lazaridis
Journal: Semin Liver Dis Date: 2017-05-31 Impact factor: 6.115

Review 9. Crohn's disease.

Authors: Giulia Roda; Siew Chien Ng; Paulo Gustavo Kotze; Marjorie Argollo; Remo Panaccione; Antonino Spinelli; Arthur Kaser; Laurent Peyrin-Biroulet; Silvio Danese
Journal: Nat Rev Dis Primers Date: 2020-04-02 Impact factor: 52.329

Review 10. New insights into the epigenetics of inflammatory rheumatic diseases.

Authors: Esteban Ballestar; Tianlu Li
Journal: Nat Rev Rheumatol Date: 2017-09-14 Impact factor: 20.543