Literature DB >> 21852963

Pervasive sharing of genetic effects in autoimmune disease.

Chris Cotsapas1, Benjamin F Voight, Elizabeth Rossin, Kasper Lage, Benjamin M Neale, Chris Wallace, Gonçalo R Abecasis, Jeffrey C Barrett, Timothy Behrens, Judy Cho, Philip L De Jager, James T Elder, Robert R Graham, Peter Gregersen, Lars Klareskog, Katherine A Siminovitch, David A van Heel, Cisca Wijmenga, Jane Worthington, John A Todd, David A Hafler, Stephen S Rich, Mark J Daly.   

Abstract

Genome-wide association (GWA) studies have identified numerous, replicable, genetic associations between common single nucleotide polymorphisms (SNPs) and risk of common autoimmune and inflammatory (immune-mediated) diseases, some of which are shared between two diseases. Along with epidemiological and clinical evidence, this suggests that some genetic risk factors may be shared across diseases-as is the case with alleles in the Major Histocompatibility Locus. In this work we evaluate the extent of this sharing for 107 immune disease-risk SNPs in seven diseases: celiac disease, Crohn's disease, multiple sclerosis, psoriasis, rheumatoid arthritis, systemic lupus erythematosus, and type 1 diabetes. We have developed a novel statistic for Cross Phenotype Meta-Analysis (CPMA) which detects association of a SNP to multiple, but not necessarily all, phenotypes. With it, we find evidence that 47/107 (44%) immune-mediated disease risk SNPs are associated to multiple-but not all-immune-mediated diseases (SNP-wise P(CPMA)<0.01). We also show that distinct groups of interacting proteins are encoded near SNPs which predispose to the same subsets of diseases; we propose these as the mechanistic basis of shared disease risk. We are thus able to leverage genetic data across diseases to construct biological hypotheses about the underlying mechanism of pathogenesis.

Entities:  

Mesh:

Year:  2011        PMID: 21852963      PMCID: PMC3154137          DOI: 10.1371/journal.pgen.1002254

Source DB:  PubMed          Journal:  PLoS Genet        ISSN: 1553-7390            Impact factor:   5.917


Introduction

The human immune-mediated diseases are the result of aberrant immune responses. These immune responses may lead to chronic inflammation and tissue destruction, often targeting a specific organ site. The outcome of this process is immune-mediated inflammatory and autoimmune disease, affecting approximately 5% of the population [1]. Extensive clinical and epidemiologic observations have shown that immune-mediated inflammatory and autoimmune diseases can occur either in the same individual or in closely related family members. This clustering of multiple diseases appears more frequently than expected if disease processes were independent. As each of the immune-mediated inflammatory and autoimmune diseases has strong genetic influences on disease risk [2]–[7], the observed clustering of multiple diseases could be due to an overlap in the causal genes and pathways [8], [9]. The patterns of clustering of diseases across the population are complex [10] – each disease has a prevalence between 0.01%–3%, so direct assessment of co-aggregation within individuals or families does not result in the very large samples required for genetic or epidemiological investigation. Thus it is unsurprising that to date, these observations have yet to be translated into determinants of the shared molecular etiologies of disease. Recent GWA studies in immune-mediated and autoimmune diseases have identified 140 regions of the genome with statistically significant and robust evidence of presence of disease susceptibility loci. A subset of these loci have been shown to modulate risk of multiple diseases [3], [6], [11]–[14]. In addition, there is evidence that loci predisposing to one disease can have effects on risk of a second disease [15], although the risk allele for one disease may not be the same as for the second [16]. Together, these observations support the hypothesis of a common genetic basis of immune-mediated and autoimmune diseases [17]. There is now the ability to estimate both the number of loci contributing to risk of multiple diseases and the spectrum of diseases that each locus influences. In addition, grouping variants by the diseases they influence should provide insight into the specific biological processes underlying co-morbidity and disease risk. In this report, we systematically investigate the genetic commonality in immune-mediated inflammatory and autoimmune diseases by examining the contributions of associated genomic risk regions in seven diseases: celiac disease (CeD), Crohn's disease (CD), multiple sclerosis (MS), psoriasis (Ps), rheumatoid arthritis (RA), systemic lupus erythematosus (SLE) and type 1 diabetes (T1D). We find that nearly half of loci identified in GWAS studies of an individual disease influence risk to at least two diseases, arguing for a genetic basis to co-morbidity. We also find several variants with opposing risk profiles in different diseases. Supporting the idea that common patterns of association implicate shared biological processes, we further demonstrate that loci clustered by the pattern of diseases they affect harbor genes encoding interacting proteins at a much higher rate than by chance. These results suggest that multi-phenotype mapping will identify the molecular mechanisms underlying co-morbid immune-mediated inflammatory and autoimmune diseases.

Results

We first test our hypothesis of common genetic determinants by examining evidence of association of genetic variants in known immune-mediated and autoimmune disease susceptibility loci to multiple disease phenotypes. We collated a list of 140 single nucleotide polymorphisms (SNPs) representing reported associations to at least one immune-mediated disease at genome-wide significance levels. Where data for the reported SNP itself were not available in our GWA studies (Table 1), we chose a proxy in high linkage disequilibrium to the reported marker (r>0.9 in HapMap/CEU). We did not consider SNPs in the human Major Histocompatibility Complex (MHC) from this analysis, as its role in many of these diseases is well-established and the classically associated alleles in the HLA region are not well captured by SNPs [18]. We were able to acquire data for either the reported SNP or a good proxy in 107 of 140 cases, and assembled genotype test summaries for these from previously described GWA studies representing over 26,000 disease cases (Table 1).
Table 1

Participating studies.

DiseaseCasesControlsReference
Celiac disease3796815422
Crohn's disease323048291
Multiple sclerosis262472204
Psoriasis135914005
Rheumatoid arthritis5539201696
Systemic Lupus Erythematosus1963432923
Type 1 diabetes7514904524

Data were collated for seven phenotypes from meta-analyses incorporating all known genome-wide association studies. SLE is the exception as no comprehensive meta-analysis has yet been published; data were instead obtained from a recent meta-analysis including some, but not all, known genome-wide association studies. Note that controls overlap in some cases due to the use of common shared sample genotypes.

Data were collated for seven phenotypes from meta-analyses incorporating all known genome-wide association studies. SLE is the exception as no comprehensive meta-analysis has yet been published; data were instead obtained from a recent meta-analysis including some, but not all, known genome-wide association studies. Note that controls overlap in some cases due to the use of common shared sample genotypes. We have developed a cross-phenotype meta-analysis (CPMA) statistic to assess association across multiple phenotypes. The CPMA statistic determines evidence for the hypothesis that each independent SNP has multiple phenotypic associations. Support for this hypothesis would be shown by deviations from expected uniformity of the distribution of association p-values, indicative of multiple associations. The likelihood of the observed rate of exponential decay of −log10(p) is calculated and compared to the null expectation (the decay rate should be unity) as a likelihood ratio test (see Materials and Methods for details). This CPMA statistic has one degree of freedom, as it measures a deviation in p-value behavior instead of testing all possible combinations of diseases for association to each SNP. A total of 47 of the 107 SNPs tested have evidence of association to multiple diseases (SNP-wise <0.01; expectation roughly 1 by chance; binomial probability of observing this result p = 3×10−64). This highly significant result confirms widespread sharing of genetic loci between immune-mediated inflammatory and autoimmune diseases. Further, these “multi-phenotype” SNPs include many loci not previously known to be shared across diseases, as well as new predictions of association for previously known shared loci (Table 2).
Table 2

SNPs associated with multiple phenotypes.

SNPRAPsoriasisMSSLECrohnCoeliacT1DCPMAReference
NameChrPositionAjAmGenesZ p Z p Z p Z p Z p Z p Z p p
rs10889677167437141CAIL23R0.09.8E-015.13.5E-07−2.64.9E-03−0.36.2E-0110.39.0E-251.18.7E-010.15.3E-016.9E-2537
rs30872432204564425GACTLA4−5.71.2E-080.38.0E-01−0.72.5E-01−0.53.0E-01−1.22.3E-012.92.2E-03−8.51.1E-172.8E-2136
rs25421511812769947TGPTPN24.23.0E-052.04.1E-02−0.82.2E-011.11.5E-016.81.2E-114.86.9E-077.15.9E-135.3E-191
rs2201841167406223AGIL23R0.09.9E-015.22.7E-07−2.21.3E-02−0.36.3E-019.93.5E-231.08.4E-010.25.7E-013.7E-185
rs11209032167452113GAIL23R−0.65.2E-015.31.3E-07−1.21.2E-01−0.72.3E-018.43.1E-171.28.8E-010.24.4E-016.4E-1833
rs18932171812799340AGPTPN24.22.4E-052.13.9E-02−0.62.8E-011.01.5E-016.56.5E-114.96.1E-077.48.2E-143.4E-1724
rs9179972102529086CTIL18RAP0.37.8E-01−0.28.8E-010.33.7E-011.11.3E-014.22.2E-057.61.1E-14−1.59.4E-014.9E-133
rs127087161611087374AGCLEC16A0.28.3E-01−0.38.1E-013.71.1E-04−1.01.5E-01−0.65.5E-011.65.3E-02−8.21.2E-162.0E-1232
rs28725071735294289GAORMDL34.14.7E-050.38.0E-01−3.35.5E-04−0.36.3E-014.72.1E-060.77.6E-015.02.5E-074.1E-091
rs38212362191728264GASTAT44.72.5E-06−1.79.5E-020.14.6E-016.22.1E-10−1.79.6E-021.93.1E-023.43.5E-049.8E-082
rs6441961346327388CTCCR11.12.7E-01−0.37.5E-010.14.7E-011.39.1E-01−0.56.1E-015.76.9E-093.71.0E-041.9E-073
rs22904001735319766CTORMDL3−3.91.1E-040.47.2E-013.35.5E-040.25.7E-013.72.4E-040.24.1E-01−5.11.4E-071.2E-0624
rs71974751630550368CT16p11.22.93.8E-032.94.3E-03-0.91.7E-010.72.4E-01−4.22.5E-050.43.4E-01−0.91.9E-012.0E-0635
rs4917014750083124TGIKZF1−2.51.3E-020.93.8E-013.52.6E-04−2.91.9E-031.61.1E-010.91.8E-01−3.35.2E-044.9E-0635
rs68228444123867026GTIL2-IL21−3.46.5E-04−1.78.7E-021.21.1E-01-0.24.3E-01−2.41.5E-023.34.9E-04−1.11.5E-016.2E-0638
rs10517086425761780GA4p15.25.12.8E-070.65.7E-010.43.6E-010.15.5E-012.85.5E-030.25.6E-014.95.7E-072.3E-0524
rs112032032142709255GAUBASH3A4.22.5E-051.31.9E-010.24.4E-011.65.3E-020.09.8E-013.11.0E-036.61.8E-112.5E-0524
rs47281427128167918GAIRF54.57.1E-061.41.8E-01−1.56.4E-026.22.9E-100.84.4E-012.11.6E-02−0.24.1E-014.4E-0535
rs11755527691014952CGBACH2−1.41.6E-01−0.75.1E-01−2.82.7E-031.08.5E-01−2.61.0E-023.52.8E-045.61.0E-088.0E-0524
rs77092125158696755TCIL12B−0.65.7E-01−6.33.8E-103.61.9E-04−2.31.1E-023.39.8E-040.15.5E-011.28.9E-018.8E-0539
rs947474106430456AGPRKCQ−4.48.7E-06−1.03.4E-01−0.36.3E-01−1.08.5E-01−2.41.9E-022.02.1E-02−3.71.1E-049.1E-0534
rs21889625131798704CT5q31−0.37.7E-013.03.2E-031.39.6E-021.57.0E-025.94.6E-091.83.7E-022.92.0E-031.6E-041
rs7441661737767727AGSTAT3−0.74.7E-011.31.9E-01-4.46.4E-060.77.6E-01−4.55.9E-062.29.8E-01−2.64.6E-032.2E-041
rs47880841628447349CTIL272.61.0E-021.32.1E-010.36.1E-011.39.1E-022.93.5E-032.31.1E-02−6.55.1E-114.1E-0424
rs20824125158650367GAIL12B−1.22.3E-01−6.28.8E-10NANA−3.87.1E-052.69.6E-030.43.6E-010.05.0E-014.6E-045
rs11465804167414547TGIL23R−0.46.8E-01−4.91.3E-06−1.79.5E-01−0.43.6E-01−12.51.0E-351.21.2E-01−1.01.6E-015.2E-041
rs4634262220133739CTHIC2-UBE2L32.22.9E-021.41.6E-011.01.7E-010.33.7E-011.86.7E-023.34.1E-040.52.9E-015.2E-0435
rs7633611865682622CTCD2262.13.3E-021.96.5E-02−1.74.5E-020.04.9E-012.04.1E-022.56.7E-035.11.6E-079.0E-0424
rs115843831197667523TCKIF21B2.32.3E-020.09.7E-013.34.6E-04−0.36.2E-01−5.06.8E-070.82.0E-01−2.31.0E-029.0E-041
rs659033011127816269GAETS13.21.5E-03−0.56.0E-01−2.73.7E-032.11.9E-020.65.2E-011.49.2E-011.83.9E-022.6E-0335
rs49003841497568704AG14q32.21.32.0E-010.84.4E-011.83.9E-02−1.93.0E-020.28.1E-013.27.6E-045.28.9E-082.8E-0324
rs1075866994971602ACJAK21.03.3E-011.22.4E-01−3.34.2E-040.34.0E-015.06.8E-071.83.7E-021.01.7E-013.4E-031
rs19135171049789060AGLRRC18-WDFY4−4.41.1E-050.84.4E-010.43.5E-01−2.48.7E-03−0.46.9E-010.14.4E-01-0.82.2E-014.1E-0335
rs45058484123490097AGIL2−0.56.0E-010.19.0E-010.57.0E-011.11.3E-013.02.6E-033.19.6E-046.62.3E-114.4E-0324
rs7804356726664905TC7p15.2−1.31.9E-011.22.2E-011.01.5E-01−0.14.8E-013.45.9E-041.39.9E-02−5.69.5E-095.6E-0324
rs11258747106512897GTPRKCQ3.02.4E-031.96.0E-02-2.56.7E-030.67.3E-010.19.2E-010.24.2E-014.87.7E-077.5E-0324
rs7038421256449006AGCYP27B1−2.76.1E-03−0.18.9E-014.11.7E-05−0.46.7E-01−1.12.5E-012.02.3E-02−2.21.5E-028.3E-0331
rs19907602162949558TCIFIH1−1.03.0E-01−2.41.7E-020.15.4E-01−3.43.2E-040.65.6E-010.46.5E-01−6.22.5E-108.6E-0324
rs24766011114089610GAPTPN2218.29.1E-740.01.0E+000.43.5E-014.03.3E-05−4.31.8E-051.74.2E-0220.41.5E-926.3E-1601
rs318450412110347328TCSH2B3−2.93.6E-03−2.04.1E-023.43.3E-04−2.73.6E-03−3.46.2E-047.31.2E-13−11.97.7E-334.3E-1924
rs118651211611074189CACLEC16A0.37.7E-010.65.2E-014.38.7E-06−0.72.3E-010.93.8E-011.11.4E-01−8.92.1E-191.1E-144
rs28163161189268470ACRGS1−1.03.3E-01−0.46.7E-013.19.0E-04−0.15.3E-01−0.56.4E-016.92.7E-12−3.94.2E-055.2E-133
rs2104286106139051TCIL2RA−3.11.8E-03−0.19.5E-016.23.5E-100.46.5E-01−0.84.4E-010.53.1E-01−6.45.9E-111.2E-0832
rs30245051203328299GAIL10−1.03.3E-010.93.8E-011.57.3E-024.21.3E-052.41.6E-021.65.4E-02−4.86.2E-072.2E-0623
rs100454315158747111CAIL12B0.46.5E-014.56.6E-060.36.3E-012.48.8E-03−5.88.8E-090.56.8E-010.14.6E-016.0E-041
rs6106046138241110TGTNFAIP3−4.23.3E-054.58.0E-060.33.8E-01−1.39.9E-021.41.8E-010.46.5E-01−0.24.3E-012.7E-035
rs4613763540428485TCPTGER40.93.9E-010.74.7E-01−4.21.1E-050.43.4E-019.65.0E-220.15.2E-01−0.53.1E-014.0E-031

47/107 SNPs tested showed significant evidence of association to multiple diseases (P<0.01), where only one is expected by chance. These SNPs are therefore candidate drivers for the shared genetic architecture between diseases. The SNPs shown in the lower panel also have strong evidence of association in opposite directions across phenotypes and may be crucial decision points in pathogenesis. Aj = major allele; Am = minor allele. Z scores are reported from published GWA studies and arbitrarily signed relative to the direction of effect in celiac disease. Note that no MS data were available for rs2082412 as it had not been imputed accurately in the participating MS study. Data for all SNPs is presented in Dataset S1.

47/107 SNPs tested showed significant evidence of association to multiple diseases (P<0.01), where only one is expected by chance. These SNPs are therefore candidate drivers for the shared genetic architecture between diseases. The SNPs shown in the lower panel also have strong evidence of association in opposite directions across phenotypes and may be crucial decision points in pathogenesis. Aj = major allele; Am = minor allele. Z scores are reported from published GWA studies and arbitrarily signed relative to the direction of effect in celiac disease. Note that no MS data were available for rs2082412 as it had not been imputed accurately in the participating MS study. Data for all SNPs is presented in Dataset S1. Although our CPMA statistic is agnostic to effect direction in each disease, a subset of the 47 multi-phenotype (CPMA positive) SNPs appeared to have strong allelic effects in opposite directions in different diseases [16]. A total of 9 SNPs had strong evidence of such directional association (an association p<1×10−4 with at least one protective and one risk effect; lower panel in Table 2). This suggests that shared associations have complex effects on disease outcomes and may be of particular importance in pathogenic processes. We next examined the patterns of association across 47 multi-phenotype SNPs to determine evidence of either a global autoimmune process or biological pathways influencing sets of diseases. On visual inspection of these data we found a striking patterning of associations across diseases: only one SNP (rs3184504, in an exon of SH2B3) exhibited evidence of association to all seven diseases; the others appeared to associate only to subsets of diseases (Table 2). To formalize the analysis of association patterns across diseases, we determined specific patterns of associations across SNPs by computing SNP-SNP distances based on the level of association to each disease followed by hierarchical clustering to group them (Figure 1A; see Materials and Methods section for clustering details). SNPs in loci encoding proteins known to interact clustered together: for example, the independent effects at IL12B and IL23R, which encode subunits of a ligand-receptor pair are in the same region of the dendrogram. We next partitioned the dendrogram “tree” into four clusters and summarized the cumulative association of each cluster to each disease by combining our underlying dataset of association p-values per cluster, per disease using Fisher's omnibus test (Figure 1B; see Materials and Methods for details). Each cluster had a different pattern of associations across diseases; these patterns suggest that the clusters represent distinct co-morbid mechanisms.
Figure 1

Patterns of association across diseases correlate with protein-protein interactions.

A: 47 SNPs with evidence of association to multiple diseases (P<0.01) fall into groups clustered by the pattern of association across diseases. Clusters are numbered arbitrarily. B: Clusters show different patterns of association across diseases. We summarize the differential disease effects of each cluster with a cumulative association statistic (Fisher's method for combining p values). These patterns are different for each cluster, suggesting each represents a different co-morbid mechanism. Note that these figures are based on the same underlying association statistics the clustering in the first panel is derived from. C: proteins encoded within the linkage disequilibrium scope around SNPs in the same cluster interact either directly or via common intermediates. Three of our four clusters have significant protein inter-connectivity (permuted P<0.05; see Materials and Methods and [19] for details).

Patterns of association across diseases correlate with protein-protein interactions.

A: 47 SNPs with evidence of association to multiple diseases (P<0.01) fall into groups clustered by the pattern of association across diseases. Clusters are numbered arbitrarily. B: Clusters show different patterns of association across diseases. We summarize the differential disease effects of each cluster with a cumulative association statistic (Fisher's method for combining p values). These patterns are different for each cluster, suggesting each represents a different co-morbid mechanism. Note that these figures are based on the same underlying association statistics the clustering in the first panel is derived from. C: proteins encoded within the linkage disequilibrium scope around SNPs in the same cluster interact either directly or via common intermediates. Three of our four clusters have significant protein inter-connectivity (permuted P<0.05; see Materials and Methods and [19] for details). Our underlying hypothesis has been that phenotype-driven clusters represent distinct molecular mechanisms. This leads to the prediction that components of these clusters/pathways are encoded in associated loci; in other words, proteins encoded around SNPs in the same cluster should interact. We test this prediction by looking for connectivity between proteins encoded around SNPs within each cluster as described elsewhere [19]. Briefly, we define a genomic region around each SNP in terms of linkage disequilibrium and consider any protein overlapping that region. We then ask if proteins encoded around SNPs in the same cluster interact using protein-protein interaction maps, excluding interactions between proteins in the same region (see Materials and Methods and [19]). We find that three of the four clusters we define by patterns of association have significant connectivity (Figure 1C; permuted P<0.05) by this method, suggesting that these represent distinct molecular mechanisms affected by genetic risk variants. Two of these groups of interacting proteins are also preferentially expressed [19] in immune cell subtypes compared to other tissue types (Figure S1), supporting our hypothesis that these represent true pathways underlying pathogenesis.

Discussion

Immune-mediated inflammatory and autoimmune diseases have been known to cluster in families, suggesting a strong genetic component to risk. The genes in the human MHC (HLA complex) have been associated with disease risk, suggesting a common immune pathway. Less clear is whether other genetic variants associated with individual diseases also form common pathways/mechanisms for autoimmunity. Recent results from GWA studies suggest that common genetic mechanisms may underlie the observed clustering of multiple autoimmune diseases within a person or family. In this work we have tested the hypothesis that immunologically relevant genetic variation will either (1) underlie risk to all immune-mediated diseases, implicating a global immunological process; (2) influence risk to a discrete subset of diseases, implicating molecular entities underlying that co-morbidity; or (3) modulate risk for only one disorder thereby implying a disease-specific process. A central goal of complex disease genetics is to uncover the pathways perturbed in disease and shed light on the underlying cellular processes. Despite a wealth of molecular insight into immune function few key pathways underlying genetic susceptibility to immune-mediated diseases have been elucidated. To identify these processes in immune-mediated inflammatory and autoimmune disease, we tested genetic variation contributing to seven diseases. We observed an overwhelming abundance of commonality across these phenotypes, assorting into cohesive phenotype-genotype groups that appear to underlie co-morbidities. By analyzing loci known to associate to at least one disease, we are able to identify groups of diseases that should be considered as a unified phenotype and analyzed together. We further demonstrate that this approach generates novel biological insights into pathogenesis, often difficult to obtain from genomic studies of single traits [20]. We have described a novel statistic, CPMA, which assesses evidence for multiple associations to a marker. Rather than perform a meta-analysis, which would only detect association to all phenotypes (or suffer from heterogeneity) or test all combinations of phenotypes which would increase the multiple testing burden, we look for deviation in the distribution of association p values. Our statistic thus detects markers associated to at least some, but not necessarily all, phenotypes; we note that this is a single degree of freedom test, providing high power to reject the null hypothesis. This power comes at the price of not knowing to which phenotypes the marker is associate; we overcome this with our clustering analysis, which resolves groups of markers associating to the same diseases. Thus our analytic strategy is able to both detect shared associations and identify the relevant phenotypes. Our approach appears capable of distinguishing distinct genetic effects in the same locus in addition to validated shared associations. For example, it is now clear that the two signals in the IL2/IL21 locus on chromosome 4q27 are distinct, with T1D mapping to IL2 and other diseases to IL21 [21]. Our analysis detects this difference, clustering the two SNPs representing these associations separately (Figure 1, labeled “IL2” and “IL2/IL21”, respectively). Conversely, previous reports of an overlap in association between T1D and celiac disease [15] were in regions encoding genes highly expressed in T lymphocytes (RGS1, PTPN2 and CTLA4 in celiac; PTPN2 and CTLA4 in T1D). Our analysis identifies all these regions as CPMA-positive and highlights the second associations in T1D and celiac shown by Smyth et al. [15], indicating that our approach could be used to prioritize marginal associations for replication. We also observe other potential associations. For example, rs2816316 on near RGS1 exhibits evidence of association to MS; rs2542151 and rs1893217 on near PTPN2 has modest association to psoriasis. These last observations, whilst suggestive, require further investigation given the known effects of these regions on other diseases. In summary, our multi-disease approach is applicable beyond the immune-mediated inflammatory and autoimmune diseases, to current studies of related traits in pharmacology, metabolic and psychiatric disease and in genetic studies of cellular phenotypes such as gene expression. For most studies of the genetic basis of complex human phenotypes, the pathogenic processes are still far from understood and biological pathways may be identified using these methods. Ultimately, these results will contribute to an improved molecular nosology of mechanistic definitions and, ultimately, towards improving clinical care and human health.

Materials and Methods

Ethics statement

All data were drawn from previously published genome-wide association studies from consortia with appropriate ethics oversight from their respective institutional review boards. As only summary data from a small number of markers across the genome were used here no further ethical issues arise.

Patient cohorts

Data were obtained from previously described case/control GWA studies of celiac disease [22], Crohn's disease [2], multiple sclerosis [5], psoriasis [6], rheumatoid arthritis [7], systemic lupus erythematosus [23] and type I diabetes [24] as shown in Table 1. We note that, with the exception of psoriasis, in these cohorts diagnosis of a second immune-mediated disease is a criterion for exclusion, thereby minimizing co-morbidity as a source of bias in our study.

Locus selection

For our analysis we selected 140 independent SNPs (r<0.2) with reported associations to an immune-mediated disease in a genome-wide association scan and replicated in independent samples in that disease to combined genome-wide significance [25]. We then chose proxies for those SNPs present on the major versions of Affymetrix and Illumina genome-wide genotyping platforms [26]; 107 SNPs had sufficient data coverage to be included. Where possible we used the SNP originally reported; if data were unavailable for that marker, we chose a high LD proxy (HapMap/CEU r>0.9) to represent the region.

Cross-phenotype meta-analysis

Our CPMA analysis relies on the expected distribution of p-values for each SNP across diseases. Under the null hypothesis of no additional associations beyond those already known, we expect association values to be uniformly distributed and hence -ln(p) to be exponentially decaying with a decay rate λ = 1. We calculate the likelihood of the observed and expected values of λ and express these as a likelihood ratio test: This statistic therefore measures the likelihood of the null hypothesis given the data; we can reject the null hypothesis if sufficient evidence to the contrary is present. We note that, because we only estimate a single parameter, our test is asymptotically distributed as . This gives us more statistical power than relying on strategies combining association statistics, which would consume multiple degrees of freedom.

SNP–SNP distance calculation and clustering

To compare the patterns of association for multi-phenotype SNPs we first calculate SNP-SNP distances and then use hierarchical clustering on that distance matrix to assess relative relationships between SNP association patterns. Calculating distances based directly on p values or the underlying association statistics is problematic, as each contributing study has slightly different sample sizes and therefore different statistical power to detect associations. Thus, distance functions based on numeric data – which incorporate magnitude differences between observations – would be biased if studies have systematically different data. Normalization procedures can account for such systematic differences but may fail to remove all bias. To reduce the impact such systematic irregularities might have on our comparison, we bin associations into informal “levels of evidence” categories. We define four classes (127], which accounts for the discrete nature of the data. To compare the distance relationships between SNPs we use hierarchical agglomerative clustering. This process joins single entities (in this case, SNPs) or groups of entities together if certain criteria are met. Successive rounds of clustering are preformed in an iterative way until all groups are joined, resulting in a tree of relationships where similar entities cluster on the same branches. In this analysis we cluster SNPs based on the Gower distance matrix using Ward's method for joining entities [28]. In contrast to linkage clustering methods, Ward's method seeks to minimize the information lost during the clustering process, calculated as the error sum of squares (ESS). The higher the ESS the more information is being lost due to inaccuracy of grouping entities together. This method thus seeks compact, spherical clusters of data which are maximally similar. All distance and clustering analysis was done using the StatMatch and stats packages in the R programming language [29].

Cumulative association statistics

We compute per-cluster, per-disease cumulative association statistics by combining p values using Fisher's omnibus test, where the cumulative statistic S on N p-values is defined as: and S follows the distribution with 2N degrees of freedom.

Protein–protein interaction analysis

We use previously described methodology [19] to assess whether proteins encoded around SNPs in each cluster interact. Briefly, we first compile lists of all proteins that an association may affect by defining locus boundaries around each SNP in terms of linkage disequilibrium and including all proteins overlapping this region. We then use a high-confidence protein-protein interaction map ([30] as modified in [19]) to ask whether proteins encoded around SNPs in each cluster interact either directly or via a common intermediary and assess the significance of such observations relative to the local structure of the protein-protein network as described elsewhere [19], using 4000 permutations. These data and methodology are publicly available for download and via a webserver (http://www.broadinstitute.org/mpg/dapple). Complete SNP-wise association data. Here we present the complete dataset on which we base our analysis. All data have been previously published as detailed in the main manuscript and in the key below, and are based on chi-square (1 df) or Z association statistics. Where not provided, we computed Z scores as the square root of the cognate chi-squared statistic. Sign was assigned with reference to the minor allele declared in the psoriasis GWAS (chosen arbitrarily). SNP - marker name. CHR – chromosome. POS - physical position (hg18). major_al - major SNP allele. minor_al - minor SNP allele. RA.Z - association Z score for rheumatoid arthritis (Stahl et al.Nat Genet 2010) [7]. RA.P - association p value for rheumatoid arthritis (Stahl et al.Nat Genet 2010) [7]. PS.Z - ditto for psoriasis (Nair et al. Nat Genet 2009) [6]. PS.P - ditto for psoriasis (Nair et al. Nat Genet 2009) [6]. MS.Z - ditto for multiple sclerosis (De Jager et al. Nat Genet 2009) [5]. MS.P - ditto for multiple sclerosis (De Jager et al. Nat Genet 2009) [5]. SLE.Z - ditto for systemic lupus erythematosus (Gateva et al. Nat Genet 2009) [23]. SLE.P - ditto for systemic lupus erythematosus (Gateva et al. Nat Genet 2009) [23]. CD.Z - ditto for Crohn's disease (Barrett et al. Nat Genet 2008) [2]. CD.P - ditto for Crohn's disease (Barrett et al. Nat Genet 2008) [2]. CeD.Z - ditto for celiac disease (Hunt et al. Nat Genet 2008) [4]. CeD.P - ditto for celiac disease (Hunt et al. Nat Genet 2008) [4]. T1D.Z - ditto for type I diabetes (Barrett et al. Nat Genet 2009) [24]. T1D.P - ditto for type I diabetes (Barrett et al. Nat Genet 2009) [24]. Disease - disease in which the SNP was originally reported: AITD autoimmune thyroid disease; AS ankylosing spondylitis; BD CD Crohn's disease; MS multiple sclerosis; PS psoriasis; SLE systemic lupus erythematosus; T1D type 1 diabetes; UC ulcerative colitis. cpma.p - p value for CPMA statistic (chi-squared, 1 df). Genes - nearby notable genes. (TAB) Click here for additional data file. Enrichment in immune tissue expression for interacting genes encoded close to SNPs in (A) cluster 1 and (B) cluster 4. Following Rossin et al.[19] we looked for preferential expression of significant network genes in tissue subsets. Of the genes encoded around SNPs in clusters 1 and 4 (as defined in Figure 1), we found that those participating in significant networks are enriched in expression (purple circles) in immune tissues (red bars). Other genes encoded around those SNPs are not enriched in the same tissues (black circles). Thus interacting genes encoded around SNPs associated to the same immune diseases are preferentially expressed in immune tissues. Interacting genes for the remaining significant group, cluster 2, were not enriched. (PDF) Click here for additional data file.
  27 in total

Review 1.  The genetics of complex autoimmune diseases: non-MHC susceptibility genes.

Authors:  A Wandstrat; E Wakeland
Journal:  Nat Immunol       Date:  2001-09       Impact factor: 25.606

2.  A human phenome-interactome network of protein complexes implicated in genetic disorders.

Authors:  Kasper Lage; E Olof Karlberg; Zenia M Størling; Páll I Olason; Anders G Pedersen; Olga Rigina; Anders M Hinsby; Zeynep Tümer; Flemming Pociot; Niels Tommerup; Yves Moreau; Søren Brunak
Journal:  Nat Biotechnol       Date:  2007-03       Impact factor: 54.908

3.  Potential etiologic and functional implications of genome-wide association loci for human diseases and traits.

Authors:  Lucia A Hindorff; Praveen Sethupathy; Heather A Junkins; Erin M Ramos; Jayashri P Mehta; Francis S Collins; Teri A Manolio
Journal:  Proc Natl Acad Sci U S A       Date:  2009-05-27       Impact factor: 11.205

Review 4.  Detecting shared pathogenesis from the shared genetics of immune-related diseases.

Authors:  Alexandra Zhernakova; Cleo C van Diemen; Cisca Wijmenga
Journal:  Nat Rev Genet       Date:  2009-01       Impact factor: 53.242

5.  A large-scale replication study identifies TNIP1, PRDM1, JAZF1, UHRF1BP1 and IL10 as risk loci for systemic lupus erythematosus.

Authors:  Vesela Gateva; Johanna K Sandling; Geoff Hom; Kimberly E Taylor; Sharon A Chung; Xin Sun; Ward Ortmann; Roman Kosoy; Ricardo C Ferreira; Gunnel Nordmark; Iva Gunnarsson; Elisabet Svenungsson; Leonid Padyukov; Gunnar Sturfelt; Andreas Jönsen; Anders A Bengtsson; Solbritt Rantapää-Dahlqvist; Emily C Baechler; Elizabeth E Brown; Graciela S Alarcón; Jeffrey C Edberg; Rosalind Ramsey-Goldman; Gerald McGwin; John D Reveille; Luis M Vilá; Robert P Kimberly; Susan Manzi; Michelle A Petri; Annette Lee; Peter K Gregersen; Michael F Seldin; Lars Rönnblom; Lindsey A Criswell; Ann-Christine Syvänen; Timothy W Behrens; Robert R Graham
Journal:  Nat Genet       Date:  2009-10-18       Impact factor: 38.330

6.  Meta-analysis of genome scans and replication identify CD6, IRF8 and TNFRSF1A as new multiple sclerosis susceptibility loci.

Authors:  Philip L De Jager; Xiaoming Jia; Joanne Wang; Paul I W de Bakker; Linda Ottoboni; Neelum T Aggarwal; Laura Piccio; Soumya Raychaudhuri; Dong Tran; Cristin Aubin; Rebeccah Briskin; Susan Romano; Sergio E Baranzini; Jacob L McCauley; Margaret A Pericak-Vance; Jonathan L Haines; Rachel A Gibson; Yvonne Naeglin; Bernard Uitdehaag; Paul M Matthews; Ludwig Kappos; Chris Polman; Wendy L McArdle; David P Strachan; Denis Evans; Anne H Cross; Mark J Daly; Alastair Compston; Stephen J Sawcer; Howard L Weiner; Stephen L Hauser; David A Hafler; Jorge R Oksenberg
Journal:  Nat Genet       Date:  2009-06-14       Impact factor: 38.330

7.  Proteins encoded in genomic regions associated with immune-mediated disease physically interact and suggest underlying biology.

Authors:  Elizabeth J Rossin; Kasper Lage; Soumya Raychaudhuri; Ramnik J Xavier; Diana Tatar; Yair Benita; Chris Cotsapas; Mark J Daly
Journal:  PLoS Genet       Date:  2011-01-13       Impact factor: 5.917

8.  Autoimmune disease classification by inverse association with SNP alleles.

Authors:  Marina Sirota; Marc A Schaub; Serafim Batzoglou; William H Robinson; Atul J Butte
Journal:  PLoS Genet       Date:  2009-12-24       Impact factor: 5.917

9.  Genome-wide association study and meta-analysis find that over 40 loci affect risk of type 1 diabetes.

Authors:  Jeffrey C Barrett; David G Clayton; Patrick Concannon; Beena Akolkar; Jason D Cooper; Henry A Erlich; Cécile Julier; Grant Morahan; Jørn Nerup; Concepcion Nierras; Vincent Plagnol; Flemming Pociot; Helen Schuilenburg; Deborah J Smyth; Helen Stevens; John A Todd; Neil M Walker; Stephen S Rich
Journal:  Nat Genet       Date:  2009-05-10       Impact factor: 38.330

10.  Shared and distinct genetic variants in type 1 diabetes and celiac disease.

Authors:  Deborah J Smyth; Vincent Plagnol; Neil M Walker; Jason D Cooper; Kate Downes; Jennie H M Yang; Joanna M M Howson; Helen Stevens; Ross McManus; Cisca Wijmenga; Graham A Heap; Patrick C Dubois; David G Clayton; Karen A Hunt; David A van Heel; John A Todd
Journal:  N Engl J Med       Date:  2008-12-10       Impact factor: 91.245

View more
  257 in total

Review 1.  Five years of GWAS discovery.

Authors:  Peter M Visscher; Matthew A Brown; Mark I McCarthy; Jian Yang
Journal:  Am J Hum Genet       Date:  2012-01-13       Impact factor: 11.025

2.  The problems and promises of research into human immunology and autoimmune disease.

Authors:  Bart O Roep; Jane Buckner; Stephen Sawcer; Rene Toes; Frauke Zipp
Journal:  Nat Med       Date:  2012-01-06       Impact factor: 53.440

3.  Fine points in mapping autoimmunity.

Authors:  Constantin Polychronakos
Journal:  Nat Genet       Date:  2011-11-28       Impact factor: 38.330

Review 4.  Multiple sclerosis.

Authors:  Alyssa Nylander; David A Hafler
Journal:  J Clin Invest       Date:  2012-04-02       Impact factor: 14.808

5.  Metagenomics and personalized medicine.

Authors:  Herbert W Virgin; John A Todd
Journal:  Cell       Date:  2011-09-30       Impact factor: 41.582

Review 6.  The genetics of multiple sclerosis: an up-to-date review.

Authors:  Pierre-Antoine Gourraud; Hanne F Harbo; Stephen L Hauser; Sergio E Baranzini
Journal:  Immunol Rev       Date:  2012-07       Impact factor: 12.988

7.  The multiple autoimmune syndromes. A clue for the autoimmune tautology.

Authors:  Juan-Manuel Anaya; John Castiblanco; Adriana Rojas-Villarraga; Ricardo Pineda-Tamayo; Roger A Levy; José Gómez-Puerta; Carlos Dias; Ruben D Mantilla; Juan Esteban Gallo; Ricard Cervera; Yehuda Shoenfeld; Mauricio Arcos-Burgos
Journal:  Clin Rev Allergy Immunol       Date:  2012-12       Impact factor: 8.667

Review 8.  The genomic landscape of human immune-mediated diseases.

Authors:  Xin Wu; Haiyan Chen; Huji Xu
Journal:  J Hum Genet       Date:  2015-08-20       Impact factor: 3.172

9.  Optic Neuritis: A Model for the Immuno-pathogenesis of Central Nervous System Inflammatory Demyelinating Diseases.

Authors:  Gregory F Wu; Chelsea R Parker Harp; Kenneth S Shindler
Journal:  Curr Immunol Rev       Date:  2015

10.  Parasitic infection as a potential therapeutic tool against rheumatoid arthritis.

Authors:  Shadike Apaer; Tuerhongjiang Tuxun; Hai-Zhang Ma; Heng Zhang; Amina Aierken; Abudusalamu Aini; Yu-Peng Li; Ren-Yong Lin; Hao Wen
Journal:  Exp Ther Med       Date:  2016-09-05       Impact factor: 2.447

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.