Literature DB >> 25305756

Discovery of new risk loci for IgA nephropathy implicates genes involved in immunity against intestinal pathogens.

Krzysztof Kiryluk¹, Yifu Li¹, Francesco Scolari², Simone Sanna-Cherchi¹, Murim Choi³, Miguel Verbitsky¹, David Fasel¹, Sneh Lata¹, Sindhuri Prakash¹, Samantha Shapiro¹, Clara Fischman¹, Holly J Snyder¹, Gerald Appel¹, Claudia Izzi⁴, Battista Fabio Viola⁵, Nadia Dallera², Lucia Del Vecchio⁶, Cristina Barlassina⁶, Erika Salvi⁶, Francesca Eleonora Bertinetto⁷, Antonio Amoroso⁷, Silvana Savoldi⁸, Marcella Rocchietti⁸, Alessandro Amore⁹, Licia Peruzzi⁹, Rosanna Coppo⁹, Maurizio Salvadori¹⁰, Pietro Ravani¹¹, Riccardo Magistroni¹², Gian Marco Ghiggeri¹³, Gianluca Caridi¹³, Monica Bodria¹³, Francesca Lugani¹³, Landino Allegri¹⁴, Marco Delsante¹⁴, Mariarosa Maiorana¹⁴, Andrea Magnano¹⁴, Giovanni Frasca¹⁵, Emanuela Boer¹⁶, Giuliano Boscutti¹⁷, Claudio Ponticelli¹⁸, Renzo Mignani¹⁹, Carmelita Marcantoni²⁰, Domenico Di Landro²⁰, Domenico Santoro²¹, Antonello Pani²², Rosaria Polci²³, Sandro Feriozzi²³, Silvana Chicca²⁴, Marco Galliani²⁴, Maddalena Gigante²⁵, Loreto Gesualdo²⁶, Pasquale Zamboli²⁷, Giovanni Giorgio Battaglia²⁸, Maurizio Garozzo²⁸, Dita Maixnerová²⁹, Vladimir Tesar²⁹, Frank Eitner³⁰, Thomas Rauen³¹, Jürgen Floege³¹, Tibor Kovacs³², Judit Nagy³², Krzysztof Mucha³³, Leszek Pączek³³, Marcin Zaniew³⁴, Małgorzata Mizerska-Wasiak³⁵, Maria Roszkowska-Blaim³⁵, Krzysztof Pawlaczyk³⁶, Daniel Gale³⁷, Jonathan Barratt³⁸, Lise Thibaudin³⁹, Francois Berthoux³⁹, Guillaume Canaud⁴⁰, Anne Boland⁴¹, Marie Metzger⁴², Ulf Panzer⁴³, Hitoshi Suzuki⁴⁴, Shin Goto⁴⁵, Ichiei Narita⁴⁵, Yasar Caliskan⁴⁶, Jingyuan Xie⁴⁷, Ping Hou⁴⁸, Nan Chen⁴⁷, Hong Zhang⁴⁸, Robert J Wyatt⁴⁹, Jan Novak⁵⁰, Bruce A Julian⁵¹, John Feehally³⁸, Benedicte Stengel⁴², Daniele Cusi⁶, Richard P Lifton⁵², Ali G Gharavi¹.

Abstract

We performed a genome-wide association study (GWAS) of IgA nephropathy (IgAN), the most common form of glomerulonephritis, with discovery and follow-up in 20,612 individuals of European and East Asian ancestry. We identified six new genome-wide significant associations, four in ITGAM-ITGAX, VAV3 and CARD9 and two new independent signals at HLA-DQB1 and DEFA. We replicated the nine previously reported signals, including known SNPs in the HLA-DQB1 and DEFA loci. The cumulative burden of risk alleles is strongly associated with age at disease onset. Most loci are either directly associated with risk of inflammatory bowel disease (IBD) or maintenance of the intestinal epithelial barrier and response to mucosal pathogens. The geospatial distribution of risk alleles is highly suggestive of multi-locus adaptation, and genetic risk correlates strongly with variation in local pathogens, particularly helminth diversity, suggesting a possible role for host-intestinal pathogen interactions in shaping the genetic landscape of IgAN.

Entities: Chemical

Mesh：

Substances：

Year: 2014 PMID： 25305756 PMCID： PMC4213311 DOI： 10.1038/ng.3118

Source DB: PubMed Journal: Nat Genet ISSN： 1061-4036 Impact factor: 38.330

IgA nephropathy (IgAN) is the most common form of primary glomerulonephritis and the leading cause of end-stage kidney failure in China[1]. The diagnosis is made by kidney biopsy, which shows predominant deposition of IgA-containing immune complexes in the glomerular mesangium, leading to glomerulonephritis, glomerular sclerosis, and progressive loss of kidney function. The etiology of IgAN is poorly understood and the genetic architecture is complex. The disease is most prevalent in East Asians, less frequent in Europeans, and relatively rare in individuals of African ancestry. For example, Asian-Americans have a 4-fold higher incidence of end-stage renal disease due to IgAN compared to European-Americans, and nearly 7-fold higher compared to African-Americans[2]. IgAN affects individuals of all age groups, with a peak incidence in the 2nd or 3rd decade of life; the factors determining age of onset are unknown. To date, there have been three GWAS for IgAN[3-5]. The results of these studies demonstrate a strong contribution of the major histocompatibility (MHC) locus to disease risk. The two largest studies, both based on Asian discovery cohorts, detected four additional non-HLA loci, including chromosome 1q32, comprising a common deletion of the complement factor H related CFHR3 and CFHR1 genes (CFHR3,1-delta); 8p23 comprising the α-defensin (DEFA) gene cluster; 17p13 (including TNFSF13), and 22q12 (including HORMAD2 and several other genes)[3,4]. Cumulatively, these GWAS loci explain about 5% of the total disease risk. Additionally, variation in risk allele frequency explains a substantial fraction of the observed ethnic variation in disease prevalence, with risk alleles having substantially higher frequencies in Asians compared to Europeans[3]. These findings raise the possibility that additional disease loci might have been missed owing to fixation of risk alleles in Asian populations. To identify new disease loci, we performed a GWAS twice the size of the prior largest study and have analyzed a discovery cohort based predominantly on European subjects.

RESULTS

Study Design

In stage I (discovery) we performed a genome-wide analysis in 2,747 biopsy-confirmed cases and 3,952 controls, including three new cohorts comprising 1,553 cases and 3,050 controls of European ancestry and the previously published Han Chinese discovery cohort of 1,194 cases and 902 controls (Table 1, Supplementary Tables 1–3, Supplementary Note). For each cohort, we performed principal component analyses to assure adequate ancestry matching between cases and controls (Supplementary Figure 1). All individual samples were imputed to a common set of >1 million SNPs (Supplementary Table 4) using ancestry-matched HapMap-3 reference panels (Supplementary Figure 2). Primary association testing was performed after accounting for imputation uncertainty and significant principal components of ancestry. We detected minimal effect of population stratification within each cohort (λ 1.01–1.06, Supplementary Figure 3). The association results from individual cohorts were combined using genome-wide fixed effects meta-analysis. We identified multiple suggestive signals and several distinct peaks exceeding genome-wide significance in the joint analysis of the discovery cohorts (Supplementary Figure 4). Top signals, defined by P < 5 × 10−5, were genotyped in additional 4,911 cases and 9,002 controls (stage II), followed by meta-analysis to identify genome-wide significant signals across the combined cohorts of 20,612 individuals. This two-stage design was adequately powered to detect ORs as small as 1.15–1.25 (Supplementary Table 1).

Table 1

Summary of study cohorts

the final numbers of included cases and controls by cohort after implementation of all quality control filters.

GWAS Cohorts*	Ancestry	N Cases	N Controls	N Total	Genotyping Rate
Italian Discovery Cohort	European	1,045	1,340	2,385	99.9%
French Discovery Cohort	European	205	159	364	99.6%
US Discovery Cohort	European	303	1,551	1,854	99.7%
Chinese Discovery Cohort	East Asian	1,194	902	2,096	99.9%

Total Discovery:		2,747	3,952	6,699	--

Chinese Replication Cohort	East Asian	2,046	1,385	3,431	99.4%
UK Replication Cohort	European	464	4,783	5,247	99.9%
Japanese Replication Cohort	East Asian	445	395	840	99.3%
German Replication Cohort	European	393	371	764	99.6%
French Replication Cohort	European	432	436	868	99.5%
Czech Replication Cohort	European	247	230	477	99.7%
Polish Replication Cohort	European	123	200	323	99.6%
Hungarian Replication Cohort	European	220	237	457	98.7%
Italian Replication Cohort	European	413	780	1,193	99.1%
Turkish Replication Cohort	European	128	185	313	99.5%

Total Replication:		4,911	9,002	13,913	--

Total All Cohorts:		7,658	12,954	20,612	--

The summary of quality control analyses and case/control exclusions is provided in Supplementary Table 2.

In the combined analysis, we identified six new genome-wide significant signals (Figure 1, Supplementary Figure 5, Table 2, and Supplementary Tables 5, 6, and 7). These included four signals in three novel loci, chr.1p13 (VAV3 locus), chr.9q34 (CARD9 locus), and chr.16p11 (ITGAM-ITGAX locus), and two new independent signals within the previously known HLA-DQ/DR and DEFA regions. We also confirmed associations at all nine previously identified loci at chr.6p21 (HLA-DQ/DR, TAP1/PSMB8, and HLA-DP loci), chr.1q32 (CFHR3,1-delta locus), chr.8p23 (DEFA locus), chr.17p13 (TNFSF13 locus), and chr.22q12 (HORMAD2 locus).

Figure 1

Results of the combined meta-analysis across all 20,612 individuals

Manhattan plot (a) and regional plots for genome-wide significant loci outside of the HLA region: (b) ITGAM-ITGAX locus, (c) CARD9 locus, (d) VAV3 locus, (e) DEFA locus (shaded area represents the region of common duplications involving DEFA1 and DEFA3 genes), (f) CFHR3,1-delta locus (shaded area represents the deletion of CFHR3 and CFHR1 genes), (g) HORMAD2 locus. X-axis represents physical distance in kb (hg-18 coordinates); Y-axis represents -log P values for association statistics.

Table 2

Combined results for known and novel independent GWAS signals.

						Discovery Meta-analysisN=6,699 (2,747 cases & 3,952 controls)		Replication Meta-analysisN=13,913 (4,911 cases & 9,002 controls)		All Cohorts CombinedN=20,612 (7,658 cases & 12,954 controls)

Chr	Location* (bp)	SNP **	Risk Allele	Freq. European Controls	Freq. Asian Controls	OR	P-value	OR	P-value	OR	P-value	Q-test	I²	Locus Name	Novel
1	107,990,381	rs17019602	G	0.19	0.19	1.20	4.7E-05	1.16	2.9E-05	1.17	6.8E-09	0.50	0	VAV3	Novel
1	194,953,541	rs6677604	G	0.80	0.93	1.36	3.5E-08	1.33	2.6E-07	1.35	4.8E-14	0.53	0	CFHR3,1-del	Known
6	32,532,860	rs7763262	C	0.69	0.72	1.51	1.7E-20	1.35	5.5E-20	1.41	1.8E-38	0.07	39	HLA-DR/DQ	Novel
6	32,767,856	rs9275224	G	0.51	0.59	1.33	1.2E-13	1.38	5.6E-18	1.36	5.9E-30	0.56	0	HLA-DR/DQ	Known
6	32,778,286	rs2856717	G	0.62	0.77	1.26	6.4E-08	1.27	3.1E-09	1.27	1.1E-15	0.27	19	HLA-DR/DQ	Known
6	32,789,609	rs9275596	T	0.65	0.80	1.43	7.7E-15	1.46	4.1E-18	1.44	2.5E-31	0.09	39	HLA-DR/DQ	Known
6	32,919,607	rs2071543	G	0.87	0.80	1.22	2.3E-04	1.09	8.8E-02	1.15	1.5E-04	<0.01	76	TAP2/PSMB9	Known
6	33,194,426	rs1883414	G	0.68	0.78	1.27	1.3E-08	1.17	1.1E-04	1.22	1.5E-11	0.79	0	HLA-DP	Known
8	6,810,195	rs2738048	T	0.69	0.68	1.05	2.1E-01	1.12	1.6E-04	1.10	1.6E-04	0.04	44	DEFA	Known
8	6,887,746	rs10086568	A	0.33	0.27	1.17	1.2E-04	1.16	2.1E-06	1.16	1.0E-09	0.78	0	DEFA	Novel
9	138,386,317	rs4077515	T	0.40	0.28	1.22	4.1E-07	1.12	1.5E-04	1.16	1.2E-09	0.55	0	CARD9	Novel
16	31,265,261	rs11150612	A	0.36	0.75	1.21	4.4E-06	1.17	5.1E-07	1.18	1.3E-11	0.57	0	ITGAM-ITGAX	Novel
16	31,276,375	rs11574637	T	0.82	1.00	1.47	2.8E-10	1.22	5.6E-05	1.32	8.1E-13	0.70	0	ITGAM-ITGAX	Novel
17	7,403,693	rs3803800	A	0.20	0.32	1.12	1.2E-02	1.13	2.5E-04	1.12	9.3E-06	0.38	7	TNFSF13	Known
22	28,824,371	rs2412971	G	0.54	0.67	1.21	4.6E-07	1.20	2.2E-06	1.20	4.8E-12	0.12	35	HORMAD2	Known

Based on NCBI version 36 (hg-18) genome assembly

Only non-redundant SNPs with mutually independent effects are included; the complete list of analyzed SNPs is provided in the Supplementary Table 5.

New IgAN susceptibility loci

Chr.16p11: ITGAM-ITGAX locus

This locus represented the strongest novel non-HLA signal (Figure 1b). The top signal, rs11574637, is an intronic SNP in ITGAX encoding leukocyte-specific integrin αX, a component of complement receptor 4 (CR4) involved in leukocyte cell adhesion, migration, and phagocytosis of complement-coated particles by monocytes and macrophages[6]. This SNP was genome-wide significant in the discovery phase (OR 1.47, P = 2.8 × 10−10) and in the combined meta-analysis (OR 1.32, P = 8.1 × 10−13). It is noteworthy that the risk allele (T) at this locus represents an ancestral (chimp) allele with frequency of 0.82 in Europeans and 1.0 in Asians, explaining why this strong signal was not detected in prior GWAS based on Asian discovery cohorts. Prior studies have shown that rs11574637 is associated with risk of systemic lupus erythematosus (SLE)[7]. Interestingly, the IgAN risk allele (T) is protective against SLE, suggesting complex interplay between these two disorders causing nephritis. In addition, we detected another genome-wide significant intergenic SNP in this region, rs11150612 (P = 1.3 × 10−11), which is poorly correlated with rs11574637 (r2 = 0 for Asians and r2 = 0.12 in Europeans). Stratified conditional analysis strongly suggests that rs11150612 represents an independent signal and will require confirmation in larger European cohorts (conditioned OR 1.13, P = 1.6 × 10−6, Supplementary Table 8). The risk allele at rs11150612 is a derived (non-chimp) allele with frequency of 0.36 in Europeans and 0.75 in Asians. This allele is also associated with increased expression of ITGAX in peripheral blood cells[8] (Supplementary Table 9). Moreover, examination of 1000 Genomes data revealed that this risk allele is in strong LD with an ITGAX missense variant predicted to be damaging (rs2230429, P517R, r2=0.97, but not typed in our study, Supplementary Table 10).

Chr.9q34: CARD9 locus

We observed a genome-wide significant signal at rs4077515 (OR 1.16, P = 1.2 × 10−9, Figure 1c), which was supported by both Asian and European cohorts (Supplementary Table 6). The rs4077515-T risk allele results in p.Ser12Asn substitution in CARD9 (encoding Caspase recruitment domain-containing protein 9, an adapter protein that promotes activation of NF-κB in macrophages). This substitution is associated with higher expression of CARD9 in monocytes[9], lymphoblastoid cell lines[10], and peripheral blood cells[8] (Supplementary Table 9). This same allele also confers increased risk of ulcerative colitis and Crohn’s disease[11,12] (Supplementary Table 11).

Chr.1p13: VAV3 locus

The top signal, rs17019602 (Figure 1d) is an intronic SNP in VAV3, a gene encoding a guanine nucleotide exchange factor for Rho GTPases that is important for B- and T-lymphocyte development and antigen presentation[13,14] (OR 1.17, P = 6.8 × 10−9). Both Asian and European cohorts support this association (Supplementary Table 6). A common variant in VAV3 has previously been associated with hypothyroidism, likely secondary to autoimmune etiology[15]. However, the hypothyroidism risk allele shows no linkage disequilibrium with rs17019602 (r2 = 0), indicating that the IgAN signal represents a distinct allele at this locus.

Identification of novel and ethnicity-specific signals at known loci

Chr.6p21: Novel signal at HLA-DQ/DR locus

The strongest signal in the present GWAS represents a novel association within the HLA-DQ/DR locus (rs7763262, OR 1.41, P = 1.8 × 10−38; Supplementary Figure 6). This signal persisted after conditioning on the previously described SNPs in the region (conditioned OR 1.31, P = 6.2 × 10−14, Supplementary Table 12); the three previously reported SNPs remained significant after conditioning on rs7763262. Notably, we detect a stronger effect of rs7763262 in Europeans (OR 1.49, P = 1.2 × 10−30) compared to Asians (OR 1.30, P = 1.2 × 10−10, Supplementary Table 6, OR difference P = 0.012). To identify specific HLA alleles that may underlie associations in this region, we imputed classical HLA alleles (Supplementary Table 13). Stepwise conditional analysis identified four independent genome-wide significant associations (Supplementary Table 14), including DQA1*0101 (OR 1.53, P = 1.7 × 10−15), DQA1*0102 (OR 0.68, P = 1.7 × 10−14), DQB1*0201 (OR 0.71, P = 2.6 × 10−13), and DQB1*0301 (OR 1.33, P = 2.2 × 10−12). On conditional analysis, these classical alleles account for most of the SNP associations at this interval (Supplementary Table 15).

Chr.6p21: Population-specific effects at TAP1/PSMB8 locus

The previously reported risk allele at this locus (rs2071543, a Q49K missense variant in PSMB8)[3] represents a strong cis-eQTL associated with increased peripheral blood expression of TAP2, PSMB8, and PSMB9[8], which encode proteins involved in antigen processing and presentation (Supplementary Table 9). In this study, rs2071543 displayed significant heterogeneity across different cohorts (I2 = 76%, Cochrane’s P < 0.05) attributable to ethnicity-specific effects (Supplementary Table 6). This SNP was genome-wide significant in Asians (OR 1.41, P = 2.1 × 10−9), but no association was observed in Europeans (OR 0.99, P = 0.85). This difference was not explained by differences in risk allele frequency in Asian and European controls (0.80 and 0.87 respectively), suggesting variation in LD structure between Europeans and Asians, or the presence of an Asian-specific risk allele at this locus.

Chr.8p23: DEFA locus

A GWAS in Asians previously implicated rs2738048 in this locus, which contains a cluster of related genes encoding the α-defensin anti-microbial peptides[4]. We detected a new genome-wide significant signal in this region represented by rs10086568 (OR 1.16, P = 1.0 × 10−9, Figure 1e). All cohorts regardless of ethnicity supported this new association. In contrast, we observed only a weak association at rs2738048 (OR 1.10, P = 1.6 × 10−4), with evidence of significant heterogeneity across different cohorts (Cochrane’s P < 0.05). In the ethnicity-specific analyses, the association of rs2738048 was evident only in Asian cohorts (OR 1.23, P = 1.3 × 10−7 in Asians; OR 1.02, P = 0.58 in Europeans; Supplementary Table 6), and this finding was not explained by differences in risk allele frequency in Asian and European controls (0.68 and 0.69 respectively). Because rs2738048 and rs10086568 are not in linkage disequilibrium (r2 < 0.03), mutual conditioning had little effect on these results (Supplementary Table 16). To date, variation at this locus has not been identified by GWAS of other phenotypes, suggesting that the DEFA association may be specific to IgAN.

Replication of four other known loci and total variance explained

Our GWAS provided genome-wide significant confirmation of three previously reported loci on chr.1q32 (CHFR3,1-delta), chr.6p21 (HLA-DP), and the chr.22q12 (HORMAD2) and confirmed one of the two previously reported SNPs on chr.17p13 (TNFSF13, rs3803800) (Table 2, Figure 1, and Supplementary Figure 6). We also confirmed the additive effect of the TNFSF13 and HORMAD2 risk alleles on serum IgA levels (Supplementary Figure 7). Cumulatively, the 15 new and replicated GWAS loci explained 6.2% of the risk in the European cohorts and 7.6% of the variation in disease risk in the Chinese cohorts.

The genetic risk score is associated with the age of disease onset

We hypothesized that a higher burden of genetic susceptibility alleles may also influence the severity or onset of kidney disease. To test this hypothesis, we computed a genetic risk score as the weighted sum of the number of the alleles multiplied by the log of the OR for each of the individual loci. We detected a highly significant association between the genetic risk score and age of diagnosis among the 3,409 cases with available data, with 14 of 15 risk alleles individually contributing to this association. Risk alleles promoted earlier disease onset (Figure 2b and c, Supplementary Table 17), with each quintile of the genetic risk score changing the age of onset by 1.2 years (P = 2.8 × 10−13). This effect was robust to adjustments for cohort or ethnicity. Nonetheless, these loci explained only about 1.4% of the total variance in age of disease onset. Additional analysis of single SNP-phenotype correlations pointed to rs7763262-C risk allele (HLA-DQ/DR locus) as most strongly associated with age of diagnosis (P = 3.2 × 10−4) and greater risk of progression to end-stage kidney disease (per allele HR 1.72, P = 3.6 × 10−3). Exploratory analyses of other parameters of disease severity and progression were generally not statistically significant (Supplementary Tables 17–19).

Figure 2

Pleiotropic effects of IgAN GWAS loci and their cumulative effect on the age at disease onset

(a) A genetic susceptibility map was constructed based on all overlapping genome-wide significant loci reported in the NHGRI GWAS catalogue: diseases sharing a single locus with IgAN are indicated in yellow; diseases sharing multiple loci with IgAN are indicated in orange; solid arrows represent allelic associations that are identical to, or in tight LD (r2 > 0.5) with the IgAN risk alleles: concordant effects are indicated in red and opposed effects in blue; dotted arrows represent all other phenotype associations in the region. Of note, candidate gene or regional association studies were not included in this analysis. Inset: collapsed representation of pleiotropic relationships between IgAN and other phenotypes (only shared allelic effects are included with concordant effects indicated in red and opposed effects in blue). (b) Average age at diagnosis as a function of an individual’s risk allele burden (N=3,409 individuals with available data). (c) Average age at diagnosis by quintile of genetic risk (error bars represent 95% confidence interval for the mean). Abbreviations: IgAD: IgA Deficiency; RA: Rheumatoid Arthritis; PBC: Primary Biliary Cirrhosis; MN: Membranous Nephropathy; OA: Osteoarthritis; HCC: Hepatocellular Carcinoma; SLE: Systemic Lupus Erythematosus; UC: Ulcerative Colitis; CD: Crohn’s Disease; T1D: Type I Diabetes; AMD: Age-related Macular Degeneration.

Geospatial pattern of genetic risk suggests polygenic adaptation

We previously demonstrated that the worldwide distribution of IgAN risk alleles was correlated with distance from Africa and paralleled the prevalence of IgAN[2,3]. The distribution for the 15-SNP risk score derived from the present study showed an even greater difference among worldwide populations and was more correlated with geography (52 HGDP populations, r = 0.33, p < 1.0 × 10−16, Supplementary Figure 8a). We observed no evidence of hard selective sweeps at any of the individual loci by haplotype-based selection tests in Asians and Europeans[16]. For several loci, ancestral alleles have lower frequencies in Africans, suggesting that local selective pressures could be operating in Africa. The observed correlation of risk score with distance from Africa is unlikely to be a chance event; based on 10,000 permutations of 15 randomly drawn SNPs matched for average allele frequency to each IgAN SNP, we found that the observed geo-spatial correlation was in the upper tail of the null distribution (empiric P = 0.026, Supplementary Figure 8b). The IgAN risk allele frequencies were also highly differentiated across HapMap III populations (average Fst of 0.237, Supplementary Table 20). Notably, the risk alleles with larger effect size displayed greater differences in frequency among populations, further suggesting a non-random change in allele frequencies across populations (Supplementary Figures 8d and e). Taken together, these observations are best explained by polygenic adaptation to local environments (soft selective sweeps acting simultaneously on multiple existing loci) or more complex selective pressures not easily detectable by classical tests of selection[17,18].

Overlap with susceptibility loci for other phenotypes

We identified many overlaps with susceptibility loci for other phenotypes documented in the NHGRI GWAS catalogue, suggesting shared pathogenic pathways (Figure 2a and Supplementary Table 11). We found both concordant and opposing effects with other immune mediated diseases. The HLA-DQ/DR region had the largest number of overlapping associations; IgAN risk alleles within this locus conferred increased risk of rheumatoid arthritis[19], systemic sclerosis[20], alopecia areata[21], Graves’ disease[22], follicular lymphoma[23], type I diabetes[19] and IgA deficiency[24]. However, these risk alleles for IgAN also reduced risk for SLE[25], multiple sclerosis[26], ulcerative colitis[27], and hepatocellular carcinoma[28]. At the same time, because of extensive LD within the HLA region, some of these associations may be reflective of signal inter-correlation rather than true pleiotropic effects. Among non-HLA loci, IgAN risk alleles also conferred increased risk for IBD (CARD9 locus)[11,12], elevated serum non-albumin protein and IgA levels (TNFSF13 locus)[29], AMD (CFHR3,1-delta locus)[30], and T1DM (HORMAD2 locus)[31]. Opposing effects were detected for SLE (ITGAM-ITGAX and CFHR3,1-delta)[7,32] and IBD (HORMAD2 locus)[12,33]. Notably, detailed annotations revealed that the majority of IgAN loci encode proteins implicated in maintenance of the intestinal barrier and regulation of mucosal immune response to pathogens (Table 3). Three IgAN risk loci are associated with Crohn’s disease and/or ulcerative colitis (CARD9, HORMAD2 and HLA-DQB1)[11,12,34]. ITGAM and TNSF13 participate in regulation of IgA-producing cells in the intestine[35,36]; ITGAM is also required for interaction between FcαR (CD89) and secretory IgA, the main form of IgA at mucosal sites[37,38]. α-defensins are expressed by the intestinal Paneth cells and protect from food- and water-borne pathogens in the intestine; deficiencies in α-defensins-5 and -6 have been associated with Crohn’s disease[39,40]. Finally, CARD9, VAV and PSMB8/9 are involved in NF-κB activation and are essential for maintenance of the intestinal epithelial barrier and control of the local inflammatory response to infection and CARD9 deficiency produces susceptibility to invasive fungal infections[41-43].

Table 3

IgAN GWAS loci and their role in the intestinal immunity and inflammation.

Locus (Genes)	Canonical Pathways *	Function and role in intestinal mucosal immunity
ITGAM, ITGAX	Granulocyte pathway, Monocyte pathway, Cell adhesion molecules (CAMs), Hematopoietic cell lineage, Leishmania infection, Leukocyte transendothelial migration, Regulation of actin cytoskeleton	ITGAM and ITGAX encode integrins αM and αX that mark intestinal dendritic cells that maintain the balance between inflammation and tolerance. ITGAM and ITGAX also combine with integrin β2 chain to form leukocyte-specific complement receptors 3 and 4 (CR3 and CR4, respectively). ITGAM is involved in the regulation of intestinal IgA-producing plasma cells in mice[36]. Integrin-αM-positive IgA plasma cells reside in Peyer’s patches, require microbial stimulation for development, and exhibit more proliferation and more IgA production compared to integrin-αM-negative cells[36]. In mice, intestinal dendritic cells that express high level of both αM and αX integrins are CD103+, express TLR5, produce retinoic acid, and induce T-cell-independent IgA class-switch recombination[52, 53]. Schistosome infection specifically impairs the ability of ITGAM-positive (CD11b+) dendritic cells to stimulate CD4+ T-cells[49].
CARD9	NOD-like receptor signaling pathway, Innate immune system, Tuberculosis, Fungal infection	CARD9 encodes a molecular scaffold for the assembly of a BCL10 signaling complex that activates NF-κB, which is responsible for both innate and adaptive immune responses[54]. The rs4077515 risk allele is associated with increased expression of CARD9, and has known association with increased risk of ulcerative colitis and Crohn’s disease[11, 12, 34, 55, 56]. Conversely, a rare protein-truncating splice variant in CARD9 confers additive protection from inflammatory bowel disease[56, 57]. Familial CARD9 deficiency predisposes to invasive fungal infections[58]. CARD9 mediates intestinal repair, T-helper 17 responses, and control of bacterial infection after intestinal epithelial injury in mice[41].
VAV3	Chemokine signaling pathway, Focal adhesion, Natural killer cell mediated cytotoxicity, T cell receptor signaling pathway, B cell receptor signaling pathway, Fc epsilon RI signaling pathway, Fc gamma R-mediated phagocytosis, Leukocyte transendothelial migration, Regulation of actin cytoskeleton	VAV proteins (Vav1, 2, and 3) are guanine nucleotide exchange factors essential for adaptive immune function[13, 14] and NF-κB activation in B-cells, a process that stimulates IgA production[43]. VAV proteins are also required for proper differentiation of colonic enterocytes and preventing spontaneous ulcerations of intestinal mucosa[42]. VAV3 is a positional candidate for QTL for mouse intestinal inflammation in a parasite-induced (Trichuris muris) model-of infection[59].
DEFA1, DEFA3, DEFA4, DEFA5, DEFA6	Innate immune system	α-defensins are antimicrobial peptides involved in mucosal defense. DEFA5 and DEFA6 genes expressed by the intestinal Paneth cells. Deficiencies in α-defensins-5 and -6 have been associated with Crohn’s disease[39, 40]. While α-defensin-5 is broadly antimicrobial, α-defensin-6 promotes mucosal innate immunity through self-assembled peptide nanonets[60].
TNFSF13	Cytokine-cytokine receptor interaction, Intestinal immune network for IgA production	TNFSF13 encodes APRIL, a powerful B-cell stimulating cytokine that promotes CD40-independent IgA class switching[35]. The IgAN risk allele is associated with increased IgA levels[4]. TNFSF13 is induced by intestinal bacteria resulting in IgA class switching. APRIL levels are elevated in some patients with IgAN[61]. Mutations in the TNFSF13 receptor (TACI) produce IgA deficiency or combined variable immunodeficiency, with increased propensity to mucosal infections[62].
LIF, OSM, HORMAD2, MTMR3	Cytokine-cytokine receptor interaction, Jak-STAT signaling pathway	The IgAN risk allele at this locus is protective against Crohn’s disease[11, 12, 63] and associated with increased serum IgA levels[3]. LIF and OSM are IL-6 related cytokines that use gp130 for signal transduction, and have been previously implicated in mucosal immunity[64, 65]. Genetic disruption of gp130 signaling leads to gastrointestinal ulceration and inflammatory joint disease in mice[66]. LIF is secreted by pericrypt fibroblasts[67] and may be critical for proliferation and renewal of enterocytes[68].
PSMB8, PSMB9, TAP1, TAP2	Phagosome pathway, Antigen processing and presentation, Primary immunodeficiency, Proteosome, Activation of NFkB in B-cells	PSMB8 and PSMB9 are interferon-induced subunits of the immunoproteosome that mediate intestinal NF-κB activation in IBD[69]. PSMB8 is up-regulated in human intestinal tissue with active IBD lesions[70]. Treatment with bortezomib (PSMB8 inhibitor) or psmb8 deletion in mice attenuates experimental colitis[71].
HLA-DQA1, HLA-DQB1, HLA-DRB1	Antigen processing and presentation, Adaptive immune system, Intestinal immune network for IgA production, Allograft rejection, Graft versus host disease, Asthma, Autoimmune thyroid disease, Leishmania infection	The IgAN risk allele is associated with increased risk of Celiac disease[72, 73] and increased risk of IgA deficiency[24]. The IgAN risk allele has an opposed (protective) effect on the risk of ulcerative colitis[27, 74].

Canonical pathways based on the Molecular Signature Database (KEGG, Biocarta, and Reactome).

Enrichment of the GWAS for SNPs implicated in autoimmune or inflammatory traits

We hypothesized that additional associations with other autoimmune and inflammatory disorders may be present below our replication threshold. Therefore, we performed a gene-set analysis of 582 non-HLA SNPs previously associated with any autoimmune or inflammatory trait listed in the NHGRI GWAS catalogue. In total, 87/582 (15%) were associated with the risk of IgAN at a nominal P < 0.05 (Figure 3a, Supplementary Table 21). This distribution was never observed in 10,000 permutations of phenotype on genotype, indicating a highly significant excess of positive associations (empiric P < 0.0001, Supplementary Figure 9). We also detected a consistent excess of direct protein-protein interactions among gene products encoded by the significant and suggestive loci (Supplementary Figure 10). Among the most prominent autoimmune signals was the PADI4 locus, previously associated with risk of rheumatoid arthritis[44] (rs12568771, OR 1.12, P = 1.8 × 10−6, Supplementary Table 5). These data make clear that additional associations with other autoimmune and inflammatory disorders are present below our replication threshold and should be pursued in follow-up studies.

Figure 3

Autoimmunity/inflammatory loci and risk of IgAN

(a) A quantile-quantile plot of IgAN associations for 582 unique non-HLA SNPs previously associated with autoimmune or immune-mediated diseases at p < 5 × 10−8 in the NHGRI GWAS catalogue. When tested for association with IgAN, an unexpectedly large number of SNPs deviate from the null expectation (empiric p < 1 × 10−4, Supplementary Figure 9). (b) The KEGG enrichment map for the genes residing within autoimmunity loci associated with IgAN at p < 0.05 (q < 0.25). The size of nodes reflects −log10-transformed P-values of the adjusted hypergeometric enrichment test in GSEA. The edges represent pathway similarity as defined by an overlap coefficient. The top overrepresented KEGG pathway is the “Intestinal Immune Network for IgA Production” (gene set overlap coefficient = 25%, enrichment p < 1.0 × 10−16). Individual genes intersecting top-ranked KEGG pathways are provided in Supplementary Figure 10c.

When the suggestive and significant loci were tested for enrichment in KEGG pathways, the top overrepresented pathways were “Intestinal Immune Network for IgA Production” (overlap coefficient of 25%, P < 1.0 × 10−16, Figure 3b) and “Leishmania Infection”, a protozoan infection involving the skin, viscera and mucosa (overlap coefficient of 15%, P = 6.8 × 10−15). Notably, the pathway enrichment scores and all network connectivity parameters were consistently increased with the addition of the top SNPs at varying FDR levels, providing additional support for the role of these loci in the pathogenesis of IgAN (Supplementary Figure 10).

Association of the IgAN genetic risk score with pathogen diversity

The enrichment for pathways involving intestinal immunity and mucosal pathogens strongly suggested that the distinctive geographic pattern of IgAN risk alleles might have been shaped by an adaption to local environment. To better define potential environmental factors that could account for such an adaptive process, we performed an association analysis of the IgAN genetic risk score for HGDP populations with 14 ecological variables previously defined for these populations reflecting local climate, pathogen load, and dietary factors[45] (Supplementary Table 22a). The genetic risk was nominally associated with climatic and dietary factors. However, there was a very strong positive association of the IgAN genetic risk score with local pathogen diversity (measured as the number of different pathogen species in the area, including viruses, bacteria, protozoa, and helminthes, r = 0.61, P = 6.0 × 10−7, Figure 4a). In the analysis of individual pathogen classes, the strongest association was for helminth diversity (r = 0.68, P = 1.0 × 10−8, Figure 4b), which accounted for nearly all the association with pathogen diversity on a stepwise regression analysis. In the final combined model, only helminth diversity and geography were independently associated with the IgAN genetic risk score (Supplementary Table 22b).

Figure 4

IgAN genetic risk is correlated with worldwide pathogen diversity

(a) Correlation between IgAN genetic risk score (X-axis) and the level of local pathogen diversity (Y-axis) among the HGDP populations (linear regression line and its 95% confidence intervals, Pearson’s correlation coefficient = 0.62, P = 6 × 10−7); (b) Stepwise feature selection among all pathogen subgroups confirmed helminth diversity as the single best predictor of IgAN genetic risk (Pearson’s correlation coefficient = 0.68, P = 1 × 10−8); (c) Weaker correlation was also evident for bacterial diversity (top panel, Pearson’s correlation coefficient = 0.56, P = 1 × 10−5), but not for protozoan or viral diversity (middle and bottom panels). The pathogen diversity metrics were scaled and standardized across all populations; error bars represent 95% confidence interval for the mean; for detailed analysis refer to Supplementary Table 22.

Discussion

In this study, we identify six novel signals that contribute to IgAN, including four in novel loci (ITGAM-ITGAX, VAV3 and CARD9) and two in known regions (HLA-DQB1, DEFA), and replicate nine of the previously reported genome-wide significant signals. The loci discovered in this study reside at the intersection of multiple canonical pathways, and point to critical steps in the pathogenesis of IgAN (maintenance of the intestinal mucosal barrier, activation of mucosal IgA production, NF-κB signaling, defense against intracellular pathogens, and complement activation). Collectively, these 15 independent risk alleles significantly influence the age of disease onset. Moreover, we demonstrate significant overlap of these loci with other autoimmune and inflammatory disorders, placing IgAN in this disease spectrum. The striking association of risk allele frequencies with geography and local helminth diversity is most consistent with multi-locus adaptation to environment. While our analysis cannot exclude unmeasured environmental factors or other pathogens that are associated with helminth diversity, helminth infection itself is a potential source of selection pressure. Helminth infection has been a major source of morbidity and mortality in human history, and even today occurs in 25% of the world population[46], with the highest global burden of soil-transmitted helminthes infections occurring in Asia, significantly contributing to pediatric mortality[46,47]. Intriguingly, secondary forms of IgAN are known to develop in the setting schistosomiasis, a common helminth infection[48]. Recent data also indicate that schistosome infection specifically impairs the ability of ITGAM-positive (CD11b+) dendritic cells to stimulate CD4+ T-cells[49]. These findings strongly suggest that the increased incidence of IgAN in some geographic areas may represent an untoward consequence of protective adaptation to mucosal invasion by local pathogens. The enhanced immune response conferred by risk alleles would simultaneously explain the known association of mucosal infections as a trigger for IgAN. Host-pathogen interactions have similarly exerted a critical influence on the genetic architecture of IBD[12]. Consistent with this finding, IgAN loci are either directly associated with risk of IBD (HLA-DQ/DR, CARD9, HORMAD2) or encode proteins involved in maintenance of the intestinal mucosal barrier or regulation of mucosal immune response (DEFA, TNFSF13, VAV3, ITGAM-ITGAX, PSMB8; Table 3). Network and enrichment analyses further point to perturbations of the immune pathway of intestinal IgA production as a central defect in the disease pathogenesis (Figure 3, Supplementary Figure 10, and Supplementary Table 21). These results clearly link intestinal mucosal inflammatory disorders and IBD with risk of IgAN and may explain why these two diseases co-occur more often than expected by chance[50]. These data are also consistent with the clinical observation that mucosal infections frequently trigger episodes of glomerulonephritis in IgAN, and with the key role of IgA in defense at mucosal surfaces[51]. Finally, these results demonstrated that most IgAN risk loci are shared with other immune-mediated diseases and identified 87 suggestive associations with non-HLA autoimmune and inflammatory SNPs. These analyses predict that follow-up studies of autoimmune and inflammatory variants, particularly among patients with early onset of disease, will yield additional genome-wide significant associations and further clarify links to environmental risk factors.

Methods

Study Design and Power Analysis

The study was designed in two stages. Stage I (the discovery phase) involved a genome-wide meta-analysis of four discovery cohorts (2,747 cases and 3,952 controls) imputed to a common set of >1 million SNPs. Stage II (the replication phase) involved genotyping of the top signals from stage I in ten additional cohorts of European and Asian ancestry (4,911 cases and 9,002 controls). We carried out power calculations for this design under the following assumptions: a disease prevalence of 1%; a log-additive risk model; perfect LD between a marker and a disease allele; a follow-up significance threshold of 5×10 [5]; and joint (stage I and II) significance level of 5×10 [8]. The power of our study was calculated for a range of disease allele frequencies (0.10–0.50) and effect sizes (genotypic risk ratio 1.10–1.50). The effect sizes detectable at α = 5×10 [8] with a power of 80% were also estimated (Supplementary Table 1). The calculations were performed using CaTS software[75]. All subjects provided informed consent to participate in genetic studies and the Institutional Review Board of Columbia University as well as local ethic review committees for each of the individual cohorts approved our study protocol.

GWAS Discovery Study (Stage I)

The cohorts, genome-wide genotyping, genotype quality control, ancestry analysis, and imputations are described in detail in the Supplementary Note and Supplementary Tables 2–4. We implemented strict quality control filters for each of the cohorts, including elimination of samples with low call rates, duplicates, ancestry outliers, samples with cryptic relatedness or samples with detected gender mismatch (Supplementary Table 2). We applied principal component (PC) -based ancestry-matching algorithms to reduce any potential bias of population stratification (Supplementary Table 3). After implementation of ancestry matching, we dramatically reduced the number of significant PCs for each cohort and we demonstrated that cases and controls were evenly distributed along the PC axes without significant outliers (Supplementary Figure 1). To improve coverage across different platforms, we performed imputation to a common set of >1 million HapMap-III SNPs (Supplementary Table 4 and Supplementary Note). Only SNPs with high imputation quality (r2>0.8) were included in association analyses. After ancestry matching, imputation, and quality control, there were four cohorts included in stage I: the Italian Discovery Cohort of 1,045 cases and 1,340 controls (1,132,157 imputed markers), the Chinese Discovery Cohort of 1,194 cases and 902 controls (1,027,812 imputed markers), the French Discovery Cohort of 205 cases and 159 controls (1,032,453 imputed markers) and the US Discovery Cohort of 303 cases and 1,551 controls (1,118,683 imputed markers). The primary association testing was performed within each cohort individually under a multiplicative (log-additive) model and after accounting for imputation uncertainty using an allelic dosage method. Significant principal components of ancestry were included as covariates in the association analysis of each individual cohort. Ancestry-adjusted effect estimates and standard errors were derived for each SNP and the results were combined genome-wide using fixed effects. The meta-analysis results were verified using two independent software packages (PLINK v.1.07[76] and METAL[77]). The genome-wide distributions of P values were examined visually using QQ-plots for each individual cohort as well as for the combined analysis. We also estimated genomic inflation factors for each genome-wide analysis[78] (Supplementary Figure 3). The final meta-analysis QQ-plot showed no global departures from the expected distribution of P values and the overall genomic inflation factor was estimated at 1.047 (Supplementary Figure 4).

Follow-up of Suggestive Signals (Stage II)

Based on the examination of QQ-plots from Stage I, we selected a P-value threshold of 5×10−5 to define signals for follow-up analyses. This threshold corresponds to the positive FDR of 13% (Q-value software)[79]. The threshold defined 435 top SNPs that were subsequently prioritized for replication. Of the 435 SNPs, 320 (74%) were localized within the known susceptibility loci, including 286 SNPs across the HLA loci, 30 SNPs on chr.22q12.2 (HORMAD2 locus) and additional 4 SNPs on chr.1q32 (CFHR3/1-delta locus). The remainder 115 SNPs were clustered into distinct loci on the basis of their physical location and regional patterns of LD. Conditional logistic regression analysis was carried out to confirm correct SNP grouping and to detect independent signals. For follow-up genotyping, we prioritized independent SNPs with the lowest P-value within each independent locus. We additionally required that each SNP is successfully typed or imputed in at least three of the four analyzed cohorts. We excluded loci supported only by a single SNP (“singleton signals” defined by absence of supporting signals with P<0.01 within the same block of LD). In case genotyping failed, we selected a back-up SNP based on strength of association, LD with the top SNP, quality of genotyping or imputation, and ability to design working primers. Additionally, we included representative SNPs for the two recently discovered GWAS loci in Chinese[4], the TNFSF13 locus (rs3803800 and rs4227) and the DEFA locus (rs2738048). In total, we successfully acquired and analyzed genotype data for 50 carefully selected SNPs representative of the top 37 distinct genomic regions in 13,913 replication samples (4,911 cases and 9,002 controls). The composition of the replication cohorts, genotyping methods and genotype quality control are summarized in the Supplementary Note and Supplementary Table 2. The association analyses were first carried out individually within each of the 10 included cohorts. Similar to stage I, the results were next combined using a fixed effects model. For each SNP, we derived pooled effect estimates, their standard errors, and 95% confidence intervals. We also estimated the degree of heterogeneity using heterogeneity index (I2) and Cochrane’s Q test in the combined analysis[80]. The complete summary of association results for all 50 SNPs tested in replication cohorts is provided in Supplementary Tables 5, 6, and 7.

Imputation Analysis of Classical HLA Alleles

For each of the cohorts with available genome-wide genotype data, we imputed classical HLA alleles at -A, -B, -C, -DQB1, -DQA1, and -DRB1 loci. We used HapMap Caucasian Utah (CEU) samples as reference for imputation of Caucasian cohorts and combined Han Chinese Beijing (HCB) and Japanese of Tokyo (JPT) samples for Asians. The reference panels were constructed by phasing combined SNP genotype and HLA typing data. The phasing and imputation were performed using two independent methods: MACH[81] and BEAGLE-3[82]. Any poorly imputed alleles (R-sq < 0.3) were eliminated from association testing at the level of individual cohorts. The imputed allelic concordance rate between the two methods was 98.1%. In addition, direct sequencing of the informative coding segments of HLA-DQB1 gene in a random subset of 155 samples demonstrated that our imputation had 89.0% sensitivity and 91.5% specificity. The association testing in each cohort was performed using allelic dosage method with adjustment for significant principal components in PLINK[76]. The final results were combined across cohorts using fixed effects meta-analysis in METAL[77] (Supplementary Table 13). Conditional analyses were performed using stepwise logistic regression with Bayesian Information Criterion (BIC) as a selection criterion (Supplementary Table 14, Step function, R version 3.0)

Pairwise Epistasis Screen

We screened all possible pairwise interaction terms for association with disease using 1-df LRT comparing two nested logistic models: one with main effects only and one with main effects and a multiplicative (log-additive) interaction term. We included cohort membership as a fixed covariate in both models. We excluded 7 pairwise interaction terms between SNPs in partial linkage disequilibrium (r2>0.1) resulting in a total of 98 independent interactions tested (Supplementary Table 23). The results were ranked in the order of significance and positive false discovery rate (q-values) were calculated. Suggestive interaction terms were defined as exceeding a significance threshold that was Bonferroni-corrected for the number of independent tests (p < 0.05/98 or 5×10−4).

Interrogation of Protein-Protein Interaction (PPI) Networks

We interrogated two comprehensive PPI network datasets using two independent methods. First, we used the Disease Association Protein-Protein Link Evaluator (DAPPLE)[83]. This is a network connectivity tool based on InWeb[84], an integrated database of known PPIs with 12,793 nodes and 169,810 high-confidence interactions based on MINT, IntAct, BIND, PPrel, ECrel, and Reactome. Statistical significance of network connectivity parameters for individual proteins and for the entire seed set was assessed using 1,000 within-degree node-label permutations (Supplementary Figure 10). As an independent confirmatory analysis, we downloaded the Protein Interaction Network Analysis (PINA) dataset[85], which combines annotated PPI data from 6 databases (MINT, IntAct, DIP, BioGRID, HPRD, and MIPS/MPact). This large network consisted of 14,784 nodes and 107,802 unique edges (last release December 10th, 2012). To integrate our GWAS results with PPI data, and to identify modules enriched in disease-associated genes, we used a dense module searching method (dmGWAS v.2.0)[86]. Briefly, we performed a global search for modules with maximum proportion of low P-values by designating the top-scoring GWAS genes as seeds and selecting neighboring nodes (with a shortest path to any node in the module ≤ 2) that optimize subgraph’s overall significance. The extracted subnetworks were merged and visualized using R (igraph v.0.5.2).

Other Methods of Prioritizing Candidate Genes

To interrogate putative functional SNPs that were not typed or imputed in our dataset, we systematically identified all variants that were in high LD (r2 > 0.5) with the 15 IgAN GWAS SNPs based on 1000 Genomes data. These variants were further annotated using ANNOVAR[87], SeattleSeq[88], and HaploReg2[89] (Supplementary Table 10). We also analyzed a subset of 1,073 SNPs that represented tags for the known common copy number polymorphisms[90]. Additionally, we identified all genes whose expression was correlated with the IgAN susceptibility SNPs in cis- or trans- and at P < 10−5 (Supplementary Table 9). For this purpose, we used the following recently published eQTL datasets: (1) meta-analysis of transcriptional profiles from peripheral blood cells of 5,311 Europeans[8], (2) primary immune cells (B-cells and monocytes) from 288 healthy Europeans[9]; (3) 400 lymphoblastoid cell lines (LCL) derived from asthmatic children[10], and (4) eqtl.uchicago browser with compiled data across several tissues. Finally, we utilized GRAIL (Gene Relationships Across Implicated Loci), an online tool that uses PubMed text mining results to assess network connectivity between genes residing in implicated GWAS loci[91]. To prioritize candidate genes, each individual gene was tested for significant enrichment in GRAIL connectivity to genes residing in other loci.

Genetic Risk Score

To assess cumulative effects of the newly detected loci, we built a logistic regression model based on the 15 SNP predictors with independent contribution to disease risk. The risk score was calculated as a weighted sum of the number of risk alleles at each locus multiplied by the log of the adjusted OR for each of the individual loci. The percentage of the total variance explained was estimated by Nagelkerke’s pseudo R2 from the logistic regression model with the risk score as a quantitative predictor and disease state as an outcome (SPSS Statistics v.21.0, IBM 2013).

Geospatial Risk Analysis

For this purpose, we used publicly available genotype data of HapMap III (1,184 individuals representative of 11 populations) and the Human Genome Diversity Panel (HGDP; 1,050 individuals representative of 52 worldwide populations). The HGDP individuals have been previously genotyped for 660,918 markers using Illumina 650Y arrays (Stanford University). High quality genotype data was available for 13 out of 15 IgAN SNPs, with missing genotypes for rs10086568 and rs7763262. We imputed rs7763262 with high confidence (imputation r2 > 0.99) using all combined HapMap-III populations for reference. Instead of rs10086568, we used a near-perfect proxy rs9644778 (r2=94%, D′=1.00), which was also genome-wide significant in our study (P = 1.8 × 10−9). Using these data, we calculated individual risk score profiles for all individuals in the HGDP dataset. The risk score was standardized across populations using a Z-score method: Standardized Risk Score = (Individual Risk Score – Worldwide Mean)/Worldwide Standard Deviation. The median standardized risk scores for each population were compared across continents. We correlated standardized risk profiles with the longitude, latitude, and geographic distance from Africa.

Testing for Genetic Drift

To evaluate if the observed allelic differentiation is due to genetic drift, we analyzed 10,000 sets of SNPs randomly drawn from the genome but matched to the IgAN SNPs based on average minor allelic frequency on a per-SNP basis. In each permutation round, we scored all 1,050 HGDP individuals with the risk score calculated from the set of randomly selected SNPs. The risk scores were correlated with the distance from Africa to generate distributions of null statistics against which we compared the observed geospatial correlation. Empirical P-value was defined as the number of permuted statistics more extreme than the observed statistic divided by the total number of permutations (Supplementary Figure 8). Empiric P-value < 0.05 was considered statistically significant. The permutation procedure was implemented using a custom script in PERL programming language.

Correlations with Environmental Variables

We investigated correlations between the newly defined genetic risk score and 14 environmental variables previously defined for each of the HGDP populations (Supplementary Table 22a). The environmental variables were downloaded directly from Fumagalli et al.[45], and included climatic factors (relative humidity, mean annual temperature, precipitation rate, net short wave radiation flux, and physical distance from the sea), subsistence strategies (relative amount of agriculture, animal husbandry, fishing, hunting, and gathering) and pathogen diversity (number of different species of viruses, bacteria, protozoa, and helminthes). We applied Pearson’s correlation analysis, as well as partial correlation to test median standardized genetic risk before and after controlling for geographic distance from Africa (SPSS Statistics v.21.0). Because many of the ecological factors are inter-correlated, we also applied a stepwise feature selection algorithm (BIC selection criterion) to construct the best predictive regression model of genetic risk (step function, R v.3.0). At entry, we included each of the broad predictor categories separately (climate, subsistence, pathogens), followed by all 14 predictors combined, with additional adjustment for the distance from Africa (Supplementary Table 22b).

Clinical Phenotype-Genotype Correlations

We analyzed baseline demographic and clinical data from the time of renal biopsy, including age, gender, body mass index, serum creatinine (SCr), albumin (Alb), hemoglobin (Hgb), 24-hour protein excretion (P24), microscopic hematuria, systolic blood pressure (SBP), diastolic blood pressure (DBP), and history of gross hematuria. The diagnosis of hypertension was based on SBP ≥ 140 mmHg, or DBP ≥ 90 mmHg, or history of antihypertensive medication use. The level of protein excretion was measured by a 24-hour urine collection or estimated based on urinary protein-to-creatinine ratio; the proteinuria values were normalized using ln(P24+1) transformation. The degree of renal tissue injury was graded using the Haas[92] classification. Estimated glomerular filtration rate (eGFR) was evaluated using the Modification of Diet in Renal Disease (MDRD) equation for Europeans[93] and the modified MDRD version for Chinese[94]. Chronic kidney disease (CKD) was classified based on the eGFR intervals according to the Kidney Disease Outcomes Quality Initiative (K/DOQI) practice guidelines[95]. End stage renal disease (ESRD) was defined by eGFR < 15 ml/min/1.73m2 or initiation of renal replacement therapy (dialysis or kidney transplantation). Longitudinal data after kidney biopsy were available for 1,607 patients with a mean follow-up time of 7.9 years. Out of 1,607 patients, 459 reached the endpoint of ESRD within the follow-up period. For screening genotype-phenotype correlations, we used linear regression for quantitative traits, logistic regression for binary traits, and Cox proportional hazards models for survival analysis with SNP predictors coded under additive genetic model. The associations for eGFR, P24, Alb, Hgb, histopathology scores, and serum levels of IgA and IgA1 were adjusted for age, gender, and cohort/ethnicity. Association testing for the age of diagnosis and onset of ESRD were performed before and after adjustment for sex and cohort/ethnicity. The analysis of kidney disease progression was adjusted for age, sex, cohort/ethnicity, baseline eGFR (minimally adjusted model) as well as P24 and Haas histopathology score (full model). Statistical analyses were implemented in R version 3.0 and SPSS Statistics version 21 (IBM).

Genetic Overlap with Other Phenotypes

To systematically cross-annotate IgAN susceptibility loci against all previously published GWAS findings, we downloaded the latest NHGRI GWAS catalogue (September 2013)[96]. We filtered all published SNPs that were (1) associated with any disease phenotype or trait at a genome-wide significance (p < 5 × 10−8) and (2) resided within the genomic regions of association with IgAN. For each SNP association, we manually verified the direction of effect for a reference allele based on original publications. Next, each selected SNP from the catalogue was queried against our GWAS results to extract the odds ratios and p-values for associations with IgAN. The directionality of allelic effects was assessed to identify pleiotropic alleles with concordant or opposed effects (Supplementary Table 11). We calculated a maximum r2 between SNPs associated with each catalogued trait and the 15 SNPs from our study based on the data from HapMap-III and 1000 Genomes project. We defined overlapping susceptibility alleles if r2 exceeded 0.50. Lastly, we constructed a susceptibility overlap map that connects each of the IgAN loci to the previously associated GWAS traits and highlights associations with SNPs in high LD with the top IgAN signals (Figure 2a).

Testing Inflammatory/Autoimmune Subset Hypothesis

We analyzed 582 unique SNPs representative of all non-HLA autoimmune and inflammatory disease-associated GWAS loci out of the 11,276 listed in the NHGRI GWAS catalogue (September, 2013)[96]. The association results for this set were visually examined for overrepresentation of significant signals using a QQ-plot (Figure 3a). Next, we tested the autoimmune hypothesis using a previously published GWAS-HD approach[97]. This involved testing 582 unique SNPs simultaneously for association with IgAN using the GWAS discovery cohorts. To preserve the LD pattern between SNPs, the IgAN phenotype was permuted 10,000 times within each cohort. In each round of permutation, corresponding association analysis was performed using logistic regression after adjustment for cohort membership, and a sum of the Wald (1-d.f.) association statistics of the 582 SNPs was calculated. The empirical P value was calculated as the proportion of the permutation samples whose sum statistic was larger than that in the observed sample (Supplementary Figure 9).

Gene Annotation and Network Analysis of Autoimmune/Inflammatory SNPs

Based on the observed distribution of P-values, we defined two arbitrary thresholds for inclusion of suggestive signals in downstream network analyses: positive FDR < 10% (corresponding to P < 5.9 × 10−3) and positive FDR < 25% (corresponding to P < 0.05). The SNPs meeting these criteria were clustered into distinct loci based on genomic location and pairwise linkage disequilibrium. The disease locus was defined by nearest recombination hotspots in the 3′ and 5′ direction of the top SNP and overlapping intervals were merged into a single locus. All genes that intersect this interval, including 100-kb upstream and 40-kb downstream of the largest isoform (to include regulatory DNA), were considered as contained within the disease locus. The candidate gene sets (union of all genes within the candidate loci), were used as seeds in the sequential GRAIL and DAPPLE analyses (Supplementary Figure 10). These gene sets were also used for pathway analysis using Gene Set Enrichment Analysis (GSEA)[98]. The KEGG pathway enrichment map (Figure 3b) was constructed using the Enrichment Map (v.1.2)[99]. Network graphs were visualized in Cytoscape (v.2.8).

98 in total

1. DCs induce CD40-independent immunoglobulin class switching through BLyS and APRIL.

Authors: Mikhail B Litinskiy; Bernardetta Nardelli; David M Hilbert; Bing He; Andras Schaffer; Paolo Casali; Andrea Cerutti
Journal: Nat Immunol Date: 2002-08-05 Impact factor: 25.606

2. Human α-defensin 6 promotes mucosal innate immunity through self-assembled peptide nanonets.

Authors: Hiutung Chu; Marzena Pazgier; Grace Jung; Sean-Paul Nuccio; Patricia A Castillo; Maarten F de Jong; Maria G Winter; Sebastian E Winter; Jan Wehkamp; Bo Shen; Nita H Salzman; Mark A Underwood; Renee M Tsolis; Glenn M Young; Wuyuan Lu; Robert I Lehrer; Andreas J Bäumler; Charles L Bevins
Journal: Science Date: 2012-06-21 Impact factor: 47.728

3. Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering.

Authors: Sharon R Browning; Brian L Browning
Journal: Am J Hum Genet Date: 2007-09-21 Impact factor: 11.025

4. A new subset of CD103+CD8alpha+ dendritic cells in the small intestine expresses TLR3, TLR7, and TLR9 and induces Th1 response and CTL activity.

Authors: Kosuke Fujimoto; Thangaraj Karuppuchamy; Naoki Takemura; Masaki Shimohigoshi; Tomohisa Machida; Yasunari Haseda; Taiki Aoshi; Ken J Ishii; Shizuo Akira; Satoshi Uematsu
Journal: J Immunol Date: 2011-04-27 Impact factor: 5.422

5. A genome-wide association study identifies two new risk loci for Graves' disease.

Authors: Xun Chu; Chun-Ming Pan; Shuang-Xia Zhao; Jun Liang; Guan-Qi Gao; Xiao-Mei Zhang; Guo-Yue Yuan; Chang-Gui Li; Li-Qiong Xue; Min Shen; Wei Liu; Fang Xie; Shao-Ying Yang; Hai-Feng Wang; Jing-Yi Shi; Wei-Wei Sun; Wen-Hua Du; Chun-Lin Zuo; Jin-Xiu Shi; Bing-Li Liu; Cui-Cui Guo; Ming Zhan; Zhao-Hui Gu; Xiao-Na Zhang; Fei Sun; Zhi-Quan Wang; Zhi-Yi Song; Cai-Yan Zou; Wei-Hua Sun; Ting Guo; Huang-Ming Cao; Jun-Hua Ma; Bing Han; Ping Li; He Jiang; Qiu-Hua Huang; Liming Liang; Li-Bin Liu; Gang Chen; Qing Su; Yong-De Peng; Jia-Jun Zhao; Guang Ning; Zhu Chen; Jia-Lun Chen; Sai-Juan Chen; Wei Huang; Huai-Dong Song
Journal: Nat Genet Date: 2011-08-14 Impact factor: 38.330

6. Expression of interleukin-6, leukemia inhibitory factor and their receptors by colonic epithelium and pericryptal fibroblasts.

Authors: S P Rockman; K Demmler; N Roczo; A Cosgriff; W A Phillips; R J Thomas; R H Whitehead
Journal: J Gastroenterol Hepatol Date: 2001-09 Impact factor: 4.029

7. Regulation of humoral and cellular gut immunity by lamina propria dendritic cells expressing Toll-like receptor 5.

Authors: Satoshi Uematsu; Kosuke Fujimoto; Myoung Ho Jang; Bo-Gie Yang; Yun-Jae Jung; Mika Nishiyama; Shintaro Sato; Tohru Tsujimura; Masafumi Yamamoto; Yoshifumi Yokota; Hiroshi Kiyono; Masayuki Miyasaka; Ken J Ishii; Shizuo Akira
Journal: Nat Immunol Date: 2008-05-30 Impact factor: 25.606

8. GWAS of follicular lymphoma reveals allelic heterogeneity at 6p21.32 and suggests shared genetic susceptibility with diffuse large B-cell lymphoma.

Authors: Karin E Smedby; Jia Nee Foo; Christine F Skibola; Hatef Darabi; Lucia Conde; Henrik Hjalgrim; Vikrant Kumar; Ellen T Chang; Nathaniel Rothman; James R Cerhan; Angela R Brooks-Wilson; Emil Rehnberg; Ishak D Irwan; Lars P Ryder; Peter N Brown; Paige M Bracci; Luz Agana; Jacques Riby; Wendy Cozen; Scott Davis; Patricia Hartge; Lindsay M Morton; Richard K Severson; Sophia S Wang; Susan L Slager; Zachary S Fredericksen; Anne J Novak; Neil E Kay; Thomas M Habermann; Bruce Armstrong; Anne Kricker; Sam Milliken; Mark P Purdue; Claire M Vajdic; Peter Boyle; Qing Lan; Shelia H Zahm; Yawei Zhang; Tongzhang Zheng; Stephen Leach; John J Spinelli; Martyn T Smith; Stephen J Chanock; Leonid Padyukov; Lars Alfredsson; Lars Klareskog; Bengt Glimelius; Mads Melbye; Edison T Liu; Hans-Olov Adami; Keith Humphreys; Jianjun Liu
Journal: PLoS Genet Date: 2011-04-21 Impact factor: 5.917

9. Genome-wide association study and meta-analysis find that over 40 loci affect risk of type 1 diabetes.

Authors: Jeffrey C Barrett; David G Clayton; Patrick Concannon; Beena Akolkar; Jason D Cooper; Henry A Erlich; Cécile Julier; Grant Morahan; Jørn Nerup; Concepcion Nierras; Vincent Plagnol; Flemming Pociot; Helen Schuilenburg; Deborah J Smyth; Helen Stevens; John A Todd; Neil M Walker; Stephen S Rich
Journal: Nat Genet Date: 2009-05-10 Impact factor: 38.330

10. Host-microbe interactions have shaped the genetic architecture of inflammatory bowel disease.

Authors: Luke Jostins; Stephan Ripke; Rinse K Weersma; Richard H Duerr; Dermot P McGovern; Ken Y Hui; James C Lee; L Philip Schumm; Yashoda Sharma; Carl A Anderson; Jonah Essers; Mitja Mitrovic; Kaida Ning; Isabelle Cleynen; Emilie Theatre; Sarah L Spain; Soumya Raychaudhuri; Philippe Goyette; Zhi Wei; Clara Abraham; Jean-Paul Achkar; Tariq Ahmad; Leila Amininejad; Ashwin N Ananthakrishnan; Vibeke Andersen; Jane M Andrews; Leonard Baidoo; Tobias Balschun; Peter A Bampton; Alain Bitton; Gabrielle Boucher; Stephan Brand; Carsten Büning; Ariella Cohain; Sven Cichon; Mauro D'Amato; Dirk De Jong; Kathy L Devaney; Marla Dubinsky; Cathryn Edwards; David Ellinghaus; Lynnette R Ferguson; Denis Franchimont; Karin Fransen; Richard Gearry; Michel Georges; Christian Gieger; Jürgen Glas; Talin Haritunians; Ailsa Hart; Chris Hawkey; Matija Hedl; Xinli Hu; Tom H Karlsen; Limas Kupcinskas; Subra Kugathasan; Anna Latiano; Debby Laukens; Ian C Lawrance; Charlie W Lees; Edouard Louis; Gillian Mahy; John Mansfield; Angharad R Morgan; Craig Mowat; William Newman; Orazio Palmieri; Cyriel Y Ponsioen; Uros Potocnik; Natalie J Prescott; Miguel Regueiro; Jerome I Rotter; Richard K Russell; Jeremy D Sanderson; Miquel Sans; Jack Satsangi; Stefan Schreiber; Lisa A Simms; Jurgita Sventoraityte; Stephan R Targan; Kent D Taylor; Mark Tremelling; Hein W Verspaget; Martine De Vos; Cisca Wijmenga; David C Wilson; Juliane Winkelmann; Ramnik J Xavier; Sebastian Zeissig; Bin Zhang; Clarence K Zhang; Hongyu Zhao; Mark S Silverberg; Vito Annese; Hakon Hakonarson; Steven R Brant; Graham Radford-Smith; Christopher G Mathew; John D Rioux; Eric E Schadt; Mark J Daly; Andre Franke; Miles Parkes; Severine Vermeire; Jeffrey C Barrett; Judy H Cho
Journal: Nature Date: 2012-11-01 Impact factor: 49.962

220 in total

Review 1. [Glomerulonephritides].

Authors: J Floege
Journal: Internist (Berl) Date: 2015-11 Impact factor: 0.743

2. Ubiquitin Ligase TRIM62 Regulates CARD9-Mediated Anti-fungal Immunity and Intestinal Inflammation.

Authors: Zhifang Cao; Kara L Conway; Robert J Heath; Jason S Rush; Elizaveta S Leshchiner; Zaida G Ramirez-Ortiz; Natalia B Nedelsky; Hailiang Huang; Aylwin Ng; Agnès Gardet; Shih-Chin Cheng; Alykhan F Shamji; John D Rioux; Cisca Wijmenga; Mihai G Netea; Terry K Means; Mark J Daly; Ramnik J Xavier
Journal: Immunity Date: 2015-10-20 Impact factor: 31.745

3. Association of ITGAX and ITGAM gene polymorphisms with susceptibility to IgA nephropathy.

Authors: Dianchun Shi; Zhong Zhong; Ricong Xu; Bin Li; Jianbo Li; Ullah Habib; Yuan Peng; Haiping Mao; Zhijian Li; Fengxian Huang; Xueqing Yu; Ming Li
Journal: J Hum Genet Date: 2019-06-21 Impact factor: 3.172

Review 4. IgA Nephropathy.

Authors: Jennifer C Rodrigues; Mark Haas; Heather N Reich
Journal: Clin J Am Soc Nephrol Date: 2017-02-03 Impact factor: 8.237

Review 5. CFHR Gene Variations Provide Insights in the Pathogenesis of the Kidney Diseases Atypical Hemolytic Uremic Syndrome and C3 Glomerulopathy.

Authors: Peter F Zipfel; Thorsten Wiech; Emma D Stea; Christine Skerka
Journal: J Am Soc Nephrol Date: 2020-01-24 Impact factor: 10.121

6. Intestinal Microbiota and Kidney Diseases.

Authors: Ao Xie; Jie Sheng; Feng Zheng
Journal: Chin J Integr Med Date: 2018-04-12 Impact factor: 1.978

7. Update on immunoglobulin A nephropathy, Part I: Pathophysiology.

Authors: Maurizio Salvadori; Giuseppina Rosso
Journal: World J Nephrol Date: 2015-09-06

8. Transethnic, Genome-Wide Analysis Reveals Immune-Related Risk Alleles and Phenotypic Correlates in Pediatric Steroid-Sensitive Nephrotic Syndrome.

Authors: Hanna Debiec; Claire Dossier; Eric Letouzé; Christopher E Gillies; Marina Vivarelli; Rosemary K Putler; Elisabet Ars; Evelyne Jacqz-Aigrain; Valery Elie; Manuela Colucci; Stéphanie Debette; Philippe Amouyel; Siham C Elalaoui; Abdelaziz Sefiani; Valérie Dubois; Tabassome Simon; Matthias Kretzler; Jose Ballarin; Francesco Emma; Matthew G Sampson; Georges Deschênes; Pierre Ronco
Journal: J Am Soc Nephrol Date: 2018-06-14 Impact factor: 10.121

Review 9. Genomic approaches in the search for molecular biomarkers in chronic kidney disease.

Authors: M Cañadas-Garre; K Anderson; J McGoldrick; A P Maxwell; A J McKnight
Journal: J Transl Med Date: 2018-10-25 Impact factor: 5.531

Review 10. Genes and environment in chronic kidney disease hotspots.

Authors: David J Friedman
Journal: Curr Opin Nephrol Hypertens Date: 2019-01 Impact factor: 2.894