Literature DB >> 31672989

GWAS for systemic sclerosis identifies multiple risk loci and highlights fibrotic and vasculopathy pathways.

Elena López-Isac1, Marialbert Acosta-Herrera2, Martin Kerick2, Shervin Assassi3, Ansuman T Satpathy4,5, Jeffrey Granja4,5, Maxwell R Mumbach4,5, Lorenzo Beretta6, Carmen P Simeón7, Patricia Carreira8, Norberto Ortego-Centeno9, Ivan Castellvi10, Lara Bossini-Castillo11, F David Carmona12, Gisela Orozco13, Nicolas Hunzelmann14, Jörg H W Distler15, Andre Franke16, Claudio Lunardi17, Gianluca Moroncini18, Armando Gabrielli18, Jeska de Vries-Bouwstra19, Cisca Wijmenga20, Bobby P C Koeleman21, Annika Nordin22, Leonid Padyukov22, Anna-Maria Hoffmann-Vold23, Benedicte Lie24, Susanna Proudman25, Wendy Stevens26, Mandana Nikpour27, Timothy Vyse28, Ariane L Herrick29,30, Jane Worthington13, Christopher P Denton31, Yannick Allanore32, Matthew A Brown33, Timothy R D J Radstake34, Carmen Fonseca31, Howard Y Chang4,5, Maureen D Mayes3, Javier Martin35.   

Abstract

Systemic sclerosis (SSc) is an autoimmune disease that shows one of the highest mortality rates among rheumatic diseases. We perform a large genome-wide association study (GWAS), and meta-analysis with previous GWASs, in 26,679 individuals and identify 27 independent genome-wide associated signals, including 13 new risk loci. The novel associations nearly double the number of genome-wide hits reported for SSc thus far. We define 95% credible sets of less than 5 likely causal variants in 12 loci. Additionally, we identify specific SSc subtype-associated signals. Functional analysis of high-priority variants shows the potential function of SSc signals, with the identification of 43 robust target genes through HiChIP. Our results point towards molecular pathways potentially involved in vasculopathy and fibrosis, two main hallmarks in SSc, and highlight the spectrum of critical cell types for the disease. This work supports a better understanding of the genetic basis of SSc and provides directions for future functional experiments.

Entities:  

Mesh:

Year:  2019        PMID: 31672989      PMCID: PMC6823490          DOI: 10.1038/s41467-019-12760-y

Source DB:  PubMed          Journal:  Nat Commun        ISSN: 2041-1723            Impact factor:   14.919


Introduction

Rheumatic diseases are one of the main causes of physical disability of non-mental origin in the Western world according to the World Health Organization. Rheumatic diseases have a marked impact on the quality of life of patients. Among them, systemic sclerosis (SSc) has one of the highest mortality rates[1]. SSc is a chronic autoimmune disease (AD) that affects the connective tissue, with very heterogeneous clinical manifestations. The pathogenesis of the disease involves extensive fibrosis of the skin and internal organs, vascular damage, and immune imbalance, including autoantibody production[2,3]. Lung involvement—both pulmonary hypertension and/or pulmonary fibrosis—is the leading cause of death[4]. As most ADs, SSc has a complex genetic component and its etiology is poorly understood. Genome-wide association studies (GWASs) have been successful in the identification of thousands of genetic variants associated with the susceptibility of complex traits. Moreover, GWASs provide invaluable information on disease aetiopathogenesis and contribute to drug discovery and repurposing[5,6]. Several GWASs of SSc have been published, which greatly contributed to the understanding of SSc pathogenesis, and pointed out to relevant pathways for the disease, such as the interferon pathway, the interleukin 12 pathway, and apoptosis[7-12]. Nonetheless, the rate of discovery of previous studies was limited owing to the relatively small sample sizes of the study cohorts. To continue unraveling the partially known genetic background of SSc, we perform a powerful meta-GWAS in European population that includes ~ 10,000 patients. We also hypothesize that an integrative approach combining all SSc association signals, fine-mapping, and the identification of target genes based on chromatin contacts would provide further insights into the biology of the disease.

Results

Twenty-seven signals independently associated with SSc

We performed genome-wide association analyses in 14 independent European cohorts comprising a total of 26,679 individuals (9,095 SSc patients and 17,584 healthy controls). Nine out of the 14 SSc GWAS cohorts were so far unreported, whereas 5 had been previously published[7,8] (Supplementary Data 1). After correcting for sex and the first five principal components (PCs) (Methods), we did not observe genomic inflation in any of the independent GWAS cohorts, with the exception of the Italian cohort, which remained with residual inflation (Supplementary Data 1 and Supplementary Fig. 1). Overall, the meta-analysis showed a genomic inflation factor (λ) of 1.10, with a rescaled λ1000 of 1.008 for an equivalent study of 1000 cases/1000 controls (Supplementary Data 1). We undertook an inverse variance-weighted meta-analysis with a high-density genotyped and imputed SNP panel (4.72 million SNPs) to combine all independent GWASs. We considered all the SNPs that were shared by at least two data sets to avoid SNP data loss. This approach yielded 431 significantly associated SNPs (association test p value ≤ 5 × 10−8) excluding the well-known HLA region. Significant signals involved 23 genomic regions, of which 13 were new genome-wide significant loci for SSc and 10 corresponded to previously reported GWAS signals (Table 1, Fig. 1).
Table 1

Twenty-seven signals independently associated with systemic sclerosis in the meta-GWAS

ChrLocusBpSNPIndex SNPRef.MAF N P valueOR Q I P cond Func refgene
1 IL12RB2 67814440rs3790566YesT0.24133.84E-101.160.800- Intronic
1 CD247 167420425rs2056626YesG0.3961.31E-110.810.570- Intronic
1 TNFSF4- LOC100506023 -PRDX6 173238736 rs2022449 No T 0.23 12 6.28E-08 1.15 0.90 0 6.63E-08 Regulatory region
1 TNFSF4- LOC100506023 -PRDX6 173332629 rs1857066 Yes A 0.25 13 5.02E-09 0.87 0.84 0 - ncRNA intronic
2 NAB1* 191534372 rs16832798 Yes C 0.14 14 5.20E-09 1.18 0.41 3.79 3.84E-07 Intronic
2 STAT4 191902758rs3821236YesA0.20121.94E-231.310.0348.21- Intronic
2 STAT4 191959489rs4853458NoA0.2394.86E-181.350.421.795.58E-08 Intronic
3 FLNB -DNASE1L3-PXK 58131515rs7355798NoT0.24131.24E-081.140.1430.57.42E-07 Intronic
3 FLNB-DNASE1L3- PXK 58375286rs4076852YesG0.26131.04E-101.160.710- Intronic
3 POGLUT1-TIMMDC1-CD80- ARHGAP31 119116150 rs9884090 Yes A 0.16 13 1.89E-10 0.83 0.92 0 - Intronic
3 IL12A 159733527rs589446YesT0.35111.95E-100.860.850-ncRNA intronic
4 DGKQ 965779 rs11724804 Yes A 0.44 12 5.31E-11 1.17 0.24 21.04 - Intronic
4 NFKB1 103449041 rs230534 Yes T 0.34 10 5.38E-09 1.15 0.92 0 - Intronic
5 TNIP1 150455732rs3792783YesG0.16142.42E-121.200.0347.41- Intronic
6 ATG5 106734040rs633724YesT0.35142.84E-091.130.3113.41- Intronic
7 IRF5- TNPO3 128651522rs36073657YesT0.10123.10E-211.400.2123.35- Intronic
7 IRF5- TNPO3 128658739rs12155080NoG0.37132.87E-130.850.6902.22E-07 Intronic
8 FAM167A-BLK 11343973 rs2736340 Yes T 0.24 14 3.33E-21 1.24 0.17 26.76 - Intergenic
8 RAB2A-CHD7 61564964 rs685985 Yes T 0.47 11 3.82E-08 0.87 0.15 30.84 - Intergenic
11 CDHR5 -IRF7 618172 rs6598008 Yes A 0.44 4 1.97E-08 0.80 0.16 42.27 - Intronic
11 TSPAN32,CD81-AS1 2348619 rs2651804 Yes T 0.17 12 2.54E-10 0.82 0.67 0 - Intergenic
11 DDX6 118639353 rs11217020 Yes A 0.20 14 2.08E-11 0.84 0.80 0 - Intronic
15 CSK 75077367rs1378942YesC0.39131.84E-141.180.900- Intronic
16 IRF8 85971922 rs11117420 Yes C 0.19 12 3.82E-15 0.81 0.47 0 - Intergenic
17 IKZF3 -GSDMB 38063381 rs883770 Yes T 0.50 14 4.79E-09 1.13 0.75 0 - Intronic
17 NUP85 -GRB2 73224639 rs1005714 Yes G 0.20 13 1.87E-08 0.85 0.68 0 - Intronic
19 IL12RB1 18193191rs2305743YesA0.20124.64E-100.830.2816.88- Intronic

The new genome-wide significant loci for systemic sclerosis are highlighted in bold. NAB1-rs16832798 p value conditioned on conditioned on STAT4-rs3821236 and STAT4-rs4853458. For those intronic or regulatory SNPs that are located in a high gene density region, the gene they lie in was underlined

Bp base pair, Chr chromosome, MAF minor allele frequency in the 1000 Genome Project European Population, N number of cohorts, OR odds ratio, Pcond p value conditioned on index SNP, Ref. reference allele, SNP single-nucleotide polymorphism

Fig. 1

Association signals for systemic sclerosis in a large meta-GWAS. a Manhattan plot representing the meta-GWAS results. The −log10 of the p values are plotted against their physical chromosomal position. The red and blue lines represent the genome-wide level of significance (p < 5 × 10−8) and p value threshold at p < 1 × 10−5, respectively. The plot has been truncated at p < 1 × 10−30. The lowest p value was observed within the MHC region for rs6457617 (association test p = 3.25 × 10−43). b Locuszoom to depict independent association signals in IRF5-TNPO3. From left to right, locuszoom of the association signals in IRF5-TNPO3 for the global meta-analysis; association signals conditioned on the lead SNP (rs36073657), and conditioned on rs36073657 and the secondary signal at the locus (rs12155080)

Twenty-seven signals independently associated with systemic sclerosis in the meta-GWAS The new genome-wide significant loci for systemic sclerosis are highlighted in bold. NAB1-rs16832798 p value conditioned on conditioned on STAT4-rs3821236 and STAT4-rs4853458. For those intronic or regulatory SNPs that are located in a high gene density region, the gene they lie in was underlined Bp base pair, Chr chromosome, MAF minor allele frequency in the 1000 Genome Project European Population, N number of cohorts, OR odds ratio, Pcond p value conditioned on index SNP, Ref. reference allele, SNP single-nucleotide polymorphism Association signals for systemic sclerosis in a large meta-GWAS. a Manhattan plot representing the meta-GWAS results. The −log10 of the p values are plotted against their physical chromosomal position. The red and blue lines represent the genome-wide level of significance (p < 5 × 10−8) and p value threshold at p < 1 × 10−5, respectively. The plot has been truncated at p < 1 × 10−30. The lowest p value was observed within the MHC region for rs6457617 (association test p = 3.25 × 10−43). b Locuszoom to depict independent association signals in IRF5-TNPO3. From left to right, locuszoom of the association signals in IRF5-TNPO3 for the global meta-analysis; association signals conditioned on the lead SNP (rs36073657), and conditioned on rs36073657 and the secondary signal at the locus (rs12155080) The presence of independent signals in the genomic regions that showed significant associations was investigated by stepwise conditional analysis using summary statistics from the meta-analysis (Methods). Four genomic regions—TNFSF4 (1q25.1), STAT4 (2q32.2-q32.3), DNASE1L3 (3p14.3), and IRF5-TNPO3 (7q32.1)—showed additional significant signals after conditioning on the lead SNP of each locus (conditional association test p value (Pcond) <  5  ×  10−6) (Table 1, Fig. 1, Supplementary Fig. 2). Hence, a total of 27 independent signals associated with SSc were identified. The two independent signals identified in the DNASE1L3 genomic region (3p14.3) (rs4076852, rs7355798) were intronic variants at PXK and FLNB, respectively. PXK-rs4076852 is in high linkage disequilibrium (LD) with PXK-rs2176082 (r2 = 0.92), which was reported to be associated with SSc in Martin et al.[13] However, two Immunochip studies conducted by Mayes et al.[9] and Zochling et al.[12] showed that the primary association in this genomic region was with the nonsynonymous SNP DNASE1L3-rs35677470 (R206C), not present in our SNP panel. Mayes et al.[9] showed that the PXK-rs2176082 association was dependent on the rs35677470 (R206C). Therefore, although we could not analyze the dependence in our GWAS data, we presumed that PXK-rs4076852 signal was also dependent on DNASE1L3-rs35677470 on the basis of previous evidence. Regarding the intronic signal in FLNB (rs7355798), we could not estimate whether it was dependent on DNASE1L3-rs35677470 or not. However, given its role in vascular injury repair[14], FLNB may be an interesting SSc locus and should be the object of future research. In the case of the STAT4 genomic region (2q32.2-q32.3), we observed three independent signals, of which the third was an intronic variant at NAB1 (rs16832798). This finding—added to further functional evidence provided below—revealed NAB1 as a new SSc risk locus. We also observed that the genome-wide signal in GSDMB (17q21.1; rs883770) was independent (Pcond = 1.27  ×  10−7) from the recently reported signal at GSDMA (rs3894194), which is located in the same genomic region[10].

Fine-mapping of SSc-associated loci in a Bayesian framework

The identification of the causal SNPs driving the association signals remains an open question after completion of a GWAS. To address this question, Bayesian fine-mapping was performed to define 95% credible sets (the smallest set of variants that summed together at least a 95% probability of including the likely causal variant) in each of the independently associated loci (the two independent signals in IRF5-TNPO3 were excluded as fine-mapping was not feasible). To improve SNP prioritization accuracy, the probabilistic method integrated association strength with functional annotation data (Methods). Eighteen (72%) and 12 (48%) out of the 25 loci were fine-mapped to ≤10 and to <5 plausible causal variants, respectively (Table 2, Supplementary Data 2). In six loci, the 95% credible set comprised a single variant (ARHGAP31, BLK, CD247, TNIP1, CSK, STAT4-a), and for four others the credible set contained two SNPs (DGKQ, NUP85-GRB2, STAT4-b, IL12RB1). Moreover, in 64% of the credible sets, the index SNP showed the maximum posterior probability (PPmax) of being causal. The SNPs with PPmax were intergenic, intronic, or noncoding RNA intronic (ncRNA intronic) variants, although the remaining credible set SNPs involved additional SNP categories, namely: UTR3′, downstream, exonic synonymous, and exonic nonsynonymous (Table 2).
Table 2

Posterior probabilities of systemic sclerosis fine-mapped loci

ChrCredible set locusSNPs Cred. SetIndex SNPPP IndexSNP PP MaxPP MaxFunc.refgene SNP PP MaxFunc.refgene in the 95% credible set
1 IL12RB2 6rs37905660.195rs37905670.321IntronicIntronic
1 CD247 1rs20566260.999rs20566260.999Intronic-
1 TNFSF4- LOC100506023 -PRDX6 6rs20224490.659rs20224490.659ncRNA IntronicncRNA_intronic
1 TNFSF4- LOC100506023 -PRDX6 43rs18570660.046rs115765470.265ncRNA intronicncRNA_intronic
2 NAB1 11rs168327980.191rs7162540.242IntronicIntronic; intergenic; downstream
2 STAT4-a 1 1rs38212361.000rs38212361.000Intronic-
2 STAT4-b 2 2rs48534580.905rs48534580.905IntronicIntronic
3 FLNB -DNASE1L3-PXK 6rs73557980.365rs73557980.365IntronicIntronic
3 FLNB-DNASE1L3- PXK 27rs40768520.123rs76537340.292IntronicIntronic; intergenic
3 POGLUT1-TIMMDC1-CD80- ARHGAP31 1rs98840900.956rs98840900.956Intronic-
3 IL12A 23rs5894460.385rs5894460.385ncRNA intronicncRNA_intronic
4 DGKQ 2rs117248040.793rs117248040.793IntronicIntronic
4 NFKB1 6rs2305340.200rs2305170.329IntronicIntronic
5 TNIP1 1rs37927830.999rs37927830.999Intronic-
6 ATG5 3rs6337240.588rs6337240.588IntronicIntronic
8 FAM167A-BLK 1rs27363401.000rs27363401.000Intergenic-
8 RAB2A-CHD7 80rs6859850.003rs69870840.139IntronicIntronic; UTR3; intergenic
11 CDHR5 -IRF7 4rs65980080.760rs65980080.760IntronicIntronic; exonic synonymous SNV; UTR3; exonic nonsynonymous
11 TSPAN32,CD81-AS1 20rs26518040.184rs26518040.184IntergenicIntergenic
11 DDX6 7rs112170200.021rs108922860.775IntronicIntronic; intergenic
15 CSK 1rs13789420.993rs13789420.993Intronic-
16 IRF8 6rs111174200.202rs111174220.54IntergenicIntergenic
17 IKZF3 -GSDMB 17rs8837700.032rs93032770.157IntronicIntergenic; intronic; exonic synonymous SNV; exonic nonsynonymous
17 NUP85 -GRB2 2rs10057140.940rs10057140.940IntronicIntronic
19 IL12RB1 2rs23057430.944rs23057430.944IntronicIntronic

Chr chromosome, PP posterior probability, SNP single-nucleotide polymorphism

1Name of the credible set that comprised the index SNP from STAT4 genomic region (2q32.2-q32.3)

2Name of the credible set that comprised the secondary association signal in STAT4 genomic region (2q32.2-q32.3)

Posterior probabilities of systemic sclerosis fine-mapped loci Chr chromosome, PP posterior probability, SNP single-nucleotide polymorphism 1Name of the credible set that comprised the index SNP from STAT4 genomic region (2q32.2-q32.3) 2Name of the credible set that comprised the secondary association signal in STAT4 genomic region (2q32.2-q32.3)

Functional annotation of SNPs from credible sets

Since most of the likely causal variants were linked to regulatory functions rather than affecting the function of proteins encoded by surrounding genes, we further explored their regulatory effects. For this purpose, we performed functional annotation of SNPs from credible sets through eQTL analysis (Methods). In addition, we explored overlap with histone marks of active promoters and active enhancers (H3K9ac, H3K4me1, and H3K27ac) of cell types relevant to the disease using data from the Roadmap Epigenomics Project[15] (Supplementary Table 1) (Methods). Supplementary Figure 3 summarizes the results of the functional characterization of credible set SNPs. When the 95% credible set was not well resolved (credible sets that contained > 15 likely causal variants), we selected the SNP with PPmax and the index SNP. In the case of IRF5-TNPO3, where the credible set was not feasible, we selected the two independent signals identified at this locus. We obtained a final reduced list of credible set SNPs containing a total of 81 variants. As it can be observed in Supplementary Fig. 3, the vast majority of the likely causal variants overlapped with promoter and enhancer histone marks in the cell types interrogated. These observations suggest that most of the genetic variations involved in the susceptibility to SSc modulate transcriptional regulatory mechanisms. In this regard, we found that 61 out of the 81 interrogated variants (75.31% of the 81 listed credible set SNPs) represent eQTLs, thus altering gene expression in different tissues and cell types (Supplementary Data 3). In fact, the credible set SNPs were significantly enriched for eQTLs in blood and non-blood tissues (odds ratio (OR) = 3.05, Fisher’s exact test P = 5.65 × 10−6; OR = 1.61, Fisher’s exact test P = 4.48 × 10−2, respectively) (Supplementary Table 2). Many SNPs were shown to impact the expression of the closest gene (a priori candidate gene) (Supplementary Data 3). In addition, we also found genetic variants affecting the expression of a priori candidate genes and other genes. However, some SNPs only showed eQTL signals for genes other than the closest one. As an example, the SNP rs9884090—which is an intronic variant at ARHGAP31—was found to alter the expression of POGLUT1 and TIMMDC1 in several tissues (Supplementary Data 3). These results highlight that assigning association signals to the nearby gene is not always the most appropriate strategy and the functional role of certain SSc signals may expand to different target genes. Five 95% credible sets comprised exonic nonsynonymous variants or contained SNPs in high-to-moderate LD with exonic nonsynonymous variants (r2 ≥ 0.8, r2 ≥ 0.6, respectively) (ARHGAP31, IRF7, GSDMB, NUP85-GRB2, and IL12RB1) (Supplementary Fig. 3, Supplementary Data 4). However, based on SIFT and PolyPhen, none of these exonic variants showed a clear consensus to be deleterious[16,17] (Supplementary Data 4). Finally, we assessed pleiotropic effect of our signals by determining whether the likely causal SNPs were also risk factors for other diseases. The results showed extensive overlap especially with other two ADs: systemic lupus erythematosus and primary biliary cholangitis (Supplementary Data 5). These findings were consistent with previous reports that identified shared risk loci for SSc and other immune-mediated diseases[11,13].

H3K27ac HiChIP in T cells expands and refines target genes

As stated above, assigning disease-associated variants to the closest gene is not always an appropriate strategy to determine the potential mechanistic effect of association signals. With the aim of identifying the putative drivers of SSc association hits on the basis of functional evidence, we performed an analysis of experimentally derived high-resolution maps of enhancer-promoter interactions generated by H3K27ac HiChIP experiments in human CD4 + T cells[18] (Methods). HiChIP interactions were detected in 18 out of the 27 (66.67%) independently associated loci, using the SNPs with PPmax as anchor points (Table 3). Several intronic variants were linked to the target gene promoter in which they are mapped. This was the case of CD247-rs2056626, which showed a strong H3K27ac HiChIP signal to the CD247 promoter (Fig. 2). Other relevant examples of this type of interactions were found in IL12RB2 and NFKB1. The intronic variants in STAT4, rs3821236 and rs4853458, showed strong normalized HiChIP signal to STAT4 and STAT1 promoters (Fig. 2). We also observed HiChIP contacts that linked intergenic SNPs to the closest genes. For example, rs11117422, located ~40 kb downstream of IRF8 transcriptional start site, showed interactions with the promoter region of IRF8 (Fig. 2). In addition, several other enhancer-promoter interactions linked intronic and intergenic SNPs to distant genes. In total, H3K27ac HiChIP signals nominated 155 target genes from 18 SSc likely causal variants (~8 genes per SNP on average) (Table 3).
Table 3

H3K27ac HiChIP target genes and nominated target genes for the 27 systemic sclerosis association signals

ChLocusSNP PP MaxHiChIP target genes with SNP PP MaxNominated genes by H3K27ac HiChIP + eQTL validationNominated genes by CHi-C + eQTL validationPrioritized genes by DEPICT/other criteria
1 IL12RB2 rs3790567 IL12RB2 , IL23R IL12RB2 IL12RB2, SERBP1 IL12RB2
1 CD247 rs2056626 CD247 , POU2F1, TADA1, GPA33, MAEL, DUSP27, CREG1, RCSD1, MPZL1, DCAF6, MPC2 CD247 CD247 CD247
1 TNFSF4- LOC100506023 -PRDX6 rs2022449 TNFSF4
1 TNFSF4- LOC100506023 -PRDX6 rs11576547 TNFSF4 a
2 NAB1 rs716254 NAB1 , GLS, TMEM194B, MFSD6 NAB1 NAB1, TMEM194B
2 STAT4 rs3821236 STAT4, STAT1, GLS, MYO1B, NABP1, SDPR STAT4, STAT1** STAT4
2 STAT4 rs4853458 STAT4, STAT1, GLS, MYO1B, NABP1, SDPR STAT4, STAT1**
3 FLNB -DNASE1L3-PXK rs7355798 PXK, FLNB, RPP14 FLNB-AS1
3 FLNB-DNASE1L3- PXK rs7653734 PXK, RPP14 PXK, DNASE1L3 b
3 POGLUT1-TIMMDC1-CD80- ARHGAP31 rs9884090 POGLUT1, TIMMDC1 CD80, ARHGAP31, POGLUT1
3 IL12A rs589446 IL12A IL12A IL12A
4 DGKQ rs11724804 DGKQ, GAK, TMEM175
4 NFKB1 rs230517 NFKB1, MANBA , UBE2D3, CISD2, SLC9B1, SLC39A8 NFKB1, MANBA NFKB1, MANBA, BDH2 NFKB1
5 TNIP1 rs3792783TNIP1, GPX3, CD71, RPS14, RMB22, DCTN4, IRGM, SMIM3,ANXA6, GM2A, SLC36A3 TNIP1, ANXA6 TNIP1, ANXA6 TNIP1
6 ATG5 rs633724 ATG5 b
7 IRF5- TNPO3* rs36073657TNPO3, IRF5 IRF5 IRF5, FAM71F2
7 IRF5- TNPO3* rs12155080TNPO3, IRF5 TNPO3, IRF5 TNPO3 IRF5
8 FAM167A,BLK rs2736340 BLK, RP11-481A20.11, FDFT1, NEIL2 BLK, C8orf14
8 RAB2A-CHD7 rs6987084CHD7, RAB2A, CLVS1 RAB2A
11 CDHR5 -IRF7 rs6598008IRF7, RIC8A, BET1L, PSMD13, SIRT3, NLRP6, PTDSS2, RNH1, HRAS, RASSF7, PHRF1, DEAF1, DRD4, TALDO1, PDDC1, CEND1, EPS8L2 IRF7, RNH1, HRAS, PHRF1, DRD4, EPS8L2 DRD4, C11orf35, IRF7, PHRF1, RNH1 IRF7
11 TSPAN32,CD81-AS1 rs2651804 CD81, ASCL2 TSPAN32
11 DDX6 rs10892286DDX6, KMT2A, TREH, CXCR5, BCL9L, UPK2, ATP5L, TRAPPC4, FOXR1, VPS11 DDX6, TREH PHLDB1, TREH
15 CSK rs1378942CSK, PML, STOML1, ARID3B, SEMA7A, CLK3, EDC3, CYP1A1, LMAN1L, CPLX3, ULK3, SCAMP2, MPI, COX5A, RPP25, PPCDC, GOLGA6C, MAN2C1, NEIL1, SIN3A, SNUPN, SNX33 CSK, CYP1A1, LMAN1L, CPLX3, ULK3, SCAMP2, MPI, RPP25, PPCDC CPLX3, CSK, CYP1A1, FAM219B, LMAN1L, MPI, PPCDC, RPP25, SCAMP2, SCAMP5, ULK3 CSK, SCAMP2, ULK3
16 IRF8 rs11117422 IRF8 , COX4I1, EMC8, FOXF1 IRF8** IRF8
17 IKZF3- GSDMB rs9303277GSDMB, IKZF3, ZPBP2, GRB7, MED1, CDK12, PPP1R1B,PNMT, PGAP3, ERBB2, ORMDL3, NEUROD2, STARD3, TCAP, MIEN1, RPL19, CACNB1, FBXL20, LRRC3C, GSDMA, PSMD3, THRA, NR1D1, MSL1, RARA, GJD3, TNS4, IGFBP4, CCR7, SMARCE1, KRTAP17, KRT33A GSDMB, IKZF3, ZPBP2, PNMT, PGAP3, ORMDL3, GSDMA, PSMD3 IKZF3, ZPBP2, GSDMB, ORMDL3 GSDMB, IKZF3, ORMDL3
17 NUP85- GRB2 rs1005714NUP85, MRPS7, ATP5H, ICT1 NUP85, MRPS7 ARMC7, GGA3, SLC16A5, MRPS7, NT5C, NUP85 ARMC7, GGA3, NT5C
19 IL12RB1 rs2305743IL12RB1, MAST3, ARRDC2, ISYNA1, ELL, SSBP4, LSM4, JUND, RAB3A, PDE4C, KIAA1683, MPV17L2, IFI30, PIK3R2, B3GNT3, FCHO1, INSL3, JAK3, RPL18A, SLC5A IL12RB1, KIAA1683 KIAA1683 IL12RB1, MAST3

In column ‘HiChIP Target Genes withSNP PP Max’, genes with HiChIP and eQTL signals are highlighted in bold

Ch, chromosome; PP, posterior probability from the statistical fine-mapping; SNP, single-nucleotide polymorphism

*Credible set not feasible

**NO eQTL signals found but strong HiChIP signal

aGene nominated by proximity

bGenes nominated according to the results from the first Immunochip in systemic sclerosis (9)

Fig. 2

H3K27ac HiChIP signals at systemic sclerosis loci in human CD4+ T cells. The SNPs with maximum Posterior Probabilities in each locus were set as anchor points to assess promoter-enhancer chromatin interactions. Representation of overlap with ATAC-seq peaks is included. Chr, chromosome; Kb, kilo base; Mb, mega base; Res, resolution

H3K27ac HiChIP target genes and nominated target genes for the 27 systemic sclerosis association signals In column ‘HiChIP Target Genes withSNP PP Max’, genes with HiChIP and eQTL signals are highlighted in bold Ch, chromosome; PP, posterior probability from the statistical fine-mapping; SNP, single-nucleotide polymorphism *Credible set not feasible **NO eQTL signals found but strong HiChIP signal aGene nominated by proximity bGenes nominated according to the results from the first Immunochip in systemic sclerosis (9) H3K27ac HiChIP signals at systemic sclerosis loci in human CD4+ T cells. The SNPs with maximum Posterior Probabilities in each locus were set as anchor points to assess promoter-enhancer chromatin interactions. Representation of overlap with ATAC-seq peaks is included. Chr, chromosome; Kb, kilo base; Mb, mega base; Res, resolution Subsequently, we further validated the functional relevance of H3K27ac HiChIP results by investigating whether the explored SNPs were eQTLs for the nominated target genes. Forty enhancer-target gene relationships showed overlap with SSc eQTL genes (eGenes) (OR = 10.1, Fisher’s exact test P = 2.92 × 10−19, Supplementary Table 3). Although no eQTL to IRF8, STAT4, and STAT1 signals were found, the enrichment of the HiChIP signal observed at these loci (q value < 1e-60, Methods) over a global background of distance-matched interactions, and the crucial role of these genes in the immune response, provided evidence to prioritize them as candidate genes. Remarkably, the third independent association signal observed in the STAT4 genomic region (2q32.2-q32.3)—mapped in a NAB1 intron—was linked to NAB1 promoter by our H3K27ac HiChIP analysis. The interaction was validated by an eQTL signal (Supplementary Data 3). These results supported NAB1 as a new SSc risk locus. In total, we provided strong evidence to nominate 43 genes as robust SSc target genes in CD4 + T cells (Table 3). Interestingly, some of them pinpointed to new mechanistic insights relevant for the diseases (see Discussion).

Chromatin interaction analyses in other relevant cell types

It is noteworthy that the epigenomic profiles are cell type specific[19,20]. Considering that the HiChIP analyses were performed in CD4+ T cells, and that the pathogenesis of SSc is not only mediated by T cells, we also explored chromatin interaction maps derived from promoter capture Hi-C experiments in additional immune cell types[21,22] (Methods). These analyses identified promoter interactions not observed in CD4+ T cells, which targeted new genes (Table 3, Supplementary Data 6). For example, we observed that the FLNB-intronic variant rs7355798 interacted with FLNB and PXK promoters in B cells and macrophages. Moreover, some of the interactions observed with H3K27ac HiChIP in CD4+ T cells were also found in other immune cell types. The functional relevance of the observed chromatin interactions by promoter capture Hi-C analyses was also validated by eQTLs analysis. In total, these analyses nominated 25 additional target genes for SSc (Table 3).

Candidate genes prioritized by DEPICT

In addition, we also conducted gene prioritization by means of DEPICT (Data-driven Expression-Prioritized Integration for Complex Traits) (http://www.broadinstitute.org/mpg/depict/index.html), which pinpoints the most likely candidate gene/s in associated loci based on predicted gene functions[23]. Significant gene prioritization p values were found in 19 of the 27 queried loci (Table 3, Supplementary Data 7). Most of the prioritized genes were previously nominated by the chromatin interaction analyses. In addition, this method nominated TNFSF4, TMEM194B, FLNB-AS1, CD80, ARHGAP31, C8orf14, TSPAN32, and MAST3.

Tissue-specific enrichment of SSc loci in epigenetic marks

The majority of credible set SNPs overlapped with epigenetic marks related to active regions (Supplementary Fig. 3). To quantify the extent of this overlap, we investigated whether SSc associations were non-randomly distributed in histone marks of active promoters (H3K9ac, H3K4me2, H3K4me3, H3K4ac), active enhancers (H3K27ac, H3K4me1, H2BK20ac), and active (or at least accessible) genes (H3K79me1, H2BK15ac) across the 127 reference epigenomes available from the Roadmap Epigenomics Consortium and the Encyclopedia of DNA Elements (ENCODE) projects[15,24]. We used a nonparametric approach (GARFIELD[25,26]) to compute ORs and estimate the significance of functional enrichment at various GWAS p value cutoffs (5 × 10−6, 5 × 10−7, and 5 × 10−8) (Methods). Our results showed 363 significant enrichments (p value < 1.25 × 10−4) in 59 out of the 127 cell and tissue types analyzed (Fig. 3, Supplementary Data 8). Most of the significant enrichments were found in immune cells. SSc-associated variants displayed the most significant enrichments in H2BK15ac and H2BK20ac marks in the GM12878 lymphoblastoid cell line (OR = 37.33, enrichment p value (Penr) = 4.84 × 10−14; OR = 12.79, Penr = 2.40 × 10−10, respectively), followed by H3K79me1 in primary Natural Killer (NK) cells (BLD.CD56.PC) (OR = 12.23, Penr = 5.73 × 10−09) and primary T cells (BLD.CD3.PPC) (OR = 12.58, Penr = 9.78 × 10−09). The spleen also showed a strong functional enrichment in H2BK15ac (OR = 14.69, Penr = 9.81 × 10−09). There were significant enrichments of associations with SSc within H3K27ac, H3K4me1, H3K4me2, and H3K9ac marks—among others—of several CD4+ T cells (T helper, T regulatory, etc), CD8+ T cells, primary B cells, monocytes, primary neutrophils, and thymus (Fig. 3, Supplementary Data 8). Moreover, the SSc association signals showed different epigenetic enrichment patterns in non-immune cell/tissue types, such as lung, fibroblasts, chondrocytes, keratinocytes, osteoblasts, intestinal mucosa, and esophagus, among others (Fig. 3, Supplementary Data 8).
Fig. 3

Tissue-specific enrichment for systemic sclerosis associations in epigenetic marks. The heatmap displays the significant enrichment (p value < 1.25 × 10−4) in 59 out of the 127 cell and tissue types in Roadmap Epigenomics Consortium and the Encyclopedia of DNA Elements (ENCODE) projects. The enrichment p values are plotted with different colors according to the strength of the significance. Since the enrichments were computed at various GWAS p value cutoffs (5 × 10−6, 5 × 10−7, 5 × 10−8), the most significant p value was selected if a cell type/epigenetic mark combination showed more than one significant enrichment across the different cutoffs. Supplementary Data 13 provides the correspondence between cell codes and cell types

Tissue-specific enrichment for systemic sclerosis associations in epigenetic marks. The heatmap displays the significant enrichment (p value < 1.25 × 10−4) in 59 out of the 127 cell and tissue types in Roadmap Epigenomics Consortium and the Encyclopedia of DNA Elements (ENCODE) projects. The enrichment p values are plotted with different colors according to the strength of the significance. Since the enrichments were computed at various GWAS p value cutoffs (5 × 10−6, 5 × 10−7, 5 × 10−8), the most significant p value was selected if a cell type/epigenetic mark combination showed more than one significant enrichment across the different cutoffs. Supplementary Data 13 provides the correspondence between cell codes and cell types Interestingly, our large panel of reference epigenomes allowed us to identify cell type specific patterns of enrichment. This specificity was especially relevant for some tissue/cell types that showed enrichment for a single histone mark. For example, dermal fibroblast primary cells (SKIN.NHDFAD) only showed significant enrichment for H3K4me2 (OR = 7.91, Penr = 6.54 × 10−6).

Specific patterns of associations for the main SSc subtypes

We performed GWAS stratified analyses considering the main SSc clinical subtypes (limited cutaneous SSc (lcSSc) or diffuse cutaneous SSc (dcSSc)) and autoantibody status according to the presence of anticentromere (ACA), antitopoisomerase (ATA), and anti-RNA polymerase III (ARA) autoantibodies (Supplementary Table 4) (Methods). A total of 18 and 5 non-HLA significant signals were identified for lcSSc and dcSSc, respectively, representing 15 new genome-wide significant signals for lcSSc and 3 for dcSSc (Supplementary Data 9). Among the associations, there were loci that yielded stronger associations and larger effect size in the subtype analyses than in the global meta-analysis; all despite the reduction of the sample and, consequently, of statistical power. To assess whether the more powerful genetic signals were randomly observed, we performed 10,000 permutation analyses (Methods) and computed empirical p values (p*) taking into account the proportion of permuted genetic signals that were at least as extreme as the observed signals. As an example, DNASE1L3 genomic region (3p14.3, rs7652027) was associated with the global disease with an OR of 1.15, whereas the OR observed for the lcSSc subtype was 1.20. Permutation analysis showed significant empirical p value (p* = 1.9 × 10−3) for DNASE1L3-rs7652027 thus confirming a larger effect of this risk factor in lcSSc patients (Supplementary Data 9, Supplementary Fig. 4). Remarkably, we observed two subtype-specific signals that did not show statistical significance in the global analysis. In the case of lcSSc, there was a significant association in the MERTK genomic region (2q13) (association test p value = 1.04 × 10−08, OR = 1.15) that was not significant in the global meta-analysis (association test p value = 3.49 × 10−05, OR = 1.09) nor in dcSSc sub-analysis (association test p value = 0.503, OR = 1.02) (Supplementary Data 9 and Supplementary Fig. 5) (Supplementary Fig. 4, p* = 1.0 × 10−4). MERTK is a tyrosine kinase member of the MER/AXL/TYRO3 receptor kinase family that is associated with multiple sclerosis[27] and hepatitis C-induced liver fibrosis[28]. Moreover, the associated variant (rs3761700), affects the expression of MERTK in whole blood (p value = 1.22 × 10−35). Regarding dcSSc, we found an association signal in the ANKRD12 genomic region (18p11.22) (association test p value = 3.97 × 10−08, OR = 1.22). This locus did not show significant associations in the global meta-analysis (association test p value = 2.30 × 10−05, OR = 1.10) nor in the lcSSc subtype (association test p value = 0.034, OR = 1.06) (Supplementary Data 9 and Supplementary Fig. 5) (Supplementary Fig. 4, p* = 4.0 × 10−4). ANKRD12 encodes a member of the ankyrin repeats-containing cofactor family involved in the modulation of gene transcription. The associated variant—rs4798783—is an eQTL for TWSG1 (p value = 6.44 × 10−06) in transformed fibroblasts. Interestingly, TWSG1 enhances TGF-beta signaling (which profibrotic role is well-known) in activated T lymphocytes[29]. These findings and the specific association with dcSSc are of utmost importance, given the more aggressive and rapidly progressing fibrosis observed in this clinical subtype[2,3]. When data were stratified by patients positive for ACA, nine genome-wide significant signals were found, of which two had been previously reported (IRF5-TNPO3, and DNASE1L3) and seven were novel associations (Supplementary Data 10). Notably, the locus CDHR5-IRF7 was strongly associated with this autoantibody presentation (association test p value = 5.32 × 10−09, OR = 0.73) and showed stronger effect as compared to the global meta-analysis (association test p value = 1.97 × 10−08, OR = 0.80) (Supplementary Fig. 4, p* = 2.1 × 10−3). Regarding the ATA-positive subgroup, we replicated the previously reported association with IRF5-TNPO3 (Supplementary Data 10)[30]. Finally, in the case of the RNA pol III-positive SSc patients, we did not observe any signal at the genome-wide significance level outside the HLA region. Nonetheless, we found suggestive associations in FAM167A-BLK, GUSBP1-CDH12, and STEAP2 (Supplementary Data 10). Overall, our findings highlights how performing GWASs in more homogeneous group of patients can increase the success of case–control studies by improving association strengths, thus avoiding reduction of statistical power owing to phenotypic heterogeneity[31]. Moreover, consistent with previous studies, suggesting genetic differences in the susceptibility to SSc subtypes[9,32] the identification of specific patterns of association in each SSc subphenotype emphasizes the importance of classification biomarkers to predict more accurately the best therapeutic approach in each group of patients.

Drug target enrichment analysis

The advantage of using human genetic evidence in drug discovery and repurposing has been comprehensively addressed in the last years[5,6]. In this line, we assessed whether any of the 78 SSc target genes identified in the present study (genes from the last three columns of Table 3) encode proteins that are drug targets in any phase of development. Seven out of the 78 genes (9%) overlapped with pharmacological active targets (CD80, BLK, TNFSF4, IL12A, DRD4, PSMD3, FDFT1) in the Open Targets Platform[33] (Supplementary Data 11). Among them, CD80 and BLK were targets of drugs for SSc in any phase of clinical trial (i.e., abatacept and dasatinib, respectively). We assessed the significance of the overlap and found that our SSc target genes were significantly enriched in pharmacological active targets for SSc (OR = 6.0; Fisher’s exact test P = 4.7 × 10−2) (Supplementary Table 5).

Discussion

The large cohort of SSc included in the present study allowed us to identify 13 new risk loci for the disease, almost doubling the number of genome-wide association signals reported for SSc, bringing the total number of SSc risk loci up to 28. In the present study, some of the cohorts were genotyped using different genotyping platforms between cases and controls. To control for the potential spurious associations that this fact may lead owing to the possibility of differential imputation quality for the SNPs, we applied additional steps along with the standard QC procedures. Prior to the imputation, we carefully excluded multi-allelic, or A/T-C/G variants with MAF > 0.4 from all the data sets. After imputation, we applied an in-house Perl script that compares the genotypic frequencies between cases and controls and excluded all SNPs showing genotypic inconsistencies. In addition, manual inspection of the individual Manhattan plots from the 14 independent cohorts was performed and any suspicious false positive signal was carefully analyzed and removed, if necessary. In addition, the genome-wide significant signals identified in the present study were the results of combining the effect of the signals across several independent cohorts. Moreover, no significant heterogeneity of the ORs was observed. Applying a statistical fine-mapping approach, we reduced associated signals to 95% credible sets of 10 likely causal SNPs or fewer for 18 loci (72%). Notably, 95% credible sets comprised a single variant in 6 loci (ARHGAP31, BLK, CD247, TNIP1, CSK, STAT4-a). In other four loci, the credible sets contained two SNPs (DGKQ, NUP85-GRB2, STAT4-b, IL12RB1). Functional annotation of likely causal variants from credible sets revealed that all variants with PPmax were intronic, intergenic, or ncRNA intronic SNPs. These observations suggest that most of the genetic variations underlying SSc susceptibility are related to transcriptional regulatory mechanisms, including mRNA processing or stability mediated by ncRNAs. Our results are in accordance with emerging evidence that suggest a role of ncRNAs in autoimmunity[34]. Moreover, the exonic nonsynonymous variants included in the 95% credible sets or in high LD to credible set SNPs did not show clear evidence of being damaging mutations, as in the case of CDHR5-rs2740375 located close to IRF7. As an exception, it is worth mentioning that the nonsynonymous variant rs35677470 (R206C) in DNASE1L3, not present in our SNP panel, was reported to impact the DNase activity of the encoded protein in in vitro studies[35]. This fact is consistent with the result of our statistical fine-mapping since the 95% credible set for this locus was not well resolved (27 likely causal variants comprised the credible set). Further functional studies will be needed to confirm that rs35677470 (R206C) is the actual causal variant underlying the association or whether there are secondary signals that also influence the role of this genomic region in SSc susceptibility. As expected, the results from gene expression data (eQTLs) suggested that the functional role of certain SSc signals may expand to several target genes. This hypothesis was confirmed through the experimentally derived high-resolution maps of enhancer-promoter interactions generated by H3K27ac HiChIP in human CD4+ T cells. On average, HiChIP results found physical interactions for approximately eight genes per SNP across the 18 SSc likely causal variants that were mapped in the H3K27ac HiChIP analysis, consistent with previous findings for other ADs[20]. Strong interactions were observed in relation to some SNPs. For example, CD247-rs2056626 (intronic SNP with PPmax = 0.99 in the fine-mapping) showed a strong normalized signal of HiChIP interaction to the CD247 promoter, suggesting that the SNP affects an intronic regulatory element that controls gene expression. This interaction between the SSc risk SNP and the promoter of the CD247 gene in CD4 + T-cells was also observed by the promoter capture Hi-C technique[21,36], further supporting that the SNP may be involved in the transcriptional regulation of this gene. In fact, rs2056626 is a cis-eQTL for CD247 (p value = 2.411 × 10−48; FDR = 0) in whole blood. The identification of target genes for GWAS signals is one of the most challenging questions. In the present study, aggregate analysis of chromatin interaction maps in a wide spectrum of immune cell lines and eQTLs provided strong support to nominate 68 genes as robust SSc target genes (Table 3). Interestingly, the function of some of these experimentally nominated target genes are related to relevant pathways or biological processes in SSc. For example, DDX6 encodes a RNA helicase essential for efficient miRNA-induced gene silencing. De Vries et al. demonstrated the role of DDX6 in the regulation of vascular endothelial growth factor under hypoxic conditions[37]. Therefore, this hit may establish a link between vasculopathy and SSc unknown so far. Other two new loci providing relevant mechanistic insights are RAB2A and GSDMB. RAB2A belongs to the Rab family, a group of membrane-bound proteins, involved in vesicular fusion and trafficking. Specifically, RAB2A has been proposed to be a key factor in autophagosome clearance[38], thus it is another SSc risk locus involved in autophagy apart from the previously described ATG5[9]. These results reinforce the role of autophagy in SSc pathogenesis. In regard to GSDMB, it encodes a member of the gasdermin-domain containing protein family. The functional mechanism of gasdermin proteins is not clearly understood yet. However, recent evidence demonstrated that some gasdermin-N domains—including GSDMB—play a role in the induction of pyroptosis[39,40], an inflammatory form of cell death that is crucial for the immune response. In line of these observations, our results also suggest a role of defective pyroptosis in SSc. Enrichment analyses of SSc loci in epigenetic marks of active gene regulation showed a strong immune signature. We identified relevant cell types and tissues for disease pathogenesis. Noteworthy, primary NK cells represented one of the highest enrichment signals across almost the entire panel. Our results are consistent with previous reports linking NK cells to SSc[41]. In a very recent publication, Benyamine et al.[42] reported a particular expression profile of NK cells in SSc and showed that this cell type induced endothelial activation[42]. These findings may provide a link between vascular damage and the immune imbalance in SSc. The inclusion of a wide panel of tissue and cell types captured cell type-specific patterns of enrichment. For example, there were some cell types that showed enrichment for a single histone mark. It is important to note that these results add valuable information to design future functional studies on the basis of accurately and well-chosen cell types or tissues, thus increasing the rate of success of the experiment. Finally, it has been demonstrated that human genetic evidence positively impacts the success rate in clinical development[5]. The drug target enrichment found in the identified SSc target genes supports that our results might also be informative in drug repurposing. As an example, the present study supports the possibility to consider ustekinumab for SSc treatment, which is a drug currently approved for related diseases, such as psoriasis, active psoriatic arthritis, and Crohn’s disease.

Methods

Study cohorts and GWAS quality control

This study included 14 independent epidemiological cohorts comprising a total of 28,179 unrelated and genome-wide genotyped individuals (9846 SSc) patients and 18,333 healthy controls), after genotyping quality control (QC) steps. In brief, nine new SSc GWAS cohorts and five previously published SSc GWAS cohorts[7,8] of European ancestry were included (Spain 1, Germany 1, The Netherlands 1, USA 1, France, Spain 2, Germany 2, The Netherlands 2, USA 2, Italy, UK, Sweden, Norway and Australia/UK) (Supplementary Table 1). SSc patients fulfilled the 1980 American College of Rheumatology classification criteria for this disease or the criteria proposed by LeRoy and Medsger for early-SSc[43,44]. In addition, patients were classified as having lcSSc or dcSSc, as described in LeRoy et al.[45] Patients were also subdivided by autoantibody status according to the presence of ACA, ATA, or ARA autoantibodies. The main clinical features are shown in Supplementary Table 4. This study complied with all relevant ethical regulations. CSIC’s Ethics Committee approved the study protocol, and written informed consent was obtained in accordance with the tenets of the Declaration of Helsinki. Genome-wide genotyping was undertaken using the arrays specified in detail in Supplementary Table 1. Stringent QC measures were applied to all GWAS data sets as follows: SNPs with call rates < 0.98; minor allele frequencies (MAFs) < 0.01; and those that deviated from Hardy-Weinberg equilibrium (HWE; p < 0.001 in both case and control subjects) were filtered out from further analysis; samples with call rates < 0.95 were removed. The presence of relatives and/or duplicates was assessed by computing identity-by-descent (IBD) estimation using PLINK[46]. An individual from each pair of relatives (Pi_Hat > 0.45) or duplicates (Pi_Hat > 0.99) was removed. Additionally, duplicate/relatedness testing was also performed between different GWAS data sets with the same country origin.

PC analysis and identification of outliers

To identify ancestry outliers, ~100,000 quality-filtered independent SNPs were selected from each case-control GWAS cohort. PC analysis was performed using PLINK and GCTA64 and R-base software under GNU Public license v.2. The first ten PCs per individual were calculated and plotted. Samples showing > 4 standard deviations from the cluster centroids of each cohort were considered outliers and removed from further analyses. The total number of individuals that remained in the final filtered data sets after this procedure was 26,679 (9095 SSc patients and 17,584 healthy controls).

Imputation

QC-filtered GWAS data sets were subjected to whole-genome genotype imputation using IMPUTE2[47] and the 1000 Genome Project Phase III (1KGPh3) data as reference panel[48]. GTOOL was used to convert data sets into the file format used by IMPUTE2. SNPs that were duplicated, multi-allelic, or A/T-C/G with MAF > 0.4 were excluded. Imputation was done separately for each independent study. A probability threshold of 0.9 was set for merging genotypes using GTOOL. Imputed data sets were also QC-filtered by removing SNPs with call rates < 0.98, with MAFs < 0.01 and those that deviated from HWE (p < 0.001). In addition, singleton SNPs (which are not informative for phasing) and those that showed genotypic inconsistency between cases and controls were also excluded from analysis using an in house Perl script.

Genome-wide association analysis

Genome-wide association analyses were performed in PLINK[46] using a logistic regression model of additive effects, including sex and the five first PCs as covariates in each of the 14 independent European cohorts. Genomic inflation factor (λ) was calculated by cohort and rescaled for an equivalent study of 1000 cases and 1000 controls when necessary (λ1000). Quantile–quantile (Q–Q) plots were generated and plotted with an in house R script to compare genome-wide distribution of the test statistic with the expected null distribution (Supplementary Fig. 1). We conducted a fixed effects inverse variance meta-analysis in PLINK[46] to combine the ORs obtained in each independent GWAS study. Heterogeneity values (I2 and Q) were calculated with PLINK[46] to evaluate possible OR heterogeneity across the 14 individuals cohorts. Novel signals of associations were defined as the genome-wide significant associations (p value ≤ 5 × 10−8) that did not overlap with previously SSc reported signals at the genome-wide significance level of association.

Stratified analysis in clinical and serological SSc subtypes

Stratification of patients according to SSc subtype (lcSSc, dcSSc) or autoantibody status (ACA, ATA, and ARA) was performed to conduct stratified genome-wide association analyses using the same procedure as for global analysis. All sub-analyses included the 14 independent cohorts, with the exception of the GWAS analysis for ARA-positive patients, which included Spain 2, USA 2, Italy, and UK cohorts according to data availability.

Permutation analysis for subphenotype hits

A number of loci exhibited stronger genetic signals in stratified analysis (lcSSc, dcSSc, ACA) as compared with SSc as a whole despite the loss of statistical power caused by smaller numbers of the subphenotypes. To investigate whether these outcomes could have occurred by chance, we randomly shuffled 10,000 times a number of cases from each cohort (while keeping controls constant) and reran association testing and subsequent meta-analysis on the reshuffled data sets. The p values were converted to z scores to generate a null distribution of this test statistic. In detail, for each subtype, the number of cases randomly selected was determined by the prevalence of the subtype observed in the present study: we selected 62.52% of cases for lcSSc, 27.75% for dcSSc, and 36.77% for ACA +. The empirical p value (p*) was calculated as the number of permuted z scores that were at least as extreme as the actual z score + 1 divided by the number of permutations + 1[31].

Stepwise conditional analysis in SSc-associated loci

The presence of independent signals in the genomic regions with significant signals in the meta-analysis was assessed by joint conditional analysis by GCTA[49]. This method uses summary-level statistics from meta-analysis and applies LD correction between SNPs estimated from a reference sample set. Conditional analysis of each associated locus was performed within a standard region of 1.5 Mb-window centered on the most associated SNP (index or lead SNP), with the exception of DNASE1L3 region, where we explored the locus to 2 Mb owing to the extent of the haplotype block. LD patterns were estimated using genotype data from the 14 individual cohorts as reference. Conditional association analysis was performed including the lead SNP as covariate. Any SNP showing a conditional association p value < 5 × 10−6 was considered as independent signal and was further included in a new round of conditional analysis. This process was repeated until no SNP with p value < 5 × 10−6 remained in any of the genomic regions explored. The observed independent signals were confirmed using PLINK[46] by dependence analysis at cohort level scans through stepwise logistic regression with adjustment for the most associated signals in each locus, followed by inverse variance weighted meta-analysis under a fixed effects model. After the assessment of independent signals in significant loci from the meta-analysis, statistical fine-mapping was carried out using PAINTOR (Probabilistic Annotation INTegratOR) v3.0[50,51] searching for one causal SNP per independent associated region. PAINTOR performs probabilistic inference and computes posterior probabilities (PP) for SNPs to be casual considering the strength of association (Z score) and the LD pattern across genomic regions. The association strength was quantified using Wald statistic (“Equation (1)”) from ref. [50], and the LD information was provided by a LD matrix containing pairwise Pearson correlations coefficients between each SNP. In addition, PAINTOR leverages functional annotation data as a prior probability to improve SNP prioritization. Finally, the method uses Bayes theorem to obtain PP for SNPs to be casual, which in turn were used to generate 95% credible sets (the smallest list of variants that jointly have a probability of including the causal variant ≥ 95%). Associated regions that contained more than one independent signal were split to obtain regions containing only one independent signal by integrating local LD information as well as the recombination rates using the online-tool LDlink (https://ldlink.nci.nih.gov/)[52]. The selection of functional annotations for PAINTOR fine-mapping was carried out by stratified information enrichment calculations using GARFIELD[25,26] (http://www.ebi.ac.uk/birney-srv/GARFIELD/) (method explained in more detail in ‘Enrichment analysis of SSc risk loci in epigenetic marks and cell types’) with the annotation panel distributed in GARFIELD package. The purpose was to systematically select annotations relevant to SSc on the basis of functional enrichment analysis. GARFIELD tests its robustness by calculating functional enrichment for at least four significance cutoffs (p value < 1e −5/−6/−7/−8) applied to the variants. GARFIELD analysis was carried out in our genome-wide SNP panel by setting default parameters and omitting SNPs from chromosome 6 between Mb25 and Mb34. We determined a set of nine annotations to be used for fine-mapping that showed: A significant enrichment (FDR < 0.05) of GWAS SNPs for at least two out of the four significance cutoffs analyzed (p value < 1e −5/−6/−7/−8); and b) A low inter-annotation correlation as suggested by PAINTOR (median inter-annotation Pearson correlation  < 0.35) (Supplementary Data 12). Functional characterization of the SNPs included in credible sets was performed by assessing SNP functional categories by means of wANNOVAR using default parameters[53]. Then we explored overlap with eQTLs, epigenetic histone marks of active promoters and active enhancers (H3K9ac, H3K4me1, and H3K27ac), and the presence of exonic nonsynonymous variants in high or moderate LD (r2 ≥ 0.8, r2 ≥ 0.6, respectively) using HaploReg v4.1[54]. For eQTL interrogation, we used blood eQTL from Westra et al.[55], the Geuvadis data set[56]—which contains expression data from lymphoblastoid cell lines—and the Genotype–Tissue Expression (GTEx) project[57]—which provides RNA sequencing-based eQTL for a wide range of human tissues. Overlap of SNPs with chromatin marks was interrogated in selected cell lines from the Roadmap Epigenomics Project[15] (Supplementary Table 1). Cell lines were selected according to the results of the functional enrichment analysis from ‘Enrichment analysis of SSc risk loci in epigenetic marks and cell types’ for H3K9ac, H3K4me1, and H3K27ac histone marks. Finally, we assessed pleiotropic effect of our signals by determining whether the SNPs included in the credible sets had been reported to be associated with other ADs. For this, we interrogated the new NHGRI-EBI GWAS Catalog (https://www.ebi.ac.uk/gwas/)[58] through the web tool FUMA GWAS (http://fuma.ctglab.nl/)[59].

H3K27ac HiChIP analysis in human CD4+ T cells

Experimentally derived high-resolution maps of enhancer-promoter interactions generated by H3K27ac HiChIP experiments[20] were explored to identify target genes of SSc-associated variants. HiChIP was developed by Mumbach et al.[18] for the analysis of protein-directed chromosome conformation in a very efficient and sensitive way. The H3K27ac HiChIP experiments were performed by Mumbach et al. in human CD4 + T cells from healthy donors: Primary human naïve T cells (CD4+CD45RA+CD25−CD127hi), regulatory T (Treg) cells (CD4+CD25+CD127lo) and T helper 17 (Th17) cells (CD4+CD45RA−CD25−CD127hiCCR6+CXCR5−)[20]. Virtual 4 C plots were generated from dumped matrices generated with Juicebox. The Juicebox tools dump command was used to extract the chromosome of interest from the.hic file[60,61]. The interaction profile of a specific 5 kb or 10 kb bin containing the anchor was then plotted in R. Replicate reproducibility was visualized with the mean profile shown as a line and the shading surrounding the mean representing the standard deviation between replicates. We explored chromatin interactions of the most likely causal variants by setting as anchor points the SNPs with maximum PPs in each of the independent associated loci. To identify the connectivity of candidate SNPs to target genes, we called interactions by manual inspection of individual SNP virtual 4 C interaction files and subset these interactions to those containing a transcription start site and SNP[18,20]. Fit-Hi-C algorithm was used to identify statistically significant (q value ≤ 1e-60) distance-matched enrichment of interaction over background[18,62]. The functional relevance of the H3K27ac HiChIP findings was further validated by evaluating whether the explored SNPs were eQTLs for the HiChIP nominated target genes.

Promoter capture Hi-C analysis

Chromatin interaction maps obtained by promoter capture Hi-C experiments in a wide spectrum of immune cell types[21,22] were assessed using the web-based tool Capture Hi-C Plotter (CHiCP) (https://www.chicp.org/)[63]. The SNPs with maximum PPs in each of the independent associated loci were used as anchor points to explore physical interactions between restriction fragments containing the variants and gene promoters.

Enrichment of SSc loci in epigenetic marks and cell types

To assess whether our SSc GWAS SNPs were not randomly distributed among functional or regulatory elements in the genome, we performed functional enrichment analysis of non-MHC SNPs using GARFIELD v2.0[25,26]. This method estimates enrichment of overlap on functional information computing ORs at various GWAS p value cutoffs, and tests the significance of the enrichment under a generalized linear model. GARFIELD accounts for major sources of confounding factors by incorporating high-LD proxies (r2 > 0.8), MAF, and transcription start site distance as categorical covariates in the logistic regression model. Enrichment was tested on independent SNPs after pruning of GWAS SNPs (r2  > 0.1). We omitted SNPs of chromosome 6 between Mb25 and Mb34 to avoid bias. GARFIELD provides an annotation panel that includes 1005 annotations (genetic annotations, chromatin states, histone modifications, DNase I hypersensitive sites and transcription factor binding sites in different cell lines) from ENCODE, GENCODE and Roadmap Epigenomics projects[15,24,64]. Moreover, GARFIELD can be run using a custom annotation panel. The second option was selected for our enrichment analysis using annotations for 127 reference epigenomes (Supplementary Data 13) and 9 epigenetic marks (H2BK20ac, H3K27ac, H3K4me1, H3K4me2, H3K4me3, H3K9ac, H3K4ac, H3K79me1, H2BK15ac) obtained from the Roadmap Epigenomics Consortium and the Encyclopedia of DNA Elements (ENCODE) projects[15,24]. Annotations used were Imputed Narrow Peaks as generated by the software Chrom-Impute[65] and obtained from https://egg2.wustl.edu/roadmap/data/byFileType/peaks/consolidatedImputed/narrowPeak/. The estimated ORs were computed at various GWAS p value cutoffs (5 × 10−6, 5 × 10−7, 5 × 10−8) and the R code Garfield-Meff-Padj.R provided by GARFIELD was used to calculate an enrichment p value threshold adjusted for multiple testing (P value = 1.25 × 10−4) on the effective number of annotations (Meff = 400.4454).

Drug-target gene enrichment analysis

Target genes nominated in the present study were used to query the Open Target Platform[33] in order to assess whether any of the genes encode proteins that are drug targets in any phase of clinical trial (phase I–IV). Enrichment of overlap between SSc target genes with pharmacological active targets for the diseases was calculated by Fisher’s exact test.
  31 in total

Review 1.  A Summary on the Genetics of Systemic Lupus Erythematosus, Rheumatoid Arthritis, Systemic Sclerosis, and Sjögren's Syndrome.

Authors:  Lourdes Ortíz-Fernández; Javier Martín; Marta E Alarcón-Riquelme
Journal:  Clin Rev Allergy Immunol       Date:  2022-06-24       Impact factor: 8.667

2.  Dysregulation of Type 1 Interferon Signaling in Systemic Sclerosis: a Promising Therapeutic Target?

Authors:  Minghua Wu; Shervin Assassi
Journal:  Curr Treatm Opt Rheumatol       Date:  2021-10-19

Review 3.  Construction and Application of Polygenic Risk Scores in Autoimmune Diseases.

Authors:  Chachrit Khunsriraksakul; Havell Markus; Nancy J Olsen; Laura Carrel; Bibo Jiang; Dajiang J Liu
Journal:  Front Immunol       Date:  2022-06-27       Impact factor: 8.786

Review 4.  The Pathophysiological Roles of Regulatory T Cells in the Early Phase of Systemic Sclerosis.

Authors:  Satomi Kobayashi; Yasuo Nagafuchi; Hirofumi Shoda; Keishi Fujio
Journal:  Front Immunol       Date:  2022-05-24       Impact factor: 8.786

5.  Altered Balance of Pro-Inflammatory Immune Cells to T Regulatory Cells Differentiates Symptomatic From Asymptomatic Individuals With Anti-Nuclear Antibodies.

Authors:  Rashi Gupta; Emma Vanlieshout; Kieran Manion; Dennisse Bonilla; Michael Kim; Carolina Muñoz-Grajales; Carol Nassar; Sindhu R Johnson; Linda T Hiraki; Zareen Ahmad; Zahi Touma; Arthur Bookman; Joan E Wither
Journal:  Front Immunol       Date:  2022-06-30       Impact factor: 8.786

6.  Interpreting type 1 diabetes risk with genetics and single-cell epigenomics.

Authors:  Joshua Chiou; Ryan J Geusz; Mei-Lin Okino; Jee Yun Han; Michael Miller; Rebecca Melton; Elisha Beebe; Paola Benaglio; Serina Huang; Katha Korgaonkar; Sandra Heller; Alexander Kleger; Sebastian Preissl; David U Gorkin; Maike Sander; Kyle J Gaulton
Journal:  Nature       Date:  2021-05-19       Impact factor: 49.962

7.  Shared heritability of human face and brain shape.

Authors:  Sahin Naqvi; Yoeri Sleyp; Hanne Hoskens; Karlijne Indencleef; Jeffrey P Spence; Rose Bruffaerts; Ahmed Radwan; Ryan J Eller; Stephen Richmond; Mark D Shriver; John R Shaffer; Seth M Weinberg; Susan Walsh; James Thompson; Jonathan K Pritchard; Stefan Sunaert; Hilde Peeters; Joanna Wysocka; Peter Claes
Journal:  Nat Genet       Date:  2021-04-05       Impact factor: 38.330

Review 8.  Artificial intelligence to deep learning: machine intelligence approach for drug discovery.

Authors:  Rohan Gupta; Devesh Srivastava; Mehar Sahu; Swati Tiwari; Rashmi K Ambasta; Pravir Kumar
Journal:  Mol Divers       Date:  2021-04-12       Impact factor: 3.364

Review 9.  Updates on genetics in systemic sclerosis.

Authors:  Yuko Ota; Masataka Kuwana
Journal:  Inflamm Regen       Date:  2021-06-15

10.  Association of GTF2I, NFKB1, and TYK2 Regional Polymorphisms With Systemic Sclerosis in a Chinese Han Population.

Authors:  Chenxi Liu; Songxin Yan; Haizhen Chen; Ziyan Wu; Liubing Li; Linlin Cheng; Haolong Li; Yongzhe Li
Journal:  Front Immunol       Date:  2021-06-23       Impact factor: 7.561

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.