Literature DB >> 20018071

Allelic based gene-gene interactions in rheumatoid arthritis.

Jeesun Jung1, Joon Jin Song, Deukwoo Kwon.   

Abstract

The detection of gene-gene interaction is an important approach to understand the etiology of rheumatoid arthritis (RA). The goal of this study is to identify gene-gene interaction of SNPs at the allelic level contributing to RA using real data sets (Problem 1) of North American Rheumatoid Arthritis Consortium (NARAC) provided by Genetic Analysis Workshop 16 (GAW16). We applied our novel method that can detect the interaction by a definition of nonrandom association of alleles that occurs when the contribution to RA of a particular allele inherited in one gene depends on a particular allele inherited at other unlinked genes. Starting with 639 single-nucleotide polymorphisms (SNPs) from 26 candidate genes, we identified ten two-way interacting genes and one case of three-way interacting genes. SNP rs2476601 on PTPN22 interacts with rs2306772 on SLC22A4, which interacts with rs881372 on TRAF1 and rs2900180 on C5, respectively. SNP rs2900180 on C5 interacts with rs2242720 on RUNX1, which interacts with rs881375 on TRAF1. Furthermore, rs2476601 on PTPN22 also interacts with three SNPs (rs2905325, rs1476482, and rs2106549) in linkage disequilibrium (LD) on IL6. The other three SNPs (rs2961280, rs2961283, and rs2905308) in LD on IL6 interact with two SNPs (rs477515 and rs2516049) on HLA-DRB1. SNPs rs660895 and rs532098 on HLA-DRB1 interact with rs2834779 and four SNPs in LD on RUNX1. Three-way interacting genes of rs10229203 on IL6, rs4816502 on RUNX1, and rs10818500 on C5 were also detected.

Entities:  

Year:  2009        PMID: 20018071      PMCID: PMC2795978          DOI: 10.1186/1753-6561-3-S7-S76

Source DB:  PubMed          Journal:  BMC Proc        ISSN: 1753-6561


Background

Rheumatoid arthritis (RA) is a complex, chronic inflammatory disease whose etiology remains unknown. It has been known that RA is a result of the complicated networks of multiple genes along with the environmental factors such as smoking. It is more common in females. Through a combined linkage and association study [1], the HLA gene cluster on 6p21 has been shown to have the most likely predisposing loci for RA. In addition to HLA, numerous genetic variants influence the pathology of RA. Unfortunately, detection of gene-gene interaction has been challenging due to an issue of high dimensionality from multi-locus combinations that require a large sample size. In this study, we applied a novel approach to detect gene-gene interaction influencing RA using the case-control subjects provided by the North American Rheumatoid Arthritis Consortium (NARAC). In contrast to the previously available method searching for the interaction at the genotype level, our approach focuses on the detection of interaction of at the allelic level with a novel definition: the allele-based gene-gene interaction occurs when a particular allele in one gene and a particular allele at another unlinked genes are dependent on the contribution to RA (Figure 1) [1]. Based on the 639 SNPs from 26 candidate genes related to RA pathology, we performed a score test based on logistic regression and a F-test based on the Cochran-Armitage regression model developed for the detection of allelic based gene-gene interaction [2,3].
Figure 1

Cell combinations of two unlinked markers; .

Cell combinations of two unlinked markers; .

Methods

Characteristics of data

As a regular quality control procedure, population stratification analysis using all 531,688 single-nucleotide polymorphisms (SNPs) was performed by EIGENSTRAT as we included the subjects of the four populations from HapMap database (Yoruba, CEPH, Japanese, and Han Chinese). The result showed that the case and control subjects are confirmed as European Americans. Additionally, we tested sex inconsistency between X chromosome and the clinical report and removed seven subjects whose data were discrepant. After the removal, 866 cases and 1189 controls were used for the further analysis. The 26 candidate genes were selected based on the following reasons: 1) previous reported results (HLA-DRB1, PTPN22, and TNF) [4,5]; 2) genes related to macrophage migration inhibitory factor and linked to the production of inflammatory cytokines (MIF, IL6, IL1B, IL3, IL4, and IL13) [8,9]; 3) genes playing an immunologically important role in down-regulating the immune response (CTLA4, RUNX1, STAT4, and SLC22A4) [8]; 4) genes used to test for interaction by Mei et al. and Ding et al. [6,7]; 5) genes relevant to inflammatory disease [8]. We also removed SNPs deviating from Hardy-Weinberg equilibrium (p-value < 10-5) and having a minor allele frequency of less than 0.01. The names, locations, and the number of SNPs being tested for the 26 genes are provided in Table 1.
Table 1

List of genes selected for analysis

Gene SymbolLocusNo. of SNPs
TNFRSF1B1p36.2220
PADI41p36.1316
PTPN221p13.37
FCGR3A1q23.31
IL1B2q1414
ITGAV2q32.116
STAT42q32.335
CTLA42q3316
IL35q31.13
IL135q31.13
IL45q31.14
SLC22A45q31.114
HAVCR15q33.313
NFKBIL16p21.37
HLA-DRB16p21.36
LTA6p21.334
TNF6p21.31
MAP3K7IP26q25.152
IL67p2196
TRAF19q333
C59q3327
DLG510q22.326
MS4A111q12.211
CARD1516q12.110
RUNX121q22.12216
MIF22q11.2318
Total639
List of genes selected for analysis

Statistical model

The underlying principle of our method is to identify the association of allelic combination between two unlinked markers with a disease trait so that subjects are assigned an allelic score given their observed genotype information. The score is a conditional probability of obtaining the particular allelic combination given the observed genotypes at the two loci of each subject. For example, a subject with AA (at marker M1) and Bb (at marker M2) genotype has 1/2 in the AB combination and 1/2 in the Ab combination, X= P(AB|M1 = AA, M2 = Bb) = 1/2, X= P(Ab|M1 = AA, M2 = Bb) = 1/2 Table 2 shows the allelic scores of a subject whose genotype is given [2].
Table 2

Allelic scores

Allelic score

GenotypeABAbaBab
(AA, BB)1000
(AA, Bb)1/21/200
(AA, bb)0100
(Aa, BB)1/201/20
(Aa, Bb)1/41/41/41/4
(Aa, bb)01/201/2
(aa, BB)0010
(aa, Bb)001/21/2
(aa, bb)0001
Allelic scores

Score statistic by logistic regression model

Denote y= 1 if ith subject is affected by RA and y= 0 otherwise. In the non-parametric maximum likelihood solution that allows an arbitrary covariate distribution, fitting a standard prospective logistic regression in case-control sampling design is equivalent to fitting a retrospective logistic regression except that an intercept in case-control sampling needs the information of sampling fraction of cases and controls [10,11]. Therefore, the prospective logistic regression model is used due to the equivalence in parameter estimates of interaction effect. The likelihood function of the standard logistic regression is where . X= P(AB|M1, M1) is the allelic score of A allele and B allele from M1 and M2 genotypes of ith subject and βis interaction effect of kth ∈ {AB, Ab, aB} allelic combination. The overall proportion of y is R/N, where R is the number of case subjects and N is the number of total subjects. Under the assumption of no covariates, let U= (U, U, U)be the score function, which is a derivative of the log-likelihood function with respect to β = (β, β, β) respectively. Under the null hypothesis of no interaction effect H0: β= β= β= 0 the efficient score test statistic under the null hypothesis is where and V-1 is the submatrix of I-1(α, β, β, β), which is the observed Fisher information matrix corresponding to U= (U, U, U). Detailed derivation and theoretical justification were published by Jung and Zhao [3].

Extension of Cochran-Armitage trend regression

With the same allelic scores in Table 2 at the two markers, we can model a linear trend of proportion of cases over total number of subjects at each allelic combination, p= r/n, where n= r/sfor k ∈ (AB, Ab, aB, ab), j ∈ (0,1/4,1/2,1) for two markers. rand sare the number of affected subjects and unaffected subjects having j score at k allelic combination, respectively. It has been shown that regressing pon Z, Z, Zis equivalent to regressing yon Z, Z, Z[12]. As an extension of Cochran-Armitage trend method, the interaction effect of two markers on RA trait can be modeled as Under the assumption of no covariates, the theoretical regression coefficients are functions of linkage disequilibrium (LD) between a marker and a disease locus as follows: where , are the average effect of the gene substitution at each disease loci and is the magnitude of interaction effect. The global test statistic for interaction effect over all allelic combinations under the null hypothesis H0: β= β= β= 0 is , which follows F(3, N-4) distribution with λ = 0. The analytical properties of two methods were derived by Jung and Zhao [3] and simulation studies showed that the score test and F test by Cochran-Armitage trend are asymptotically equivalent.

Simulation study of power and type I error rates and comparison of genotype-based method

A simulation study was performed to study power and type I error rates at the 1% significance level [3]. Six two-way interaction models were simulated using the simulation of software SNaP [3]. Most of the models were designed based on the combination of dominant and recessive inheritance at the genotypic level at each marker. These models are 1) dominant or recessive (Dom ∪ Rec), 2) recessive or recessive (RecRec), 3) dominant and dominant (Dom ∩ Dom), 4) dominant and recessive (Dom ∩ Rec), 5) threshold model in which the disease risk is increased when two or more high-risk alleles from either locus are present, 6) modified model in which the homozygosity at either locus confers disease risk [1]. For each model for type I error rates, we simulated 5,000 data sets. Each data set has 200 case and 200 controls under no LD between markers and disease loci. The disease risk allele frequency at each disease loci is 0.2 and the minor allele frequency of each SNP is 0.3, which is close to the real data. For power calculation, 2,000 data sets were simulated, with at each model. The rest of parameters are the same as used for type I error rate calculation. For comparison of genotype-based method, logistic regression of two-way interaction is modeled as follows: , where j, l = (A, or a)m, n = (B, or b) and . Additionally, we calculated the empirical power of multifactor dimensionality reduction (MDR) in the same simulated data sets. Table 3 illustrates that the power of the score test of the allelic based method over the six two-interaction models are higher than that of the two genotype-based methods. Type I error rates of the allele-based method are close to nominal value of 0.01. Further detailed results of simulation and analytical derivation are available in Jung and Zhao [3].
Table 3

Type I error rates and power over six two-way interaction models

Allele-based methodGenotype-based method


Type I error ratePowerPower



ModelScoreF-testScoreF-testScoreMDR
Dom ∪ Rec1.11.117.316.659.856.3
Modified0.780.8423.9523.4513.759.3
Dom ∩ Dom0.80.8446.7546.1529.324.5
Rec ∪ Rec1.221.2658.257.738.131.9
Threshold11.049291.7580.4573.45
Dom ∩ Rec0.920.9496.4596.388.9582.6
Type I error rates and power over six two-way interaction models

Non-nested model comparison using Cochran-Armitage regression method

Technically, the proposed two-way interaction and three-way interaction model are not nested models, so an artificial nesting approach was employed to select the best model as follows: Under the assumption of normal errors, an artificial nesting approach called the J test [14] was utilized as follows: Because f is linear in β, the comparison requires that one estimates δ and then fits a linear regression and test for α = 0 using the ordinary t-statistic [13,14]. On the other hand, we can compare Akaike information criterion (AIC) and Bayesian information criterion (BIC) for a two-way model with that of a three-way interaction model.

Analysis procedure

The procedure to search for the best interaction model consists of multiple steps based on the proposed methods.

Step 1

When performing a two-way interaction analysis [Model (1) and (3)] of two SNPs, each is selected from each gene and the global test for an interaction is performed. Note that two SNPs in the same gene are removed from the interaction analysis. We then compared the interaction model (3) with a main effect model (6) in order to search for the pure interacting SNPs that are not confounded with the main effects, and selected the best two-way interaction models which met three criteria: 1) the p-value less than 2.5 × 10-7 from interaction test (the total 203,841 combination; adjusted for Bonferroni correction), 2) the p-value of the test for comparison between the interaction model and the main effect model less than 0.01, and 3) the testing SNPs should have both the smallest AIC and the smallest BIC. The following models were considered:

Step 2

Based on the pair-wise SNPs selected by Step 1, we conducted three-way interaction model analysis as we added one SNP at a time from one of the remaining genes, which is the scheme of the forward selection procedure. With the same procedure of a two-way model selection and an additional comparison of the three-way model with two-way interaction model, the best three-way interaction models were selected by the same criteria described in Step 1.

Step 3

We continued these steps until no further high-dimensional interaction model was identified.

Results

Table 4 lists ten pairs of two-way interacting genes and the function of SNPs identified. Because we analyzed all SNPs in LD with a gene, there are multiple SNPs in a gene interacting with a SNP of the other gene. SNP rs2476601 on PTPN22 interacts with rs2306772 on SLC22A4, which interacts with rs881372 on TRAF1 and rs2900180 on C5, respectively. SNP rs2900180 interacts with rs2242720 on RUNX1, which interacts with rs881375 on TRAF1. SNPs rs881375 and rs2900180 are in LD (R2 = 0.89). Furthermore, rs2476601 on PTPN22 interacts with three SNPs (rs2905325, rs1476482, and rs2106549) on IL6. Three SNPs that are not in LD on IL6 interact with two SNPs (rs477515 and rs2516049) on HLA-DRB1. SNP rs660895 on the same gene interacts with rs2834779 on the 5' UTR region on RUNX1, and rs660895 and rs532098 interact with four SNPs on RUNX1. Additionally, rs4947324 on NFKBIL1 is not in LD with three SNPs on HLA-DRB1, but it interacts with them. Two SNPs (rs6586516 and rs2477142) on PADI4 interact with rs3761847 on TRAF1, which interacts with five SNPs in LD on RUNX1. Furthermore, we detected three-way interacting genes which are rs10229203 on IL6, rs4816502 on RUNX1 and rs10818500 on C5. Figure 2 summarized the pathway of ten pairs genes and one group of three interacting genes (indicated in red).
Table 4

Results of two-way (two genes) interaction and the characteristics of genes

p-valueb

Gene 1 symbol (location)Gene 2 symbol (location)SNP1a from gene 1SNP 2a from gene 2Function of SNP 1Function of SNP 2ScoreF-testMain vs. interaction
PADI4 (1p36.13)XTRAF1(9q33)rs6586516, rs2477142rs37618475' UTR5' UTR1.66 × 10-81.43 × 10-80.005
PTPN22 (1p13.3)XSLC22A4(5q31.1)rs2476601rs2306772Codingintron1.79 × 10-121.25 × 10-120.0054
PTPN2 (1p13.3)XIL6 (7p21)rs2476601rs2905325, rs1476482, rs2106549Coding5' UTR3.80 × 10-132.53 × 10-130.0003
SLC22A4 (5q31.1)XTRAF1 (9q33)rs2073838, rs2306772rs881375Intron3' UTR6.19 × 10-95.22 × 10-90.0037
SLC22A4 (5q31.1)XC5 (9q33)rs2073838, rs2306772rs2900180Intron3' UTR1.37 × 10-91.13 × 10-90.0029
NFKBIL1 (6p21.3)XHLA-DRB1(6p21.3)rs4947324crs477515, rs2516049, rs5320983' UTR5' UTR<1.0 × 10-15<1.0 × 10-150.0012
HLA-DRB1 (6p21.3)XIL6 (7p21)rs477515, rs2516049rs2961280, rs2961283, rs29053085' UTR5' UTR<1.0 × 10-15<1.0 × 10-150.0023
HLA-DRB1 (6p21.3)XRUNX1(21q22.12)rs660895rs28347795' UTR5' UTR<1.0 × 10-15<1.0 × 10-150.0028
HLA-DRB1 (6p21.3)XRUNX1 (21q22.12)rs660895, rs532098rs4817699, rs8131102, rs9984470, rs99791535' UTRIntron<1.0 × 10-15<1.0 × 10-150.0037
TRAF1 (9q33)XRUNX1 (21q22.12)rs881375drs4816502, rs2242720e3' UTRIntron/5' UTR3.03 × 10-82.62 × 10-80.0027
TRAF1 (9q33)XRUNX1 (21q22.12)rs3761847rs1981392, rs2834714, rs4816502, rs2242882, rs9322845' UTRIntron1.29 × 10-119.47 × 10-120.0001
C5 (9q33)XRUNX1 (21q22.12)rs10760130rs1981392, rs2834714, rs4816502, rs22428823' UTRIntron7.19 × 10-115.53 × 10-110.0001
C5 (9q33)XRUNX1 (21q22.12)rs2900180drs4816502, rs2242720e3' UTRIntron/5' UTR2.83 × 10-102.24 × 10-100.0046
C5 (9q33)XRUNX1 (21q22.12)rs1468673, rs10818500rs2834714, rs4816502IntronIntron1.85 × 10-81.59 × 10-80.0001

aThe SNPs listed under each gene are in LD (within R2 > 0.8).

bThe smallest p-value of each combination is reported.

crs4947324 on NFKBIL1 and three SNPs (rs477515, rs2516049, rs532098) are not in LD.

drs881375 on TRAF1 and rs2900180 on C5 are in LD with r2 = 0.89.

ers4816502 and rs2242720 on RUNX1 are not in LD, and the function of rs4817502 is intron, that of rs2242720 is 5' UTR, respectively.

Figure 2

Graphical view of the interaction of two-way and three-way interaction (red).

Results of two-way (two genes) interaction and the characteristics of genes aThe SNPs listed under each gene are in LD (within R2 > 0.8). bThe smallest p-value of each combination is reported. crs4947324 on NFKBIL1 and three SNPs (rs477515, rs2516049, rs532098) are not in LD. drs881375 on TRAF1 and rs2900180 on C5 are in LD with r2 = 0.89. ers4816502 and rs2242720 on RUNX1 are not in LD, and the function of rs4817502 is intron, that of rs2242720 is 5' UTR, respectively. Graphical view of the interaction of two-way and three-way interaction (red).

Discussion

In this study, the allele-based gene-gene interaction analyses were applied to case-control data sets of RA. Based on the analysis with 639 SNPs from 26 candidate genes that were previously detected through linkage study or fine mapping, we identified ten two-way interacting genes with multiple SNPs in LD from a gene and one three-way interaction. We have not identified any four-way interaction effects. However, the 26 candidate genes selected in this study may not represent all candidate genes for RA and we observed that Illumina 550k chip may not have a good gene-wide coverage for SNPs because no SNPs of SUMO4 and VEGFA in the platform are available. A more standard interaction model using a logistic regression consisting of two main effect terms (X = 0, 1, 2 according to the number of alleles) and a multiplicative term of the main effect (additive × additive) was applied to the same data set. There is no interacting SNPs by an even more lenient criteria (p-value<10-5). Three criteria to justify the significant interaction models were used. For the interaction models, Bonferroni correction was used for multiple testing, and for the comparison of the interaction model with a main-effect model (significance level of 0.01), the smaller AIC and BIC were utilized. The final selected interacting SNPs satisfied all of the criteria, which may be conservative and may cause false-negative error. There still remains the issue of multiple comparisons in the high-dimensional interactions and the complexity of the procedure to screen the interaction effects.

Conclusion

As shown in the results, the proposed allele-based approach allows us to identify multiple interactions that may not have been identified as risk factors for RA. PTPN22, SLC22A4, HLA-DRB1, IL6, PADI4, TRAF1, NFkBIL1, C5, and RUNX1 may play interactive roles for RA, especially PTPN22 and SLC22A4, which are related to the reaction of antigen for RA. Therefore, our method taking into account the nonrandom association of all allelic combinations may help detect novel genetic variants and interpret biological pathways.

List of abbreviations used

AIC: Akaike Information Criterion; BIC: Bayesian Information Criterion; GAW16: Genetic Analysis Workshop 16; LD: Linkage disequilibrium; MDR: Multifactor dimensionality reduction; NARAC: North American Rheumatoid Arthritis Consortium; RA: Rheumatoid arthritis; SNP: Single-nucleotide polymorphism

Competing interests

The authors declare that have no competing interests.

Authors' contributions

JJ developed statistical models, performed the analysis, and wrote the manuscript. JJS checked SAS/IML codes that JJ has written. DK contributed on the interpretation of the analysis.
  8 in total

Review 1.  Susceptibility to JRA/JIA: complementing general autoimmune and arthritis traits.

Authors:  J D Phelan; S D Thompson; D N Glass
Journal:  Genes Immun       Date:  2006-01       Impact factor: 2.676

Review 2.  Genetic basis of rheumatoid arthritis.

Authors:  Philippe Dieudé; François Cornélis
Journal:  Joint Bone Spine       Date:  2005-10-19       Impact factor: 4.929

3.  Tumor necrosis factor a microsatellite polymorphism is associated with rheumatoid arthritis severity through an interaction with the HLA-DRB1 shared epitope.

Authors:  H Mu; J J Chen; Y Jiang; M C King; G Thomson; L A Criswell
Journal:  Arthritis Rheum       Date:  1999-03

4.  Gene-gene and gene-environment interactions involving HLA-DRB1, PTPN22, and smoking in two subsets of rheumatoid arthritis.

Authors:  Henrik Kallberg; Leonid Padyukov; Robert M Plenge; Johan Ronnelid; Peter K Gregersen; Annette H M van der Helm-van Mil; Rene E M Toes; Tom W Huizinga; Lars Klareskog; Lars Alfredsson
Journal:  Am J Hum Genet       Date:  2007-04-02       Impact factor: 11.025

5.  Allelic-based gene-gene interaction associated with quantitative traits.

Authors:  Jeesun Jung; Bin Sun; Deukwoo Kwon; Daniel L Koller; Tatiana M Foroud
Journal:  Genet Epidemiol       Date:  2009-05       Impact factor: 2.135

6.  TRAF1-C5 as a risk locus for rheumatoid arthritis--a genomewide study.

Authors:  Robert M Plenge; Mark Seielstad; Leonid Padyukov; Annette T Lee; Elaine F Remmers; Bo Ding; Anthony Liew; Houman Khalili; Alamelu Chandrasekaran; Leela R L Davies; Wentian Li; Adrian K S Tan; Carine Bonnard; Rick T H Ong; Anbupalam Thalamuthu; Sven Pettersson; Chunyu Liu; Chao Tian; Wei V Chen; John P Carulli; Evan M Beckman; David Altshuler; Lars Alfredsson; Lindsey A Criswell; Christopher I Amos; Michael F Seldin; Daniel L Kastner; Lars Klareskog; Peter K Gregersen
Journal:  N Engl J Med       Date:  2007-09-05       Impact factor: 91.245

7.  Evaluating gene x gene and gene x smoking interaction in rheumatoid arthritis using candidate genes in GAW15.

Authors:  Ling Mei; Xiaohui Li; Kai Yang; Jinrui Cui; Belle Fang; Xiuqing Guo; Jerome I Rotter
Journal:  BMC Proc       Date:  2007-12-18

8.  Constructing gene association networks for rheumatoid arthritis using the backward genotype-trait association (BGTA) algorithm.

Authors:  Yuejing Ding; Lei Cong; Iuliana Ionita-Laza; Shaw-Hwa Lo; Tian Zheng
Journal:  BMC Proc       Date:  2007-12-18
  8 in total
  4 in total

1.  Common obesity-related genetic variants and papillary thyroid cancer risk.

Authors:  Cari M Kitahara; Gila Neta; Ruth M Pfeiffer; Deukwoo Kwon; Li Xu; Neal D Freedman; Amy A Hutchinson; Stephen J Chanock; Erich M Sturgis; Alice J Sigurdson; Alina V Brenner
Journal:  Cancer Epidemiol Biomarkers Prev       Date:  2012-10-11       Impact factor: 4.254

2.  The challenge of detecting epistasis (G x G interactions): Genetic Analysis Workshop 16.

Authors:  Ping An; Odity Mukherjee; Pritam Chanda; Li Yao; Corinne D Engelman; Chien-Hsun Huang; Tian Zheng; Ilija P Kovac; Marie-Pierre Dubé; Xueying Liang; Jia Li; Mariza de Andrade; Robert Culverhouse; Doerthe Malzahn; Alisa K Manning; Geraldine M Clarke; Jeesun Jung; Michael A Province
Journal:  Genet Epidemiol       Date:  2009       Impact factor: 2.135

3.  Structure of tumor necrosis factor-alpha haploblocks in European populations.

Authors:  Aimee M Merino; Kui Zhang; Richard A Kaslow; Brahim Aissani
Journal:  Immunogenetics       Date:  2013-04-12       Impact factor: 2.846

4.  Comparative study for haplotype block partitioning methods - Evidence from chromosome 6 of the North American Rheumatoid Arthritis Consortium (NARAC) dataset.

Authors:  Mohamed N Saad; Mai S Mabrouk; Ayman M Eldeib; Olfat G Shaker
Journal:  PLoS One       Date:  2018-12-31       Impact factor: 3.240

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.