Literature DB >> 20018053

A genome-wide association scan for rheumatoid arthritis data by Hotelling's T2 tests.

Lianfu Chen1, Ming Zhong, Wei Vivien Chen, Christopher I Amos, Ruzong Fan.   

Abstract

We performed a genome-wide association scan on the North American Rheumatoid Arthritis Consortium (NARAC) data using Hotelling's T2 tests, i.e., TH based on allele coding and TG based on genotype coding. The objective was to identify associations between single-nucleotide polymorphisms (SNPs) or markers and rheumatoid arthritis. In specific candidate gene regions, we evaluated the performance of Hotelling's T2 tests. Then Hotelling's T2 tests were used as a tool to identify new regions that contain SNPs showing strong associations with disease. As expected, the strongest association evidence was found in the region of the HLA-DRB1 locus on chromosome 6. In the region of the TRAF1-C5 genes, we identified two SNPs, rs2900180 and rs3761847, with the largest and the second largest TH and TG scores among all SNPs on chromosome 9. We also identified one SNP, rs2476601, in the region of the PTPN22 gene that had the largest TH score and the second largest TG score among all SNPs on chromosome 1. In addition, SNPs with the largest TH score on each chromosome were identified. These SNPs may be located in the regions of genes that have modest effects on rheumatoid arthritis. These regions deserve further investigation.

Entities:  

Year:  2009        PMID: 20018053      PMCID: PMC2795960          DOI: 10.1186/1753-6561-3-s7-s6

Source DB:  PubMed          Journal:  BMC Proc        ISSN: 1753-6561


Background

Rheumatoid arthritis (RA) is the most common inflammatory joint disease and has an autoimmune etiology. The exact cause of RA is still unknown, but it is well known that RA has a strong genetic component [1]. The HLA-DRB1 locus has been clearly demonstrated to be associated with RA [2-4]. Other candidate genes, such as PTPN22 and TRAF1-C5, which confer a modest level of risk of RA, have also been identified recently [5,6]. We conducted a genome-wide association analysis on the data of the North American Rheumatoid Arthritis Consortium (NARAC). The objective of this analysis was to identify associations between single-nucleotide polymorphisms (SNPs) or markers and RA. In specific candidate gene regions, we evaluated the performance of Hotelling's T2 tests on known associations. Then, we used the Hotelling's T2 tests to identify additional SNPs that showed strong association with RA. These SNPs are located in regions that are very likely related to the disease and deserve further investigation.

Methods

We used the Hotelling's T2 test developed by Fan and Knapp [7] and Xiong et al. [8] to analyze the NARAC data. Consider a case-control design with N cases from an affected population and M controls from an unaffected population. When analyzing SNPs, we study bi-allelic markers with two alleles, which we denoted by 1 and 2 that can form three genotypes 1/1, 1/2 and 2/2. Then a coding vector can be defined for each case/control by either i) genotype coding or ii) allele coding. Let Xand Ydenote the coding vector for the ith case and the jth control, respectively. In our study, X= (1,0)τ for genotype 1/1, X= (1,0)τ for genotype 1/2, and X= (0,0)τ for genotype 2/2 were used in the genotype coding, whereas the allele coding simply counts the number of allele 1 of a genotype. If multiple markers are available, the coding vectors of each case/control can be combined together. For instance, the allele coding vector of a case/control of n SNPs is an n-dimensional vector; and the genotype coding vector of a case/control of n SNPs is 2n-dimensional. For multi-allelic markers, the coding method is described by Fan and Knapp [7]. Let us define a pooled-sample variance covariance matrix by where and are the mean vectors of cases and controls, respectively. The Hotelling's T2 test statistic [9] is defined as In the following, we will denote the Hotelling's T2 for allele coding as Tand the Hotelling's T2 for genotype coding as T. Assume the sample sizes N and M are large enough so that the large sample theory applies. Under the null hypothesis of no association, the statistic T(or T) is asymptotically distributed as a central chi-square χ2 statistic with n (or 2n) degree(s) of freedom if n SNPs are used in the analysis. Under the alternative hypothesis of association, T(or T) is asymptotically distributed as a non-central chi-square χ2 statistic [7,8,10]. Based on the Hotelling's T2 test statistics, we have developed a SAS Macro (hotel_cc.sas) to implement the method, which is available online [11].

Results

First, we applied the Hotelling's test statistics and performed a genome-wide scan on the NARAC data by analyzing one SNP at a time. The NARAC data contained a total of 2062 individuals (868 cases and 1194 controls). Our analysis used data from 22 autosomes. The RA data of Genetic Analysis Workshop (GAW) 16 included 545,080 SNP-genotype fields from an Illumina 550 k chip (22 autosomes, sex chromosomes, and mitochondria). We dropped all SNPs with low call rates (less than 95%) or not in Hardy-Weinberg equilibrium in the controls (p-value < 10-5) and dropped all SNPs which are not on the autosomes. After this filtering, 490,613 SNPs on 22 autosomes were used in our analysis. The strongest signal was found in the region of the HLA-DRB1 gene on chromosome 6 at location 32,654,524-32,686,031 bp. In Figure 1, Graphs I and II show the Hotelling's test scores for chromosome 6. Both Tand Tscores reached the highest value around the location of 32.5 Mb in the region of HLA-DRB1. Graphs III and IV showed the results in the region of HLA-DRB1 gene (the legend indicates location of the HLA-DRB1 gene). Most of the test scores in the region were very significant.
Figure 1

Hotelling's test scores for chromosomes 6 and 9 data.

Hotelling's test scores for chromosomes 6 and 9 data. We present the six SNPs on chromosome 6 with the highest test scores in the left-hand part of Table 1. The most significant result was found at SNP rs2395175 (p-value = 9.25 × 10-144). These SNPs are all located around the HLA-DRB1 gene. It is interesting that both Tand Treached the highest scores at the same four SNPs (rs2395175, rs660895, rs6910071, and rs2395163). Interstingly, Treached the 5th highest score at SNP rs3763309 and the 6th highest at SNP rs3763312; conversely, Treached the 5th highest score at SNP rs3763312 and the 6th highest at SNP rs3763309. Actually, the order of two SNPs for Tand Tthat reached the 7th and 8th highest scores switches too; in addition, Tand Treached the 9th to 13th highest scores at the same SNPs (data not shown). Thus, the region of the HLA-DRB1 gene contains multiple SNPs that are highly associated with RA. In addition, the p-values of the test Twere generally smaller than those of T, i.e., the genotype coding test Tleads to more significant results than the allele coding test T. This observation is consistent with the evidence for non-additivity of DRB1 effects [12].
Table 1

Six SNPs on chromosome 6 and 9 with the highest test scores

Highest Test Scores of Chromosome 6Highest Test Scores of Chromosome 9


SNPPosition (bp)Test scorep-ValueSNPPosition (bp)Test scorep-Value
rs239517532513004TH = 651.769.25 × 10-144rs2900180120785936TH = 34.214.95 × 10-9
rs66089532685358TH= 635.762.79 × 10-140rs3761847120769793TH = 32.171.41 × 10-8
rs691007132390832TH = 527.231.13 × 10-116rs881375120732452TH = 31.641.86 × 10-8
rs239516332495787TH = 509.448.4 × 10-113rs1953126120720054TH = 31.192.34 × 10-8
rs376330932483951TH = 474.633.15 × 10-105rs10760130120781544TH = 30.14.10 × 10-8
rs376331232484326TH= 472.101.12 × 10-104rs10985073120723409TH = 28.161.12 × 10-7

rs239517532513004TG = 694.141.87 × 10-151rs2900180120785936TG = 34.513.21 × 10-8
rs66089532685358TG = 653.091.53 × 10-142rs3761847120769793TG = 32.827.48 × 10-8
rs691007132390832TG = 548.258.89 × 10-120rs881375120732452TG = 31.881.19 × 10-7
rs239516332495787TG = 526.006.03 × 10-115rs1953126120720054TG = 31.691.32 × 10-7
rs376331232484326TG = 496.901.26 × 10-108rs10760130120781544TG = 31.271.62 × 10-7
rs376330932483951TG = 495.073.14 × 10-108rs10985073120723409TG = 29.184.62 × 10-7
Six SNPs on chromosome 6 and 9 with the highest test scores It is well known that the HLA-DRB1 alleles are associated with RA [1,2]. We performed an analysis in which HLA-DRB1 alleles *0101, *0102, *0401, *0404, *0405, *0408, *1001, which are components of the shared epitope were treated as risk alleles, and the other alleles were collapsed as one. Here we used the multi-allelic version of the Hotelling's T2 tests [7]. The test score for allele coding was T= 650.81 with 7 degrees of freedom (p-value = 2.76 × 10-136), and test score for genotype coding was T= 694.82 with 35 degrees of freedom (p-value = 1.36 × 10-123). The results were consistent with those using individual SNPs above. On the basis of individual SNP analysis, we performed a forward analysis of multiple SNPs. Using the most significant SNP rs2395175 as baseline, we added one SNP a time for an analysis of two SNPs. We identified that each of three SNPs, rs660895, rs6910071, and rs3763312, contributed significant association in addition to the contribution of the base SNP rs2395175 (p-value < 0.01). Moreover, the most significant result was from the two SNPs rs2395175 and rs660895. Then, we added one SNP at a time to the two most significant SNPs; we found each of the two SNPs, rs6910071 and rs3763312, contributed significant association (p-value < 0.01). Finally, four SNPs together were found to be significantly associated with RA (rs2395175, rs660895, rs6910071 and rs3763312; p-value < 0.01). Graphs V-VIII of Figure 1 showed the results of chromosome 9 (the legend indicates location of theTRAF1-C5 genes). In Plenge et al. [6], SNP rs3761847 at position 120,769,793 bp and SNP rs2900180 at position 120,785,936 bp were found to be significantly associated with RA in the region of the TRAF1-C5 genes. We found consistent results since T= 34.21 of SNP rs2900180 was the largest (p-value = 4.95 × 10-9), and T= 32.17 of SNP rs3761847 was the second largest among all SNPs on chromosome 9 (p-value = 1.41 × 10-8). Other SNPs on chromosome 9 that showed highest scores were also reported on the right-hand side of Table 1. Interestingly, the SNPs identified via Twere the same as the ones identified via T(the right-hand side of Table 1). As with chromosome 6 in the region HLA-DRB1, we performed a forward analysis of multiple SNPs. Using rs2900180 as baseline, we found no other SNP that contributed significant association (p-value > 0.05). Thus, all association is from SNP rs2900180 in the region of the TRAF1-C5 genes. In the region of the PTPN22 gene on chromosome 1, we identified one SNP (rs2476601) that was reported to be associated with RA by Begovich et al. [5]. The SNP is located at position 114,089,610 bp on the left-hand side of the PTPN22 gene. The T= 48.88 of rs2476601 was the largest Tscore among all SNPs on chromosome 1 (p-value = 2.72 × 10-12), and the T= 49.99 of rs2476601 was the second largest (p-value = 1.4 × 10-11, data not shown). In this region, only SNP rs2476601 stood out; other SNPs of top 20 test scores are not located in the region. Hence, we did not analyze multiple SNPs. From the results in the candidate regions on chromosomes 6, 9, and 1, we noticed that the highest test scores of Tand Twere from SNPs located very close to the candidate genes HLA-DRB1, TRAF1-C5, and PTPN22, respectively. Therefore, the SNPs with high test scores are of interest for further investigation to identify genes that have modest effect on RA. In Table 2, we presented the SNPs that showed the highest Tscores among all SNPs of each chromosome. We chose to present the results based on the test statistic T, since it is more robust than Tin terms of more stable type I error rates [7]. To make a comparison, we presented the most significant results from PLINK in Table 2. The SNPs identified by statistic Tare the same as those identified by PLINK, except rank switches on chromosomes 11 and 16. It is possible that other SNPs that have high test scores are worthy of further study. Due to the limited length of this article, we could not present detailed genome-wide test data here but we will provide detailed information on request.
Table 2

SNPs and positions of the highest Tscores on each chromosome

ChrSNPPositionTH (p-value)χ2 (p-value)
1rs247660111408961048.88 (2.72 × 10-12)50.62 (1.12 × 10-12)
2rs643330917234365823.37 (1.34 × 10-6)22.88 (1.72 × 10-6)
3rs929045217404523625.01 (5.70 × 10-7)25.04 (5.62 × 10-7)
4rs138802115775704829.12 (6.81 × 10-8)28.97 (7.35 × 10-8)
5rs659614713307567435.82 (2.16 × 10-9)36.21 (1.77 × 10-9)
6rs239517532513004651.76 (9.25 × 10-144)534.10 (3.62 × 10-118)
7rs697882014662980223.69 (1.13 × 10-6)23.54 (1.22 × 10-6)
8rs97851332040289829.44 (5.78 × 10-8)30.20 (3.90 × 10-8)
9rs290018012078593634.21 (4.94 × 10-9)33.76 (6.23 × 10-9)
10rs26716924976782530.61 (3.16 × 10-8)30.94 (2.66 × 10-8)
11rs16935797807787620.72 (5.32 × 10-6)20.80 (5.10 × 10-6)
11rs376813822163319.1 (1.24 × 10-5)21.33 (3.87 × 10-6)
12rs10222325323133223.01 (1.61 × 10-6)22.81 (1.79 × 10-6)
13rs11776374159082320.28 (6.71 × 10-6)19.96 (7.91 × 10-6)
14rs128851669219503525.38 (4.72 × 10-7)25.53 (4.36 × 10-7)
15rs129138322603921322.11 (2.58 × 10-6)23.27 (1.41 × 10-6)
16rs1875206996650823.92 (1.01 × 10-6)23.81 (1.06 × 10-6)
16rs25216692679585423.64 (1.16 × 10-6)24.08 (9.24 × 10-7)
17rs98960527093045723.00 (1.62 × 10-6)23.53 (1.23 × 10-6)
18rs124558942687379326.23 (3.03 × 10-7)25.61 (4.18 × 10-7)
19rs81043093685868626.89 (2.16 × 10-7)26.15 (3.16 × 10-7)
20rs11825315782639734.96 (3.37 × 10-9)33.67 (6.53 × 10-9)
21rs10417782274732322.72 (1.87 × 10-6)23.25 (1.42 × 10-6)
22rs7137564311884728.16 (1.12 × 10-7)31.04 (2.53 × 10-8)

aChr, chromosome. The values of statistic χ2 and the related p-values are from program PLINK http://pngu.mgh.harvard.edu/~purcell/plink/anal.shtml.

SNPs and positions of the highest Tscores on each chromosome aChr, chromosome. The values of statistic χ2 and the related p-values are from program PLINK http://pngu.mgh.harvard.edu/~purcell/plink/anal.shtml.

Discussion

The results of our genome-wide scan provided a large number of SNPs that have high test scores. One reason for this is the large sample size of NARAC data. For further study, one may start with the regions that contain the SNPs that have highest test scores, i.e., the regions with strongest signals. The Hotelling's T2 tests do not adjust for population substructures. Thus, some of the strong signals could be due to false positives. Further study is necessary to clarify these issues. The Hotelling's T2 test does not include a multiplicity adjustment. However, we can perform a very conservative (assuming independence of the tests) Bonferroni analysis as follows. In the RA study, we analyzed 490,163 SNPs in total across the whole human genome. Therefore, there are 490,163 T(or T) tests. For the most significant SNP (rs2900180) with the highest test scores on chromosome 9 on the right hand-side of Table 1, the p-value of T= 34.21 is 4.95 × 10-9. After adjusting for the multiple tests, the probability to get such a result by chance is 4.95 × 10-9 * 490,163 = 0.0024. Hence, the result is still very significant. For the least significant SNP (rs10985073), the p-value of T= 28.16 is 1.12 × 10-7. After adjusting for the multiple tests, the probability to get such a result by chance is 1.12 × 10-7 * 490,163 = 0.055, which is close to the 0.05 significance level. The rest of the results in Tables 1 and 2 can be analyzed similarly. We compared our results with those in literature [2-6] and found them to be consistent. In addition, we analyzed the data using PLINK and found similar results as those of Table 1 and Table 2; partial results are presented in Table 2. Hence, our results for analysis of data from candidate studies and genome-wide scans showed that the Hotelling's tests performed well. Furthermore, we could jointly use multiple SNPs in analysis as we did for data of chromosomes 6 and 9.

Conclusion

We performed a genome-wide association scan for RA data by applying Hotelling's T2 tests. In the candidate regions of the HLA-DRB1, TRAF1-C5, and PTPN22 genes, we identified SNPs that have the highest test scores across chromosomes 6, 9, and 1, respectively. Given the encouraging results in the candidate gene regions, the regions containing SNPs with high test scores are of interest for further investigation to map genes which have modest effects on RA. We provided the SNPs and their positions that had the largest scores for each chromosome. The regions of these SNPs deserve more investigation to map RA genes.

List of sbbreviations used

GAW: Genetic Analysis Workshop; NARAC: North American Rheumatoid Arthritis Consortium; RA: Rheumatoid arthritis; SNP: Single-nucleotide polymorphism

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

CIA and RF conceived the main idea of the study. LC, MZ, and WVC performed statistical analysis under the direction of CIA and RF. LC, CIA, and RF wrote the manuscript. MZ and WVC provided comments to improve the writings of the manuscript. All authors read and approved the final manuscript.
  10 in total

1.  Generalized T2 test for genome association studies.

Authors:  Momiao Xiong; Jinying Zhao; Eric Boerwinkle
Journal:  Am J Hum Genet       Date:  2002-03-29       Impact factor: 11.025

2.  Genome association studies of complex diseases by case-control designs.

Authors:  Ruzong Fan; Michael Knapp
Journal:  Am J Hum Genet       Date:  2003-03-19       Impact factor: 11.025

3.  New classification of HLA-DRB1 alleles supports the shared epitope hypothesis of rheumatoid arthritis susceptibility.

Authors:  Sophie Tezenas du Montcel; Laetitia Michou; Elisabeth Petit-Teixeira; José Osorio; Isabelle Lemaire; Sandra Lasbleiz; Céline Pierlot; Patrick Quillet; Thomas Bardin; Bernard Prum; François Cornelis; Françoise Clerget-Darpoux
Journal:  Arthritis Rheum       Date:  2005-04

4.  Genome screens using linkage disequilibrium tests: optimal marker characteristics and feasibility.

Authors:  N H Chapman; E M Wijsman
Journal:  Am J Hum Genet       Date:  1998-12       Impact factor: 11.025

Review 5.  The shared epitope hypothesis. An approach to understanding the molecular genetics of susceptibility to rheumatoid arthritis.

Authors:  P K Gregersen; J Silver; R J Winchester
Journal:  Arthritis Rheum       Date:  1987-11

6.  Refining the complex rheumatoid arthritis phenotype based on specificity of the HLA-DRB1 shared epitope for antibodies to citrullinated proteins.

Authors:  Tom W J Huizinga; Christopher I Amos; Annette H M van der Helm-van Mil; Wei Chen; Floris A van Gaalen; Damini Jawaheer; Geziena M T Schreuder; Mark Wener; Ferdinand C Breedveld; Naila Ahmad; Raymond F Lum; Rene R P de Vries; Peter K Gregersen; Rene E M Toes; Lindsey A Criswell
Journal:  Arthritis Rheum       Date:  2005-11

7.  TRAF1-C5 as a risk locus for rheumatoid arthritis--a genomewide study.

Authors:  Robert M Plenge; Mark Seielstad; Leonid Padyukov; Annette T Lee; Elaine F Remmers; Bo Ding; Anthony Liew; Houman Khalili; Alamelu Chandrasekaran; Leela R L Davies; Wentian Li; Adrian K S Tan; Carine Bonnard; Rick T H Ong; Anbupalam Thalamuthu; Sven Pettersson; Chunyu Liu; Chao Tian; Wei V Chen; John P Carulli; Evan M Beckman; David Altshuler; Lars Alfredsson; Lindsey A Criswell; Christopher I Amos; Michael F Seldin; Daniel L Kastner; Lars Klareskog; Peter K Gregersen
Journal:  N Engl J Med       Date:  2007-09-05       Impact factor: 91.245

8.  A missense single-nucleotide polymorphism in a gene encoding a protein tyrosine phosphatase (PTPN22) is associated with rheumatoid arthritis.

Authors:  Ann B Begovich; Victoria E H Carlton; Lee A Honigberg; Steven J Schrodi; Anand P Chokkalingam; Heather C Alexander; Kristin G Ardlie; Qiqing Huang; Ashley M Smith; Jill M Spoerke; Marion T Conn; Monica Chang; Sheng-Yung P Chang; Randall K Saiki; Joseph J Catanese; Diane U Leong; Veronica E Garcia; Linda B McAllister; Douglas A Jeffery; Annette T Lee; Franak Batliwalla; Elaine Remmers; Lindsey A Criswell; Michael F Seldin; Daniel L Kastner; Christopher I Amos; John J Sninsky; Peter K Gregersen
Journal:  Am J Hum Genet       Date:  2004-06-18       Impact factor: 11.025

9.  The shared epitope hypothesis in rheumatoid arthritis: evaluation of alternative classification criteria in a large UK Caucasian cohort.

Authors:  Ann W Morgan; Lubna Haroon-Rashid; Stephen G Martin; Hock-Chye Gooi; Jane Worthington; Wendy Thomson; Jennifer H Barrett; Paul Emery
Journal:  Arthritis Rheum       Date:  2008-05

Review 10.  A review of the MHC genetics of rheumatoid arthritis.

Authors:  J L Newton; S M J Harney; B P Wordsworth; M A Brown
Journal:  Genes Immun       Date:  2004-05       Impact factor: 2.676

  10 in total
  4 in total

1.  Genome-wide association studies for discrete traits.

Authors:  Duncan C Thomas
Journal:  Genet Epidemiol       Date:  2009       Impact factor: 2.135

2.  Robust joint analysis with data fusion in two-stage quantitative trait genome-wide association studies.

Authors:  Dong-Dong Pan; Wen-Jun Xiong; Ji-Yuan Zhou; Ying Pan; Guo-Li Zhou; Wing-Kam Fung
Journal:  Comput Math Methods Med       Date:  2013-08-12       Impact factor: 2.238

3.  Power Calculation of Multi-step Combined Principal Components with Applications to Genetic Association Studies.

Authors:  Zhengbang Li; Wei Zhang; Dongdong Pan; Qizhai Li
Journal:  Sci Rep       Date:  2016-05-18       Impact factor: 4.379

4.  Comparative study for haplotype block partitioning methods - Evidence from chromosome 6 of the North American Rheumatoid Arthritis Consortium (NARAC) dataset.

Authors:  Mohamed N Saad; Mai S Mabrouk; Ayman M Eldeib; Olfat G Shaker
Journal:  PLoS One       Date:  2018-12-31       Impact factor: 3.240

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.