Literature DB >> 20018024

On the association between rheumatoid arthritis and classical HLA class I and class II alleles predicted from single-nucleotide polymorphism data.

Mathieu Lemire1.   

Abstract

Using single-nucleotide polymorphisms (SNPs), we sought to predict classical class I and class II human leukocyte antigen (HLA) alleles, and test for their associations with rheumatoid arthritis (RA) in the North American Rheumatoid Arthritis Consortium sample of cases and controls, genotyped on the Illumina HumanHap550 BeadChip. We use publicly available databases of SNP data and HLA data to find SNPs or SNP-haplotypes to be used as surrogates for each HLA allele. To reduce the confounding effects of linkage disequilibrium with the HLA-DRB1 locus, we tested for the association conditional on the presence or absence of a shared epitope allele on the same haplotype as the target HLA allele. Using SNP surrogates, we find that components of the DQ8 serotype (DQA1*0301:DQB1*0302) are associated with RA, irrespective of the presence or absence of a shared epitope allele on their respective haplotypes. Knowledge of the haplotype structure in the HLA region is still necessary for better interpretation of the results.

Entities:  

Year:  2009        PMID: 20018024      PMCID: PMC2795931          DOI: 10.1186/1753-6561-3-s7-s33

Source DB:  PubMed          Journal:  BMC Proc        ISSN: 1753-6561


Background

The human leukocyte antigen (HLA)-DRB1 locus has long been recognized as a strong genetic risk factor for rheumatoid arthritis (RA), yet it explains less than half of the estimated genetic susceptibility to the disease [1]. Large-scale studies that interrogate the whole genome have uncovered, at strict significance thresholds, genetic risk variants outside the major histocompatibility complex (MHC) [2-4], while also replicating known associations from candidate gene studies [5]. The difficulty in evaluating the role of other candidate loci from the MHC region in the etiology of the disease resides in the strong linkage disequilibrium (LD) and the extended haplotype structure that exists in this highly polymorphic portion of the genome. We seek to evaluate the risk of other HLA loci, by using SNP data and publicly available HLA data, in order to predict and evaluate the effect of classical class I and class II HLA alleles in the North American Rheumatoid Arthritis Consortium (NARAC) case-control dataset [3], as distributed for use at the Genetic Analysis Workshop 16.

Methods

The NARAC sample consists of 868 cases of RA and 1194 controls, all genotyped on the Illumina HumanHap550 BeadChip, or equivalent. The sample is fully described by Plenge et al. [3]. The sample also includes HLA-DRB1 alleles, typed at various resolutions. The susceptibility alleles at DRB1 tend to share the RAA motif in position 72-74 of the amino acid sequence, an observation that led to the hypothesis of a functional unit, called the shared epitope (SE) [6]. Amino acids found in positions 70-71 provide further refinement of the classification of DRB1 risk alleles [7]. To predict classical HLA alleles from SNP data, we followed the methods described by de Bakker et al. [8]. They typed six class I and class II HLA genes (A, B, C, DQA1, DQB1, and DRB1) in a set of samples that includes the CEU samples from the HapMap (Utah residents with ancestry from northern and western Europe). Most of the HLA alleles they report are at a resolution of four digits. We used this publicly available dataset, combined with SNP genotype data from the HapMap that are in the broad MHC region (chr6: 25, 990, 507...33, 893, 423 [hg18]), and that overlap with the set of SNPs on the Illumina HumanHap550 BeadChip. Using the CEU HapMap data combined with the CEU HLA data from de Bakker et al. [8], we searched for tags for each of the HLA alleles, considering up to three-SNP haplotypes as potential predictors. The best predictor was chosen based on the largest observed r2 measure of LD (where, for each target HLA allele, we merged all other alleles at that locus into a single one, to mimic a biallelic locus; the same for multiple SNP haplotypes). To be considered a potential predictor of HLA alleles, a SNP had to be in Hardy-Weinberg equilibrium (p > 0.00001) in the set of controls from the NARAC dataset, and had to have a call rate above 95% over all samples. We used the program Tagger [9] as implemented in computer program Haploview [10] to predict the HLA alleles from the HapMap SNP data. We tested for the association between RA and the class I and class II (non DRB1) HLA alleles, using the SNP predictors as surrogates. Because the DRB1 locus is a strong risk factor for RA, we reduced the confounding effects of LD by performing the analysis conditional on whether each of the two alleles found at the DRB1 locus are members of the SE class of alleles, considering this conditioning argument as if it was a biallelic locus. We used the computer program UNPHASED [11] to perform the conditional tests of association. For each target HLA allele, we report two conditional odds ratios (ORs): these are ORs for the target HLA allele given the presence (SE+) or absence (SE-) of an SE allele on its haplotype. Among the four-digit alleles that are classified as SE+ (according to the classification of du Montcel et al. [7]), those that were actually observed in the NARAC samples only include DRB1*0101, *0102, *0401, *0404, *0405, *0408, and *1001. The NARAC sample is affected by population substructure, with chi-square statistics reported to be inflated on average by a factor ~ 1.4 [3]. To account for the hidden ancestry of all cases and controls, we computed the spectral decomposition of a covariance matrix between all DNA samples and used its eigenvectors as surrogates for ancestry [12]. The covariance matrix was calculated using a set of ~ 120,000 autosomal SNPs that are at most modestly correlated (pairwise r2 < 0.30), a set that excludes SNPs on the short arm of chromosome 6 and on the short arm of chromosome 8 (for reasons explained by Plenge et al. [3]). As in Plenge et al. [3], we found seven outliers by inspecting the eigenvectors associated with the top 10 eigenvalues: their respective entries in at least one eigenvector differed from the mean by more than six standard deviations. We removed these seven outliers from any downstream analyses, and recomputed the eigenvectors. As in Plenge et al. [3], the top three vectors that are statistically significant predictors of case-control status were used as surrogates for the hidden ancestry of all samples, and were used to correct for the effects of population stratification. By using them as covariates in a logistic regression framework, the inflation factor of all association results, excluding results on the short arm of chromosome 6, was calculated to be 1.035. This value is similar to what has been calculated by Plenge et al. [3]. We used these three vectors as potential confounders in UNPHASED.

Results

We only report the results of the conditional tests of association for those HLA alleles that can be predicted from the set of SNPs described in Methods at an r2 > 0.80 (47 out of the 70 non-DRB1 HLA alleles, or 67%), and that moreover show conditional association at the level p < 0.001. Table 1 shows, for each HLA allele, its frequency as estimated from the data from de Bakker et al. [8], the SNP or the combination of SNPs that can be used to predict the HLA allele, along with the predictor allele or haplotype, and the strength of the prediction in terms of the r2 measure of LD. It also shows the results of the conditional tests of association, including the conditional ORs and their confidence intervals.
Table 1

Conditional tests of association between RA and classical HLA alleles through SNP surrogates

SE-eSE+f


HLA alleleFrequencyaSurrogate (allele/haplotype)b r 2c p-valuedORCI-lowCI-highORCI-lowCI-high
DQA*03010.27rs660895 (G)0.962.13 × 10-122.111.383.242.111.652.71
DQB*05010.11rs17533090, rs9275406, rs9275439 (AAG)1.004.30 × 10-110.330.042.610.430.330.55
DQA*01010.13rs9268832, rs2395185, rs7774434 (GCG)0.805.12 × 10-080.580.201.680.480.370.62
B*08010.16rs3134792 (C)1.001.86 × 10-41.541.032.303.031.277.23
DQB*03020.19rs9275312 (G)0.941.96 × 10-42.231.283.891.381.071.79
C*04010.08rs9264904 (A)1.001.99 × 10-40.850.561.300.590.440.79

aFrequency estimated from the data of de Bakker et al. [8]

bAlleles or haplotypes used as surrogate for the HLA alleles, with corresponding SNPs

cr2 measure of linkage disequilibrium between the HLA allele and the surrogate

dp-value for the two degrees of freedom conditional test of association

eOdds ratio and 95% confidence interval for HLA alleles on non-SE allele bearing haplotypes

fOdds ratio and 95% confidence interval for HLA alleles on SE allele bearing haplotypes

Conditional tests of association between RA and classical HLA alleles through SNP surrogates aFrequency estimated from the data of de Bakker et al. [8] bAlleles or haplotypes used as surrogate for the HLA alleles, with corresponding SNPs cr2 measure of linkage disequilibrium between the HLA allele and the surrogate dp-value for the two degrees of freedom conditional test of association eOdds ratio and 95% confidence interval for HLA alleles on non-SE allele bearing haplotypes fOdds ratio and 95% confidence interval for HLA alleles on SE allele bearing haplotypes We find that two class II alleles, DQA1*0301 and DQB1*0302, and one class I allele, B*0801, show significant association with RA irrespective of the presence or absence, on their respective haplotypes, of an SE allele at the DRB1 locus. For DQA1*0301, both conditional ORs are estimated to take the same value, 2.11 (p = 2 × 10-12). For DQB1*0302, the risk is higher when not combined with an SE allele (2.23 versus 1.38, p = 0.0002). The alpha and beta chains DQA1*0301/DQB1*0302 together form the DQ8 serotype [13]. That they are co-associated is thus not surprising. DQ8 has been shown to be associated with RA in humans [14], but this association was thought to be due to LD with DRB1*0401 and *0405 (two SE alleles). Our results are indicative of DQ8 being a risk factor independent of the risk alleles, or non-risk alleles, found at DRB1 (but see Discussion). For the class I allele B*0801, the OR is 1.54 when its haplotype does not contain an SE allele, while it is 3.03 otherwise (p = 0.00018). B*0801 is found on the ancestral 8.1 haplotype, which has been shown to carry risk for RA as well as DRB1*03, a non-SE allele [15]. All other HLA alleles from Table 1 show significant decrease in risk only when combined with an SE allele (see Discussion).

Discussion

Conditioning on the presence or absence of SE alleles on the same haplotype as the test allele at other HLA loci helps reduce the confounding effects of LD with the DRB1 locus, but since different DRB1 alleles, or combinations thereof, show a wide spectrum of risks, this conditioning argument is not sufficient on its own to fully account for DRB1. Knowledge of the haplotype structure in the MHC region is still necessary for a better interpretation of the results. For instance, the apparent protection that seems to be conferred by DQB1*0501 or DQA1*0101 (Table 1) is a mere reflection of the fact that these two alleles are in LD with DR1/DR10 [16], which although they are risk factors for RA, they are not the most prominent ones [14]. Moreover, the classical HLA alleles that we report in the present study are only predicted from the SNP data at hand, sometimes imperfectly, and based on only a small sample (in our case, the HapMap CEU samples). Thus, it is still unclear if the associations seen between DQA1*0301/DQB1*0302 (the DQ8 serotype) and RA truly reflect risks that are independent of DRB1, or rather are artifacts of the measurement errors inherent to any tagging procedure. In terms of the power to detect disease-associated HLA alleles, a penalty is incurred when using SNPs or SNP-haplotypes as surrogates for them, because the sample size required to achieve a given power is inversely proportional to the r2 measure of LD between them [17]. Yet, as a proof-of-concept and justification for the more expensive typing of HLA alleles at high resolution, using SNP data and publicly available databases of HLA data to predict classical class I and class II alleles is an efficient method for preliminary evaluation of the role of HLA genes in the etiology of autoimmune, infectious or other relevant diseases.

List of abbreviations used

HLA: Human leukocyte antigen; LD: Linkage disequilibrium; MHC: Major histocompatibility complex; NARAC: North American Rheumatoid Arthritis Consortium; OR: Odds ratio; RA: Rheumatoid arthritis; SE: Shared epitope; SNP: Single-nucleotide polymorphisms

Competing interests

The author declares that he has no competing interests.
  17 in total

Review 1.  Do the HLA-DQ and DP genes play a role in rheumatoid arthritis?

Authors:  A Perdriger
Journal:  Joint Bone Spine       Date:  2001-02       Impact factor: 4.929

2.  Efficiency and power in genetic association studies.

Authors:  Paul I W de Bakker; Roman Yelensky; Itsik Pe'er; Stacey B Gabriel; Mark J Daly; David Altshuler
Journal:  Nat Genet       Date:  2005-10-23       Impact factor: 38.330

Review 3.  Genome-wide association studies: theoretical and practical concerns.

Authors:  William Y S Wang; Bryan J Barratt; David G Clayton; John A Todd
Journal:  Nat Rev Genet       Date:  2005-02       Impact factor: 53.242

4.  Principal components analysis corrects for stratification in genome-wide association studies.

Authors:  Alkes L Price; Nick J Patterson; Robert M Plenge; Michael E Weinblatt; Nancy A Shadick; David Reich
Journal:  Nat Genet       Date:  2006-07-23       Impact factor: 38.330

5.  Two independent alleles at 6q23 associated with risk of rheumatoid arthritis.

Authors:  Robert M Plenge; Chris Cotsapas; Leela Davies; Alkes L Price; Paul I W de Bakker; Julian Maller; Itsik Pe'er; Noel P Burtt; Brendan Blumenstiel; Matt DeFelice; Melissa Parkin; Rachel Barry; Wendy Winslow; Claire Healy; Robert R Graham; Benjamin M Neale; Elena Izmailova; Ronenn Roubenoff; Alexander N Parker; Roberta Glass; Elizabeth W Karlson; Nancy Maher; David A Hafler; David M Lee; Michael F Seldin; Elaine F Remmers; Annette T Lee; Leonid Padyukov; Lars Alfredsson; Jonathan Coblyn; Michael E Weinblatt; Stacey B Gabriel; Shaun Purcell; Lars Klareskog; Peter K Gregersen; Nancy A Shadick; Mark J Daly; David Altshuler
Journal:  Nat Genet       Date:  2007-11-04       Impact factor: 38.330

6.  New classification of HLA-DRB1 alleles supports the shared epitope hypothesis of rheumatoid arthritis susceptibility.

Authors:  Sophie Tezenas du Montcel; Laetitia Michou; Elisabeth Petit-Teixeira; José Osorio; Isabelle Lemaire; Sandra Lasbleiz; Céline Pierlot; Patrick Quillet; Thomas Bardin; Bernard Prum; François Cornelis; Françoise Clerget-Darpoux
Journal:  Arthritis Rheum       Date:  2005-04

Review 7.  The shared epitope hypothesis. An approach to understanding the molecular genetics of susceptibility to rheumatoid arthritis.

Authors:  P K Gregersen; J Silver; R J Winchester
Journal:  Arthritis Rheum       Date:  1987-11

8.  TRAF1-C5 as a risk locus for rheumatoid arthritis--a genomewide study.

Authors:  Robert M Plenge; Mark Seielstad; Leonid Padyukov; Annette T Lee; Elaine F Remmers; Bo Ding; Anthony Liew; Houman Khalili; Alamelu Chandrasekaran; Leela R L Davies; Wentian Li; Adrian K S Tan; Carine Bonnard; Rick T H Ong; Anbupalam Thalamuthu; Sven Pettersson; Chunyu Liu; Chao Tian; Wei V Chen; John P Carulli; Evan M Beckman; David Altshuler; Lars Alfredsson; Lindsey A Criswell; Christopher I Amos; Michael F Seldin; Daniel L Kastner; Lars Klareskog; Peter K Gregersen
Journal:  N Engl J Med       Date:  2007-09-05       Impact factor: 91.245

9.  Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls.

Authors: 
Journal:  Nature       Date:  2007-06-07       Impact factor: 49.962

10.  A high-resolution HLA and SNP haplotype map for disease association studies in the extended human MHC.

Authors:  Paul I W de Bakker; Gil McVean; Pardis C Sabeti; Marcos M Miretti; Todd Green; Jonathan Marchini; Xiayi Ke; Alienke J Monsuur; Pamela Whittaker; Marcos Delgado; Jonathan Morrison; Angela Richardson; Emily C Walsh; Xiaojiang Gao; Luana Galver; John Hart; David A Hafler; Margaret Pericak-Vance; John A Todd; Mark J Daly; John Trowsdale; Cisca Wijmenga; Tim J Vyse; Stephan Beck; Sarah Shaw Murray; Mary Carrington; Simon Gregory; Panos Deloukas; John D Rioux
Journal:  Nat Genet       Date:  2006-09-24       Impact factor: 38.330

View more
  3 in total

1.  Haplotype-based analysis: a summary of GAW16 Group 4 analysis.

Authors:  Elizabeth Hauser; Nadine Cremer; Rebecca Hein; Harshal Deshmukh
Journal:  Genet Epidemiol       Date:  2009       Impact factor: 2.135

2.  Inference of high resolution HLA types using genome-wide RNA or DNA sequencing reads.

Authors:  Yu Bai; Min Ni; Blerta Cooper; Yi Wei; Wen Fury
Journal:  BMC Genomics       Date:  2014-05-01       Impact factor: 3.969

3.  Comparative study for haplotype block partitioning methods - Evidence from chromosome 6 of the North American Rheumatoid Arthritis Consortium (NARAC) dataset.

Authors:  Mohamed N Saad; Mai S Mabrouk; Ayman M Eldeib; Olfat G Shaker
Journal:  PLoS One       Date:  2018-12-31       Impact factor: 3.240

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.