Literature DB >> 20018028

Armitage's trend test for genome-wide association analysis: one-sided or two-sided?

Yixin Fang1, Yuanjia Wang, Nanshi Sha.   

Abstract

The importance of considering confounding due to population stratification in genome-wide association analysis using case-control designs has been a source of debate. Armitage's trend test, together with some other methods developed from it, can correct for population stratification to some extent. However, there is a question whether the one-sided or the two-sided alternative hypothesis is appropriate, or to put it another way, whether examining both the one-sided and the two-sided alternative hypotheses can give more information. The dataset for Problem 1 of Genetic Analysis Workshop 16 provides us with a chance to address this question. Because it is a part of a combined sample from the North American Rheumatoid Arthritis Consortium (NARAC) and the Swedish Epidemiological Investigation of Rheumatoid Arthritis (EIRA), the results from the combined sample can be used as references. To test this aim, the last 10,000 single-nucleotide polymorphisms (SNPs) on chromosome 9, which contain the common genetic variant at the TRAF1-C5 locus, were examined by conducting Armitage's trend tests. Examining the two-sided alternative hypothesis shows that SNPs rs12380341 (p = 9.7 x 10-11) and rs872863 (p = 1.7 x 10-15), along with six SNPs across the TRAF1-C5 locus, rs1953126, rs10985073, rs881375, rs3761847, rs10760130, and rs2900180 (p~1 x 10-7), are significantly associated with anti-cyclic citrullinated peptide-positive rheumatoid arthritis. But examining the one-sided alternative hypothesis that the minor allele is positively associated with the disease shows that only those six SNPs across the TRAF1-C5 locus are significantly associated with the disease (p~1 x 10-8), which is consistent with the results from the combined sample of the NARAC and the EIRA.

Entities:  

Year:  2009        PMID: 20018028      PMCID: PMC2795935          DOI: 10.1186/1753-6561-3-s7-s37

Source DB:  PubMed          Journal:  BMC Proc        ISSN: 1753-6561


Background

The Genetic Analysis Workshop 16 (GAW16) rheumatoid arthritis (RA) dataset is the initial batch of whole genome-wide association study (GWAS) data for the North American Rheumatoid Arthritis Consortium (NARAC) cases (N1 = 868) and controls (N0 = 1194) after removing duplicated and contaminated samples [1]. The high-throughput genotyping technology [~550 k single-nucleotide polymorphisms (SNPs)] in the NARAC data makes it a challenge to interpret this GWAS. One of the disadvantages of the case-control GWAS is that they are prone to a number of biases including population stratification [2]. The importance of considering confounding due to population stratification in GWAS using case-control designs [3,4] has been a source of debate. The Armitage's trend tests can correct for population stratification to some extent [5-7]; some other methods based on the Armitage's trend tests have also been developed, such as genomic control approach [8,9]. However, there is still a question as to whether the one-sided or the two-sided alternative hypothesis is appropriate, or put it another way, whether examining both the one-sided and the two-sided alternative hypotheses can give more information. The dataset for the Problem 1 of GAW16 provides us with a chance to address this question. Because it is a part of a combined sample from the NARAC and the Epidemiological Investigation of Rheumatoid Arthritis (EIRA), the results from the combined sample can be used as references. To this aim, the last 10,000 SNPs on chromosome 9, which contains the common genetic variant at the TRAF1-C5 locus, were examined by conducting Armitage's trend tests. Two alternative hypotheses, the two-sided alternative hypothesis that the genotypes at a locus are associated with the disease and the one-sided alternative hypothesis that the minor allele at a locus is positively associated with the disease, were considered. Three types of scores, co-dominant score, dominant score, and recessive score, were chosen to construct the Armitage's trend tests.

Methods

At any SNP, the data can be summarized in a contingency table as in Table 1. Always assume that "M" is the major allele and "m" is the minor allele. Scores x0, x1, and x2, are chosen to construct Armitage's trend test. The Armitage's trend test statistic is defined as [5,6].
Table 1

Contingency table at any SNP (M is major allele and m is minor allele)

GenotypeTotal

MM Mm Mm
Casen10n11n12N1
Controln00n01n02N0
TotalN+0N+1N+2N
Scorex0x1x2
Contingency table at any SNP (M is major allele and m is minor allele) Under the null hypothesis, it is approximately distributed with . This test statistic is suitable for the two-sided alternative hypothesis that the genotypes at a SNP are associated with the disease of interest. As discussed in Armitage [5], whatever the scoring system chosen, the validity of the test is not affected, but the choice of scoring system affects the power of the test. There are three common choices of scoring system: 1) co-dominant score: x0 = 0, x1 = 1, and x2 = 2; 2) dominant score: x0 = 0, x1 = 1, and x2 = 1; 3) recessive score: x0 = 0, x1 = 0, and x2 = 1. Here, the names of scoring systems are in favor of the minor allele "m". From the rationale of the genetic association analysis (see, for example, Risch and Merikangas [10]), it is more informative to look at two one-sided alternative hypotheses, i) the alternative that the minor allele is positively associated with the disease and ii) the alternative that the major allele is positively associated with the disease. Furthermore, because the disease of interest is rare, it is more reasonable to concentrate on the first alternative, despite that in practice we would do better to consider both alternatives if no prior information is available on which allele is positively associated with the disease. Another reason is that it can reduce the false-positive rate. Hereafter, we concentrate on the alternative hypothesis that the minor allele is positively associated with the disease. To this aim, one-sided can be defined as Under the null hypothesis, it is approximately distributed with N(0,1). Similarly, those three scoring systems can also be used here. It is shown in Knapp [11] that if the co-dominant scoring system is chosen, then , where F is the Wright's coefficient of inbreeding, and Z is the test statistic simply comparing the frequencies of minor allele "m" in the case and control groups. Here the value of F automatically corrects the population stratification to some extent.

Results

For simplicity of interpretation, we only consider the last 10,000 SNPs on chromosome 9, which contain the common genetic variant at the TRAF1-C5 locus. The same analysis can be extended to the whole genome of approximately 550,000 SNPs. For the two-sided alternative that the genotypes at a SNP are associated with the disease, Table 2 summarizes the LOD scores (-log10 p) of the test 2, which simply compares the frequencies of the minor allele in both groups, the Armitage's tests with co-dominant score, with dominant score, with recessive score, and the Wright's coefficient of inbreeding F; only those SNPs with LOD > 6 are reported. The SNPs across the TRAF1-C5 locus are marked with asterisks.
Table 2

LOD values for the two-sided alternative

SNPa Z 2 b c F d
rs40782926.145.230.91d6.960.1958
rs1238034111.6110.010.3613.450.1722
rs169295457.226.711.017.620.0850
*rs19531267.567.535.055.440.0037f
*rs109850736.986.875.634.190.0173
*rs8813757.647.634.815.710.0020
*rs37618477.917.755.925.030.0230
*rs107601307.427.306.034.400.0190
*rs29001808.218.195.206.090.0022
rs87286315.6514.781.5115.110.0617
rs8882296.175.271.096.100.1914
rs111856657.546.480.329.680.1817
rs117921458.586.530.0412.240.3488

a Asterisks indicates SNPs are located on TRAF1-C5.

b Z2 is the Chi-square test comparing the frequencies of the minor allele in the two groups

c Subscripts A1, A2 and A3 denote test (1) with score systems 1, 2, and 3, respectively

d F is the Wright's coefficient of inbreeding

e Bold font indicates is significantly smaller than and .

f Italic font indicates F value is smaller than 0.03.

LOD values for the two-sided alternative a Asterisks indicates SNPs are located on TRAF1-C5. b Z2 is the Chi-square test comparing the frequencies of the minor allele in the two groups c Subscripts A1, A2 and A3 denote test (1) with score systems 1, 2, and 3, respectively d F is the Wright's coefficient of inbreeding e Bold font indicates is significantly smaller than and . f Italic font indicates F value is smaller than 0.03. In Table 2, those six SNPs marked with asterisks have small F (<0.03), and this explains why their values in the third column, which correct for population stratification, are almost the same as Z2 in the second column. Also, for these six SNPs, is a bit more significant than and , and the latter two are close to each other, which means that these SNPs are very likely co-dominant. For the other seven SNPs, is a bit more significant than , but is not significant at all. This shows that these SNPs are very likely recessive. Another thing learned from Table 2 is that two SNPs, rs12380341 and rs872863, have extreme large LOD scores for Z2, , and , but surprisingly they were not reported by Plenge et al. [1], which was based on the combined sample from the NARAC and the EIRA. Are these two SNPs truly associated with the disease, or are they just false positives? Table 3 summarizes the LOD values for the one-sided alternative that the minor allele at a SNP is positively associated with the disease. Similarly, Zis the statistic Zwith co-dominant score, Zdominant score, and Zrecessive score. From Table 3, only those six SNPs marked with asterisks are significant for the one-sided alternative that the minor allele is positively associated with the disease. These results are completely consistent with the ones in Plenge et al. [1]. By consider the other one-sided alternative that the major allele is positively associated with the disease, the other seven SNPs are significant. Therefore, as discussed in the preceding section, and particularly for this dataset, it seems that it is more reasonable to consider the one-sided alternative that the minor allele is positively associated with the disease.
Table 3

LOD values for the one-sided alternative of the minor allele

SNPa Z b Z A1 c Z A2 Z A3
rs40782920.000.000.030.00
rs123803410.000.000.110.00
rs169295450.000.000.020.00
*rs19531267.867.835.355.74
*rs109850737.287.175.934.49
*rs8813757.957.935.126.01
*rs37618478.218.056.225.33
*rs107601307.727.606.334.70
*rs29001808.518.495.506.39
rs8728630.000.000.010.00
rs8882290.000.000.020.00
rs111856650.000.000.630.00
rs117921450.000.000.270.00

a Asterisks indicates SNPs are located on TRAF1-C5.

b Z is the z-test comparing the frequencies of the minor allele in the two groups

c Superscripts A1, A2 and A3 denote the test (2) with score systems 1, 2, and 3, respectively

LOD values for the one-sided alternative of the minor allele a Asterisks indicates SNPs are located on TRAF1-C5. b Z is the z-test comparing the frequencies of the minor allele in the two groups c Superscripts A1, A2 and A3 denote the test (2) with score systems 1, 2, and 3, respectively

Discussion

The question of whether the two-sided alternative or the one-sided alternatives should be considered is intractable, but this manuscript attempts to raise the question and address it to some extent. Table 3 shows that if we concentrate on the one-sided alternative that the minor allele is positively associated with the disease, we get exactly the same results as Plenge et al. [1]. For rare diseases, and we have reason to believe that the alleles positively associated with them have low frequencies in a general population. Based on this belief (or alternative hypothesis), it seems that those SNPs without asterisks are false positives under the two-sided alternative. But if we do not want to believe that the minor allele is positively associated with the disease and do not want to miss any SNPs related to the disease, we had better consider the two-sided alternative.

Conclusion

More information can be gained from GWAS by using multiple scoring systems in the Armitage's trend tests and examining both the one-sided and the two-sided alternative hypotheses.

List of abbreviations used

EIRA: Epidemiological Investigation of Rheumatoid Arthritis; GAW16: Genetic Analysis Workshop 16; GWAS: Genome-wide association; NARAC: North American Rheumatoid Arthritis Consortium; RA: Rheumatoid arthritis; SNP: Single-nucleotide polymorphism(s).

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

YF conceived the idea, performed the statistical analysis, and drafted the manuscript. YW and NS helped to perform the statistical analysis and draft the manuscript. All authors read and approved the final manuscript.
  10 in total

Review 1.  Detecting association in a case-control study while correcting for population stratification.

Authors:  D E Reich; D B Goldstein
Journal:  Genet Epidemiol       Date:  2001-01       Impact factor: 2.135

2.  Biased tests of association: comparisons of allele frequencies when departing from Hardy-Weinberg proportions.

Authors:  D J Schaid; S J Jacobsen
Journal:  Am J Epidemiol       Date:  1999-04-15       Impact factor: 4.897

3.  Re: "Biased tests of association: comparisons of allele frequencies when departing from Hardy-Weinberg proportions".

Authors:  M Knapp
Journal:  Am J Epidemiol       Date:  2001-08-01       Impact factor: 4.897

4.  Genomic control for association studies.

Authors:  B Devlin; K Roeder
Journal:  Biometrics       Date:  1999-12       Impact factor: 2.571

5.  Point: population stratification: a problem for case-control studies of candidate-gene associations?

Authors:  Duncan C Thomas; John S Witte
Journal:  Cancer Epidemiol Biomarkers Prev       Date:  2002-06       Impact factor: 4.254

6.  Counterpoint: bias from population stratification is not a major threat to the validity of conclusions from epidemiological studies of common polymorphisms and cancer.

Authors:  Sholom Wacholder; Nathaniel Rothman; Neil Caporaso
Journal:  Cancer Epidemiol Biomarkers Prev       Date:  2002-06       Impact factor: 4.254

7.  How to interpret a genome-wide association study.

Authors:  Thomas A Pearson; Teri A Manolio
Journal:  JAMA       Date:  2008-03-19       Impact factor: 56.272

8.  From genotypes to genes: doubling the sample size.

Authors:  P D Sasieni
Journal:  Biometrics       Date:  1997-12       Impact factor: 2.571

9.  The future of genetic studies of complex human diseases.

Authors:  N Risch; K Merikangas
Journal:  Science       Date:  1996-09-13       Impact factor: 47.728

10.  TRAF1-C5 as a risk locus for rheumatoid arthritis--a genomewide study.

Authors:  Robert M Plenge; Mark Seielstad; Leonid Padyukov; Annette T Lee; Elaine F Remmers; Bo Ding; Anthony Liew; Houman Khalili; Alamelu Chandrasekaran; Leela R L Davies; Wentian Li; Adrian K S Tan; Carine Bonnard; Rick T H Ong; Anbupalam Thalamuthu; Sven Pettersson; Chunyu Liu; Chao Tian; Wei V Chen; John P Carulli; Evan M Beckman; David Altshuler; Lars Alfredsson; Lindsey A Criswell; Christopher I Amos; Michael F Seldin; Daniel L Kastner; Lars Klareskog; Peter K Gregersen
Journal:  N Engl J Med       Date:  2007-09-05       Impact factor: 91.245

  10 in total
  1 in total

1.  Improving the signal-to-noise ratio in genome-wide association studies.

Authors:  Lisa J Martin; Guimin Gao; Guolian Kang; Yixin Fang; Jessica G Woo
Journal:  Genet Epidemiol       Date:  2009       Impact factor: 2.135

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.