| Literature DB >> 23966874 |
Geraldine M Clarke1, Manuel A Rivas, Andrew P Morris.
Abstract
Multiple rare variants either within or across genes have been hypothesised to collectively influence complex human traits. The increasing availability of high throughput sequencing technologies offers the opportunity to study the effect of rare variants on these traits. However, appropriate and computationally efficient analytical methods are required to account for collections of rare variants that display a combination of protective, deleterious and null effects on the trait. We have developed a novel method for the analysis of rare genetic variation in a gene, region or pathway that, by simply aggregating summary statistics at each variant, can: (i) test for the presence of a mixture of effects on a trait; (ii) be applied to both binary and quantitative traits in population-based and family-based data; (iii) adjust for covariates to allow for non-genetic risk factors and; (iv) incorporate imputed genetic variation. In addition, for preliminary identification of promising genes, the method can be applied to association summary statistics, available from meta-analysis of published data, for example, without the need for individual level genotype data. Through simulation, we show that our method is immune to the presence of bi-directional effects, with no apparent loss in power across a range of different mixtures, and can achieve greater power than existing approaches as long as summary statistics at each variant are robust. We apply our method to investigate association of type-1 diabetes with imputed rare variants within genes in the major histocompatibility complex using genotype data from the Wellcome Trust Case Control Consortium.Entities:
Mesh:
Year: 2013 PMID: 23966874 PMCID: PMC3744430 DOI: 10.1371/journal.pgen.1003694
Source DB: PubMed Journal: PLoS Genet ISSN: 1553-7390 Impact factor: 5.917
C-alpha
test statistic is simply:Z has a standard Gaussian distribution under the null hypothesis of no association, which is rejected for values of Z significantly larger than that expected for a one-tailed test of size α. The Binomial C-alpha test cannot adjust for covariates and cannot be directly applied to imputed data.Null simulations.
| MinimumMAF % | Mean no. of variants in region | Type I error rates for significance level (95% Confidence Interval) | |||
| 1×10−5 | 1×10−4 | 1×10−3 | 1×10−2 | ||
|
| |||||
| 0.2 | 34 | <0.00001 (0.00000–0.00003) | 0.00008 (0.00001–0.00016) | 0.00081 (0.00059–0.00104) | 0.00947 (0.00870–0.01025) |
| 0.5 | 15 | 0.00001 (0.00000–0.00003) | 0.00009 (0.00002–0.00016) | 0.00102 (0.00079–0.00126) | 0.00993 (0.00919–0.01067) |
|
| |||||
| 0.2 | 34 | <0.00001 (0.00000–0.00003) | 0.00010 (0.00002–0.00018) | 0.00091(0.00067–0.00115) | 0.00952 (0.00875–0.01030) |
| 0.5 | 15 | 0.00001 (0.00000–0.00003) | 0.00013 (0.00005–0.00021) | 0.00115(0.00090–0.00141) | 0.01061 (0.00984–0.01137) |
Observed type I errors at selected significance levels for the Generalised C-alpha test for association with a quantitative trait and a dichotomised version of a quantitative trait in a 50 kb region where the rare variants tested do not account for any of the trait variance. Tests only consider variants in the region with a maximum MAF of 1% and a minimum MAF as indicated in the table. Type I error is estimated over 100,000 replicates of data for a sample of size 2,000. Significance in each replicate of data is assessed empirically by random permutation of the quantitative trait value and recalculation of the test statistic 1,000 times as described in Text S1.
Figure 1Power Comparisons.
Power to detect association in a region is shown for the Generalised C-alpha test, SKAT-O and the GRANVIL test applied directly to the quantitative trait and for the Generalised C-alpha and the Binomial C-alpha tests applied to the dichotomised quantitative trait. (A) Power is shown as a function of the percentage of causal variants in a region of size 100 kb that are risk as opposed to protective when the minimum MAF of variants considered is fixed at 0.5% for a sample size of 10,000. Results show that as the proportion of risk causal variants approaches 50%, the C-alpha and SKAT-O tests maintain power and that the Generalised C-alpha applied directly to the quantitative trait has optimal power. (B) Power is also shown as a function of the minimum MAF of variants considered when the percentage of risk causal variants in a region of size 100 kb is fixed at 50% for a sample 10,000 individuals. Results show that the power of the Generalised C-alpha test is optimal for variants with MAF>∼0.3% but SKAT-O is optimal for lower MAF. For quantitative traits, the power of the Generalised C-alpha test remains better than the Binomial C-alpha applied to a dichotomized version of the trait as long as variants have MAF>∼0.1%. For binary traits, the Binomial C-alpha test has greater or equivalent power than the Generalised C-alpha test.
Genes demonstrating genome-wide significant evidence of rare variant association with type-1 diabetes on chromosome 6.
| Gene symbol | NCBI Build 37 chromosome 6 position (BP) | Number of rare variants | Unconditional analysis | Conditional analysis: adjusted for lead MHC SNP | ||
| Start | Stop |
|
|
| ||
|
| ||||||
|
| 32,485,162 | 32,557,562 | 189 | 60.5 | 40.3 | 5.2×10−6 |
|
| ||||||
|
| 30,513,695 | 30,525,008 | 9 | 51.0 | 14.1 | 1.7×10−6 |
|
| 30,620,896 | 30,640,830 | 7 | 45.5 | 14.0 | 3.5×10−6 |
|
| 31,865,561 | 31,913,448 | 12 | 20.7 | 12.9 | 4.0×10−5 |
|
| 31,895,265 | 31,919,860 | 8 | 19.9 | 9.7 | 1.3×10−4 |
|
| 32,008,931 | 32,077,151 | 21 | 29.9 | 34.5 | <1.7×10−6 |
|
| 32,223,487 | 32,233,615 | 18 | 41.2 | 24.1 | 1.0×10−4 |
|
| 32,256,302 | 32,339,656 | 97 | 89.1 | 57.8 | 3.0×10−6 |
|
| 32,362,512 | 32,374,900 | 6 | 26.0 | 15.7 | <1.7×10−6 |
|
| 32,485,162 | 32,557,562 | 62 | 43.2 | 29.4 | <1.7×10−6 |
|
| 32,520,489 | 32,552,155 | 34 | 44.1 | 27.6 | <2.5×10−6 |
|
| 32,709,162 | 32,715,219 | 6 | 28.6 | 13.1 | 2.3×10−5 |
|
| 32,723,875 | 32,731,330 | 6 | 18.5 | 17.0 | <1.7×10−6 |
|
| 32,781,499 | 32,806,547 | 18 | 21.8 | 18.9 | <1.7×10−6 |
|
| 32,902,409 | 32,908,817 | 8 | 10.3 | 8.2 | 2.8×10−5 |
|
| 32,936,436 | 32,949,281 | 13 | 14.9 | 11.2 | 8.7×10−6 |
|
| 33,130,468 | 33,160,245 | 13 | 31.2 | 17.4 | 1.7×10−6 |
Genes with a permuted p-value less than 1.7×10−6 (indicating genome wide significance assuming a significance level of 5% and that there are 30,000 genes in the human genome [25]) in a Generalised C-alpha test.
For these genes, results are also shown when effects are adjusted for the lead common MHC SNP (rs9268645). Both analyses are adjusted for 3 principal components to account for population structure. For the unconditional analysis results are based on 600,000 permutations; for the conditional analysis results are based on 575,000 permutations. MAF, minor allele frequency; BP, base pair; MAF: Minor Allele Frequency; MHC, Major histocompatibility complex; NCBI, National Center for Biotechnology Information.