Literature DB >> 27980645

Powerful association test combining rare variant and gene expression using family data from Genetic Analysis Workshop 19.

Yen-Yi Ho1, Weihua Guan1, Michael O'Connell1, Saonli Basu1.   

Abstract

BACKGROUND: Genetic association studies aim to test for disease or trait association with genetic variants, either throughout the human genome or in regions of interest. However, for most diseases and traits, the combined effects of associated genetic variants explain only a small proportion of the genetic variation. This "missing heritability" may be a result of the small effects of common variants considered in the genetic association studies. Rare variants may also play an important role in understanding the missing heritability of complex traits.
METHOD: We propose a novel weight-adjustment approach to combine gene expression into rare variant analysis. Results from previous simulation studies suggested that incorporating gene expression information can lead to substantial gain in statistical power.
RESULTS: Using the family data set provided through the Genetic Analysis Workshop 19, we identified susceptible genes associated with blood pressure regulation.
CONCLUSIONS: These findings provide valuable information for further functional studies for blood pressure control and mechanism.

Entities:  

Year:  2016        PMID: 27980645      PMCID: PMC5133497          DOI: 10.1186/s12919-016-0039-4

Source DB:  PubMed          Journal:  BMC Proc        ISSN: 1753-6561


Background

In the past decade, genome-wide association studies (GWAS) have been successful in identifying susceptible genetic loci for many complex traits [1]. However, the study by Eichler et al. reported that the amount of genetic variations explained by the findings from GWAS for a given disease or complex trait is often notably less than the estimated heritability of the traits [2]. One explanation is that the common variants examined by GWAS often have smaller effects, and the rare variants with larger genetic effects are often excluded in a GWAS analysis. Rare variants may play an important role in explaining the “missing heritability” of complex traits. As a result of recent advances in high-throughput sequencing technology, it is becoming financially feasible to assay rare genetic variations in thousands of individuals. The rare variants here are defined as genetic variants with a minor allele frequency of less than 1 %. Hence the typical GWAS strategy of analyzing one variant at a time is oftentimes underpowered for rare variant detection unless the effect size of the variant or the sample size is very large. A number of methods have been developed to analyze multiple rare variants jointly [3-5]. In this paper, we consider the Seq-aSum-VS approach developed by Basu and Pan [3]. This approach uses dimension-reduction and data-adaptive variable selection strategies to identify the nonnull variants from a group of genetic variants and uses a score test to test for association between the group of variants and the disease of interest. It often still requires a relatively large sample size for rare variant analyses. To boost the statistical power of genetic association analysis, one research direction is to integrate various genomic information, such as gene expression, rare variants, copy number variation, methylation, transcriptional regulation, and protein abundance. With the availability of both rare variant genotype and gene expression information in the family data set through Genetic Analysis Workshop 19 (GAW19), we proposed a novel approach to incorporate gene expression into rare variant association analysis in this paper. Genovese et al. [6] introduced the idea of using prior knowledge to weight p values from a genome-wide association study and provided theoretical proof for controlling family-wise error rate (FWER). In this paper, our contribution is to provide a formal mechanism to construct weights using available information about rare variants and gene expression. Studies by Li et al. [7] and Ho et al. [8] demonstrate that by incorporating additional genomic information, the weight-adjustment procedure can increase statistical power drastically compared to the traditional genetic association analysis.

Methods

There are 259 participants with full genotype, expression, and blood pressure data available in the GAW19 family data set. Our analysis focuses on systolic blood pressure (SBP) and diastolic blood pressure (DBP) as outcomes of interest. The average SBP and DBP from all visits were used as the summary measurement respectively for every participant. To adjust for pedigree structure in this data, we estimated identity-by-descent (IBD) matrix (Σ) using genome-wide single nucleotide polymorphism (SNP) marker data obtained using the Illumina platform provided through GAW19. To incorporate the dependence information embedded in the family structure, we transformed the average blood pressure measurements: , so that individuals from the same family are independent in the transformed phenotypic value (Y*). In addition, to account for the population stratifications that exist in the Mexican American population, we performed multidimensional scaling and calculated the first 3 principal components. In the following analysis, the residual from the transformed phenotypic value adjusting for the first 3 principal components were used. We considered genetic variant with a minor allele frequency of less than 0.01 as rare variant, and performed rare variants analysis for the genes reported by hg19 build on the odd-number autosomes that have less than 50 rare variants and more than 1 rare variant.

Sequential sum test

Consider k rare variants in a gene and that SNP indicates the number of rare variant alleles in variant i in a general regression equation: , with γ  = ν s ; where s is 1, −1, or 0, indicating whether the effect of rare variant i is positive or negative or excluded from the equation, and v is a weight assigned to rare variant i. In our analysis, we assumed vi = 1. In addition, β represents the common odds ratio between the trait and the rare variants in the gene. We performed the Seq-aSum-VS approach described in Basu and Pan [3] and obtained p value for each gene with 500 permutations.

Constructing weights using expression measurements

After obtaining the p value for each gene from the Seq-aSum-VS test, we used gene expression information to construct weight for each gene. In Roeder et al. [9, 10], the authors suggested to use a weight (w  > 0) to adjust p value (p ) and to reject the null hypothesis if it belongs to the set of all gene i for which p /w  ≤ α. The weight adjustment procedure maintains the proper FWER control as long as w  > 0 and . Building on the theoretical findings, we developed a novel weight-adjustment approach for rare variant association analysis. After weight adjustment, genes that have strong contributions to phenotype-associated gene expression will be assigned weights greater than 1, hence achieving smaller weight-adjusted p values. The weighting mechanism is as follows: we assign a weight w to each gene and the weight is the product of 2 parts: and . The first termindicates the effect of gene g on the j th gene expression measurement, E . The second term, describes whether gene expression measurement (E ) is associated with the phenotypic outcome (P). Eq. (1) is applied to obtain :and . In equation 1, g is the number of total rare variants in gene i calculated by collapsing the genotypes across rare variant loci. A second equation (2) was implemented to obtain :The benefit of taking the product of 2 weights is that if either or is zero, then the resulting product will be zero. On the other hand, if both and are substantially large, then taking the product of the two parts will result in an amplified overall weight. In other words, if rare variants in the gene under consideration provide a strong contribution to outcome P through E , then will be a large value. A crude weight for gene i is set to be the maximum of the products among all gene expression measurements: . To ensure that , we divide crude weights ( by their average () as required by Roeder and Wasserman [10]: . If is larger than the average, then will be greater than 1 after dividing by the average. We calculate adjusted p value for the ith gene as: adjusted p value for gene . If after adjustment, p value becomes greater than 1, then it is set to 1. For the genes with adjusted p value of less than 0.05, we performed gene set enrichment analysis using biological process categories defined in gene ontology (GO). To account for the hierarchical structure in GO terms, we implemented conditional hypergeometric test [11].

Results

Of the 13,711 genes on the odd-numbered autosomes based on the hg19 build, we considered 6118 genes with less than 50 rare variants and more than 1 rare variant in the analysis. We identified 153 genes with weight-adjusted p values of less than 0.05 for SBP or DBP; the top 20 genes are listed in Table 1. The genes with strong contribution to phenotype-associated gene expression levels are assigned weights greater than 1. In Table 1, 17 genes have weights for SBP greater than 1 and 18 genes have weights for DBP greater than 1, indicating that these genes contribute to alterations of phenotype-associated gene expression levels.
Table 1

Top 20 genes with adjusted p value of <0.05 from the Seq-aSum-VS test for either SNP or DBP

GeneChr# RVs p Values W s p s* p ValueD W D p D*
1 NAIF1 923<0.0021.306<0.0020.0020.8250.002
2 SPATA13-AS1 137<0.0020.986<0.0030.0180.8350.022
3 UTP11L 134<0.0020.828<0.0030.0061.1680.005
4 C1orf174 136<0.0022.473<0.002<0.0022.049<0.002
5 KRTAP23-1 212<0.0020.955<0.003<0.0020.586<0.004
6 ZNF14 1949<0.0020.625<0.0040.0120.7880.015
7 LOC101926911 1528<0.0020.526<0.0040.0080.8560.009
8 SGSH 17450.0021.8390.0010.0040.8420.005
9 UROD 160.0021.5230.0010.0401.8700.021
10 HES2 1130.0020.8580.0020.0160.6940.023
11 CHRNB1 17500.0082.4130.0030.0221.4220.015
12 ZBTB47 3420.0041.1060.0040.0761.4650.052
13 MIR4467 720.0040.8790.0050.0060.8480.007
14 GTF3A 13310.0060.9180.0070.0021.1780.002
15 HMGB4 1200.0121.5930.0080.0621.2480.050
16 TEKT2 1270.0121.4810.0080.0061.7820.003
17 COX8A 11120.0060.7220.0080.0061.0430.006
18 MED29 19270.0101.2020.0080.0260.6370.041
19 C11orf21 11320.0080.9510.0080.0381.1720.032
20 PRAMEF17 150.0141.6560.0080.0540.7210.075

Chr chromosome, p * weight adjusted p value for SBP, p value p value for SBP, # RVs number of rare variants identified in the gene, W weight for SBP

Subscript D represents statistics for DBP

Top 20 genes with adjusted p value of <0.05 from the Seq-aSum-VS test for either SNP or DBP Chr chromosome, p * weight adjusted p value for SBP, p value p value for SBP, # RVs number of rare variants identified in the gene, W weight for SBP Subscript D represents statistics for DBP We performed gene-set enrichment analysis for these 153 significant genes using GO biological process categories. The top 15 enriched gene sets with more than ten genes are reported in Table 2 with p value of less than 0.05. The results suggest that these reported 153 genes are involved in the regulation of blood pressure, and blood vessel size pathways (p value <0.03). Interestingly, these 153 blood pressure–associated genes are also significantly involved in sensory perception of sound. Hypertension has been clinically observed to be correlated with hearing loss [12]. The result could suggest genetic basis for the correlation between hypertension and hearing function. However, further study is needed to validate the findings in this study.
Table 2

Enriched GO biological processes (p value <0.05) for the top 153 blood pressure–associated genes

GOBPIDCountSizeTerm p Value
1GO:000760010107Sensory perception0.0031
2GO:0050873312Brown fat cell differentiation0.0069
3GO:0007605426Sensory perception of sound0.0109
4GO:004886926495Cellular developmental process0.0123
5GO:0003013657Circulatory system process0.0123
6GO:004298115239Regulation of apoptotic process0.0142
7GO:001250117288Programmed cell death0.0161
8GO:0008544779Epidermis development0.0173
9GO:0031424430Keratinization0.0180
10GO:001094115246Regulation of cell death0.0182
11GO:0007369318Gastrulation0.0220
12GO:0045638318Negative regulation of myeloid cell differentiation0.0220
13GO:0050880319Regulation of blood vessel size0.0255
14GO:0008217320Regulation of blood pressure0.0277
15GO:001626517312Death0.0331

GOBPID GO biological process ID

Enriched GO biological processes (p value <0.05) for the top 153 blood pressure–associated genes GOBPID GO biological process ID

Discussion

Many of the genes reported in Table 1 achieve smaller p values after the weight adjustment procedure in this analysis. In Table 1, we listed genes with weight-adjusted p values of less than 0.05. Multiple comparisons, such as Bonferroni threshold, could be adapted and applied using the weight-adjusted p values. In this analysis, the threshold is 0.05/6118 ≅ 8 × 10− 6 and none of the genes reported in Table 1 exceeded this stringent threshold value. In addition, we chose the Seq-aSum-VS test to obtain a p value for each gene in this paper; other approaches, such as the sequence kernel association test (SKAT) [5], can also be used to obtain the p value for each gene and the weighting scheme proposed in this study can then be used to calculated weight-adjusted p values. To obtain , we summed the total number of rare variants across all the loci in a gene in equation (1) by assuming all the rare variants have similar effects on E . An alternative approach to calculate is to replace equation (1) by the test statistic reported Seq-aSum-VS test while treating gene expression as the outcome. The alternative approach does not assume that all the rare variants have the same effects on E ; however, the alternative approach might be more computationally intensive. In this analysis, we did not consider genes with more than 50 rare variants. In a gene with a large number of rare variants, many of the variants might be null-variants, which will cause the estimated effects for genes with large numbers of rare variants to be diluted. For gene with a large number of rare variants, we suggest to use a moving window approach and only consider a feasible number of rare variants in a window. Furthermore, for data collected in a case–control design, our proposed approach is easily modified by logistic regression and applied. The weighting scheme proposed in this study is also usable when only a subset of the individuals have gene expression measurements available. It is also modifiable for when SNP, gene expression data and gene expression, phenotype data are from two different sets of cohorts, instead of paired gene expression and GWAS data from the same cohort. However, paired gene expression and GWAS data from the same cohort might be preferable, as the data will have increased power to detect the causative relationship (SNP→E→P) but not the reactive relationship (SNP→P→E) based on the simulation study described in [8]. In the data analysis, we observed genes with small p values without evidence of gene expression association. It is biologically possible that genetic variants could cause phenotypic changes without altering gene expression level. Thus in practice, we suggest to pursue genes with either (a) small p values from Seq-aSum-VS test or (b) small weight-adjusted p values.

Conclusions

In this paper, we proposed a novel approach to incorporate gene expression information into rare variant association analysis. Using the weight-adjustment approach, this method upweights the genes that contribute to phenotype-associated gene expression and downweights others. This weight-adjustment approach is expected to boost the power of association analysis by incorporating additional genomic information while keeping the FWER controlled at a nominal level. Both simulation studies and experimental findings reported in Li et al. [7] and Ho et al. [8] support the expected gain in power through the weight-adjustment procedure.
  11 in total

1.  Using linkage genome scans to improve power of association in genome scans.

Authors:  Kathryn Roeder; Silvi-Alin Bacanu; Larry Wasserman; B Devlin
Journal:  Am J Hum Genet       Date:  2006-01-03       Impact factor: 11.025

2.  Using GOstats to test gene lists for GO term association.

Authors:  S Falcon; R Gentleman
Journal:  Bioinformatics       Date:  2006-11-10       Impact factor: 6.937

3.  Potential etiologic and functional implications of genome-wide association loci for human diseases and traits.

Authors:  Lucia A Hindorff; Praveen Sethupathy; Heather A Junkins; Erin M Ramos; Jayashri P Mehta; Francis S Collins; Teri A Manolio
Journal:  Proc Natl Acad Sci U S A       Date:  2009-05-27       Impact factor: 11.205

4.  Comparison of statistical tests for disease association with rare variants.

Authors:  Saonli Basu; Wei Pan
Journal:  Genet Epidemiol       Date:  2011-07-18       Impact factor: 2.135

5.  Genome-Wide Significance Levels and Weighted Hypothesis Testing.

Authors:  Kathryn Roeder; Larry Wasserman
Journal:  Stat Sci       Date:  2009-11       Impact factor: 2.901

6.  Asymptotic tests of association with multiple SNPs in linkage disequilibrium.

Authors:  Wei Pan
Journal:  Genet Epidemiol       Date:  2009-09       Impact factor: 2.135

7.  Using gene expression to improve the power of genome-wide association analysis.

Authors:  Yen-Yi Ho; Emily C Baechler; Ward Ortmann; Timothy W Behrens; Robert R Graham; Tushar R Bhangale; Wei Pan
Journal:  Hum Hered       Date:  2014-07-30       Impact factor: 0.444

8.  The relation of hearing in the elderly to the presence of cardiovascular disease and cardiovascular risk factors.

Authors:  G A Gates; J L Cobb; R B D'Agostino; P A Wolf
Journal:  Arch Otolaryngol Head Neck Surg       Date:  1993-02

Review 9.  Missing heritability and strategies for finding the underlying causes of complex disease.

Authors:  Evan E Eichler; Jonathan Flint; Greg Gibson; Augustine Kong; Suzanne M Leal; Jason H Moore; Joseph H Nadeau
Journal:  Nat Rev Genet       Date:  2010-06       Impact factor: 53.242

10.  Using eQTL weights to improve power for genome-wide association studies: a genetic study of childhood asthma.

Authors:  Lin Li; Michael Kabesch; Emmanuelle Bouzigon; Florence Demenais; Martin Farrall; Miriam F Moffatt; Xihong Lin; Liming Liang
Journal:  Front Genet       Date:  2013-05-31       Impact factor: 4.599

View more
  1 in total

Review 1.  Emerging roles of rare and low-frequency genetic variants in type 1 diabetes mellitus.

Authors:  Haipeng Pang; Ying Xia; Shuoming Luo; Gan Huang; Xia Li; Zhiguo Xie; Zhiguang Zhou
Journal:  J Med Genet       Date:  2021-03-22       Impact factor: 6.318

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.