Literature DB >> 23755072

Using eQTL weights to improve power for genome-wide association studies: a genetic study of childhood asthma.

Lin Li1, Michael Kabesch, Emmanuelle Bouzigon, Florence Demenais, Martin Farrall, Miriam F Moffatt, Xihong Lin, Liming Liang.   

Abstract

Increasing evidence suggests that single nucleotide polymorphisms (SNPs) associated with complex traits are more likely to be expression quantitative trait loci (eQTLs). Incorporating eQTL information hence has potential to increase power of genome-wide association studies (GWAS). In this paper, we propose using eQTL weights as prior information in SNP based association tests to improve test power while maintaining control of the family-wise error rate (FWER) or the false discovery rate (FDR). We apply the proposed methods to the analysis of a GWAS for childhood asthma consisting of 1296 unrelated individuals with German ancestry. The results confirm that eQTLs are enriched for previously reported asthma SNPs. We also find that some SNPs are insignificant using procedures without eQTL weighting, but become significant using eQTL-weighted Bonferroni or Benjamini-Hochberg procedures, while controlling the same FWER or FDR level. Some of these SNPs have been reported by independent studies in recent literature. The results suggest that the eQTL-weighted procedures provide a promising approach for improving power of GWAS. We also report the results of our methods applied to the large-scale European GABRIEL consortium data.

Entities:  

Keywords:  asthma; eQTL; false discovery rate; family-wise error rate; genome-wide association study; weighted hypothesis test

Year:  2013        PMID: 23755072      PMCID: PMC3668139          DOI: 10.3389/fgene.2013.00103

Source DB:  PubMed          Journal:  Front Genet        ISSN: 1664-8021            Impact factor:   4.599


Introduction

Asthma is a disorder characterized by inflamed mucosa of small airways of lung, causing wheezing and shortness of breath (Moffatt et al., 2010). Among the most common chronic diseases of childhood, asthma has been reported to affect more than 10% of children in many westernized societies (Cookson, 2004). It is caused by a combination of genetic and environmental factors (Cookson, 2004; Moffatt et al., 2007), and several genome-wide association studies (GWAS) have been conducted to study the genetic basis underlying the complex disorder. More than 50 single nucleotide polymorphisms (SNPs) have been reported to be associated with asthma, according to the GWAS catalog (www.genome.gov/gwastudies, accessed on January 15, 2013). Remarkably, the recent report (Moffatt et al., 2010) from the GABRIEL (A Multidisciplinary Study to Identify the Genetic and Environmental Causes of Asthma in the European Community) consortium identified several SNPs reaching genome-wide significance through a large-scale meta-analysis. Prior biological information, often available in practice, has potential to increase power of GWAS. The common practice of GWAS, “agnostic” in some sense, assumes no prior information about any of the SNPs under investigation, meaning that all the SNPs have an equal likelihood of being causal. Some recent studies have taken advantage of information from linkage analysis (Roeder et al., 2006) and gene expression (Xiong et al., 2012) in genome-wide association scans. In genetic studies of etiology of asthma, it is of our particular interest to employ similar approaches and explore potentials of power gain in identifying asthma-associated SNPs by incorporating expression quantitative trait loci (eQTL) information. Catalogs of eQTLs in multiple tissues have been made publicly available, resulting from recent efforts of GWAS of gene expressions (Stranger et al., 2005, 2007, 2012; Dixon et al., 2007; Dimas et al., 2009; Yang et al., 2010). eQTLs provide insight into biology of transcription regulation. It has been shown that eQTLs are enriched for SNPs associated with complex diseases and traits using GWAS (Cookson et al., 2009; Nicolae et al., 2010). eQTL results can be used to provide functional interpretation for findings from GWAS (Moffatt et al., 2007; Heid et al., 2010; Hsu et al., 2010; Lango Allen et al., 2010; Speliotes et al., 2010; Chu et al., 2011; Wu et al., 2012) and prioritize genes in an association region for carrying out functional experiments using animal models (Teslovich et al., 2010). Focusing on eQTLs may also be useful to identify genetic pathways associated with the risk of complex diseases and traits, such as basal cell carcinoma in a skin cancer GWAS (Zhang et al., 2012) and type 2 diabetes (Zhong et al., 2010). Other results show that many cis eQTLs are shared across tissues (Ding et al., 2010) and that a comprehensive eQTL catalog in one tissue might be used to increase the power of capturing relevant transcripts for other diseases (including those that are only weakly or incidentally expressed in tissues where eQTL information was collected). As single-SNP analysis still remains the most popular in GWAS, we focus on those methods designed for this type of analysis. Single-SNP analysis tests one SNP at a time for association by scanning across the whole genome, and hence involves a large number of hypotheses. To correct for multiple comparisons, statistical methods have been proposed and applied to control for the family-wise error rate (FWER) (Bonferroni, 1936; Holm, 1979) or the false discovery rate (FDR) (Benjamini and Hochberg, 1995; Storey and Tibshirani, 2003). Recent advances in statistical methodology make it possible to incorporate prior information through weighted hypothesis testing. In several of such methods (Genovese et al., 2006; Roeder et al., 2006, 2007), hypotheses are up-weighted or down-weighted based on prior likelihood of association with the trait of interest. While keeping the FWER or FDR under control, the procedures can improve power with informative weights and suffer small loss in power with uninformative weights (Genovese et al., 2006; Roeder and Wasserman, 2009). This feature is appealing as compared to prescreening SNPs based on prior information (e.g., to consider only eQTLs for association testing). In this paper, we propose to use eQTLs as prior information, and apply these weighted hypothesis testing methods to reanalyze the MAGICS (Multicentre Asthma Genetics in Childhood Study) data of asthma GWAS (Moffatt et al., 2007) as well as the GABRIEL meta-study of asthma (Moffatt et al., 2010).

Results

Published asthma associations are enriched with eQTLs

We extracted published asthma associations from the GWAS catalog maintained by the National Human Genome Research Institute. As of January 15, 2013, 52 distinct reference SNPs in or near more than 40 genes have been reported to be associated with asthma (Table A1). According to the eQTL database (described in Materials and Methods), 20 of these 52 SNPs (38.5%) are eQTLs. Using the proxy SNP search tool SNAP (Johnson et al., 2008), we then obtained an extended list of 506 SNPs that were either in the GWAS catalog or in strong linkage disequilibrium (LD) with the 52 SNPs (r2≥ 0.8). We called all these 506 SNPs the extended set of asthma-associated SNPs.
Table A1

Published asthma associated SNPs in the GWAS catalog as of January 15, 2013.

RegionChrSNPContextGeneeQTL
1q21.31rs4129267IntronIL6R
1q23.11rs1101999IntronPYHIN1
1q21.31rs4845783IntergenicCRCT1
1q31.31rs2786098IntronDENND1B, CRB1
2q12.12rs13408661IntronIL1RL1, IL18R1
2q12.12rs9807989IntergenicIL18R1,IL1R1
2q12.12rs3771180IntronIL1RL1
2q12.12rs3771166IntronIL18R1Y
4q31.214rs7686660IntergenicLOC729675
4q31.214rs3805236IntronGAB1
5q31.35rs6867913IntergenicNDFIP1
5q31.15rs11745587IntronC5orf56Y
5q22.15rs1837253IntergenicTSLP
5q31.15rs1295686IntronIL13
5q31.15rs2073643IntronSLC22A5Y
5q31.15rs2244012IntronRAD50Y
5q12.15rs1588265IntronPDE4D
6p21.326rs9268516IntergenicBTNL2, HLA-DRA
6q276rs6456042IntergenicT
6p21.326rs9500927IntergenicHLA-DOAY
6p21.326rs9275698IntergenicHLA-DQA2Y
6p21.326rs7775228IntergenicHLA-DQB1Y
6p21.326rs3129890IntergenicHLA-DRA
6p21.326rs3117098IntergenicBTNL2Y
6p21.326rs204993IntronPBX2Y
6p21.326rs404860IntronNOTCH4
6p21.326rs3129943IntronC6orf10Y
6p21.326rs987870Intron; nearGene-5HLA, DPB1Y
6p21.326rs9273349IntergenicHLA-DQ
8q24.118rs3019885IntronSLC30A8
9p21.19rs10970976IntronACO1
9p24.19rs2381416IntergenicIL33
9p24.19rs1342326IntergenicIL33
10q21.110rs7922491IntronPRKG1
10p1410rs10508372IntergenicLOC338591
11q13.511rs7130588IntergenicLRRC32
11q23.211rs11214966IntergenicC11orf71
12q13.212rs1701704IntergenicIKZF4Y
12q13.212rs2069408IntronCDK2Y
13q21.3113rs3119939IntergenicPCDH20
15q22.215rs11071559IntronRORA
15q22.3315rs744910IntronSMAD3
15q21.215rs17525472IntergenicSCG3Y
17q1217rs4794820IntergenicORMDL3Y
17q1217rs11078927IntronGSDMBY
17q1217rs6503525IntergenicORMDL3Y
17q21.117rs3894194MissenseGSDMAY
17q1217rs2305480MissenseGSDMBY
17q1217rs7216389IntronORMDL3Y
19q13.4219rs16984547IntronZNF665
20p1320rs4815617nearGene-5KIAA1271
22q12.322rs2284033IntronIL2RB

“Y” in the eQTL column means the corresponding SNP is an eQTL SNP according to the eQTL database described in Materials and Methods. “nearGene-5” is an NCBI dbSNP function code, meaning that SNP is 5′ to and 2kb away from a gene.

We calculated an eQTL enrichment p-value (Hosack et al., 2003) using the MAGICS data. There are 300,821 SNPs that passed quality control in the MAGICS data. Among these SNPs, 29 SNPs are in the GWAS catalog, and 64 SNPs are among the 506 extended asthma-associated SNPs defined previously. To account for the LD between SNPs in the calculation of enrichment p-value, we conducted LD pruning with the r2 threshold of 0.8 on the 300,821 SNPs. This resulted in 251,826 SNPs and 38 of them are extended asthma-associated SNPs according to the GWAS catalog. According to the eQTL database, 22,922 SNPs (9.1% of 251,826 SNPs) are eQTLs, and 13 asthma associated SNPs (34.2% of 38 SNPs) are eQTLs. The corresponding enrichment p-value is 6.78 × 10−5, suggesting the asthma associations are enriched with eQTLs in the MAGICS data. Note that other analyses considered all the SNPs rather than the pruned set of SNPs. These results are in line with the previous findings (Nicolae et al., 2010), which studied the eQTLs in lymphoblastoid cell lines (LCL) from the HapMap samples and the GWAS catalog. Their results suggest that SNPs associated with complex traits are more likely to be eQTLs.

Weights using eQTL information

We calculated two kinds of weights using eQTL information for the MAGICS data. All the 300,821 SNPs passing quality control were considered. First, we defined a SNP as an eQTL SNP if it was labeled as an eQTL in the eQTL database (see details in Materials and Methods). There are 31,781 cis eQTL SNPs (10.6% of 300,821 SNPs) according to the definition, and for each of them, we retrieved an eQTL p-value peQTL. Next, we considered two choices of weights, the general weight and the binary weight. The general weight is for an eQTL SNP, and wg = 1 otherwise. The binary weight takes only two possible values, w = 3.70 for any eQTL SNP and w = 0.68 otherwise. The two values of the binary weight were chosen to maximize the minimum power while keeping at least 10.6% (also the percentage of eQTL SNPs) of all the hypotheses with a power of 60%. The parameters for calculating the binary weight are ϵ = 0.106, α = 0.05, and β = 0.4 (see details in Materials and Methods). Last, both weights were normalized to have the mean equal to 1 which is necessary for the weighted hypothesis testing methods to maintain the correct FWER or FDR (Genovese et al., 2006). After normalization, the general weight w has a mean of 2.44 and a median of 2.21 among eQTL SNPs, while the binary weight w is 3.70 for all eQTL SNPs (Figure 1).
Figure 1

Weights used in the MAGICS analysis. Each weight corresponds to a SNP and a hypothesis. The weights have been normalized to have mean 1 and shown in the ascending order. (A) The weights are based on the square root of −log10 peQTL where peQTL is the eQTL p-value; (B) the weights take only two possible values, which are decided using the method described in Materials and Methods.

Weights used in the MAGICS analysis. Each weight corresponds to a SNP and a hypothesis. The weights have been normalized to have mean 1 and shown in the ascending order. (A) The weights are based on the square root of −log10 peQTL where peQTL is the eQTL p-value; (B) the weights take only two possible values, which are decided using the method described in Materials and Methods.

Weighted hypothesis testing

We applied the weighted hypothesis testing methods (Genovese et al., 2006; Roeder and Wasserman, 2009) using the general weight w and the binary weight w to the MAGICS data. For each of the 300,821 SNPs, we calculated the trait association p-value, p, from the single-SNP association test on the phenotypes of asthma status, as well as the weighted p-values Q = p/w and Q = p/w. Multiple testing adjustments were done for both the original p-values (p) and the weighted p-values (Q and Q). Bonferroni (1936) and Holm's (1979) methods were considered to control for FWER, and Benjamini and Hochberg's (1995) method was used to control for FDR. We first ranked the SNPs using their p-values in the ascending order, and compared the ranks based on the weighted p-values with those based on the original p-values (Figure 2). Since only 10.6% SNPs are eQTL SNPs according to the eQTL database, and hypotheses for eQTL SNPs are up-weighted, eQTLs generally have higher ranks after weighting, and non-eQTLs' ranks are lower but the magnitude of changes is small. This is true for both the general and binary weights. When restricting to the 29 asthma-associated SNPs reported in the GWAS catalog, we also observed similar behaviors, suggesting that weighting hypotheses may improve power using informative weights, and sacrifice a little power using uninformative weights (Roeder and Wasserman, 2009).
Figure 2

Rankings of the SNPs based on original Original ranks of eQTLs compared to their new ranks based on the general weight; (B) original ranks of non-eQTLs compared to their new ranks based on the general weight; (C) original ranks of eQTLs compared to their new ranks based on the binary weight; (D) original ranks of non-eQTLs compared to their new ranks based on the binary weight. The black circles represent the reported asthma-associated SNPs in the GWAS catalog, and the gray circles represent the rest of the SNPs in the data.

Rankings of the SNPs based on original Original ranks of eQTLs compared to their new ranks based on the general weight; (B) original ranks of non-eQTLs compared to their new ranks based on the general weight; (C) original ranks of eQTLs compared to their new ranks based on the binary weight; (D) original ranks of non-eQTLs compared to their new ranks based on the binary weight. The black circles represent the reported asthma-associated SNPs in the GWAS catalog, and the gray circles represent the rest of the SNPs in the data. We then looked at the Q–Q plots of the original and weighted p-values. For p-values greater than 0.0001, the Q–Q curves (Figure 3) are similar between the original and weighted p-values, regardless of the weights used. For those p-values less than 0.0001, some weighted p-values are smaller than original ones, and the difference is larger using the binary weight. We also observed that 3 asthma-associated SNPs in the GWAS catalog are among the top SNPs with original p-values less than 10−6. The weighted p-values for all the 3 asthma-associated SNPs are smaller than original ones.
Figure 3

Q–Q plots of original The weighted p-values are based on (A) the general weight, or (B) the binary weight. The reported asthma-associated SNPs in the GWAS catalog are shown in circles.

Q–Q plots of original The weighted p-values are based on (A) the general weight, or (B) the binary weight. The reported asthma-associated SNPs in the GWAS catalog are shown in circles. Next, we applied the methods to control for the FWER. An effective ratio of 0.791 (Li et al., 2012) was used to calculate the effective number of SNPs (300,821 × 0.791). Controlling for an FWER level of 0.05, we obtained significant SNPs using both the original and weighted p-values. Both Bonferroni and Holm's methods gave the same results, and both weights (binary and general) also gave the same results (Tables A2, A3). The unweighted hypothesis testing claimed 6 SNPs to be significant, all on chromosome 17, including 2 asthma-associated SNPs (rs3894194 with GSDMA, and rs7216389 with ORDML3) that have been reported previously (Moffatt et al., 2007, 2010). After applying the weighted hypothesis testing, we obtained 9 significant SNPs including all the 6 SNPs identified by the unweighted method, although the ranks are not exactly the same. The 3 SNPs additionally identified by eQTL weighting were rs3902025, rs4795405, and rs2305480. The SNP rs2305480, a missense SNP in the gene GSDMB, was not reported in the previous GWAS study (Moffatt et al., 2007) but has been reported as an asthma-associated SNP in a later larger scale study by the GABRIEL consortium (including the MAGIC data, Moffatt et al., 2010) and was found to be strongly interacting with exposure to tobacco smoke in early life (Bouzigon et al., 2008). We found that rs2305480 is actually in LD (r2 = 0.702, D′ = 0.926) with rs7216389 that was identified by the unweighted methods, suggesting that rs2305480 may not represent a new association. Using a stringent r2 threshold of 0.4, we found that the other two SNPs, rs3902025 and rs4795405, are also in LD with at least one SNP identified by the unweighted methods. So there is no new association identified by the weighted methods in this particular analysis.
Table A2

The significant SNPs identified by the unweighted method and the weighted methods (using general weight or binary weight) in the MAGICS analysis.

RankUnweighted methodGeneral weightBinary weight
ChrSNPp-valueChrSNPQgChrSNPQb
117rs3894194*4.01E-0917rs3894194*1.68E-0917rs3894194*1.08E-09
217rs8079416*6.86E-0917rs8079416*2.70E-0917rs8079416*1.85E-09
317rs4795408*2.40E-0817rs4795408*9.66E-0917rs4795408*6.49E-09
417rs38591923.58E-0817rs2290400*1.75E-0817rs2290400*2.00E-08
517rs2290400*7.39E-0817rs7216389*1.76E-0817rs7216389*2.02E-08
617rs7216389*7.48E-0817rs38591924.31E-0817rs38591925.25E-08
717rs3902025*1.05E-0717rs3902025*7.44E-08
817rs4795405*1.07E-0717rs4795405*9.83E-08
917rs2305480*1.40E-0717rs2305480*1.54E-07

Bonferroni's correction is used with an FWER level of 0.05.

eQTL.

Table A3

The significant SNPs identified by the unweighted method and the weighted methods (using general weight or binary weight) in the MAGICS analysis.

RankUnweighted methodGeneral weightBinary weight
ChrSNPp-valueChrSNPQgChrSNPQb
117rs3894194*4.01E-0917rs3894194*1.68E-0917rs3894194*1.08E-09
217rs8079416*6.86E-0917rs8079416*2.70E-0917rs8079416*1.85E-09
317rs4795408*2.40E-0817rs4795408*9.66E-0917rs4795408*6.49E-09
417rs38591923.58E-0817rs2290400*1.75E-0817rs2290400*2.00E-08
517rs2290400*7.39E-0817rs7216389*1.76E-0817rs7216389*2.02E-08
617rs7216389*7.48E-0817rs38591924.31E-0817rs38591925.25E-08
717rs3902025*1.05E-0717rs3902025*7.44E-08
817rs4795405*1.07E-0717rs4795405*9.83E-08
917rs2305480*1.40E-0717rs2305480*1.54E-07

Holm's method is used with an FWER level of 0.05.

eQTL.

Besides controlling for FWER, we also used Benjamini and Hochberg's (BH) procedure (Benjamini and Hochberg, 1995) to control for a FDR level of 0.05 (Table A4). Based on the original p-values without weighting, the BH procedure gave 11 positive results (SNPs). The weighted BH procedures based on the general weight and the binary weight resulted in 7 and 8 additional positive results (SNPs), respectively. Using a stringent r2 threshold of 0.4, we found that 5 SNPs (Table 1) are not in LD with any of the SNPs identified without weighting. Although none of the 5 SNPs are, or in LD with, any asthma-associated SNPs according to the GWAS catalog, there are some SNPs that seem interesting. Some of the SNPs are in or close to the genes PGAP3 and STARD3 on chromosome 17, and interestingly, rs2941504 has been reported in a recent independent study (Anantharaman et al., 2011) to be associated with asthma, although it does not meet the criteria for inclusion in the GWAS catalog. This suggests that the reanalysis using the eQTL weighting approaches is promising and potentially useful.
Table A4

The positive results (SNPs) given by the unweighted method and the weighted methods (using general weight or binary weight) in the MAGICS analysis.

RankUnweighted methodGeneral weightBinary weight
ChrSNPp-valueChrSNPQgChrSNPQb
117rs3894194*4.01E-0917rs3894194*1.68E-0917rs3894194*1.08E-09
217rs8079416*6.86E-0917rs8079416*2.70E-0917rs8079416*1.85E-09
317rs4795408*2.40E-0817rs4795408*9.66E-0917rs4795408*6.49E-09
417rs38591923.58E-0817rs2290400*1.75E-0817rs2290400*2.00E-08
517rs2290400*7.39E-0817rs7216389*1.76E-0817rs7216389*2.02E-08
617rs7216389*7.48E-0817rs38591924.31E-0817rs38591925.25E-08
717rs3902025*2.75E-0717rs4795405*1.07E-0717rs3902025*7.44E-08
817rs4795405*3.64E-0717rs3902025*1.05E-0717rs4795405*9.83E-08
917rs2305480*5.68E-0717rs2305480*1.40E-0717rs2305480*1.54E-07
1017rs115574672.19E-0617rs8067378*4.58E-0717rs8067378*5.45E-07
1117rs8067378*2.02E-0617rs9303277*7.94E-0717rs9303277*9.30E-07
1217rs1877031*1.54E-0617rs1877031*1.17E-06
1317rs907092*1.56E-0617rs931992*1.52E-06
1417rs931992*1.96E-0617rs9070921.68E-06
1517rs2941504*2.68E-0617rs2941504[*]2.06E-06
1617rs1565922[*]2.55E-0617rs1565922[*]1.97E-06
1717rs115574672.64E-0610rs11191325[*]2.29E-06
1810rs11191325[*]3.37E-0617rs115574673.22E-06
1917rs1007654[*]3.11E-06

The BH procedure is used with an FDR level of 0.05.

eQTL.

Table 1

Additional significant SNPs or positive results identified by eQTL weighting methods after accounting for linkage disequilibrium in the MAGICS analysis.

ChrSNPGenep-valueQgQbMethod
17rs1877031STARD34.32 × 10−61.54 × 10−61.17 × 10−6BH
17rs931992TCAP, STARD35.61 × 10−61.96 × 10−61.52 × 10−6BH
17rs1565922PGAP37.28 × 10−62.55 × 10−61.97 × 10−6BH
17rs2941504PGAP37.64 × 10−62.68 × 10−62.06 × 10−6BH
10rs11191325SUFU8.49 × 10−63.37 × 10−62.29 × 10−6BH

.

Additional significant SNPs or positive results identified by eQTL weighting methods after accounting for linkage disequilibrium in the MAGICS analysis. .

Reanalysis of the GABRIEL data

As another application, we reanalyzed the GABRIEL data using the eQTL weighted approaches. Since only the p-values are necessary for the use of eQTL weighting, we took the p-values of the meta analysis of 37 studies that were calculated based on imputed data. In total, there are 2,473,850 SNPs and their p-values available in the GABRIEL study, which include 267,350 out of 268,204 eQTL SNPs in the eQTL database. The weights based on eQTL information were calculated in the similar way to the MAGICS data analysis. We applied Bonferroni and Holm's methods with an FWER level of 0.05, as well as the BH procedure with an FDR level of 0.05. An effective ratio of 0.30 (Li et al., 2012) was used to calculate the effective number of SNPs (2,473,850 × 0.30). After obtaining the lists of significant SNPs using different methods, we report any SNPs identified by eQTL weighting that are not in LD with any SNPs identified by unweighted methods using an r2 threshold of 0.4. Such SNPs may be informative and suggest new associations. Tables 2 and 3 show the SNPs that were identified based on the general weight and the binary weight, respectively.
Table 2

Additional significant SNPs or positive results identified by eQTL weighting methods after accounting for linkage disequilibrium in the GABRIEL analysis.

ChrSNPGenep-valueQgQbMethod
5rs244749MEF2C6.25 × 10−52.51 × 10−51.54 × 10−5BH
5rs100759416.13 × 10−53.03 × 10−51.51 × 10−5BH
17rs75031957.66 × 10−53.07 × 10−51.88 × 10−5BH
17rs176374726.13 × 10−53.56 × 10−51.51 × 10−5BH
9rs70475756.87 × 10−53.92 × 10−51.69 × 10−5BH

The results are based on the general weight.

Table 3

Additional significant SNPs or positive results identified by eQTL weighting methods after accounting for linkage disequilibrium in the GABRIEL analysis.

ChrSNPGenep-valueQgQbMethod
5rs7368012.15 × 10−78.81 × 10−85.29 × 10−8Bonferroni, Holm
6rs2596450HCG262.72 × 10−71.16 × 10−76.69 × 10−8Bonferroni, Holm
5rs100759416.13 × 10−53.03 × 10−51.51 × 10−5BH
17rs176374726.13 × 10−53.56 × 10−51.51 × 10−5BH
5rs244749MEF2C6.25 × 10−52.51 × 10−51.54 × 10−5BH
9rs70475756.87 × 10−53.92 × 10−51.69 × 10−5BH
17rs75031957.66 × 10−53.07 × 10−51.88 × 10−5BH
6rs9273363HLA_DQB18.38 × 10−54.41 × 10−52.06 × 10−5BH
5rs43511829.98 × 10−54.91 × 10−52.45 × 10−5BH
2rs133917941.01 × 10−45.62 × 10−52.49 × 10−5BH
5rs10044342MEF2C1.10 × 10−44.59 × 10−52.71 × 10−5BH
2rs67511961.14 × 10−46.22 × 10−52.80 × 10−5BH
6rs176095PBX2, GPSM31.25 × 10−45.12 × 10−53.08 × 10−5BH
2rs26750731.34 × 10−45.40 × 10−53.29 × 10−5BH
2rs19136211.46 × 10−47.92 × 10−53.60 × 10−5BH
2rs10497621NUP351.59 × 10−47.55 × 10−53.91 × 10−5BH
5rs244750MEF2C1.70 × 10−47.08 × 10−54.17 × 10−5BH
6rs9366689POM121L21.72 × 10−48.73 × 10−54.23 × 10−5BH
6rs77757591.81 × 10−46.43 × 10−54.46 × 10−5BH
6rs77410911.81 × 10−46.43 × 10−54.46 × 10−5BH
8rs66016491.90 × 10−49.68 × 10−54.68 × 10−5BH
6rs2049941.94 × 10−48.02 × 10−54.76 × 10−5BH

The results are based on the binary weight.

Additional significant SNPs or positive results identified by eQTL weighting methods after accounting for linkage disequilibrium in the GABRIEL analysis. The results are based on the general weight. Additional significant SNPs or positive results identified by eQTL weighting methods after accounting for linkage disequilibrium in the GABRIEL analysis. The results are based on the binary weight.

Size simulations on FWER

We conducted simulations using 5000 permutations based on the MAGICS data, and calculated the percentage of having at least one false positive claimed by Bonferroni and Holm's methods (α = 0.05, with an effective ratio of 0.791). In fact, any SNPs claimed significant using the two methods would be a false positive. The calculated percentages (Table 4) provide estimates of the FWER. Bonferroni and Holm's methods give the same results. The results suggest that, under the null hypothesis, the FWER level is controlled for the methods based on both the original (unweighted) and the weighted p-values. The simulations confirm the validity of the weighted hypothesis method (Genovese et al., 2006).
Table 4

Family-wise error rate estimates in 5000 permutations.

P-valueFWER
Original0.0454
Weighted by general weight0.0458
Weighted by binary weight0.0460

Both Bonferroni and Holm's methods gave the same results in the same scenarios.

Family-wise error rate estimates in 5000 permutations. Both Bonferroni and Holm's methods gave the same results in the same scenarios.

Discussion

It is of substantial interest to enhance the power for identifying associations in the era of post-GWAS. Besides meta-analysis that has been proved successful in power gain (Moffatt et al., 2010), incorporating prior information has also received increasing attention. Such information can be obtained from various sources and levels, such as linkage analysis (Roeder et al., 2006), gene expression (Yang et al., 2010), and annotation information of variants (Adzhubei et al., 2010), genes (Saccone et al., 2007), and pathways (Wang et al., 2007). The so-called “agnostic” GWAS may benefit from incorporating useful prior information. In our study of asthma, gene expression information is of particular interest, as a recent study (Moffatt et al., 2007) identified several eQTLs associated with asthma. In the reanalysis of the MAGICS data (Moffatt et al., 2007), we applied recently developed statistical methods that can improve power by weighting hypothesis (Genovese et al., 2006; Roeder and Wasserman, 2009). Using eQTL information obtained from an independent dataset, we employed weighted procedures that up-weighted eQTL SNPs and down-weighted non-eQTL SNPs while controlling for the FWER or the FDR. It has been proved (Genovese et al., 2006) that any set of nonnegative weights can guarantee substantial power gain given informative weights and little power loss for uninformative weights. The property implies that the weighted procedures are robust to informativeness of weights and to the uneven coverage of genes and expression targets on the genome. We took advantage of this robustness and applied the procedures to an asthma study. We found additional SNPs that were significantly associated with asthma according to the weighting hypothesis methods. Some of them were interesting after we accounted for LD and compared them to literature. Our analysis was the first application of this approach to asthma GWAS studies, and the results successfully illustrated the use of eQTL weighting in the context of asthma studies. As another application, we also reanalyzed the GABRIEL meta-analysis p-values and reported corresponding results. It is noted that the weighted procedures can utilize eQTL information from a reference database. Multiple choices of eQTL databases have already been made available (e.g., Yang et al., 2010; Liang et al., 2013), and future efforts may provide even better reference of eQTL information. For example, the eQTL information considered in our reanalysis was obtained through a single platform (Affymetrix HG-U133 Plus 2.0), and better coverage of gene expression profiling may be achieved through RNA-Seq technologies or by combining information from various platforms. Besides the weighted procedures, an alternative method of using eQTL information is to simply test association between eQTL SNPs and the trait of interest. Such a method is not recommended in GWAS as it excludes non-eQTL completely and relies on the prior information too heavily. By contrast, weighted procedures make it possible to consider eQTLs and non-eQTLs simultaneously. More importantly, they can possibly increase power if the prior information is useful and are able to maintain the type I error under the null. Applying the weighted procedures in our reanalysis only requires p-value of eQTL SNPs. This flexibility means that such analyses can be applied to any existing GWAS data, even if they do not have accompanying gene expression data. Although gene expression may have tissue-specific patterns, a substantial fraction of eQTLs may be shared across tissues (Ding et al., 2010). Hence eQTLs developed from tissues that are not directly relevant to the outcome of interest, such as those from publicly available eQTL databases based on LCL, can be used to improve power on GWAS. It is possible that using eQTL information from relevant tissues may result in even more power gain, if such information is available. Besides the particular weighting hypothesis method (Genovese et al., 2006) we adopted, Bayesian methods are potentially alternative strategies to incorporate eQTL information. The use of Bayes factors has been applied to genetic association studies (Wellcome Trust Case Control Consortium, 2007; Stephens and Balding, 2009). In single SNP analysis, a prior is assumed for each SNP effect [e.g., N(0, 0.22) under a model of association in Wellcome Trust Case Control Consortium (2007)]. eQTL information can be naturally incorporated into the prior, although it may be challenging to choose a realistic yet tractable alternative model and to assess error rates (Hoggart et al., 2008), especially with the eQTL weight. One possible choice is through modifying the variance of the prior, for example assuming a prior N(0.22w), where w is a weight of eQTL signal. Another possible choice is to keep the variance the same and increase the probability of association for eQTLs a priori. It is of interest in future research to explore these possibilities and consider the extension of Bayesian methods to incorporate eQTL information. In our analysis, we took into account the LD between SNPs by considering the effective number of SNPs (Li et al., 2012). As an alternative, testing SNP sets for association has potential of improving power and reducing the correlation between tests. Since the focus of this paper is to demonstrate the use of eQTL information in association testing, we will consider the weighted correlated hypothesis in future research. Two choices of weights were applied in our analysis including a binary weight and a weight using strength of eQTLs, and the results using the two weights were similar in our analysis. Theoretical results exist (Roeder and Wasserman, 2009) for the optimal binary weight, which provide guidance in choosing the values of the weight. The weight taking advantage of the eQTL strength may possibly provide more useful information, and what is the best choice of weights is still under research. Through an application to an asthma GWAS, we demonstrated the usefulness of eQTL weights in GWAS. Although results may vary depending on the traits of interest and the underlying biological mechanism, the potentials of increasing power and little investment required for reanalysis make the eQTL-weighted procedures desirable for reanalysis of existing GWAS data and useful for design and analysis of future studies.

Materials and methods

The MAGICS asthma GWAS samples and data

The MAGICS (Multicentre Asthma Genetics in Childhood Study) study data (Moffatt et al., 2007), part of the GABRIEL consortium, were reanalyzed by incorporating eQTL information. Quality control procedures were conducted similarly to a published protocol (Anderson et al., 2010). Individuals with missing phenotypes, elevated missing rates (≥ 5%), or outlying heterozygosity rate were removed. Markers with an excessive missing rate (≥ 5%), low MAF (<5%), or failing in the HWE test (p-value < 10−5) were all excluded as well. The remaining dataset contains 1296 individuals (647 affected and 649 unaffected) genotyped across 300,821 SNPs. To account for possible divergent ancestry and population stratification, principal component analysis (PCA) was conducted using EIGENSOFT 4.2 (Patterson et al., 2006; Price et al., 2006). The genotype data were pruned for LD prior to the PCA. The PCA result (Figure A1) suggests that no obvious stratification exists, and the signal of the first principal component is very weak. In the subsequent analysis, we still included the first principal component as a covariate.
Figure A1

Principal components analysis for the MAGICS study. The scatter plots of (A) PC1 vs. PC2, and (B) PC2 vs. PC3 are shown, with each circle corresponding to an individual in the asthma GWAS.

LD pruning was considered only in the calculation of enrichment p-value. It was conducted using PLINK (v1.07, downloaded from http://pngu.mgh.harvard.edu/purcell/plink/) (Purcell et al., 2007). A moving window with a width of 50 SNPs and a step size of 5 SNPs was considered, and pairwise LDs were calculated and pruned if r2 > 0.8 (corresponding PLINK arguments: “–indep-pairwise 50 5 0.8”).

The GABRIEL meta-analysis p-values

Association testing results, including SNP ID and p-values, were obtain from a reanalysis of the GABRIEL consortium data using imputed SNPs (Bouzigon et al., personal communication). The meta-analysis considered imputation of SNP genotypes using the HapMap 2 reference data for 37 studies, and calculated a meta-analysis p-value for each SNP using available data. Imputed SNPs were kept for analysis if their imputation scores (Rsq) were ≥ 0.5 and if their minor allele frequencies were ≥ 1%. In total there were 2,473,850 SNPs that passed the quality control. Only the SNP ID and the p-values of these SNPs were obtained and used for the reanalysis described in this paper.

Expression quantitative trait loci data

An eQTL database (http://www.hsph.harvard.edu/liming-liang/software/eqtl/) resulting from an independent dataset was used as prior information to be incorporated in the GWAS. The sample contains 405 siblings from a panel of families of British descent (MRC-A) (Dixon et al., 2007). Global gene expression in LCLs was measured using Affymetrix HG-U133 Plus 2.0 chips. All siblings were genotyped using the Illumina Sentrix HumanHap300 BeadChip (ILMN300K) and/or the Illumina Sentrix Human-1 Genotyping BeadChip (ILMN100K). The SNP genotype data were further imputed using the MaCH program, and each SNP was tested for association with probes in the gene expression data. Restricting to cis eQTLs (1 Mb region) and controlling for the FDR of 1%, there are 515,947 tests with logarithm of odds (LOD) scores greater than 3.172, corresponding to 268,204 unique SNPs. In case a SNP has multiple p-values reported for associations with different probes, the minimal p-value was used for that SNP. These 268,204 SNPs are considered as eQTLs, and the database contains information of their physical positions, LOD scores, p-values, and residing or nearby genes. Details of the database are described by Liang et al. (2013).

Genetic association analysis

Genetic association analysis of the MAGICS data was conducted in PLINK. Logistic regression was used to test for disease-trait SNP association while adjusted for gender and the first principal component. Meta-analysis on GABRIEL data was carried out by combining association results from 37 studies using a random effect model, and all computations were done using Stata software.

p-value weighting methods

Consider m hypotheses H1, …, H and their test statistic p-values, P1, …, P. Suppose there are weights W1, …, W available for the m tests, respectively, satisfying W > 0 and ∑W = m. Define Q = P/W and let Q(1) ≤ … ≤ Q( be the sorted values. Let P(1), …, P( and W(1), …, W( be the values in the corresponding order. Q is sometimes referred to a “weighted p-value” (e.g., Roeder and Wasserman, 2009), although it is not a p-value. The weighted Bonferroni procedure is to reject any hypothesis H (1 ≤ j ≤ m) that satisfies Q ≤ α/m, where α is the desired level of FWER. Genovese et al. (2006) showed that this procedure controls FWER at level no greater than α. Holm's weighted procedure (1979) is carried out as follows: given the desired α level of FWER, if Q(1) ≥ α/m, no hypothesis is rejected; otherwise, find the largest j that satisfies Q( ≤ α/∑W( for all i ≤ j, and reject the hypotheses corresponding to the j smallest Q's. Genovese et al. (2006) also prove that this procedure can work for a general setting of weights. We also consider Benjamini and Hochberg's procedure (1995) for controlling FDR. Given the desired level α, find the largest j such that Q( ≤ α · j/m, and reject the hypotheses corresponding to the j smallest Q's. Genovese et al. (2006) prove that this procedure controls FDR at level α.

eQTL information as weights

The eQTL p-values were used to construct weights for the SNPs in the asthma GWAS reanalysis. We considered two kinds of weights, the binary weight w and the general weight w. The binary weight takes only two possible values that are predefined, denoted by weQTL and wnon−eQTL. For m hypotheses, a binary weight is defined as w = (w, …, w) where w = weQTL if the jth SNP is an eQTL SNP, and w = wnon−eQTL if it is not an eQTL SNP. Given the values of α, β, and ϵ, the optimal values of weQTL and wnon−eQTL were chosen (Roeder and Wasserman, 2009) to maximize the minimum power among all the hypotheses while having at least a fraction ϵ with high power 1−β. Here α is either the level of FWER or FDR. We also considered a general weight, where the weight w = (w, …, w) has if the jth SNP is an eQTL SNP with the eQTL p-value peQTL, and w = 1 otherwise. The particular form was intuitively chosen prior to the reanalysis of the GWAS data in consideration of avoiding up-weighting top eQTL SNPs too much. Both w and w were then normalized such that the means equal to 1, i.e., and .

Reported associations in the GWAS catalog

Asthma-associated SNPs and genes reported in publications were retrieved from the online catalog of published GWAS on January 15, 2013. The catalog limits the associations to those with p-values less than 1.0 × 10−5 and records only one SNP with a gene or region of high LD unless there was evidence of independent association. The reported associations were compared against the findings in the asthma GWAS data we reanalyzed.

Linkage disequilibrium information

To account for LD between SNPs, LD information based on HapMap 2 was obtained. The SNAP proxy search tool (http://www.broadinstitute.org/mpg/snap/ldsearch.php) was used to obtain the information, based on the HapMap 2 (rel22) reference and a distance limit of 500kb.

Size simulation using the asthma GWAS data

Besides analyzing the MAGICS asthma GWAS data, we also conducted size simulations by permuting the disease status in the data. Logistic regression was considered where the dependent variable was the disease status (affected or unaffected) and the independent variables included a single SNP effect, gender, and the first principal component. The regression was applied to all the ~300,000 SNPs across the whole genome. Five thousand permutations were done by permuting the disease status among all the individuals, and then the model was refitted for each SNP. In the end of simulations, 5000 permutation p-values were obtained for each of the ~300,000 SNPs.

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
  43 in total

1.  Principal components analysis corrects for stratification in genome-wide association studies.

Authors:  Alkes L Price; Nick J Patterson; Robert M Plenge; Michael E Weinblatt; Nancy A Shadick; David Reich
Journal:  Nat Genet       Date:  2006-07-23       Impact factor: 38.330

2.  A genome-wide association study of global gene expression.

Authors:  Anna L Dixon; Liming Liang; Miriam F Moffatt; Wei Chen; Simon Heath; Kenny C C Wong; Jenny Taylor; Edward Burnett; Ivo Gut; Martin Farrall; G Mark Lathrop; Gonçalo R Abecasis; William O C Cookson
Journal:  Nat Genet       Date:  2007-09-16       Impact factor: 38.330

3.  Effect of 17q21 variants and smoking exposure in early-onset asthma.

Authors:  Emmanuelle Bouzigon; Eve Corda; Hugues Aschard; Marie-Hélène Dizier; Anne Boland; Jean Bousquet; Nicolas Chateigner; Frédéric Gormand; Jocelyne Just; Nicole Le Moual; Pierre Scheinmann; Valérie Siroux; Daniel Vervloet; Diana Zelenika; Isabelle Pin; Francine Kauffmann; Mark Lathrop; Florence Demenais
Journal:  N Engl J Med       Date:  2008-10-15       Impact factor: 91.245

4.  SNAP: a web-based tool for identification and annotation of proxy SNPs using HapMap.

Authors:  Andrew D Johnson; Robert E Handsaker; Sara L Pulit; Marcia M Nizzari; Christopher J O'Donnell; Paul I W de Bakker
Journal:  Bioinformatics       Date:  2008-10-30       Impact factor: 6.937

5.  A genome-wide association study identifies two new risk loci for Graves' disease.

Authors:  Xun Chu; Chun-Ming Pan; Shuang-Xia Zhao; Jun Liang; Guan-Qi Gao; Xiao-Mei Zhang; Guo-Yue Yuan; Chang-Gui Li; Li-Qiong Xue; Min Shen; Wei Liu; Fang Xie; Shao-Ying Yang; Hai-Feng Wang; Jing-Yi Shi; Wei-Wei Sun; Wen-Hua Du; Chun-Lin Zuo; Jin-Xiu Shi; Bing-Li Liu; Cui-Cui Guo; Ming Zhan; Zhao-Hui Gu; Xiao-Na Zhang; Fei Sun; Zhi-Quan Wang; Zhi-Yi Song; Cai-Yan Zou; Wei-Hua Sun; Ting Guo; Huang-Ming Cao; Jun-Hua Ma; Bing Han; Ping Li; He Jiang; Qiu-Hua Huang; Liming Liang; Li-Bin Liu; Gang Chen; Qing Su; Yong-De Peng; Jia-Jun Zhao; Guang Ning; Zhu Chen; Jia-Lun Chen; Sai-Juan Chen; Wei Huang; Huai-Dong Song
Journal:  Nat Genet       Date:  2011-08-14       Impact factor: 38.330

6.  Data quality control in genetic case-control association studies.

Authors:  Carl A Anderson; Fredrik H Pettersson; Geraldine M Clarke; Lon R Cardon; Andrew P Morris; Krina T Zondervan
Journal:  Nat Protoc       Date:  2010-08-26       Impact factor: 13.491

7.  Identifying biological themes within lists of genes with EASE.

Authors:  Douglas A Hosack; Glynn Dennis; Brad T Sherman; H Clifford Lane; Richard A Lempicki
Journal:  Genome Biol       Date:  2003-09-11       Impact factor: 13.583

8.  Genome-wide association study identifies PERLD1 as asthma candidate gene.

Authors:  Ramani Anantharaman; Anand Kumar Andiappan; Pallavi Parate Nilkanth; Bani Kaur Suri; De Yun Wang; Fook Tim Chew
Journal:  BMC Med Genet       Date:  2011-12-21       Impact factor: 2.103

9.  Patterns of cis regulatory variation in diverse human populations.

Authors:  Barbara E Stranger; Stephen B Montgomery; Antigone S Dimas; Leopold Parts; Oliver Stegle; Catherine E Ingle; Magda Sekowska; George Davey Smith; David Evans; Maria Gutierrez-Arcelus; Alkes Price; Towfique Raj; James Nisbett; Alexandra C Nica; Claude Beazley; Richard Durbin; Panos Deloukas; Emmanouil T Dermitzakis
Journal:  PLoS Genet       Date:  2012-04-19       Impact factor: 5.917

10.  Population structure and eigenanalysis.

Authors:  Nick Patterson; Alkes L Price; David Reich
Journal:  PLoS Genet       Date:  2006-12       Impact factor: 5.917

View more
  30 in total

1.  Comprehensive evaluation of disease- and trait-specific enrichment for eight functional elements among GWAS-identified variants.

Authors:  Christina A Markunas; Eric O Johnson; Dana B Hancock
Journal:  Hum Genet       Date:  2017-05-31       Impact factor: 4.132

2.  Genetic Factors Interact With Tobacco Smoke to Modify Risk for Inflammatory Bowel Disease in Humans and Mice.

Authors:  Pankaj Yadav; David Ellinghaus; Gaëlle Rémy; Sandra Freitag-Wolf; Anabelle Cesaro; Frauke Degenhardt; Gabrielle Boucher; Myriam Delacre; Laurent Peyrin-Biroulet; Muriel Pichavant; John D Rioux; Philippe Gosset; Andre Franke; L Philip Schumm; Michael Krawczak; Mathias Chamaillard; Astrid Dempfle; Vibeke Andersen
Journal:  Gastroenterology       Date:  2017-05-12       Impact factor: 22.682

Review 3.  Genetics of allergic diseases.

Authors:  Romina A Ortiz; Kathleen C Barnes
Journal:  Immunol Allergy Clin North Am       Date:  2014-11-21       Impact factor: 3.479

4.  eQTL of bronchial epithelial cells and bronchial alveolar lavage deciphers GWAS-identified asthma genes.

Authors:  X Li; A T Hastie; G A Hawkins; W C Moore; E J Ampleford; J Milosevic; H Li; W W Busse; S C Erzurum; N Kaminski; S E Wenzel; D A Meyers; E R Bleecker
Journal:  Allergy       Date:  2015-07-24       Impact factor: 13.146

5.  Weighted False Discovery Rate Control in Large-Scale Multiple Testing.

Authors:  Pallavi Basu; T Tony Cai; Kiranmoy Das; Wenguang Sun
Journal:  J Am Stat Assoc       Date:  2018-06-12       Impact factor: 5.033

6.  CLIN_SKAT: an R package to conduct association analysis using functionally relevant variants.

Authors:  Amrita Chattopadhyay; Ching-Yu Shih; Yu-Chen Hsu; Jyh-Ming Jimmy Juang; Eric Y Chuang; Tzu-Pin Lu
Journal:  BMC Bioinformatics       Date:  2022-10-23       Impact factor: 3.307

7.  Using gene expression to improve the power of genome-wide association analysis.

Authors:  Yen-Yi Ho; Emily C Baechler; Ward Ortmann; Timothy W Behrens; Robert R Graham; Tushar R Bhangale; Wei Pan
Journal:  Hum Hered       Date:  2014-07-30       Impact factor: 0.444

8.  Incorporating multiple sets of eQTL weights into gene-by-environment interaction analysis identifies novel susceptibility loci for pancreatic cancer.

Authors:  Tianzhong Yang; Hongwei Tang; Harvey A Risch; Sarah H Olson; Gloria Peterson; Paige M Bracci; Steven Gallinger; Rayjean J Hung; Rachel E Neale; Ghislaine Scelo; Eric J Duell; Robert C Kurtz; Kay-Tee Khaw; Gianluca Severi; Malin Sund; Nick Wareham; Christopher I Amos; Donghui Li; Peng Wei
Journal:  Genet Epidemiol       Date:  2020-08-10       Impact factor: 2.135

9.  Weighted mining of massive collections of [Formula: see text]-values by convex optimization.

Authors:  Edgar Dobriban
Journal:  Inf inference       Date:  2017-12-08

10.  Molecular mechanisms underlying variations in lung function: a systems genetics analysis.

Authors:  Ma'en Obeidat; Ke Hao; Yohan Bossé; David C Nickle; Yunlong Nie; Dirkje S Postma; Michel Laviolette; Andrew J Sandford; Denise D Daley; James C Hogg; W Mark Elliott; Nick Fishbane; Wim Timens; Pirro G Hysi; Jaakko Kaprio; James F Wilson; Jennie Hui; Rajesh Rawal; Holger Schulz; Beate Stubbe; Caroline Hayward; Ozren Polasek; Marjo-Riitta Järvelin; Jing Hua Zhao; Deborah Jarvis; Mika Kähönen; Nora Franceschini; Kari E North; Daan W Loth; Guy G Brusselle; Albert Vernon Smith; Vilmundur Gudnason; Traci M Bartz; Jemma B Wilk; George T O'Connor; Patricia A Cassano; Wenbo Tang; Louise V Wain; María Soler Artigas; Sina A Gharib; David P Strachan; Don D Sin; Martin D Tobin; Stephanie J London; Ian P Hall; Peter D Paré
Journal:  Lancet Respir Med       Date:  2015-09-21       Impact factor: 30.700

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.