Literature DB >> 35312098

Secondary analyses for genome-wide association studies using expression quantitative trait loci.

Julius S Ngwa¹, Lisa R Yanek², Kai Kammers³, Kanika Kanchan², Margaret A Taub¹, Robert B Scharpf³, Nauder Faraday⁴, Lewis C Becker², Rasika A Mathias², Ingo Ruczinski¹.

Abstract

Genome-wide association studies (GWAS) have successfully identified thousands of single nucleotide polymorphisms (SNPs) associated with complex traits; however, the identified SNPs account for a fraction of trait heritability, and identifying the functional elements through which genetic variants exert their effects remains a challenge. Recent evidence suggests that SNPs associated with complex traits are more likely to be expression quantitative trait loci (eQTL). Thus, incorporating eQTL information can potentially improve power to detect causal variants missed by traditional GWAS approaches. Using genomic, transcriptomic, and platelet phenotype data from the Genetic Study of Atherosclerosis Risk family-based study, we investigated the potential to detect novel genomic risk loci by incorporating information from eQTL in the relevant target tissues (i.e., platelets and megakaryocytes) using established statistical principles in a novel way. Permutation analyses were performed to obtain family-wise error rates for eQTL associations, substantially lowering the genome-wide significance threshold for SNP-phenotype associations. In addition to confirming the well known association between PEAR1 and platelet aggregation, our eQTL-focused approach identified a novel locus (rs1354034) and gene (ARHGEF3) not previously identified in a GWAS of platelet aggregation phenotypes. A colocalization analysis showed strong evidence for a functional role of this eQTL.

Entities: Chemical

Keywords: expression quantitative trait loci; family wise error rate; genome-wide association studies; permutations; platelet aggregation; whole-genome sequencing

Mesh：

Substances：

Year: 2022 PMID： 35312098 PMCID： PMC9086181 DOI： 10.1002/gepi.22448

Source DB: PubMed Journal: Genet Epidemiol ISSN： 0741-0395 Impact factor: 2.344

INTRODUCTION

Platelet aggregation is critical for normal hemostasis and pathologic thrombus formation (Jackson, 2007). Platelets are known to play an important role in the pathogenesis of atherosclerosis and in the acute thrombotic events that characterize acute coronary syndromes (Freedman & Loscalzo, 2013; Libby, 2001). High residual levels of platelet reactivity despite antiplatelet therapy is also associated with increased likelihood of major adverse cardiovascular events after percutaneous coronary intervention (de Prado et al., 2006). Several large cohorts have documented the highly variable interindividual platelet responsiveness to a variety of agonists (Kunicki & Nugent, 2010). Furthermore, a number of genetic and environmental factors contribute to substantial variation in platelet function seen among normal persons. Genome‐wide association studies (GWAS) have successfully identified several single nucleotide polymorphisms (SNPs) that are associated with platelet aggregation phenotype (Johnson et al., 2010; Keramati et al., 2021, 2019; Kim et al., 2013; Lewis & Ryan, 2013; Mathias et al., 2010; Qayyum et al., 2015). Previous family‐based studies have shown that the majority of these platelet traits are heritable, with estimates up to 70% in African Americans (AAs) and almost 60% in European Americans (EAs) (Bray et al., 2007; Faraday et al., 2007). But even in aggregate, the SNPs identified from prior GWAS explain only a small proportion of this heritability. This phenomenon is observed in most complex traits, because the effect size of most SNPs is small providing limited power to pass the GWAS significance threshold (He et al., 2013; Manolio et al., 2009). With the implementation of stringent thresholds, variants that confer small disease risks are likely to be missed among the millions of SNPs that are tested. Hence additional analytical approaches that exploit genetic information beyond SNP association are useful to uncover additional important genetic variants. Establishing connections between genetic variants identified in GWAS and their biological mechanisms has been challenging (Gupta & Musunuru, 2013). Some studies have looked at the overlap between complex trait‐associated variants and expression quantitative trait loci (eQTL) variants as evidence of common causal molecular mechanisms (Dubois et al., 2010; Nica et al., 2010). The concept is that a GWAS variant, in some tissues, may affect expression at a nearby gene and that both the gene and the tissue might play a role in the disease mechanism (Huang et al., 2015). Others have also explored approaches that integrate summary‐level data from GWAS with eQTL data in a Mendelian randomization style to identify genes whose expression levels are associated with a complex trait because of pleiotropy (Zhu et al., 2016). There is also increasing evidence that SNPs associated with complex traits are more likely to be eQTL and that a substantial proportion of these GWAS risk variants influence complex trait by regulating gene expression levels of their target genes (Albert & Kruglyak, 2015; Emilsson et al., 2008; Nica et al., 2010; Nicolae et al., 2010). Integrating this information in GWAS can enhance the discovery of trait‐associated SNPs for complex phenotypes, as gene expression analyses can yield important information about genetic architecture and can point to mechanisms that link genetics and disease (Gupta & Musunuru, 2013). Annotating SNPs with information on expression can certainly improve our understanding of variants that underlie biological control of gene expression and genes involved in platelet aggregation. Our goal in this study was to investigate the potential to leverage eQTL from a target tissue to identify novel loci associated with phenotype from prior GWAS. In this example, we leverage eQTL information from platelets (PLTs) and megakaryocytes (MKs) to identify novel loci associated with platelet aggregation phenotypes using Whole Genome Sequencing (WGS) data from EAs and AAs from the GeneSTAR family‐based study, generated as part of the NHLBI's Trans‐Omics for Precision Medicine (TOPMed) program. We incorporate eQTL information from RNA‐seq data on PLTs and induced pluripotent stem cell (iPSC) derived MKs (Kammers, Taub, Rodriguez, et al., 2021) to uncover novel genetic variants that determine platelet aggregation, using permutation tests to assess statistical significance.

MATERIALS AND METHODS

Genetic study of atherosclerosis risk cohort

GeneSTAR is an ongoing prospective study begun in 1983 designed to determine environmental, phenotypic, and genetic causes of premature cardiovascular disease. Participants came from EA and AA families identified from 1983 to 2006 from probands with a premature coronary disease event before 60 years of age who were identified at the time of hospitalization in any of 10 Baltimore area hospitals. Their apparently healthy 30–59 year old siblings without known coronary artery disease (CAD) were recruited and underwent initial phenotypic measurement and characterization between 1983 and 2007 (Vaidya et al., 2007; Yanek et al., 2013). Adult offspring (over 21 years of age) of siblings and probands along with the coparents of the offspring were recruited and underwent initial phenotypic measurement and characterization between 2003 and 2006. Participants for the current study took part in a 2‐week trial of aspirin from 2003 to 2006, and were apparently healthy, free of CAD, and had not used aspirin or antiplatelet medications for 2 weeks before the baseline visit (Becker et al., 2006). Platelet function was assessed before and after 2 weeks of aspirin in whole blood and platelet‐rich plasma (PRP) with multiple agonists such as collagen, ADP, and epinephrine (EPI) as described previously (Becker et al., 2006). Maximal aggregation (%) of PRP to 2 µM ADP was the phenotype we examined as proof of concept in this study.

Whole genome sequencing data

We used the sequencing data available through the NHLBI's Trans‐Omics for Precision Medicine (TOPMed) program (https://nhlbiwgs.org). WGS was performed to an average depth of 38X using DNA isolated from blood, PCR‐free library construction, and Illumina HiSeq X technology. Details for variant calling and quality control are described in detail in Taliun et al. (2021). In brief, variant discovery and genotype calling was performed jointly across all the available TOPMed studies using the GotCloud 6 pipeline, resulting in a single, multistudy, genotype call set. Sample‐level quality control was performed to check for pedigree errors, discrepancies between self‐reported and genetic sex, and concordance with prior genotyping array data. Among the GeneSTAR samples in TOPMed Freeze 6, 806 EAs in 196 families and 661 AAs in 190 families had complete phenotype data.

RNA sequencing data

Details on the iPSC derived MK and PLT samples used in the RNA sequencing are described in detail elsewhere (Kammers et al., 2017; Kammers, Taub, Mathias, et al., 2021; Kammers, Taub, Rodriguez, et al., 2021). Briefly, for 185 iPSC‐derived MK cell lines and for 290 PLT samples with WGS data we also obtained RNA‐seq data from extracted nonribosomal RNA. This included iPSC‐derived MKs on 84 AA and 101 EA subjects as well as platelets on 110 AA and 180 EA subjects. Details on data processing are provided in Kammers, Taub, Rodriguez, et al. (2021). In brief, we used the HISAT‐StringTie suite (Pertea et al., 2016) for alignment and assembly of RNA‐seq data and the Ballgown package (Frazee et al., 2015) for efficient data storage, processing, and analysis. Gene expression was quantified as fragments per kilobase per million reads mapped (FPKM), log‐transformed, and genes with median FPKM across all samples less than or equal to 1 (for MKs) or 0.3 (for PLTs) were excluded.

Genome‐wide association studies

A linear mixed effects model for genetic association was applied to the WGS data using the GENESIS Package (Conomos & Thornton, 2016), and analysis was first performed separately in each ethnic group (EA and AA). A genetic relationship matrix (GRM) was created using the PC‐Relate function to account for phenotype correlations due to the family structure of the GeneSTAR samples. GWAS WGS‐based association analysis was conducted using age and sex adjusted inverse normalized transformation of the platelet phenotypes. In each group, SNP quality control filtering was carried out family‐aware using PLINK (http://zzz.bwh.harvard.edu/plink/). Only SNPs with minor allele frequency (MAF) greater than 1% in the respective group, Hardy–Weinberg equilibrium test p value larger than 10−6 and missing genotype frequency less than 5% were tested for association, and reported. Further, SNPs with inflated estimated standard errors (larger than 10) due to collinearity were omitted.

Meta‐analysis

SNPs with MAF larger than 1% in both groups were then included in a meta‐analysis. Inverse variance weighted fixed effects meta‐analyses based on the slope and standard error estimates were conducted using the metagen function implemented in the R package meta, combining the stratified EA and AA results. Quantile–quantile (qq) plots of observed versus expected p values were examined to assess potential type I error inflation. Manhattan plots and regional association plots of the GWAS results using LocusZoom (Pruim et al., 2010) were created based on the Human Genome version 38 (hg38) build. Conditional analyses to potentially identify multiple causal variants in all regions identified using the GWAS WGS meta‐analysis approach were performed by conditioning on the most significant SNP in the regions of interest, and re‐assessing the strength of association in the respective regions.

eQTL analysis

Details of the eQTL analyses are provided in Kammers, Taub, Rodriguez, et al. (2021). In brief, eQTL analyses were carried out for both MK and PLT at the gene level stratified by ancestry (AA and EA), focusing on a 1 Mb window around each SNP and adjusting for sex, age, percent CD41+ CD42a+ MK pellets (MKs only), RNA‐seq batch, and 15 principal components (PCs) of the filtered and log‐transformed gene expression matrix. Only SNPs with at least two samples for each genotype and a call rate greater than 80% were tested, using the R package MatrixEQTL (Shabalin, 2012).

Permutation analysis

To simulate null distributions for tests of association between the set of eQTL identified SNPs and the trait, residuals (obtained after regressing the phenotype on the covariates) were randomly shuffled while SNP genotypes were kept the same, to preserve the SNP correlation structure (Churchill & Doerge, 1994). The 396 GeneSTAR families ranged from 1 to 15 members in size. For multiple‐member families, residuals were shuffled within families to also maintain within family phenotype correlation structure. Residuals were randomly swapped between singletons. To estimate the threshold for the 5% family‐wise error rate (FWER) under the global null of no association across all eQTL identified SNPs, we permuted each set of residuals 1000 times as described above, carried out 1000 separate GENESIS association analyses on the set of all eQTL identified SNPs, recorded the minimum p value for each of these 1000 analyses, and selected the 5th percentile of these 1000 minimum p values.

Colocalization

We performed a Bayesian colocalization analysis to investigate whether an observed association signal in the GWAS and eQTL analysis is consistent with a shared causal variant, using the framework described by Giambartolomei et al. (2014). In brief, for two separate traits (here, the phenotypes in the GWAS and the gene expression for the gene of interest in the eQTL analyses) five different hypothesis are considered under the assumption of a single causal variant for each trait: H0: no association with either trait; H1: association with trait 1, not with trait 2; H2: association with trait 2, not with trait 1; H3: association with trait 1 and trait 2, two independent SNPs; H4: association with trait 1 and trait 2, one shared SNP. Colocalization under the assumption of a single causal variant for each trait is inferred by support of hypothesis H5 calculating Bayes factors using the approximation proposed by Wakefield (2009). Prior probabilities for association with one or both traits were chosen as the default parameters in the coloc.abf function from the coloc R package ( that a SNP is associated with either of the two traits, and that a SNP is associated with both).

RESULTS

A total of 9,769,070 SNPs in the EA families and 16,415,214 SNPs in the AA families met the QC filtering criteria (described in Section 2). In the stratified association analysis, one SNP in gene GTF2IRD1 on chromosome 7 (rs13221023) exceeded the GWAS p value threshold in the EA families. In the AAs, one SNP (rs12041331) located in the PEAR1 gene met this GWAS threshold (Table 1A and Figure S1). The meta‐analysis of the 8,242,287 SNPs with a MAF of 1% or larger in both groups only yielded SNP rs12041331 in the PEAR1 gene (also identified in the stratified AA analysis) meeting the GWAS threshold (Table 1A and Figure 1A). The test statistics in the meta‐analysis and the stratified analyses were well‐calibrated, with genomic control parameters (Devlin & Roeder, 1999) of 1.011 in the meta‐analysis, and 1.014 and 1.012 in the EA and AA stratified analyses, respectively (Figure S2).

Table 1

(A) Loci identified through the WGS‐based GWAS meta‐analysis
SNP	Model	CHR	Position	MEA	MAA	p	Gene
rs12041331	META	1	156,899,922	0.09	0.35	2.05×10−10	PEAR1
rs12041331	AA	1	156,899,922	0.09	0.35	4.35×10−8	PEAR1
rs13221023	EA	7	74,528,803	0.04	0.07	2.40×10−8	GTF2IRD1

Note: Column names as follows. SNP: the locus rs number when available. Model: the model used to identify the locus (EA/AA stratified, or META analysis). CHR: chromosome of the identified locus. Position: genomic position of the locus identified. Gene: gene the locus resides in. If intergenic, the flanking genes are reported. MEA/MAA: minor allele frequencies of the EA and AA families. P: statistical significance (p‐value) from the hypothesis test of no association based on a standard Gaussian null distribution. eGEA/eGAA: gene for which the reported SNP is an eQTL in the EA and AA families. An italicized MAF in column MAA indicates that the reference allele was switched.

Figure 1

GWAS meta‐analysis results. (a) Manhattan plot of the GWAS for all 8,242,287 SNPs passing quality control. The dashed horizontal line is at p = , representing the standard GWAS cut‐off for significance. (b) Manhattan plot of the GWAS for the 229,674 eQTL in platelet. The dashed horizontal line is at 6.00 (p = ), representing the cut‐off for a 5% FWER derived using permutations. (c) Manhattan plot of the GWAS for the 55,088 eQTL in megakaryocytes. The dashed horizontal line is at 5.12 (p = ), representing the cut‐off for a 5% FWER derived using permutations. SNPs passing the respective significance threshold at the PEAR1 (chromosome 1) and ARHGEF3 (chromosome 3) loci are highlighted with a red background. eQTL, expression quantitative trait loci; FWER, family‐wise error rate; GWAS, genome‐wide association studies; SNP, single nucleotide polymorphism

Loci identified using the standard genome‐wide significance level of through the WGS‐based GWAS meta‐analysis (A), and the eQTL PLTs (B) and MKs based (C) permutation tests using the respective FWER permutation thresholds Note: Column names as follows. SNP: the locus rs number when available. Model: the model used to identify the locus (EA/AA stratified, or META analysis). CHR: chromosome of the identified locus. Position: genomic position of the locus identified. Gene: gene the locus resides in. If intergenic, the flanking genes are reported. MEA/MAA: minor allele frequencies of the EA and AA families. P: statistical significance (p‐value) from the hypothesis test of no association based on a standard Gaussian null distribution. eGEA/eGAA: gene for which the reported SNP is an eQTL in the EA and AA families. An italicized MAF in column MAA indicates that the reference allele was switched. GWAS meta‐analysis results. (a) Manhattan plot of the GWAS for all 8,242,287 SNPs passing quality control. The dashed horizontal line is at p = , representing the standard GWAS cut‐off for significance. (b) Manhattan plot of the GWAS for the 229,674 eQTL in platelet. The dashed horizontal line is at 6.00 (p = ), representing the cut‐off for a 5% FWER derived using permutations. (c) Manhattan plot of the GWAS for the 55,088 eQTL in megakaryocytes. The dashed horizontal line is at 5.12 (p = ), representing the cut‐off for a 5% FWER derived using permutations. SNPs passing the respective significance threshold at the PEAR1 (chromosome 1) and ARHGEF3 (chromosome 3) loci are highlighted with a red background. eQTL, expression quantitative trait loci; FWER, family‐wise error rate; GWAS, genome‐wide association studies; SNP, single nucleotide polymorphism Colocalization using meta‐analysis p values (dark grey) and eQTL p values for association with ARHGEF3 (light grey), separately for platelets (PLT) and megakaryocytes (MK) eQTL. For clarity of display, the x‐axis represent the index in the SNP set, not the genomic locations. The respective p values for SNP rs1354034 are highlighted with a red background. eQTL, expression quantitative trait loci; SNP, single nucleotide polymorphism In the eQTL analysis, a total of 16,641,225 SNP‐gene pairs were tested in the EA families and 20,101,156 pairs were tested in the AA families for PLTs, as previously described. Among those, 208,230 PLT eQTL SNP associations in the EA families met a false discovery rate of 5%, and 54,085 PLT eQTL SNP associations met the same threshold in the AA families. A combined total of 229,674 unique SNPs were common in both the EA and AA platelet eQTL analysis; these were used for the permutation approach applied to the GWAS meta‐analysis results. The MK data had a total of 30,802,119 SNP‐gene pairs tested in the EA families and 34,673,581 in the AA families for eQTL analysis. A total of 50,255 MK eQTL SNP associations in the EA families met a false discovery rate of 5%, and 9046 in the AA families, respectively. A combined total of 55,088 unique MK eQTL SNPs, found to be overlapping in EA and AA eQTL results, were then used for the permutation approach applied to the meta‐analysis of the GWAS signals. In the GWAS meta‐analysis based on the 229,674 platelet‐identified eQTL, three SNPs met the PLT eQTL permutation FWER threshold of p = in two genes, PEAR1 on chromosome 1, and ARHGEF3 on chromosome 3 (Table 1B and Figure 1B). In the GWAS meta‐analysis based on the 55,088 MK‐identified eQTL, only the intron SNP rs1354034 in the ARHGEF3 gene met the permutation threshold of p = (Table 1C and Figures 1C, S3, and S4). While PEAR1 has been firmly established as a gene modifying platelet aggregation in response to agonists (Johnson et al., 2010; Kim et al., 2013; Lewis & Ryan, 2013; Mathias et al., 2010; Qayyum et al., 2015), the exchange factor ARHGEF3 found in platelets has largely gone unnoticed in that particular role. Associations of ARHGEF3, and in particular its intronic variant rs1354034, have been reported in the GWAS catalogue for many platelet and blood related phenotypes, such as platelet count, mean platelet volume, reticulocyte fraction of red cells, reticulocyte count, red blood cell count, blood protein levels, lymphocyte counts, hematocrit, hemoglobin concentration, mean corpuscular hemoglobin, and plateletcrit (https://www.ebi.ac.uk/gwas/). However, to our knowledge, ARHGEF3 has not been previously identified in a genome‐wide analysis as modifying platelet aggregation in response to agonists. The intronic ARHGEF3 SNP rs12485738, reported by Meisinger et al. (2009) as strongly associated with mean platelet volume, was considered by Johnson et al. (2010) as a platelet aggregation candidate SNP, and achieved a p value of when tested for association in a meta‐analysis with response to lower ADP levels (table S5a in Johnson et al., 2010). When ARHGEF3 was considered as a candidate gene (table S5b in Johnson et al., 2010), no SNPs were significant after multiple comparisons correction, but low p values were reported for SNPs rs4455300 (ADP, p = 0.0006), rs9851853 (epinephrine, p = 0.0029) and rs11716680 (collagen, p = 0.016). Also noteworthy, another exchange factor (ARHGEF11) was highlighted as a gene within proximity (60 kb) of the PEAR1 peak SNP (Johnson et al., 2010, table 4). A Bayesian colocalization analysis using the platelet aggregation phenotype and gene expressions strongly supported the notion of a single shared common genetic causal variant in the newly detected gene ARHGEF3. Meta‐analysis p values for the association of the 7598 SNPs within 1 MB of rs1354034 with the platelet aggregation trait were considered, of which 4128 were PLT eQTL for ARHGEF3 gene expression, and 3809 were MK eQTL. The posterior probability of one common causal variant for association with the trait and ARHGEF3 gene expression (Hypothesis 4 as described in Giambartolomei et al., 2014) was 65.4% in the PLT and 99.8% in the MK. SNP rs1354034 had the strongest association with the phenotype (p = ) and the 8th smallest PLT eQTL p value (p = ), resulting in a posterior probability of 96.3% being the causal variant under the COLOC assumptions (Table 2 and Figure 2, PLT). However, because several SNPs had a stronger association with ARHGEF3 expression in the PLT than rs1354034 and the GWAS p value did not pass the traditional threshold of genome‐wide association, the posterior probabilities that the causal variant is only associated with gene expression (Hypothesis 2) or that two independent SNPs underly the associations (Hypothesis 3) also have appreciable support from the observed data (posterior probabilities of 14.8% and 19.8%, respectively). Among the ARHGEF3 MK eQTL on the other hand, rs1354034 also had the smallest eQTL p value (p = ), resulting in a posterior probability of virtually 100% being the causal variant (Table 2 and Figure 2, MK). A conditional analysis in this region supported the notion of a single independent variant affecting this platelet aggregation trait (Figure S5).

Table 2

Bayesian colocalization results for the PLT and MK ARHGEF3 eQTL

Bayesian colocalization results for PLT ARHGEF3 eQTL
PPH₀ = 0.000, PPH₁ = 0.000, PPH₂ = 0.148, PPH₃ = 0.198, PPH₄ = 0.654
SNP	CHR	Position	MEA	MAA	P/GWAS	P/eQTL	BF/G	BF/E	BF	PP
rs1354034	3	56,815,721	0.40	0.25	7.55×10−7	4.07×10−10	9.12	16.46	25.58	0.963
rs12488986	3	56,816,160	0.18	0.14	1.32×10−2	7.67×10−12	1.58	19.65	21.23	0.012
rs1039383	3	56,815,027	0.23	0.16	1.09×10−1	2.48×10−12	0.27	20.87	21.14	0.011
rs1039384	3	56,815,161	0.23	0.18	1.68×10−1	2.48×10−12	0.01	20.87	20.88	0.009
rs17288922	3	56,817,359	0.17	0.13	1.13×10−2	3.07×10−11	1.67	18.36	20.03	0.004

Note: PPH0–PPH4: posterior probabilities for Hypotheses 0–4 as described in Section 2 and Giambartolomei et al. (2014). Column names as in Table 1, and as follows. P/GWAS: p value from WGS GWAS. P/eQTL: p value from eQTL analysis. Bayes factors as described in Giambartolomei et al. (2014). BF/G: log10 Bayes factor for the SNP‐phenotype association. BF/E: log10 Bayes factor for the SNP‐gene association. BF: log10 Bayes factor for the joint association of the SNP with phenotype and gene expression.

Abbreviations: BF, Bayes factor; eQTL, expression quantitative trait loci; GWAS, Genome‐wide association studies; MK, megakaryocytes; PLT, platelets; PP, posterior probability of colocalization.

Figure 2

Colocalization using meta‐analysis p values (dark grey) and eQTL p values for association with ARHGEF3 (light grey), separately for platelets (PLT) and megakaryocytes (MK) eQTL. For clarity of display, the x‐axis represent the index in the SNP set, not the genomic locations. The respective p values for SNP rs1354034 are highlighted with a red background. eQTL, expression quantitative trait loci; SNP, single nucleotide polymorphism

Bayesian colocalization results for the PLT and MK ARHGEF3 eQTL Note: PPH0–PPH4: posterior probabilities for Hypotheses 0–4 as described in Section 2 and Giambartolomei et al. (2014). Column names as in Table 1, and as follows. P/GWAS: p value from WGS GWAS. P/eQTL: p value from eQTL analysis. Bayes factors as described in Giambartolomei et al. (2014). BF/G: log10 Bayes factor for the SNP‐phenotype association. BF/E: log10 Bayes factor for the SNP‐gene association. BF: log10 Bayes factor for the joint association of the SNP with phenotype and gene expression. Abbreviations: BF, Bayes factor; eQTL, expression quantitative trait loci; GWAS, Genome‐wide association studies; MK, megakaryocytes; PLT, platelets; PP, posterior probability of colocalization.

DISCUSSION

GWAS have successfully identified tens of thousands of SNPs associated with complex traits, including genetic variants that affect platelet function by modifying platelet parameters such as platelet aggregation, platelet count, mean platelet volume and altering the expression of key platelet receptors. In general, SNPs that influence gene expression (eQTL) are significantly enriched (Nicolae et al., 2010), and consequently, researchers have explored various ways of incorporating eQTL into GWAS. Using the ENCODE data base, Nicolae et al. (2010) constructed a score quantifying the likelihood that a SNP has a function in regulating transcript levels. They concluded that annotating SNPs with a score reflecting the strength of evidence that a SNP is an eQTL can improve ability to discover true associations. Gupta and Musunuru (2013) discussed the use of eQTL databases in the study of noncoding variants in cardiovascular and metabolic phenotypes, and reviewed successes in using eQTL to link variants with functional candidate genes. Zhu et al. (2016) proposed a new method called SMR that integrates summary‐level data from GWAS with expression data from eQTL to identify genes whose expression levels are associated with complex traits due to pleiotropy. The authors adopt a Mendelian randomization approach to estimate and test for the causative effect of an exposure variable on an outcome. J. Li et al. (2013) used eQTL weights as prior information in SNP‐based association tests to improve test power while maintaining control of the family‐wise error rate or false‐discovery rate. Some SNPs that were insignificant without eQTL weighting became significant using eQTL‐weighted Bonferroni or Benjamini–Hochberg procedures. The authors concluded that using informative weights may improve power, and little power is killed when uninformative weights are used. Saccone et al. (2010) developed an online prioritization tool (SPOT), which systematically combines multiple biological databases to prioritize SNPs by genomic information network. SNPs are assigned a prioritization score based on pathway information, comparative genomics, a linkage scan, and results from other independent GWAS. Wu et al. exploit the fact that complex traits are often affected by multiple genes in annotated gene pathways, and extend TWAS from a gene to a pathway based analysis (Wu & Pan, 2018). Integrating KEGG and GO pathways with GWAS and eQTL information, the authors were able to identify several novel pathways associated with schizophrenia. Zeng et al. (2019) investigated the prevalence and role of secondary cis‐eQTLs regulating gene expression in peripheral blood in two large cohort studies. A colocalization analysis of eQTL signals with GWAS hits detected 1349 genes whose expression in peripheral blood was associated with a total of 591 phenotypes, with more than 10% of the colocalized signals due to nonprimary cis‐eQTLs. After conducting GWAS on 12 complex eye diseases or traits, Strunz et al. (2020) examined regulation of gene expression in healthy retina as a disease relevant tissue, and identified more than 400,000 significant eQTL variants regulating more than 3000 genes. The authors found that expression of 10 of those genes was regulated by significant eQTLs associated with multiple eye diseases or traits, providing evidence for their role in the etiology of these conditions. These studies demonstrate that integrating eQTL information in GWAS can potentially improve power in highlighting causal genes. In our study we presented an approach to improve power to detect GWAS signals when shared among eQTL by substantially lowering the genome‐wide significance threshold compared to the standard Bonferroni procedure using permutation analyses. In addition to improving power, focusing on eQTL also is more likely to yield functional variants. We also stress the importance of using relevant target tissues for eQTL analyses, particularly for the study of platelet related phenotypes. We previously investigated the transcriptional profile of platelets and iPSC‐derived megakaryocytes, and compared those with peak‐associated SNP‐expressed gene pairs of 48 other tissue types from the GTEx catalogue (GTEx Consortium, 2020). One of our key findings was that the eQTLs we detected were largely unique to MKs and PLs, with a somewhat large fraction not seen among any of the 48 GTEx tissues (Kammers, Taub, Rodriguez, et al., 2021). Our here presented analysis strategy shares some of the characteristics of transcriptome‐wide association studies (TWAS). For example, both TWAS and our approach exploit the fact that gene expression may be a molecular mediator between genotype and phenotype. However, there are also important distinctions. TWAS were initially motivated by the fact that the majority of SNPs in the GWAS catalogue (https://www.ebi.ac.uk/gwas/) were in noncoding regions of the human genome, rendering functionality unclear. Thus, TWAS were introduced as an approach to directly investigate associations between genetically regulated gene expression and the phenotype of interest (Gamazon et al., 2015; Gusev et al., 2016). In general, a TWAS is a two‐step procedure where genetically regulated gene expression is first assessed in a reference data set for which transcriptomic data are available, and a prediction model is delineated based on these findings. The prediction model is then used in the second step to impute gene expression for the GWAS cohort, and the imputed gene expressions are then correlated with the phenotype of interest. Since its introduction, a variety of other TWAS‐like methods have been put forward (Barbeira et al., 2018, 2019; Hu et al., 2019; Luningham et al., 2020; Pividori et al., 2020; Xu et al., 2017; Zhou et al., 2020) that differ in a variety of aspects, such as the type of GWAS input, the statistical approaches to generate the eQTLs underlying the predictors, among others (see B. Li & Ritchie, 2021 for a review). A TWAS can be considered as a multimarker association approach that allows for the identification of genes underlying the etiology of the phenotype, while our approach identifies single variants that have potential to be causal. A TWAS is a more complicated procedure than our approach, and in particular the development of prediction models used in Step 1 is an active area of research (B. Li & Ritchie, 2021). Our approach for the main part is simply based on a GWAS and an eQTL analysis, with the latter being used to select SNPs from the former. The association p values from the GWAS do not change in our approach, only the significance threshold derived by the permutation test, does. In that sense it could also be reasoned that both TWAS and our methods improve power in part by reducing the multiple hypothesis burden. In addition to confirming the well known PEAR1 platelet aggregation locus, we also identified a novel platelet and megakaryocyte eQTL rs1354034 (ARHGEF3) associated with aggregation to ADP after exposure to aspirin. The SNP rs1354034 falls within the protein coding gene ARHGEF3 (Rho guanine nucleotide exchange factor 3, RhoGEF3), which activates RhoGTPases and plays an important role in the regulation of cell morphology, cell aggregation, cytoskeletal rearrangements, and transcriptional activation. It regulates the switch of RhoGTPase from the inactive GDP‐bound state to the active GTP‐bound state and is one of the most abundant GEFs found in human megakaryocyte lineage and platelets (Astle et al., 2016; Eicher et al., 2016). ARHGEF3 has been shown in previous GWAS to be associated with platelet count and mean platelet volume (Gieger et al., 2011; J. Li et al., 2013; Lin et al., 2017; Read et al., 2019; Schick et al., 2016; Shameer et al., 2014). The silencing of ARHGEF3 has been shown to completely ablate erythropoiesis and thrombocyte formation in a zebrafish model (Serbanovic‐Canic et al., 2011). Serbanovic‐Canic et al. (2011) also reported that the disruption of the ARHGEF3 target, RhoA, produced severe anemia, which was corrected by iron injection. Zou et al. (2017) reported that rs1354034, which is located in a DNase I hypersensitive region in human megakaryocytes, is an eQTL associated with ARHGEF3 expression level in human platelets (Zou et al., 2017). They also suggested that it may be the causal SNP that accounts for the variations observed in human platelet traits and ARHGEF3 expression. They further reported that in vitro human platelet activation assays revealed rs1354034 is highly correlated with human platelet activation by ADP, and concluded that modulation of ARHGEF3 gene expression in humans with a promoter‐localized SNP may play a role in human megakaryocytes and human platelets. Our Bayesian colocalization analysis showed compelling evidence for a functional role of this eQTL.

CONFLICTS OF INTEREST

The authors declare no conflict of interest.

65 in total

1. Heritability of platelet function in families with premature coronary artery disease.

Authors: P F Bray; R A Mathias; N Faraday; L R Yanek; M D Fallin; J E Herrera-Galeano; A F Wilson; L C Becker; D M Becker
Journal: J Thromb Haemost Date: 2007-08 Impact factor: 5.824

2. iGWAS: Integrative Genome-Wide Association Studies of Genetic and Genomic Data for Disease Susceptibility Using Mediation Analysis.

Authors: Yen-Tsung Huang; Liming Liang; Miriam F Moffatt; William O C M Cookson; Xihong Lin
Journal: Genet Epidemiol Date: 2015-05-22 Impact factor: 2.135

3. A Powerful Framework for Integrating eQTL and GWAS Summary Data.

Authors: Zhiyuan Xu; Chong Wu; Peng Wei; Wei Pan
Journal: Genetics Date: 2017-09-11 Impact factor: 4.562

4. Integrative approaches for large-scale transcriptome-wide association studies.

Authors: Alexander Gusev; Arthur Ko; Huwenbo Shi; Gaurav Bhatia; Wonil Chung; Brenda W J H Penninx; Rick Jansen; Eco J C de Geus; Dorret I Boomsma; Fred A Wright; Patrick F Sullivan; Elina Nikkola; Marcus Alvarez; Mete Civelek; Aldons J Lusis; Terho Lehtimäki; Emma Raitoharju; Mika Kähönen; Ilkka Seppälä; Olli T Raitakari; Johanna Kuusisto; Markku Laakso; Alkes L Price; Päivi Pajukanta; Bogdan Pasaniuc
Journal: Nat Genet Date: 2016-02-08 Impact factor: 38.330

5. Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown.

Authors: Mihaela Pertea; Daehwan Kim; Geo M Pertea; Jeffrey T Leek; Steven L Salzberg
Journal: Nat Protoc Date: 2016-08-11 Impact factor: 13.491

6. Genetics of gene expression and its effect on disease.

Authors: Valur Emilsson; Gudmar Thorleifsson; Bin Zhang; Amy S Leonardson; Florian Zink; Jun Zhu; Sonia Carlson; Agnar Helgason; G Bragi Walters; Steinunn Gunnarsdottir; Magali Mouy; Valgerdur Steinthorsdottir; Gudrun H Eiriksdottir; Gyda Bjornsdottir; Inga Reynisdottir; Daniel Gudbjartsson; Anna Helgadottir; Aslaug Jonasdottir; Adalbjorg Jonasdottir; Unnur Styrkarsdottir; Solveig Gretarsdottir; Kristinn P Magnusson; Hreinn Stefansson; Ragnheidur Fossdal; Kristleifur Kristjansson; Hjortur G Gislason; Tryggvi Stefansson; Bjorn G Leifsson; Unnur Thorsteinsdottir; John R Lamb; Jeffrey R Gulcher; Marc L Reitman; Augustine Kong; Eric E Schadt; Kari Stefansson
Journal: Nature Date: 2008-03-16 Impact factor: 49.962

7. 2SNP heritability and effects of genetic variants for neutrophil-to-lymphocyte and platelet-to-lymphocyte ratio.

Authors: Bochao Danae Lin; Elena Carnero-Montoro; Jordana T Bell; Dorret I Boomsma; Eco J de Geus; Rick Jansen; Cornelis Kluft; Massimo Mangino; Brenda Penninx; Tim D Spector; Gonneke Willemsen; Jouke-Jan Hottenga
Journal: J Hum Genet Date: 2017-08-03 Impact factor: 3.172

8. Integrity of Induced Pluripotent Stem Cell (iPSC) Derived Megakaryocytes as Assessed by Genetic and Transcriptomic Analysis.

Authors: Kai Kammers; Margaret A Taub; Ingo Ruczinski; Joshua Martin; Lisa R Yanek; Alyssa Frazee; Yongxing Gao; Dixie Hoyle; Nauder Faraday; Diane M Becker; Linzhao Cheng; Zack Z Wang; Jeff T Leek; Lewis C Becker; Rasika A Mathias
Journal: PLoS One Date: 2017-01-20 Impact factor: 3.240

9. Mapping Novel Pathways in Cardiovascular Disease Using eQTL Data: The Past, Present, and Future of Gene Expression Analysis.

Authors: Rajat M Gupta; Kiran Musunuru
Journal: Front Genet Date: 2013-05-31 Impact factor: 4.599

2 in total

1. Secondary analyses for genome-wide association studies using expression quantitative trait loci.

Authors: Julius S Ngwa; Lisa R Yanek; Kai Kammers; Kanika Kanchan; Margaret A Taub; Robert B Scharpf; Nauder Faraday; Lewis C Becker; Rasika A Mathias; Ingo Ruczinski
Journal: Genet Epidemiol Date: 2022-03-21 Impact factor: 2.344

Review 2. Benchmarking statistical methods for analyzing parent-child dyads in genetic association studies.

Authors: Debashree Ray; Candelaria Vergara; Margaret A Taub; Genevieve Wojcik; Christine Ladd-Acosta; Terri H Beaty; Priya Duggal
Journal: Genet Epidemiol Date: 2022-04-22 Impact factor: 2.344

2 in total