Literature DB >> 30925861

Identification of trans-eQTLs using mediation analysis with multiple mediators.

Nayang Shan1,2, Zuoheng Wang3, Lin Hou4,5,6.   

Abstract

BACKGROUND: Mapping expression quantitative trait loci (eQTLs) has provided insight into gene regulation. Compared to cis-eQTLs, the regulatory mechanisms of trans-eQTLs are less known. Previous studies suggest that trans-eQTLs may regulate expression of remote genes by altering the expression of nearby genes. Trans-association has been studied in the mediation analysis with a single mediator. However, prior applications with one mediator are prone to model misspecification due to correlations between genes. Motivated from the observation that trans-eQTLs are more likely to associate with more than one cis-gene than randomly selected SNPs in the GTEx dataset, we developed a computational method to identify trans-eQTLs that are mediated by multiple mediators.
RESULTS: We proposed two hypothesis tests for testing the total mediation effect (TME) and the component-wise mediation effects (CME), respectively. We demonstrated in simulation studies that the type I error rates were controlled in both tests despite model misspecification. The TME test was more powerful than the CME test when the two mediation effects are in the same direction, while the CME test was more powerful than the TME test when the two mediation effects are in opposite direction. Multiple mediator analysis had increased power to detect mediated trans-eQTLs, especially in large samples. In the HapMap3 data, we identified 11 mediated trans-eQTLs that were not detected by the single mediator analysis in the combined samples of African populations. Moreover, the mediated trans-eQTLs in the HapMap3 samples are more likely to be trait-associated SNPs. In terms of computation, although there is no limit in the number of mediators in our model, analysis takes more time when adding additional mediators. In the analysis of the HapMap3 samples, we included at most 5 cis-gene mediators. Majority of the trios we considered have one or two mediators.
CONCLUSIONS: Trans-eQTLs are more likely to associate with multiple cis-genes than randomly selected SNPs. Mediation analysis with multiple mediators improves power of identification of mediated trans-eQTLs, especially in large samples.

Entities:  

Keywords:  Mediation analysis; Multiple mediators; Trans-eQTL

Mesh:

Year:  2019        PMID: 30925861      PMCID: PMC6440281          DOI: 10.1186/s12859-019-2651-6

Source DB:  PubMed          Journal:  BMC Bioinformatics        ISSN: 1471-2105            Impact factor:   3.169


Background

Expression quantitative trait loci (eQTLs) are genetic variants that influence expression levels of mRNA transcripts. Cis-eQTLs commonly refer to genetic variations that act on local genes (Fig. 1a), and trans-eQTLs are those that act on distant genes and genes residing on different chromosomes (Fig. 1b). Identification of eQTLs can help advance our understanding of genetics and regulatory mechanisms of gene expression in various organisms [1]. Consistent findings suggest that many genes are regulated by nearby single nucleotide polymorphisms (SNPs), and the identified cis-eQTLs are typically close to transcription start sites. In contrast to cis-eQTLs, trans-eQTL identification is much more challenging because a greater number of SNP-gene pairs are tested for trans-association. In order to achieve the same power, analysis of trans-eQTLs requires a much larger sample size and/or effect than that in the cis-eQTL analysis. However, trans-eQTLs tend to have weaker effects than cis-eQTLs [2]. Several methods have been developed to improve trans-eQTL detection, such as reducing the multiple-testing burden based on pairwise partial correlations from the gene expression data to increase power [3], and constructing or selecting variables to control for unmeasured confounders that may lead to spurious association [4-6].
Fig. 1

Graphical representation of eQTLs. a cis-eQTL, b trans-eQTL, c mediated trans-eQTL with a single cis-mediator, and d mediated trans-eQTL with multiple cis-mediators

Graphical representation of eQTLs. a cis-eQTL, b trans-eQTL, c mediated trans-eQTL with a single cis-mediator, and d mediated trans-eQTL with multiple cis-mediators Moreover, the biological mechanisms underlying trans-eQTLs are less understood. Previous studies have shown that trans-eQTLs are more likely to be cis-eQTLs than randomly selected SNPs in the human genome [2, 7], suggesting that trans-eQTLs may regulate expression of remote genes by altering the expression of nearby genes. Recently, mediation analysis has become a popular tool to explore trans-association mediated by cis-regulators [2, 6, 8]. These studies used mediation test assuming a single mediator (Fig. 1c). However, gene expression levels are not independent due to the complex regulatory mechanisms. Correlation between genes may violate the assumptions that are required to identify mediation effects if other cis-genes also affect the trans-gene in study. Mediation analysis with multiple mediators has been applied in genomics [9-11], epigenetics [12], and epidemiological studies [13]. Mediation with two mediators was used in [9, 10, 13] and mediation with high dimensional mediators was implemented in [11, 12]. In this paper, we showed that the assumptions in the multivariate extension of mediation analysis are more likely to be satisfied than that in the single-mediator model (Additional file 1). We also found that trans-eQTLs are more likely to associate with more than one cis-gene than randomly selected SNPs in various tissues from the GTEx database. Then, we developed a computational method to identify trans-eQTLs that are mediated by multiple mediators (Fig. 1d). In simulation studies, we demonstrated that the multiple mediator approach increases the statistical power of identification of mediated trans-eQTLs. The improvement is more pronounced in large sample size. We applied the method to the HapMap3 dataset and identified 11 mediated trans-eQTLs that were not detected by the single mediator analysis in the combined samples of African populations. Lastly, we illustrated that mediated trans-eQTLs are more likely to be trait-associated SNPs in genome-wide association studies (GWAS). These findings advance our knowledge of gene regulation.

Methods

Dataset description

Genotype and gene expression data were retrieved from six HapMap3 populations, LWK (Luhya in Webuye, Kenya), MKK (Maasai in Kinyawa, Kenya), YRI (Yoruba in Ibadan, Nigeria), CEU (Utah residents with Northern and Western European ancestry from the CEPH collection), CHB (Han Chinese in Beijing, China), and JPT (Japanese in Tokyo, Japan) [14]. There are 83, 135, 107, 107, 79, and 81 individuals in each population, respectively. The greater genetic diversity in African populations (LWK, MKK, YRI) tends to increase the power of eQTL detection [15]. Therefore, we performed analyses in the three populations separately and in the combined samples. Due to the sample size below 100 in CHB and JPT, we combined the two populations into a sample of Asian populations. Processed expression data profiling on the Illumina Human-6 v2 Expression BeadChip array for the HapMap3 samples were downloaded from ArrayExpress (accession numbers E-MTAB-264 and E-MTAB-198). We also downloaded the V6p release from the GTEx database, which provides a complete list of cis- and trans-eQTLs identified in the GTEx study [16].

Genotype data processing

In the quality control step, a series of filters were applied to remove samples and SNPs with poor quality in each population. We removed samples with a call rate less than 0.97; next we retained autosomal SNPs with a missing rate less than 0.08 and minor allele frequency (MAF) greater than 0.10; finally, SNPs that failed the Hardy-Weinberg test (p-value <10-5) were removed. We then converted the SNP coordinates according to the human reference genome hg38. In addition, we found that some SNPs were in complete Linkage Disequilibrium (LD) with each other or mapped to identical genome positions. For such cases, we randomly selected one SNP to be included in the analysis. The number of individuals and SNPs before and after quality control were listed in Table 1. In total, there are 740,158 SNPs retained in the combined samples of African populations and 540,684 SNPs in the combined samples of Asian populations.
Table 1

The number of individuals and SNPs before and after quality control in the HapMap3 data

Sample Size# SNPs before quality control# SNPs after quality control
LWK831533540953834
MKK1351541375989807
YRI1071505108943161
LWK+MKK+YRI325NAa740158
CEU1071416121787357
CHB791332120675811
JPT811300764643419
CHB+JPT160NA540684

aNA: data is not available

The number of individuals and SNPs before and after quality control in the HapMap3 data aNA: data is not available

Gene expression data processing

There are 21,800 probes in the microarray gene expression in the HapMap3 samples. Among them, 20,439 probes were mapped to the reference genome. We then removed probes that were mapped to multiple genes or non-autosomes, resulting in 19,832 probes corresponding to 19,643 unique genes. We further removed probes with low variance or low intensity and performed quantile normalization to reduce inter-individual variation [17]. Mediation analysis was applied to the probe level data, mainly because multiple probes in a gene represent different isoforms of this gene and merging them may lose information. More importantly, probes mapped to the same gene were weakly correlated in the HapMap3 data.

Population stratification and confounders in gene expression data

In the single population analysis (LWK, MKK, YRI, CEU), we adopted the strategy of [18] to correct for population admixture in LWK and MKK. We used the EIGENSTRAT program [19] to select the top 10 principal components (PCs) generated from the SNP genotype data as covariates. In the combined samples of African populations and Asian populations, 20 PCs from the genotype data were included in the analysis. To adjust for batch effects and unmeasured confounders in the gene expression datasets, we used the probabilistic estimation of expression residuals (PEER) method [20]. Following the GTEx analysis [21], the number of factors for PEER was determined by the sample size. We included 15 factors for datasets with less than 150 samples, 30 factors for datasets with sample size between 150 and 250, and 35 factors for datasets with more than 250 samples. Gender was also included as a covariate in all analyses.

eQTL analysis

We conducted genome-wide eQTL analysis using the R package, Matrix eQTL [22]. SNPs and probes within 1 Mb were tested for cis-association. All inter-chromosomal SNP-probe pairs as well as intra-chromosomal SNP-probe pairs that are more than 1 Mb apart were tested for trans-association.

Enrichment analysis

The motivation of our work was based on the observation that many trans-eQTLs are also identified as cis-eQTLs and they are often associated with more than one cis-gene in the GTEx database. In order to test whether the association with multiple cis-genes is over-represented in trans-eQTLs, we compared the proportion of trans-eQTLs that are associated with more than one cis-gene with that in the human genome. We considered the trans-eQTLs reported in the GTEx V6p dataset and those identified in the HapMap3 dataset. Permutation tests were used to assess significance. To elaborate, for the trans-eQTLs reported in the GTEx V6p dataset, we randomly sampled the same number of SNPs with matched MAF from the 1000 Genomes Project [23] and calculated the proportion of SNPs that are associated with multiple cis-genes. The empirical p-value was obtained by resampling 1000 times. The same test procedure was applied to the trans-eQTLs identified in the HapMap3 dataset. To understand the role of mediated trans-eQTLs in disease association, we performed Fisher’s exact test to assess the enrichment of trait-associated SNPs in the trans-eQTLs identified by our method. The trait-associated SNPs were obtained from the NHGRI GWAS catalog [24].

Mediation analysis

To identify trans-eQTLs that are mediated by one or more cis-genes, we first selected candidate trios, composed of SNP, one or multiple cis-genes, and trans-gene. The trios were selected based on the following criteria. First, trans SNP-gene pairs were selected if their p-value is less than 10-6. The p-value cutoff was chosen to reduce the multiple-testing burden [6]. Second, cis-genes that are associated with the SNPs identified from the first step at a genome-wide false discovery rate (FDR) less than 0.05 were selected as candidate cis-mediators. In all the analyses described below, we assume gene expression data have been normalized and transformed so that the expression values approximately follow a normal distribution. In mediation analysis with a single mediator, we followed the test procedure in [25]. The bootstrap p-value was used to assess significance when testing the single mediation effect (SME). In mediation analysis with multiple mediators, we considered the following model. For the i subject, let Y be the expression level of a trans-gene, X be the SNP genotype coded by the number of minor alleles,  = (M, ⋯⋯, M)be the expression levels of the p cis-genes,  = (C, ⋯⋯, C)be the q covariates. The mediation model is stated below: where is the effect of the p cis-genes on the trans-gene adjusting for the SNP and covariates,  = (α, ⋯⋯, α) is the effect of the SNP on the p cis-genes adjusting for covariates. and are measurement errors on gene expression. Here we assume , , and and are independent, but we allow dependence among cis-genes, i.e., the off-diagonal elements in the covariance matrix Σ can be non-zero [11]. Denote the total mediation effect (TME) as and the component-wise mediation effects (CME) as  = (δ1, ⋯⋯, δ), where [11]. In the following, we focus on the hypothesis tests of TME and CME: where (2) consists of a broader class of null than (3). For example, when δ ’s are nonzero in different directions and sum to 0, the TME is zero while the CME is not. Thus, the CME test is of particular interest in the presence of the cancellation effect, which is evident in the HapMap3 dataset. That is, if a SNP has a positive mediation effect through one cis-gene and a negative mediation effect through another cis-gene, the CME test can be more powerful than the TME test, as demonstrated in the simulation studies. Conventional multivariate tests for CME, such as likelihood ratio test, have limited power when there are a large number of mediators [26]. In our problem, we were less concerned because there are 1 or 2 cis-mediators in majority of the trios (see results in Additional file 2, Additional file 3, Additional file 4, Additional file 5, Additional file 6 and Additional file 7). We used the bootstrap method to assess significance. For comparison, we also tested the SME for each mediator in the trios that have multiple mediators, and the mediation effect was considered to be significant if at least one of the SME tests is significant.

Simulation setup

We conducted simulation studies to evaluate the impact of model misspecification on type I error and statistical power. In detail, we considered three types of model misspecification: Scenario I, the true model has only one mediator while the analysis includes the true mediator and another irrelevant variable as the mediators; Scenario II, the true model has two mediators and the mediation effects are in the same direction; Scenario III, the true model has two mediators and the mediation effects are in opposite direction. In all three scenarios, the performance of the TME, CME, and SME tests are evaluated and compared. We considered sample size of 100 and 300 to mimic the sample size in the HapMap3 single population analysis and combined analysis. Scenario I: The MAF of the SNP is set to 0.3. For the cis-regulatory effect in model (1), α varies from 0.2 to 1, α is fixed at 0.6, and. We assume an exchangeable covariance structure for  with the variance being 1 and the correlation coefficient being 0.2, and ε follows the standard normal distribution. We set in the type I error experiments and in the power evaluation. The parameters are chosen to mimic the effects estimated in the HapMap3 dataset. Scenario II: We set and 0.1 to evaluate the type I error and the power respectively. The other parameters are set the same as in Scenario I. Scenario III: We set and −0.1 to evaluate the type I error and the power respectively. The other parameters are set the same as in Scenario I.

Results

Trans-eQTLs are more likely to associate with multiple cis-genes

Previous studies showed that trans-eQTLs are more likely to associate with cis-genes [2, 7], which lays the foundation for the employment of mediation analysis in trans-eQTL studies. To justify multiple mediators, we hypothesized that trans-eQTLs tend to associate with more than one cis-gene, and validated this hypothesis in the GTEx dataset. In 14 out of the 22 tissues available in the GTEx database, trans-eQTLs were found to be significantly associated with two or more cis-genes, and the sample sizes are all greater than 100 (Table 2). In the remaining 8 tissues, sample size is less than 100 in 4 tissues, and no more than 3 trans-associations were observed in 5 tissues. Consistent with the GTEx dataset, we also observed an enrichment of multiple cis-genes in trans-eQTLs in MKK, YRI, CEU, and the combined samples of African populations and Asian populations in the HapMap3 dataset (Table 3). The only exception is the LWK population, possibly because the power of identifying cis- and trans-eQTLs is limited at the sample size of 83. Thus, multiple mediators are prevalent among trans-eQTLs. In the upcoming sections, we developed and evaluated statistical tests to identify trans-eQTLs in a multiple-mediator setup, and then applied the method in the HapMap3 dataset.
Table 2

Enrichment results in different tissues in the GTEx database

TissueSample sizeCis-association (FDR<0.05)Trans-association (FDR<0.1)Trans-associationaTrans-associationbEmpirical p-value
Adipose subcutaneous29812828414510100.009
Adrenal gland1263960981001
Artery aorta1978537942882243<0.001
Artery tibial256121070912310.156
Brain hypothalamus811504152001
Brain nucleus accumbens (basal ganglia)932449292001
Brain putamen (basal ganglia)8218324011001
Cells transformed fibroblasts272128334065819376<0.001
Colon transverse16958185418060.006
Esophagus mucosa2411089061980122145<0.001
Esophagus muscularis218997653155100.001
Heart left ventricle1906052533301
Lung278106886098080.041
Muscle skeletal361110053259035<0.001
Nerve tibial2561454889301010.288
Pancreas1495156652830244<0.001
Prostate871779941001
Skin not sun-exposed (suprapubic)19672286811060.004
Skin sun-exposed (lower leg)3021306762641325<0.001
Testis157112172720334680.013
Thyroid278155166821202301390<0.001
Whole blood338103623935224<0.001

aTrans-association in which the trans-eQTL is also associated with 1 cis-gene

bTrans-association in which the trans-eQTL is also associated with 2 or more cis-genes

Table 3

Enrichment results in the HapMap3 data

PopulationSample sizeCis-association (FDR<0.05)Trans-association (FDR<0.1)Trans-associationaTrans-associationbEmpirical p-value
LWK8368387101
MKK135178894663<0.001
YRI10718239511810<0.001
LWK+MKK+YRI325564371923564<0.001
CEU107265062101860<0.001
CHB+JPT160429531355629<0.001

aTrans-association in which the trans-eQTL is also associated with 1 cis-gene

bTrans-association in which the trans-eQTL is also associated with 2 or more cis-genes

Enrichment results in different tissues in the GTEx database aTrans-association in which the trans-eQTL is also associated with 1 cis-gene bTrans-association in which the trans-eQTL is also associated with 2 or more cis-genes Enrichment results in the HapMap3 data aTrans-association in which the trans-eQTL is also associated with 1 cis-gene bTrans-association in which the trans-eQTL is also associated with 2 or more cis-genes

Simulation studies

In simulations, we studied the effect of model misspecification in three scenarios of trans-eQTL identification (see Simulation setup in Methods) from two perspectives, the type I error and the statistical power, and we compared three tests, TME, CME, and SME. In Scenario I, the type I error rates did not differ significantly from the nominal level of 0.05 even though a second mediator was falsely included in the analysis (Fig. 2a), and the results were consistent in all three tests we considered. In terms of power, as we expected, the SME test (true model) achieved the highest power, while the TME and CME tests had reduced power due to falsely including a cis-gene that does not mediate the trans-association (Fig. 3a). However, the power difference between SME and TME quickly diminishes as the mediated effect of the true mediator increases. In the mediated trans-eQTL problem, we pre-select trios for mediation test, and there is no guarantee that false mediators are excluded at this step. However, as shown in simulations, the type I error was under control, at the expense of power loss.
Fig. 2

Empirical type I error of the TME, CME, and SME tests, based on 1,000 simulation replicates, α=0.05. a type I error in Scenario I, and b type I error in Scenario II/III. The two horizontal grey dashed lines are the 95% confidence interval (0.0365-0.0635)

Fig. 3

Empirical power of the TME, CME, and SME tests, based on 1,000 simulation replicates, α=0.05. a power in Scenario I, b power in Scenario II, and c power in Scenario III

Empirical type I error of the TME, CME, and SME tests, based on 1,000 simulation replicates, α=0.05. a type I error in Scenario I, and b type I error in Scenario II/III. The two horizontal grey dashed lines are the 95% confidence interval (0.0365-0.0635) Empirical power of the TME, CME, and SME tests, based on 1,000 simulation replicates, α=0.05. a power in Scenario I, b power in Scenario II, and c power in Scenario III In Scenarios II and III, the null hypotheses are identical for each test respectively, thus we presented the type I error results in one graph (Fig. 2b). We can see that the type I error was under control when either one or both mediators are included in the model. In terms of power, the TME test was more powerful than the CME test in Scenario II when the two mediation effects are in the same direction. In contrast, the TME test was less powerful than the CME test in Scenario III when the two mediation effects are in opposite direction. The SME test lost power due to leaving out one of the two cis-mediators, and its power fell between that of TME and CME (Fig. 3b, c). It is noteworthy that for the SME test, we actually performed two separate tests for each of the mediators, and the rejection of either one leads to the final rejection. The results were consistent with [11], where similar directionality effect was reported. The power difference of the three tests increases as the sample size increases. When the mediation effects were in different direction, the power of the TME test declined until α reached 0.6 when the two mediation effects were cancelled. After that, the power of the TME test rose with the value of α, but was still inferior to that of CME and SME (Fig. 3c). In summary, the single mediator model loses power when multiple mediators are present, and the optimal choice of the hypothesis test depends on the unknown directionality of the mediation pathways.

Identification of mediated trans-eQTLs: application to the HapMap3 dataset

We applied the mediation tests to LWK, MKK, YRI, CEU, and the combined samples of African populations and Asian populations in the HapMap3 dataset. In each population, mediated trans-eQTLs with p-values less than 0.05 are shown in Table 4 (more details in Additional file 2, Additional file 3, Additional file 4, Additional file 5, Additional file 6 and Additional file 7). The three tests gave similar results in the single population analysis, perhaps due to the small sample size. In the combined samples of African populations, 291 (24.3%) trans-eQTLs were associated with two or more cis-genes. Among the 248 trans-eQTLs associated with two cis-genes, 70 trios were identified by both the TME and CME tests, 13 trios in which the estimated mediation effects were in the same direction were identified by the TME test but not the CME test, and 17 trios in which the estimated mediation effects were in opposite direction were identified by the CME test but not the TME test. All the 89 trios detected by the SME test were also identified by either the TME or CME test. In total, we identified 11 mediated trans-eQTLs that were not detected by the single mediator analysis (Table 5). In the Asian populations, 254 (26.1%) trans-eQTLs were associated with two or more cis-genes. Among the 195 trans-eQTLs associated with two cis-genes, 33 trios were identified by both the TME and CME tests, 12 trios in which the estimated mediation effects were in the same direction were identified by the TME test but not the CME test, 2 trios were identified by the CME test but not the TME test, and 4 trios in which the estimated mediation effects were in opposite direction were identified by the CME test but not the TME test. All the 45 trios detected by the SME test were also identified by either the TME or CME test. There are 6 mediated trans-eQTLs that were not detected by the single mediator analysis. Similar results were obtained when the trans-eQTL p-value threshold was set at 10-7 (data not shown).
Table 4

Mediated trans-eQTLs with p-value <0.05 in the HapMap3 data

PopulationSample Size# cis-mediators# trios tested# trios with p-value<0.05
TMECMETME+CMESME
LWK831139NAaNANA8
2265354
MKK1351426NANANA54
2324667
420000
YRI1071463NANANA43
24710111111
385555
LWK+MKK+YRI3251905NANANA77
2248838710089
3356111111
471222
511111
CEU1071527NANANA60
221557617573
3141332
CHB+JPT1601721NANANA64
219545395145
3414343
4131111
550000

aNA: the tests are not applicable

Table 5

Mediated trans-eQTLs that were detected by the multiple mediator analysis but not detected by the single mediator analysis in the combined samples of African populations

SNPChr (SNP)PositionCis-gene1Cis-gene2Chr (trans-gene)Trans-gene
rs2024679629259340 ZKSCAN3 PGBD1 17 NCOR1
rs3117327629271373 ZKSCAN3 PGBD1 17 NCOR1
rs3135392632441465 HLA-DRB5 HLA-DRB1 4 RPL34
rs2239804632443746 HLA-DRB5 HLA-DRB1 4 RPL34
rs9270623632597554 HLA-DRB5 HLA-DRB1 4 RPL34
rs642093632614298 HLA-DRB5 HLA-DRB1 4 RPL34
rs2097431632623056 HLA-DRB5 HLA-DRB1 12 ATP5MFP5
rs9272105632632222 HLA-DRB5 HLA-DRB1 4 RPL34
rs109876429127411687 SLC2A8 ZNF79 17 RPL12P38
rs10511793926924623 CAAP1 IFT74 7 BRI3
rs28351872135967194 SETD4 CBR1 3 PCOLCE2
Mediated trans-eQTLs with p-value <0.05 in the HapMap3 data aNA: the tests are not applicable Mediated trans-eQTLs that were detected by the multiple mediator analysis but not detected by the single mediator analysis in the combined samples of African populations

Replication of trans-eQTLs and mediated trans-eQTLs

We demonstrated the replication of trans-eQTLs from LWK, MKK, YRI, and the combined samples of African populations. When the FDR was controlled at 0.1, the trans-eQTLs identified in LWK, MKK, YRI, and the combined samples have a large overlap (Additional file 8). Among the 7 trans-eQTLs identified in LWK, all of them were also identified in another population or the combined samples. Among the 46 trans-eQTLs identified in MKK, 23 of them were also identified in another population or the combined samples. Among the 51 trans-eQTLs identified in YRI, 30 of them were also identified in another population or the combined samples. Additionally, we compared the results with that from a previous study in which the FDR of trans-eQTLs was set at 0.05 [2]. There were 2, 5, 5, and 20 trans-eQTLs identified by our method in LWK, MKK, YRI, and the combined samples respectively that were previously reported (more details in Additional file 9). The relatively low rates of replication with the previous study may be explained by genetic and environmental differences between populations [2]. Next, we evaluated the replication of mediated trans-eQTLs across populations (Additional file 10). 7 of 13 mediated trans-eQTLs identified in LWK were also identified in another population or the combined samples. 13 of 61 mediated trans-eQTLs identified in MKK were also identified in another population or the combined samples. 10 of 59 mediated trans-eQTLs identified in YRI were also identified in another population or the combined samples. For those trans-eQTLs that have inconsistent mediation across populations, it may be due to different gene regulatory mechanisms between populations [27]. Lastly, we observed that trait-associated SNPs are enriched in the mediated trans-eQTLs identified in the combined samples of African populations and Asian populations (Table 6).
Table 6

Enrichment results of trait-associated SNPs in the mediated trans-eQTLs identified in the HapMap3 data

PopulationMediated trans-eQTLTrait associationEnrichment p-value
YesNo
LWKYes2110.013
No13180940641
MKKYes3560.055
No14365975383
YRIYes2530.172
No12836930270
LWK+MKK+YRIYes181551.484 × 10−10
No10901729084
CEUYes24106<2.2 × 10−16
No13403773824
CHB+JPTYes15981.019 × 10−9
No9144531427
Enrichment results of trait-associated SNPs in the mediated trans-eQTLs identified in the HapMap3 data

Examples of mediated trans-eQTLs

The identified trans-eQTLs that are mediated by multiple mediators may bring new biological insight in gene regulation. For example, the RPL34 gene on chromosome 4 was found to be trans-associated with 5 SNPs on chromosome 6 through the mediation of HLA-DRB5 and HLA-DRB1 (Table 5). RPL34 was previously reported to be trans-associated with the SNP rs2395185 in human monocytes [28], and the association was found unique to TLR4 activation, which plays a key role in innate immunity [29]. However, the biological mechanism underlying this trans-association is unknown. Our study identified the mediated trans-association of RPL34 and the SNP rs2239804 which is in LD with rs2395185 (r2 = 0.53), suggesting a mediating pathway of the previously reported trans-regulation of RPL34 (Fig. 4). The SNP rs2395185 and the two cis-mediators, HLA-DRB5 and HLA-DRB1, were reported to be susceptible to ulcerative colitis [30], and the two HLA genes were also identified in the rheumatoid arthritis GWAS [31]. The dysfunction of innate immunity is critically important in the pathogenesis of ulcerative colitis [32] and rheumatoid arthritis [33]. Thus, the identified mediated trans-eQTLs not only suggest a biological mechanism for the trans-association of rs2239804 and RPL34, but also suggest a role of the mediated pathway in the disease etiology of ulcerative colitis and rheumatoid arthritis.
Fig. 4

Mediation diagram of the trans-association between rs2239804 and RPL34

Mediation diagram of the trans-association between rs2239804 and RPL34

Discussion

eQTL studies have shed enormous light on gene regulatory mechanisms. Significant progress has been made to integrate eQTL information with genome-wide association signals to explain SNP-phenotype associations and prioritize genes and variants for functional studies [34-36]. The ongoing efforts such as GTEx and the HapMap Project have greatly expanded current knowledge of eQTLs. However, the identification and interpretation of trans-eQTLs remain a challenging yet important topic. In this work, we developed a computational method to identify trans-eQTLs that are mediated by multiple mediators, and demonstrated its superiority to the single mediator test in mediation analysis. Previous studies considered the identification of cis-transcripts that mediate the effects of trans-eQTLs on distant genes in a single-mediator setting [2, 6, 8], and may be subject to potential model misspecification. One innovative aspect of our work is to employ the multiple-mediator analysis to identify mediated trans-eQTLs. We observed that the associations of trans-eQTLs with more than one cis-gene are prevalent in the GTEx and HapMap3 datasets. Thus, mediation analysis allowing for multiple mediators would be less sensitive to model misspecification, and as a result improve the statistical power of the tests. Applied to the HapMap3 data, our approach allowing for multiple mediators identified 11 mediated trans-eQTLs that were not detected in the single mediator analysis. There are several caveats in our work. First, unmeasured confounders may not be fully accounted for in the mediation analysis due to the biological complexity in gene regulatory networks. The influence of potential confounders was further evaluated in the single-mediator setting [37]. Sensitivity analysis in the mediation with multiple mediators will be investigated in future studies. Second, we cannot make causal claims based on the detected mediation effects because the observed mediations simply explain trans-associations but not establish causal relationships. Third, the selection of cis-gene mediators is completely data-driven in the current study. It would be of great interest to integrate the knowledge of gene networks into the mediation framework.

Conclusions

We implemented a multiple-mediator analysis approach to identify mediated trans-eQTLs. In simulation studies, we illustrated that our method improves the statistical power of identification of mediated trans-eQTLs compared to the single mediator analysis. Furthermore, we identified 11 mediated trans-eQTLs that were not detected by the single mediator analysis in the HapMap3 data. Assumptions in mediation analysis. This document explaining that the assumptions in the multivariate extension of mediation analysis are more likely to be satisfied than that in the single-mediator model. (DOCX 14 kb) Mediated trans-eQTLs in LWK. (XLS 64 kb) Mediated trans-eQTLs in MKK. (XLS 73 kb) Mediated trans-eQTLs in YRI. (XLS 71 kb) Mediated trans-eQTLs in the combined samples of African populations. (XLS 106 kb) Mediated trans-eQTLs in CEU. (XLS 94 kb) Mediated trans-eQTLs in the combined samples of Asian populations. (XLS 91 kb) Venn diagram of trans-eQTLs in African populations. (PNG 81 kb) Overlap of trans-eQTLs with a previous study [2]. (XLS 52 kb) Venn diagram of mediated trans-eQTLs in African populations. (PNG 82 kb)
  11 in total

1.  Joint eQTL mapping and Inference of Gene Regulatory Network Improves Power of Detecting both cis- and trans-eQTLs.

Authors:  Xin Zhou; Xiaodong Cai
Journal:  Bioinformatics       Date:  2021-09-06       Impact factor: 6.931

2.  DeCompress: tissue compartment deconvolution of targeted mRNA expression panels using compressed sensing.

Authors:  Arjun Bhattacharya; Alina M Hamilton; Melissa A Troester; Michael I Love
Journal:  Nucleic Acids Res       Date:  2021-05-07       Impact factor: 16.971

3.  DNA variants affecting the expression of numerous genes in trans have diverse mechanisms of action and evolutionary histories.

Authors:  Sheila Lutz; Christian Brion; Margaret Kliebhan; Frank W Albert
Journal:  PLoS Genet       Date:  2019-11-18       Impact factor: 5.917

4.  A framework for transcriptome-wide association studies in breast cancer in diverse study populations.

Authors:  Arjun Bhattacharya; Montserrat García-Closas; Andrew F Olshan; Charles M Perou; Melissa A Troester; Michael I Love
Journal:  Genome Biol       Date:  2020-02-20       Impact factor: 13.583

5.  MOSTWAS: Multi-Omic Strategies for Transcriptome-Wide Association Studies.

Authors:  Arjun Bhattacharya; Yun Li; Michael I Love
Journal:  PLoS Genet       Date:  2021-03-08       Impact factor: 5.917

6.  Proteins as Mediators of the Association Between Diet Quality and Incident Cardiovascular Disease and All-Cause Mortality: The Framingham Heart Study.

Authors:  Youjin Kim; Sophia Lu; Jennifer E Ho; Shih-Jen Hwang; Chen Yao; Tianxiao Huan; Daniel Levy; Jiantao Ma
Journal:  J Am Heart Assoc       Date:  2021-09-06       Impact factor: 5.501

7.  Parent-of-origin effects propagate through networks to shape metabolic traits.

Authors:  Juan F Macias-Velasco; Celine L St Pierre; Jessica P Wayhart; Li Yin; Larry Spears; Mario A Miranda; Caryn Carson; Katsuhiko Funai; James M Cheverud; Clay F Semenkovich; Heather A Lawson
Journal:  Elife       Date:  2022-03-31       Impact factor: 8.713

8.  Using expression quantitative trait loci data and graph-embedded neural networks to uncover genotype-phenotype interactions.

Authors:  Xinpeng Guo; Jinyu Han; Yafei Song; Zhilei Yin; Shuaichen Liu; Xuequn Shang
Journal:  Front Genet       Date:  2022-08-15       Impact factor: 4.772

Review 9.  Statistical methods for mediation analysis in the era of high-throughput genomics: Current successes and future challenges.

Authors:  Ping Zeng; Zhonghe Shao; Xiang Zhou
Journal:  Comput Struct Biotechnol J       Date:  2021-05-26       Impact factor: 7.271

10.  Linking genotype to phenotype in multi-omics data of small sample.

Authors:  Xinpeng Guo; Yafei Song; Shuhui Liu; Meihong Gao; Yang Qi; Xuequn Shang
Journal:  BMC Genomics       Date:  2021-07-13       Impact factor: 3.969

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.