| Literature DB >> 22905253 |
Ryan Abo1, Gregory D Jenkins, Liewei Wang, Brooke L Fridley.
Abstract
Genetic variation underlying the regulation of mRNA gene expression in humans may provide key insights into the molecular mechanisms of human traits and complex diseases. Current statistical methods to map genetic variation associated with mRNA gene expression have typically applied standard linkage and/or association methods; however, when genome-wide SNP and mRNA expression data are available performing all pair wise comparisons is computationally burdensome and may not provide optimal power to detect associations. Consideration of different approaches to account for the high dimensionality and multiple testing issues may provide increased efficiency and statistical power. Here we present a novel approach to model and test the association between genetic variation and mRNA gene expression levels in the context of gene sets (GSs) and pathways, referred to as gene set - expression quantitative trait loci analysis (GS-eQTL). The method uses GSs to initially group SNPs and mRNA expression, followed by the application of principal components analysis (PCA) to collapse the variation and reduce the dimensionality within the GSs. We applied GS-eQTL to assess the association between SNP and mRNA expression level data collected from a cell-based model system using PharmGKB and KEGG defined GSs. We observed a large number of significant GS-eQTL associations, in which the most significant associations arose between genetic variation and mRNA expression from the same GS. However, a number of associations involving genetic variation and mRNA expression from different GSs were also identified. Our proposed GS-eQTL method effectively addresses the multiple testing limitations in eQTL studies and provides biological context for SNP-expression associations.Entities:
Mesh:
Year: 2012 PMID: 22905253 PMCID: PMC3419168 DOI: 10.1371/journal.pone.0043301
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Summary of gene expression and SNP GS mappings for PharmGKB and KEGG.
| Source | Total genes mapped | Genes per GS | Gene overlap | SNPs mapped | Expression probe sets mapped | ||||||
| Avg. | Max | Min | Avg. | Max | Min | Max | Min | Max | Min | ||
| PharmGKB | 511 | 13.93 | 64 | 2 | 0.76 | 18 | 0 | 4384 | 172 | 192 | 2 |
| KEGG | 5333 | 70.02 | 1100 | 1 | 1.96 | 126 | 0 | 50871 | 35 | 2149 | 1 |
Figure 1Barplots for number of GSs in PharmGKB and KEGG categories.
Top 20 PharmGKB GS-eQTL associations.
| Type | GSexpression | GSSNP | GS-eQTL p-value | FDR | ||||||
| Gene Set | No. Genes | No. Probe sets | No. PCs | Gene Set | No. Genes | No. SNPs | No. PCs | |||
| cis | VEGF Pathway | 15 | 41 | 9 | VEGF Pathway | 15 | 700 | 57 | 1.4×10−23 | 0 |
| trans | EGFR Inhibitors Pathway PD | 64 | 192 | 26 | VEGF Pathway | 15 | 700 | 57 | 1.4×10−12 | 0 |
| trans | Thiopurine Pathway | 32 | 82 | 14 | Antiarrhythmic Drug Pathways | 55 | 3667 | 143 | 5.1×10−11 | 0 |
| cis | Glucocorticoid and Inflammatory genes Pathway PD | 9 | 31 | 6 | Glucocorticoid and Inflammatory genes Pathway PD | 9 | 571 | 31 | 1.5×10−10 | 0 |
| cis | Methotrexate Pathway | 29 | 74 | 16 | Methotrexate Pathway | 29 | 2415 | 87 | 3.9×10−10 | 0 |
| cis | EGFR Inhibitors Pathway PD | 64 | 192 | 26 | EGFR Inhibitors Pathway PD | 64 | 4384 | 130 | 3.9×10−10 | 0 |
| trans | Fluoropyrimidine PK | 24 | 62 | 12 | Antiarrhythmic Drug Pathways | 55 | 3667 | 143 | 7.7×10−10 | 0 |
| trans | EGFR Inhibitors Pathway PD | 64 | 192 | 26 | Antiarrhythmic Drug Pathways | 55 | 3667 | 143 | 7.9×10−9 | 1.1×10−5 |
| cis | Etoposide Pathway | 12 | 33 | 4 | Etoposide Pathway | 12 | 1104 | 47 | 4.3×10−8 | 3.8×10−5 |
| cis | Bisphosphonate Pathway | 19 | 55 | 13 | Bisphosphonate Pathway | 19 | 775 | 51 | 4.9×10−8 | 4.0×10−5 |
| trans | Bisphosphonate Pathway | 19 | 55 | 13 | Antiarrhythmic Drug Pathways | 55 | 3667 | 143 | 5.1×10−8 | 4.0×10−5 |
| trans | Fluoropyrimidine PK | 24 | 62 | 12 | Methotrexate Pathway | 29 | 2415 | 87 | 7.7×10−8 | 4.9×10−5 |
| cis | Statin Pathway Cholesterol and Lipoprotein Transport PD | 26 | 55 | 11 | Statin Pathway Cholesterol and Lipoprotein Transport PD | 26 | 1286 | 83 | 8.0×10−8 | 4.9×10−5 |
| trans | Selective Serotonin Reuptake Inhibitors SSRI Pathway | 28 | 72 | 12 | Antiarrhythmic Drug Pathways | 55 | 3667 | 143 | 9.9×10−8 | 5.2×10−5 |
| cis | Fluoropyrimidine PK | 24 | 62 | 12 | Fluoropyrimidine PK | 24 | 2483 | 83 | 1.1×10−7 | 5.2×10−5 |
| trans | EGFR Inhibitors Pathway PD | 64 | 192 | 26 | Selective Serotonin Reuptake Inhibitors SSRI Pathway | 28 | 1975 | 98 | 1.2×10−7 | 5.2×10−5 |
| cis | Nicotine PD Pathway Dopaminergic Neuron | 20 | 42 | 9 | Nicotine PD Pathway Dopaminergic Neuron | 20 | 1049 | 74 | 1.9×10−7 | 6.3×10−5 |
| trans | Methotrexate Pathway | 29 | 74 | 16 | Antiarrhythmic Drug Pathways | 55 | 3667 | 143 | 2.0×10−7 | 6.3×10−5 |
| cis | Antiarrhythmic Drug Pathways | 55 | 127 | 14 | Antiarrhythmic Drug Pathways | 55 | 3667 | 143 | 2.9×10−7 | 9.3×10−5 |
| trans | VEGF Pathway | 15 | 41 | 9 | EGFR Inhibitors Pathway PD | 64 | 4384 | 130 | 4.1×10−7 | 0.0001 |
Top 20 KEGG GS-eQTL associations after removing the genes within the HLA region.
| Type | GSexpression | GSSNP | GS-eQTL p-value | FDR | ||||||
| Gene Set | No. Genes | No. Probe sets | No. PCs | Gene Set | No. Genes | No. SNPs | No. PCs | |||
| cis | Metabolic pathways | 1100 | 2149 | 67 | Metabolic pathways | 1100 | 50871 | 189 | 7.9×10−85 | <5×10−8 |
| trans | Metabolic pathways | 1100 | 2149 | 67 | Neuroactive ligand receptor interaction | 302 | 14725 | 173 | 2.6×10−58 | <5×10−8 |
| trans | Metabolic pathways | 1100 | 2149 | 67 | Calcium signaling pathway | 178 | 12520 | 172 | 5.3×10−50 | <5×10−8 |
| trans | Metabolic pathways | 1100 | 2149 | 67 | Pathways in cancer | 330 | 18667 | 175 | 1.1×10−45 | <5x10−8 |
| trans | Metabolic pathways | 1100 | 2149 | 67 | Vascular smooth muscle contraction | 125 | 8588 | 162 | 5.9×10−45 | <5x10−8 |
| trans | MAPK signaling pathway | 273 | 683 | 48 | Metabolic pathways | 1100 | 50871 | 189 | 7.5×10−43 | <5×10−8 |
| trans | Metabolic pathways | 1100 | 2149 | 67 | Cytokine cytokine receptor interaction | 278 | 9377 | 162 | 1.1×10−42 | <5×10−8 |
| trans | Metabolic pathways | 1100 | 2149 | 67 | Focal adhesion | 201 | 12846 | 169 | 7.3×10−42 | <5×10−8 |
| trans | Pathways in cancer | 330 | 893 | 50 | Metabolic pathways | 1100 | 50871 | 189 | 1.3×10−39 | <5×10−8 |
| trans | Metabolic pathways | 1100 | 2149 | 67 | Axon guidance | 129 | 9757 | 163 | 2.8×10−39 | <5×10−8 |
| trans | Metabolic pathways | 1100 | 2149 | 67 | Dilated cardiomyopathy | 92 | 7964 | 155 | 4.5×10−39 | <5×10−8 |
| trans | Metabolic pathways | 1100 | 2149 | 67 | Chemokine signaling pathway | 190 | 9422 | 157 | 7.5×10−39 | <5×10−8 |
| cis | Pathways in cancer | 330 | 893 | 50 | Pathways in cancer | 330 | 18667 | 175 | 4.9×10−38 | <5×10−8 |
| Trans | Metabolic pathways | 1100 | 2149 | 67 | Neurotrophin signaling pathway | 126 | 6436 | 146 | 2.1×10−37 | <5×10−8 |
| Trans | Metabolic pathways | 1100 | 2149 | 67 | Arrhythmogenic right ventricular cardiomyopathy ARVC | 76 | 8274 | 158 | 2.7×10−37 | <5×10−8 |
| Trans | Metabolic pathways | 1100 | 2149 | 67 | Hypertrophic cardiomyopathy HCM | 89 | 7492 | 154 | 3.2×10−37 | <5×10−8 |
| Trans | Metabolic pathways | 1100 | 2149 | 67 | Purine metabolism | 159 | 9165 | 159 | 7.2×10−37 | <5×10−8 |
| trans | Pathways in cancer | 330 | 893 | 50 | Neuroactive ligand receptor interaction | 302 | 14725 | 173 | 1.9×10−36 | <5×10−8 |
| trans | MAPK signaling pathway | 273 | 683 | 48 | Pathways in cancer | 330 | 18667 | 175 | 2.1×10−36 | <5×10−8 |
| trans | Metabolic pathways | 1100 | 2149 | 67 | GnRH signaling pathway | 101 | 6256 | 152 | 4.1×10−36 | <5×10−8 |
GS-eQTL associations assessed in HapMap for replication.
Figure 2Heatmaps with points indicating associations (FDR <5%) between SNP (x-axis) and expression (y-axis) GSs.
SNP and expression GSs are indexed based on hierarchical clustering using distances between GSs (distance determined by average proportion of genes shared between GSs). The color of the points indicate the level of association significance (blue = less significant, red = more significant)
Top five SNP and expression GSs involved in the most associations (FDR <5%).
| Source | Data type | Gene Set | No. significant associations |
| EGFR Inhibitors Pathway PD | 31 | ||
| Selective Serotonin Reuptake Inhibitors SSRI Pathway | 29 | ||
| Expression | Methotrexate Pathway | 28 | |
| Doxorubicin Pathway | 26 | ||
| PharmGKB | Antiarrhythmic Drug Pathways | 22 | |
| Antiarrhythmic Drug Pathways | 32 | ||
| Taxane Pathway | 23 | ||
| SNP | Imatinib | 19 | |
| Doxorubicin Pathway | 18 | ||
| Fluoropyrimidine PK | 16 | ||
| Pathways in cancer | 142 | ||
| Metabolic pathways | 141 | ||
| Expression | Cysteine and methionine metabolism | 136 | |
| ABC transporters | 131 | ||
| KEGG | Insulin signaling pathway | 130 | |
| Calcium signaling pathway | 147 | ||
| Tyrosine metabolism | 142 | ||
| SNP | Ether lipid metabolism | 142 | |
| Nucleotide excision repair | 141 | ||
| Antigen processing and presentation | 141 |
Figure 3Scatter plot of the number of associations (FDR <5%) each GS was involved in against the average distance between each GS and GSs associated with it.
Figure 4Boxplots of log transformed p-values for GS-eQTL association results by category.
(A) PharmGKB and (B) KEGG.
Figure 5Boxplots of log transformed p-values for cis- and trans-GS association results.
(A) PharmGKB and (B) KEGG.