| Literature DB >> 26019233 |
Holger Kirsten1, Hoor Al-Hasani2, Lesca Holdt3, Arnd Gross4, Frank Beutner5, Knut Krohn6, Katrin Horn4, Peter Ahnert4, Ralph Burkhardt7, Kristin Reiche8, Jörg Hackermüller8, Markus Löffler4, Daniel Teupser3, Joachim Thiery7, Markus Scholz9.
Abstract
Genetics of gene expression (eQTLs or expression QTLs) has proved an indispensable tool for understanding biological pathways and pathomechanisms of trait-associated SNPs. However, power of most genome-wide eQTL studies is still limited. We performed a large eQTL study in peripheral blood mononuclear cells of 2112 individuals increasing the power to detect trans-effects genome-wide. Going beyond univariate SNP-transcript associations, we analyse relations of eQTLs to biological pathways, polygenetic effects of expression regulation, trans-clusters and enrichment of co-localized functional elements. We found eQTLs for about 85% of analysed genes, and 18% of genes were trans-regulated. Local eSNPs were enriched up to a distance of 5 Mb to the transcript challenging typically implemented ranges of cis-regulations. Pathway enrichment within regulated genes of GWAS-related eSNPs supported functional relevance of identified eQTLs. We demonstrate that nearest genes of GWAS-SNPs might frequently be misleading functional candidates. We identified novel trans-clusters of potential functional relevance for GWAS-SNPs of several phenotypes including obesity-related traits, HDL-cholesterol levels and haematological phenotypes. We used chromatin immunoprecipitation data for demonstrating biological effects. Yet, we show for strongly heritable transcripts that still little trans-chromosomal heritability is explained by all identified trans-eSNPs; however, our data suggest that most cis-heritability of these transcripts seems explained. Dissection of co-localized functional elements indicated a prominent role of SNPs in loci of pseudogenes and non-coding RNAs for the regulation of coding genes. In summary, our study substantially increases the catalogue of human eQTLs and improves our understanding of the complex genetic regulation of gene expression, pathways and disease-related processes.Entities:
Mesh:
Year: 2015 PMID: 26019233 PMCID: PMC4512630 DOI: 10.1093/hmg/ddv194
Source DB: PubMed Journal: Hum Mol Genet ISSN: 0964-6906 Impact factor: 6.150
Distribution of eQTLs (FDR ≤ 5%) at different significance cut-offs
| max. | min. | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| 0.00285 | >0.0042 | 1,739 991 | 779 042 (30) | 81 148 (28) | 11 098 (83) | ||||
| <10−5 | >0.0092 | 940 389 | 483 797 (18) | 40 314 (14) | 6 718 (50) | ||||
| <1.02 × 10−7 | >0.013 | 709 956 | 393 740 (15) | 30 225 (11) | 5788 (43) | 100 241 | 38 034 (1.4) | 3800 (1.3) | 2354 (18) |
| <10−10 | >0.02 | 519 833 | 311 001 (12) | 22 045 (8) | 4884 (37) | 58 072 | 23 809 (0.91) | 1356 (0.47) | 600 (4.5) |
| <10−15 | >0.03 | 360 548 | 231 440 (8.8) | 14 939 (5) | 3977 (30) | 31 660 | 15 732 (0.6) | 820 (0.28) | 374 (2.8) |
| <10−20 | >0.04 | 274 719 | 184 139 (7) | 11 093 (4) | 3366 (25) | 20 943 | 11 629 (0.44) | 553 (0.19) | 269 (2) |
| <10−50 | >0.1 | 103 318 | 77 368 (2.9) | 3705 (1.3) | 1747 (13) | 5772 | 3846 (0.15) | 131 (0.045) | 77 (0.58) |
| <10−100 | >0.19 | 41 375 | 32 415 (1.2) | 1309 (0.46) | 924 (6.9) | 1864 | 1579 (0.06) | 38 (0.013) | 28 (0.21) |
| <10−200 | >0.35 | 14 257 | 10 995 (0.42) | 420 (0.15) | 396 (3) | 955 | 869 (0.033) | 9 (0.003) | 11 (0.082) |
| <10−300 | >0.48 | 6971 | 5606 (0.21) | 221 (0.08) | 223 (1.7) | 821 | 754 (0.029) | 5 (0.002) | 7 (0.052) |
R2 corresponds to the variance of the transcription levels explained by corresponding eSNPs. Note that a gene can be both, cis- and trans-associated. After all pre-processing and filtering steps, we analysed a total of 2 625 374 autosomal SNPs and 18 738 expression probes within 2112 individuals. Pruning was done separately for cis- and trans-eQTLs.
Figure 1.Distances between eSNPs and regulated genes. Histogram of the distance in kilobase between eSNPs and transcription start sites (show at the left side) and between eSNPs and transcription end sites of corresponding genes (shown at the right sight). Dark grey bars represent start and end of transcribed regions. Vertical lines and adjacent numbers are percentiles of all upstream and downstream distances found within 5 Mb 3′ from TSS and 5′ from TES. The upper panels show all eSNPs at FDR ≤ 5%, the lower panels are restricted to the strongest eSNP per regulated gene.
Enrichment of KEGG pathways within regulated genes of GWAS-traits
| GWAS trait | GWAS- | KEGG-Term | Found in trait (%) | Enrich-ment factor | Genes fond in pathway | |
|---|---|---|---|---|---|---|
| Acute lymphoblastic leukaemia (childhood) | 6 × 10−46–9 × 10−06 | PPAR signaling pathway | 4 (8.5) | 19 | 4.1 × 10−05 | |
| Asthma and hay fever | 5 × 10−12–2 × 10−06 | Cytokine–cytokine receptor interaction | 10 (6.0) | 9 | 2.8 × 10−08 | |
| Bitter taste perception | 2 × 10−62–3 × 10−08 | Taste transduction | 3 (13.6) | 112 | 1.3 × 10−06 | |
| Blood pressure measurement (cold pressor test) | 4 × 10−09–3 × 10−06 | RNA polymerase | 2 (8.7) | 178 | 3.0 × 10−05 | |
| Comprehensive strength and appendicular lean mass | 2 × 10−07–8 × 10−07 | Biosynthesis of unsaturated fatty acids | 2 (10.5) | 216 | 2.0 × 10−05 | |
| Economic and political preferences (immigration/crime) | 2 × 10−06–6 × 10−06 | Steroid hormone biosynthesis | 3 (14.3) | 195 | 1.2 × 10−07 | |
| 1 × 10−18–2 × 10−08 | Rheumatoid arthritis | 7 (9.9) | 18 | 4.8 × 10−08 | ||
| Lipoprotein-associated phospholipase A2 activity and mass | 2 × 10−23–5 × 10−06 | Drug metabolism: cytochrome P450 | 3 (9.7) | 40 | 4.5 × 10−05 | |
| Mean platelet volume | 1 × 10−103–7 × 10−06 | ECM-receptor interaction | 11 (20.0) | 6 | 9.1 × 10−07 | |
| Metabolite levels (HVA-5-HIAA Factor score) | 2 × 10−06–6 × 10−06 | Fatty acid elongation in mitochondria | 2 (28.6) | 585 | 2.5 × 10−06 | |
| Platelet counts | 3 × 10−54–7 × 10−06 | Focal adhesion | 17 (12.0) | 3 | 3.9 × 10−05 | |
| Serum uric acid levels | 1 × 10−80–3 × 10−06 | Systemic lupus erythematosus | 10 (11.8) | 44 | 9.3 × 10−17 | |
| Stearic acid (18:0) plasma levels | 1 × 10−20–5 × 10−06 | Glutathione metabolism | 4 (10.0) | 37 | 2.4 × 10−06 | |
| Type 1 diabetes autoantibodies | 2 × 10−111–2 × 10−06 | Sulphur metabolism | 3 (42.9) | 47 | 2.3 × 10−05 |
Found in trait (%): Number of genes belonging to the KEGG-Term that are also regulated by an eSNP. The percentage relates to all genes belonging to the KEGG-Term. Enrichment: Found genes versus the genes expected without any enrichment. P-value: nominal enrichment P-value. Asterisks indicate novel identified eQTLS, the ‘+’ sign indicates that the respective gene was not mentioned as ‘reported gene’ or ‘mapped gene’ in the GWAS-catalogue.
Trans-clusters that are correlated with GWAS SNPs
| eSNP | Chr | GWAS phenotype | GWAS SNP | GWAS reported genes | Mean correl. Change, % | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| rs34856868 | 1 | Obesity-related traits | rs34856868 | 1 | 1.1 × 10−45 | 100 | 7 | −4.4 | |||
| rs17616434 | 4 | Alcohol consumption, Allergic sensitization, Asthma and hay fever, | rs10004195, rs17616434, rs2101521, rs4543123, rs4833095 | 0.88–1 | 9.8 × 10−59 | 100 | 38 | −4.8 | |||
| rs9275698 | 6 | Asthma | rs9275698 | 1 | 4.3 × 10−25 | 80 | 5 | −6.3 | |||
| rs3132468 | 6 | Dengue shock syndrome | rs3132468 | 1 | 3.1 × 10−52 | 80 | 6 | −11 | |||
| rs2858870 | 6 | Nodular sclerosis Hodgkin lymphoma | rs204999, rs2858870, rs6903608, rs9268528, rs9268542 | 1 | 3.8 × 10−122 | 87.5 | 9 | −2.3 | |||
| rs2293889 | 8 | HDL cholesterol | rs2293889 | 1 | 7.1 × 10−28 | 100 | 5 | −5.2 | |||
| rs5016282 | 11 | Attention-deficit hyperactivity disorder | rs5016282 | 1 | 4.1 × 10−18 | 100 | 3 | −5.2 | |||
| rs10876864 | 12 | Alopecia areata, Asthma, Polycystic ovary syndrome, Rheumatoid arthritis, Type 1 diabetes and autoantibodies, Vitiligo | rs10876864, rs11171739, rs1701704, rs2292239, rs2456973, rs705702, rs773125 | 0.6–1 | <10−220 | 76.9 | 13 | −42 | |||
| rs10512472 | 17 | Mean platelet volume, Platelet counts | rs10512472, rs16971217 | 1 | 2.6 × 10−09 | 77.8 | 9 | −0.4 | |||
| rs3027234 | 17 | Parkinson's disease Telomere length | rs3027234, rs3027247 | 0.67–1 | 7.5 × 10−37 | 100 | 3 | −19 |
Shown are trans-clusters with >70% novel trans-regulated genes correlated with a GWAS-SNP. Regulated genes are ordered according to explained variance. Asterisk (*): novel regulated genes R2: linkage disequilibrium between eSNP and GWAS SNP, p-value: MANOVA-p-value of eSNP and all trans-regulated transcripts, mean correl. change: relative change of the correlation between trans-regulated transcripts when adjusting expression levels on the eSNP considered.
Figure 2.Estimating the gap between explained and predicted heritability of gene expression. To estimate the gap between explained and predicted heritability of gene expression, we compared the explained variance of gene expression of combined eSNPs versus the genetic variance of gene-expression levels resulting from all imputed SNPs (CW-heritability). This is shown at the left side for all SNPs, in the middle for all SNPs found on the chromosome, where the regulated transcript is located (cis-regulation), and at the right side for all SNPs found on all chromosomes, where the regulated transcript is not located (trans-regulation). Triangles indicate transcripts with significant genome-wide CW-heritability (P ≤ 0.05). For each graph, a loess-estimator including confidence bounds is shown. Note that, for convenience, the ordinate in (C) is log10-transformed. Transcripts with an explained variance of combined trans-eSNPs of zero are shown at the bottom of the graph.
Enrichment of functionally annotated genomic loci co-localizing with eSNPs
| Annotation | OR | Overlap | |
|---|---|---|---|
| Exons of long non-coding RNAs | |||
| Cabili | 1.13 (1.08–1.18) | 4.3 × 10−8 | 4041 |
| Introns of long non-coding RNAs | |||
| Gencode v13 | 0.77 (0.75–0.8) | 1.9 × 10−74 | 8688 |
| Cabili | 0.83 (0.81–0.84) | 2.5 × 10−75 | 17 311 |
| Bona fide non-coding RNAs regulated in cell cycle, TP53 pathway or STAT3 pathway | |||
| Cell cycle (transcript located in introns of protein-coding genes) | 1.48 (1.33–1.65) | 3.4 × 10−13 | 783 |
| TP53 (transcript located in intergenic space) | 1.51 (1.2–1.9) | 3.0 × 10−4 | 181 |
| TP53 (transcript located in introns of protein-coding genes) | 1.61 (1.5–1.74) | 9.8 × 10−38 | 1714 |
| Bona fide non-coding genomic regions predicted to contain conserved secondary structure motifs | |||
| SISSIz (motif located in intron of protein-coding gene) | 1.35 (1.31–1.39) | 4.2 × 10−94 | 10 658 |
| RNAz (motif located in intergenic space) | 1.16 (1.12–1.22) | 1.1 × 10−12 | 4459 |
| RNAz (motif located in intron of protein-coding gene) | 1.58 (1.53–1.65) | 2.3 × 10−128 | 6555 |
| miRNA target sites | |||
| TsmiRNA (conserved miRNA target sites–UCSC track) | 1.66 (1.37–2) | 6.5 × 10−8 | 277 |
| Novel transcripts with putative coding function | |||
| Exons of transcripts of uncertain coding potential (TUCP) | 1.26 (1.19–1.34) | 7.2 × 10−15 | 2414 |
| Predicted ORF in Intergenic space (RNAcode) | 1.27 (1.19–1.35) | 1.7 × 10−12 | 1897 |
| Pseudogenes | |||
| Gencode v13 | 1.35 (1.32–1.39) | 2.1 × 10−125 | 13 933 |
| Protein-coding gene annotation (Gencode v13) | |||
| 5′UTRs | 1.57 (1.52–1.62) | 7.2 × 10−188 | 10 084 |
| Coding exons | 1.43 (1.4–1.45) | 4.0 × 10−288 | 24 729 |
| 3′UTRs | 1.54 (1.51–1.57) | <1 × 10−220 | 24 828 |
| Intergenic space | 0.9 (0.9–0.91) | 4.3 × 10−93 | 174 239 |
| Intron | 1.33 (1.32–1.34) | <1 × 10−220 | 180 540 |
| Regulatory sites | |||
| CpG islands (UCSC track) | 1.54 (1.5–1.59) | 2.5 × 10−186 | 10 779 |
| Most conserved sequences (MCS, UCSC track) | 1.21 (1.19–1.22) | 1.2 × 10−140 | 42 451 |
| Open source for Regulatory Annotation (OregAnno, UCSC track) | 1.26 (1.23–1.3) | 2.4 × 10−52 | 9074 |
| Promoter regions (2 kb upstream of 5′UTR) | 1.51 (1.49–1.53) | <1 × 10−220 | 46 383 |
| Promoter regions (5 kb upstream of 5′UTR) | 1.5 (1.48–1.51) | <1 × 10−220 | 69 202 |
| Pol-II binding sites (ENCODE) | 1.44 (1.43–1.46) | <1 × 10−220 | 85 931 |
| Transcription factor binding sites (ENCODE) | 1.27 (1.25–1.28) | <1 × 10−220 | 102 616 |
| Transcription factors from Transfac database | 1.17 (1.15–1.19) | 2.6 × 10−66 | 26 801 |
| DNaseI hypersensitivity sites (ENCODE) | 1.24 (1.23–1.25) | <1 × 10−220 | 115 291 |
| Chromatin marks associated with enhancer or promoter sites (ENCODE) | |||
| H3K4 monomethylation | 1.25 (1.24–1.27) | <1 × 10−220 | 220 210 |
| H3K4 trimethylation | 1.4 (1.38–1.41) | <1 × 10−220 | 108 138 |
| H3K27 acetylation | 1.36 (1.34–1.37) | <1 × 10−220 | 135 114 |
| Chromatin marks associated with active regions of POL-II transcripts (ENCODE) | |||
| H3K36 trimethylation | 1.55 (1.54–1.57) | <1 × 10−220 | 213 995 |
| Chromatin marks associated with repressed regions of POL-II transcripts (ENCODE) | |||
| H3K27 trimethylation | 0.81 (0.8–0.82) | <1 × 10−220 | 251 569 |
OR, odds ratios; P-value, P-value of Fisher's Exact Test; Overlap, number of eSNPs overlapping with an annotation. A non-coding transcript is bona fide non-coding if it does not exhibit any evidence for open-reading frames or any sequence similarity to known amino acid coding sequences. Within this analysis, 787 378 unique eSNPs were included. In this table, enriched or depleted categories are reported if significance level was smaller than 0.05 after Bonferroni correction for 42 categories considered.
Examples of pseudogenes co-located with an eSNP that regulates the pseudogene's parent gene
| Pseudogene ID | Pseudogene position | Pseudogene biotype | eSNPs | Regulated gene = pseudogene parent gene | Regulated gene position | Corresponding GWAS trait |
|---|---|---|---|---|---|---|
| ENST00000427240.1 | chr1: 39 997 510–40 024 379 (−) | Unprocessed | rs2746050 (0.006) | chr1: 40 204 517–40 229 585 (+) | C-reactive protein (rs12037222–rs2746050, | |
| ENST00000428767.1 | chr2: 73 898 157–73 912 212 (+) | Unprocessed | rs1052162 (0.027), rs10206899 (0.022) | chr2: 73 612 886–73 837 046 (+) | Chronic kidney disease (rs13538–rs10206899, | |
| ENST00000475455.1 | chr3: 133 407 036–133 431 646 (+) | Unprocessed | rs1006097 (0.005) | chr3: 133 464 977–133 497 849 (+) | Iron status biomarkers (rs2718812–rs1006097, | |
| ENST00000377662.2 | chr6: 26 422 347–26 431 843 (+) | Processed | rs6456723 (0.005) | chr6: 26 458 189–26 469 865 (+) | Iron levels (rs17342717–rs6456723, | |
| ENST00000435769.1 | chr7: 72 040 483–72 298 654 (−) | Unprocessed | rs3015844 (0.105), rs13238203 (0.005) | chr7: 66 461 817–66 704 496 (+) | Subcutaneous adipose tissue (rs2058059–rs3015844, | |
| ENST00000415709.1 | chr22: 25 851 679–25 855 648 (+) | Unprocessed | rs6423498 (0.373) | chr22: 25 615 612–25 627 836 (+) | Bipolar disorder (mood-incongruent) (rs1930961–rs6423498, |
Pseudogenes were restricted to those reported to be transcribed (58), additionally, a corresponding GWAS trait had to exist. Pseudogene biotype: ‘processed’, pseudogene originates from retrotransposition; ‘unprocessed’, pseudogene originates from gene-duplication (58); eSNPs, all co-localized eSNPs that also are associated with expression levels of the pseudogene's parent gene. Values following SNP-Ids show explained variance of the regulated gene's expression level, corresponding GWAS phenotype, GWAS phenotype with a GWAS SNP in LD with the eSNPs. LD between GWAS-SNPs and eSNPs is shown in hyphens.