| Literature DB >> 24956270 |
Maiko Narahara1, Koichiro Higasa2, Seiji Nakamura3, Yasuharu Tabara2, Takahisa Kawaguchi2, Miho Ishii3, Kenichi Matsubara3, Fumihiko Matsuda2, Ryo Yamada1.
Abstract
Profiles of sequence variants that influence gene transcription are very important for understanding mechanisms that affect phenotypic variation and disease susceptibility. Using genotypes at 1.4 million SNPs and a comprehensive transcriptional profile of 15,454 coding genes and 6,113 lincRNA genes obtained from peripheral blood cells of 298 Japanese individuals, we mapped expression quantitative trait loci (eQTLs). We identified 3,804 cis-eQTLs (within 500 kb from target genes) and 165 trans-eQTLs (>500 kb away or on different chromosomes). Cis-eQTLs were often located in transcribed or adjacent regions of genes; among these regions, 5' untranslated regions and 5' flanking regions had the largest effects. Epigenetic evidence for regulatory potential accumulated in public databases explained the magnitude of the effects of our eQTLs. Cis-eQTLs were often located near the respective target genes, if not within genes. Large effect sizes were observed with eQTLs near target genes, and effect sizes were obviously attenuated as the eQTL distance from the gene increased. Using a very stringent significance threshold, we identified 165 large-effect trans-eQTLs. We used our eQTL map to assess 8,069 disease-associated SNPs identified in 1,436 genome-wide association studies (GWAS). We identified genes that might be truly causative, but GWAS might have failed to identify for 148 out of the GWAS-identified SNPs; for example, TUFM (P = 3.3E-48) was identified for inflammatory bowel disease (early onset); ZFP90 (P = 4.4E-34) for ulcerative colitis; and IDUA (P = 2.2E-11) for Parkinson's disease. We identified four genes (P<2.0E-14) that might be related to three diseases and two hematological traits; each expression is regulated by trans-eQTLs on a different chromosome than the gene.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24956270 PMCID: PMC4067418 DOI: 10.1371/journal.pone.0100924
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Cis-eQTL map.
–log10 P values of cis-eQTLs are plotted against the respective chromosomal positions. eQTLs for mRNA transcripts are shown in red; lincRNA transcripts are shown in green; and other transcripts are shown in black. The vertical dashed lines separate chromosomes.
Summary statistics and counts of cis- and trans-eQTLs at thresholds by R2 or |β|.
|
|
| ||||||||
| All | mRNA | lincRNA | Other | All | mRNA | lincRNA | Other | ||
|
| 3,804 | 2,995 | 293 | 516 | 165 | 91 | 49 | 25 | |
|
| 3,385 | 2,779 | 244 | 440 | 105 | 65 | 34 | 21 | |
|
| 3,804 (12.5%) | 2,995 (15.1%) | 293 (4.8%) | 516 (11.6%) | 114 (0.4%) | 60 (0.3%) | 34 (0.6%) | 20 (0.4%) | |
|
| 2,973 (16.9%) | 2,667 (17.3%) | 0 | 357 (10.9%) | 74 (0.4%) | 57 (0.4%) | 0 | 17 (0.5%) | |
|
| 455 | 28 | 293 | 134 | 54 | 1 | 49 | 4 | |
|
| 0.19±0.15 | 0.19±0.15 | 0.20±0.16 | 0.21±0.18 | 0.27±0.12 | 0.27±0.12 | 0.29±0.13 | 0.23±0.07 | |
|
| 0.13±0.15 | 0.13±0.14 | 0.13±0.15 | 0.15±0.17 | 0.23±0.12 | 0.23±0.12 | 0.26±0.11 | 0.21±0.09 | |
|
| 0.33±0.33 | 0.32±0.32 | 0.38±0.33 | 0.38±0.34 | 0.53±0.35 | 0.50±0.38 | 0.59±0.31 | 0.51±0.33 | |
|
| 0.24±0.24 | 0.23±0.23 | 0.30±0.31 | 0.28±0.25 | 0.47±0.41 | 0.40±0.44 | 0.58±0.34 | 0.43±0.34 | |
|
|
| ||||||||
|
|
| 2,568 | 2,022 | 192 | 354 | 165 | 91 | 49 | 25 |
|
| 665 | 500 | 56 | 109 | 45 | 25 | 15 | 5 | |
|
| 245 | 171 | 22 | 52 | 10 | 5 | 5 | 0 | |
|
| 67 | 43 | 8 | 16 | 2 | 2 | 0 | 0 | |
|
| 2 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | |
|
|
| 1,440 | 1,053 | 146 | 241 | 118 | 60 | 40 | 18 |
|
| 155 | 122 | 11 | 22 | 23 | 13 | 6 | 4 | |
|
| 55 | 44 | 3 | 8 | 3 | 1 | 1 | 1 | |
|
| 26 | 19 | 2 | 5 | 1 | 1 | 0 | 0 | |
|
| 12 | 9 | 1 | 2 | 1 | 1 | 0 | 0 | |
FDR: false discovery rate; FWER: family-wise error rate; SD: standard deviation; IQR: inter-quartile range; R 2: proportion of phenotypic variances explained by genotypes; |β|: absolute value of coefficient of genotypes.
The sum of #unique eQTLs counted within RNA types is not necessarily equal to #unique eQTLs counted for all transcripts because the same eQTLs may be counted in more than one RNA types. The number of genes for All and each type do not match for a similar reason.
Figure 2Histograms of effects of eQTLs.
A, B) Histograms of |β| values (A) and of R 2 values (B) of cis-eQTLs are shown. C, D) Histograms of |β| values (C) and R 2 values (D) of trans-eQTLs are shown.
Counts and proportions of gene structure-based categories and protein consequences in local SNPs and cis-eQTLs.
| Mean effect | ||||||
| Categories | Local SNPs (%) |
| Enrich | β |
| |
| Intergenic | 10,268,814 (93.11) | 1,523 (50.85) | 0.55 | 0.31 | 0.17 | |
| Genic | 716,576 (6.50) | 1,370 (45.74) | 7.04 | 0.33 | 0.21 | |
| Exonic | 25,822 (0.23) | 109 (3.64) | 15.54 | 0.32 | 0.19 | |
| Splicing | 28 (0.00) | 0 (0.00) | - | - | - | |
| Intronic | 633,398 (5.74) | 889 (29.68) | 5.17 | 0.33 | 0.20 | |
| 3′ UTR | 29,609 (0.27) | 188 (6.28) | 23.38 | 0.28 | 0.22 | |
| 5′ UTR | 3,965 (0.04) | 45 (1.50) | 41.79 | 0.41 | 0.22 | |
| Upstream | 11,727 (0.11) | 89 (2.97) | 27.95 | 0.47 | 0.23 | |
| Downstream | 12,027 (0.11) | 50 (1.67) | 15.31 | 0.27 | 0.19 | |
| N.A. | 42,870 (0.39) | 102 (3.41) | 8.76 | 0.38 | 0.21 | |
|
|
|
|
|
|
| |
| Exonic | nonsyn | 11,739 (45.46) | 52 (47.71) | 1.05 | 0.35 | 0.20 |
| syn | 13,662 (52.91) | 53 (48.62) | 0.92 | 0.29 | 0.18 | |
| stopgain | 52 (0.20) | 2 (1.83) | 9.11 | 0.45 | 0.24 | |
| stoploss | 10 (0.04) | 0 (0.00) | - | - | - | |
| N.A. | 359 (1.39) | 2 (1.83) | 1.32 | 0.32 | 0.19 | |
|
|
|
|
|
|
| |
Local SNPs and cis-eQTLs that affect mRNA transcripts are counted within each gene-based functional category (upper panel) and for each protein consequence (lower panel).
Enrich: the fold change in proportion that each group constitutes among cis-eQTLs compared to among all local SNPs.
The category “Exonic” does not include 5′ and 3′ untranslated regions (UTRs); “Upstream” and “Downstream” each includes regions within 1 kb from transcription start or end sites of genes, respectively; “Splicing” includes SNPs 2 bp from exon-intron splicing junctions and within an intron; SNPs 2 bp from a splice junction and within an exon are designated “Exonic”); “Intronic” includes SNPs in introns, but not those 2 bp from exon-intron splicing junctions; “nonsyn” indicates a SNP in an Exonic that is non-synonymous; “syn” indicates an SNP in an Exonic that is synonymous; “stopgain” indicates an SNP in an Exonic and with a variant that causes the creation of stop codon; “stoploss” indicates an SNP in an Exonic and with a variant that eliminates a stop codon.
N.A. means “Not Available” and includes SNPs that were found in a gene, but that could not be assigned to a specific functional category.
Totals for gene-structure-based classification and protein consequences are shown in bold font.
Figure 3Cumulative curves of effect magnitudes of cis-eQTLs in gene-structure-based functional categories.
Cumulative curves represent the distributions of |β| values or R 2 values of cis-eQTLs in each category. Cumulative distribution of all cis-eQTLs (A–D) or all exonic cis-eQTLs (E–F) are shown in grey. The X axis is a log scale. A, B) Distributions of genic and intergenic cis-eQTLs for |β| values (A) or for R 2 values (B). C, D) Distributions of genic subcategories and intergenics for |β| values (C) or for R 2 values (D). E, F) Distributions of nonsynonymous and synonymous eQTLs for |β| values (E) and for R 2 values (F).
Figure 4Trend in effects associated with regulatory classes of intergenic cis-eQTLs.
The box-and-whisker plots show distributions of |β| values (A) or of R 2 values (B) of intergenic cis-eQTLs that affect mRNA transcripts for regulatory classes defined by the RegulomeDB. A cross indicates the mean effect of each class. The number of cis-eQTLs belonging to each class is shown in the parentheses following the class name. Jonckheere-Terpstra permutation test was used to test each trend, and the results are shown under the box-and-whisker plots. N.A.: not available.
Figure 5Relationships between effects of cis-eQTLs and distance from genes.
|β| values (A) and R 2 values (D) of cis-eQTLs that affect mRNA transcripts are plotted against distances from the respective target genes by scatter (non-transcribed regions) and by box-and-whisker plots (transcribed regions). eQTLs in transcribed regions are shown for each gene-structure-based category. The number following each category name represents the number of cis-eQTLs classified into that category. Negative distance values indicate that the eQTL is upstream of the target gene, and positive values indicate that it is downstream, regarding transcriptional directions. Distributions of distances are represented by box-and-whisker plots below the scatters. Magnified view for <5 kb of genes is shown for |β| (B) and R 2 (E). Cumulative distribution of |β| (C) and R 2 (F) of eQTLs are shown for each division of eQTLs; each division represent a defined distance (kb) from the respective target gene. The number in the parentheses following each distance range in the legend is the number of cis-eQTLs identified in that range. The X-axis is a log scale. One eQTL located within a gene (C16orf55) that was assigned function of “downstream” is shown as “unknown”; therefore, the number of “In gene” eQTLs shown in (A) and (D) is the sum of the numbers of Exonic, Splicing, Intronic, 5′ UTR, 3′ UTR, and N.A. in Table 2 plus 1.
Multi-regulatory cis-eQTLs and trans-eQTLs.
| LD block | ||||||||
| eQTL | Chr | Position | MAF | HWE-P | Start | End | Length | Gene Symbol |
|
| ||||||||
| rs7522860 | 1 | 156,275,281 | 0.49 | 0.644 | 156,208,230 | 156,314,627 | 106,398 |
|
| rs6464103 | 7 | 150,478,385 | 0.37 | 0.711 | 150,476,888 | 150,478,385 | 1,498 |
|
| rs4390300 | 10 | 60,144,207 | 0.47 | 0.817 | 60,144,207 | 60,168,003 | 23,797 |
|
| rs2416549 | 12 | 11,325,804 | 0.24 | 0.116 | 11,045,512 | 11,349,454 | 303,943 |
|
| rs35969491 | 12 | 11,339,020 | 0.24 | 0.084 | 11,045,512 | 11,349,454 | 303,943 |
|
| rs7226263 | 17 | 44,814,884 | 0.32 | 0.111 | 44,788,310 | 44,853,872 | 65,563 |
|
|
| ||||||||
| rs116711766 | 1 | 160,093,165 | 0.075 | 0.3909 | 160,093,165 | 160,093,165 | 1 |
|
| rs11718621 | 3 | 40,362,122 | 0.288 | 1.0000 | 40,362,122 | 40,463,063 | 100,942 |
|
| rs6773917 | 3 | 40,469,254 | 0.492 | 0.4881 | 40,373,259 | 40,498,845 | 125,587 |
|
| rs7801498 | 7 | 102,089,595 | 0.368 | 0.8039 | 102,089,595 | 102,089,595 | 1 |
|
| rs10873415 | 14 | 92,558,171 | 0.380 | 0.0097 | 92,434,957 | 92,558,171 | 123,215 |
|
Chr, Position: chromosomal positions of eQTLs; MAF: minor allele frequency; HWE-P: Hardy-Weinberg Equilibrium test P value; LD block: range in which SNPs in LD (r 2>0.8) with the eQTLs exist.
Summary of GWAS records associated with Crohn's disease and eQTL mapping results.
| Suggested genes | SNPs | eQTL statistics | Top local SNP for GWAS gene | ||||||||||
| Case | Record | GWAS | eQTL | GWAS | eQTL |
| β |
|
| SNP | β |
|
|
| Case 1 | 1 |
|
| rs415890 | rs400837 | 0.99 | −0.36 | 2.7E-39 | 0.87 | Not tested | |||
| 2 |
|
| rs102275 | rs108499 | 0.97 | 0.16 | 3.2E-10 | 0.74 | rs174570 | 0.17 | 6.2E-07 | 0.99 | |
| 3 |
|
| rs694739 | rs600377 | 0.85 | −0.26 | 1.0E-06 | 0.51 | rs2286614 | 0.42 | 4.5E-23 | 0.01 | |
|
| rs641811 | 0.06 | n.s. | 0.01 | |||||||||
| 4 |
|
| rs2872507 | rs1008723 | 0.98 | −0.38 | 6.9E-38 | 0.81 | rs56030650 | 0.05 | n.s. | 0.01 | |
|
| rs62065216 | –0.09 | n.s. | 0.01 | |||||||||
|
| rs1054609 | –0.18 | 3.6E-14 | 0.98 | |||||||||
|
| Not tested | ||||||||||||
| 5 |
|
| rs2872507 | rs1008723 | 0.98 | −0.38 | 6.9E-38 | 0.81 | rs1054609 | –0.18 | 3.6E-14 | 0.98 | |
| 6 |
|
| rs4809330 | rs6011058 | 1.00 | 0.09 | 2.9E-07 | 1.00 | rs2252258 | –0.05 | n.s. | 0.002 | |
|
| rs310609 | –0.07 | n.s. | 0.02 | |||||||||
|
| Not tested | ||||||||||||
| Case 2 | 7 |
|
| rs6738825 | rs1866664 | 0.98 | −0.25 | 3.0E-07 | 0.81 | rs1866664 | –0.25 | 3.0E-07 | 1 |
| 8 |
|
| rs7714584 | rs1428554 | 1.00 | −0.40 | 3.4E-13 | 0.98 | rs1428554 | –0.40 | 3.4E-13 | 1 | |
| 9 |
|
| rs11747270 | rs1428554 | 1.00 | −0.40 | 3.4E-13 | 0.98 | rs1428554 | –0.40 | 3.4E-13 | 1 | |
| 10 |
|
| rs13361189 | rs1428554 | 1.00 | −0.40 | 3.4E-13 | 0.98 | rs1428554 | –0.40 | 3.4E-13 | 1 | |
| Case 3 | 11 |
|
| rs4656940 | rs11265498 | 1.00 | −0.67 | 2.4E-17 | 1.00 | rs11265498 | –0.67 | 2.4E-17 | 1 |
|
| rs574610 | –0.12 | n.s. | 0.16 | |||||||||
| 12 |
|
| rs2149085 | rs400837 | 0.99 | −0.36 | 2.7E-39 | 0.87 | rs400837 | –0.36 | 2.7E-39 | 1 | |
|
| rs73039162 | 0.68 | 7.5E-45 | 0.078 | |||||||||
|
| Not tested | ||||||||||||
|
| Not tested | ||||||||||||
The GWAS-reported gene was not included in our study.
GWAS-reported genes that match the eQTL-suggested genes in Case 3.
r 2: correlation of genotypes for linkage disequilibrium between the GWAS-identified SNP and cis-eQTL (in the “SNPs” column), or between the top local SNP for GWAS gene and cis-eQTL (in the “Top local SNP for GWAS Gene” column).
P: P value of a conditional regression on genotypes of GWAS-identified SNP.
Genes suggested by GWAS and our eQTL map are listed in the “Suggested genes” column; eQTL statistics are listed in the “eQTL statistics” column; most significant local SNP for the GWAS-reported gene is shown in the “Top local SNP for GWAS gene” column.
n.s: not significant.