| Literature DB >> 25374782 |
Geoff Macintyre1,2, Antonio Jimeno Yepes1, Cheng Soon Ong3,4,5, Karin Verspoor1,6.
Abstract
We present a method to assist in interpretation of the functional impact of intergenic disease-associated SNPs that is not limited to search strategies proximal to the SNP. The method builds on two sources of external knowledge: the growing understanding of three-dimensional spatial relationships in the genome, and the substantial repository of information about relationships among genetic variants, genes, and diseases captured in the published biomedical literature. We integrate chromatin conformation capture data (HiC) with literature support to rank putative target genes of intergenic disease-associated SNPs. We demonstrate that this hybrid method outperforms a genomic distance baseline on a small test set of expression quantitative trait loci, as well as either method individually. In addition, we show the potential for this method to uncover relationships between intergenic SNPs and target genes across chromosomes. With more extensive chromatin conformation capture data becoming readily available, this method provides a way forward towards functional interpretation of SNPs in the context of the three dimensional structure of the genome in the nucleus.Entities:
Keywords: Data integration; HiC; Non-coding variants; Text mining; eQTL
Year: 2014 PMID: 25374782 PMCID: PMC4217187 DOI: 10.7717/peerj.639
Source DB: PubMed Journal: PeerJ ISSN: 2167-8359 Impact factor: 2.984
Scores vs ranks: for a list of length n the fractional rank is given by the position i of the object divided by n + 1.
| Distance | 43k | 2k | 500k | 8k | 1292k |
| Position ( | 3 | 1 | 4 | 2 | 5 |
| Rank ( |
|
|
|
|
|
eQTL data used for validation.
The eQTL column shows the p-value of the eQTL data. GD means genomic distance, and Lit. refers to the score resulting from literature. rD, rHiC and rL refer to the ranks of genomic distance, HiC and literature respectively, out of Total number of genes which are more than 500 Kbp from the SNP under consideration. rHy is the geometric mean rank. A nan value in Lit. means that no evidence was found in the literature.
| PubMedID | SNP | Disease | chr | Location | Gene | GeneID | GeneWindow | eQTL | GD | HiC | Lit. | rD | rHiC | rL | rHy | Total |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
| rs652625 | Carcinoma, NSC Lung | chr1 | 12147937 | MTOR | 2475 | chr1:11M-12M | 3.94E-05 | 902741 | 6.84 | 1.17 | 0.0091 | 0.0046 | 0.0282 | 0.0012 | 2408 |
|
| rs344781 | Carcinoma, NSC Lung | chr19 | 48866627 | PSG11 | 5680 | chr19:48M-49M | 1.96E-05 | 644155 | 7.16 | nan | 0.0031 | 0.0006 | 0.2842 | 0.0013 | 1594 |
|
| rs344781 | Endometrial Neoplasms | chr19 | 48866627 | PSG11 | 5680 | chr19:48M-49M | 1.96E-05 | 644155 | 7.16 | nan | 0.0031 | 0.0006 | 0.2748 | 0.0013 | 1594 |
|
| rs344781 | Lung Neoplasms | chr19 | 48866627 | PSG11 | 5680 | chr19:48M-49M | 1.96E-05 | 644155 | 7.16 | nan | 0.0031 | 0.0006 | 0.2992 | 0.0013 | 1594 |
|
| rs7187167 | Breast Neoplasms | chr16 | 1289209 | WDR24 | 84219 | chr16:0-1M | 2.73E-05 | 608807 | 7.21 | nan | 0.0156 | 0.0021 | 0.7419 | 0.0042 | 961 |
|
| rs652625 | Lung Neoplasms | chr1 | 12147937 | MTOR | 2475 | chr1:11M-12M | 3.94E-05 | 902741 | 6.84 | −5.59 | 0.0091 | 0.0046 | 0.2263 | 0.0046 | 2408 |
|
| rs12983047 | Carcinoma, Squamous Cell | chr19 | 46526338 | CIC | 23152 | chr19:47M-48M | 0.00036 | 954318 | 7.15 | 1.42 | 0.0218 | 0.0230 | 0.0131 | 0.0075 | 1607 |
|
| rs7187167 | Lymphatic Metastasis | chr16 | 1289209 | WDR24 | 84219 | chr16:0-1M | 2.73E-05 | 608807 | 7.21 | nan | 0.0156 | 0.0021 | 0.7034 | 0.0083 | 961 |
|
| rs12983047 | Adenocarcinoma | chr19 | 46526338 | CIC | 23152 | chr19:47M-48M | 0.00036 | 954318 | 7.15 | 0.33 | 0.0218 | 0.0230 | 0.0492 | 0.0137 | 1607 |
|
| rs12983047 | Lung Neoplasms | chr19 | 46526338 | CIC | 23152 | chr19:47M-48M | 0.00036 | 954318 | 7.15 | −0.61 | 0.0218 | 0.0230 | 0.0834 | 0.0168 | 1607 |
|
| rs2823093 | Breast Neoplasms | chr21 | 15442702 | USP25 | 29761 | chr21:16M-17M | 0.00010 | 581512 | 6.57 | −1.75 | 0.0101 | 0.0709 | 0.1723 | 0.0169 | 296 |
|
| rs3213182 | Carcinoma, Squamous Cell | chr20 | 31726893 | ITCH | 83737 | chr20:32M-33M | 0.00021 | 687808 | 6.95 | −5.09 | 0.0142 | 0.0236 | 0.2079 | 0.0173 | 635 |
|
| rs12983047 | Carcinoma, NSC Lung | chr19 | 46526338 | CIC | 23152 | chr19:47M-48M | 0.00036 | 954318 | 7.15 | −2.16 | 0.0218 | 0.0230 | 0.1176 | 0.0243 | 1607 |
|
| rs2029166 | Breast Neoplasms | chr12 | 52876365 | AAAS | 8086 | chr12:52M-53M | 7.79E-05 | 874685 | 6.40 | −2.42 | 0.0131 | 0.0533 | 0.2238 | 0.0262 | 1144 |
|
| rs7296239 | Breast Neoplasms | chr12 | 52877970 | AAAS | 8086 | chr12:52M-53M | 0.00011 | 876290 | 6.40 | −2.42 | 0.0131 | 0.0533 | 0.2238 | 0.0262 | 1144 |
|
| rs3213182 | Head and Neck Neoplasms | chr20 | 31726893 | ITCH | 83737 | chr20:32M-33M | 0.00021 | 687808 | 6.95 | nan | 0.0142 | 0.0236 | 0.7858 | 0.0283 | 635 |
|
| rs7096206 | Carcinoma, Hepatocellular | chr10 | 54201690 | CHUK | 1147 | chr10:101M-102M | 1.43E-06 | 47777644 | 2.00 | −1.08 | 0.6705 | 0.9153 | 0.1695 | 0.5931 | 956 |
|
| rs7096206 | Liver Neoplasms | chr10 | 54201690 | CHUK | 1147 | chr10:101M-102M | 1.43E-06 | 47777644 | 2.00 | −4.22 | 0.6705 | 0.9153 | 0.2029 | 0.6234 | 956 |
Figure 1Ranks of individual data sources vs. hybrid rank.
For each eQTL SNP, the rank of the target gene based on each of the three sources: genomic distance (rank), HiC (rank) and literature (rank), are plotted against the rank of that target gene based on rank. Points in the upper left corner (the green region) occur when the hybrid rank is superior. The hybrid rank is better 12/17, 10/17, 15/17 times as compared to the respective sources. (A)–(C) shows the whole range of ranks [0, 1], whereas (D)–(F) shows the same data zoomed in to the range [0, 0.1].
Figure 2Pairwise rank comparison of each of the three source rankings.
For each eQTL SNP, each of the three sources of information: genomic distance (rank), HiC (rank) and literature (rank), is plotted against each other. (A)–(C) shows the whole range of ranks [0, 1], whereas (D)–(F) shows the same data zoomed in to the range [0, 0.1].
Data used for SNP target gene discovery.
HiC means the HiC score and Lit refers to the score resulting from literature. rD, rHiC and rL refer to the ranks of genomic distance, HiC and literature respectively. rHy is the geometric mean rank. We report the significant SNPs, confirmed by an independent eQTL study on GeneVar with p-value 0.05 of Spearman ρ.
| PubMedID | SNP | Disease | chr | Location | Gene | GeneWindow | HiC | Lit | rHiC | rL | rHy | Total | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
| rs4796793 | Carcinoma, NSC Lung | chr17 | 37795735 | ERCC1 | chr19:50M-51M | 1.52 | 13.52 | 0.047955 | 0.000258 | 0.000043 | 23230 | 0.0152 |
|
| rs4796793 | Neoplasm Metastasis | chr17 | 37795735 | CEACAM7 | chr19:46M-47M | 1.47 | 15.74 | 0.062118 | 0.000086 | 0.000043 | 23230 | 0.0459 |
|
| rs12983047 | Carcinoma, NSC Lung | chr19 | 46526338 | TP53 | chr17:7M-8M | 1.37 | 19.21 | 0.045003 | 0.000174 | 0.000044 | 22976 | 0.0173 |
|
| rs12983047 | Carcinoma, Squamous Cell | chr19 | 46526338 | TP53 | chr17:7M-8M | 1.37 | 27.91 | 0.045003 | 0.000087 | 0.000044 | 22976 | 0.0173 |
|
| rs652625 | Lung Neoplasms | chr1 | 12147937 | TTF1 | chr9:134M-135M | 1.49 | 16.41 | 0.004690 | 0.000451 | 0.000045 | 22175 | 0.0231 |
|
| rs3213182 | Carcinoma, Squamous Cell | chr20 | 31726893 | PDXP | chr22:36M-37M | 1.23 | 19.24 | 0.036609 | 0.000292 | 0.000083 | 23956 | 0.0326 |
|
| rs4796793 | Carcinoma, Renal Cell | chr17 | 37795735 | XRCC1 | chr19:48M-49M | 1.53 | 73.29 | 0.046965 | 0.000043 | 0.000086 | 23230 | 0.0149 |
|
| rs4796793 | Lung Neoplasms | chr17 | 37795735 | CEACAM7 | chr19:46M-47M | 1.47 | 18.66 | 0.062118 | 0.000172 | 0.000086 | 23230 | 0.0459 |
|
| rs652625 | Carcinoma, NSC Lung | chr1 | 12147937 | ERCC1 | chr19:50M-51M | 1.05 | 13.52 | 0.053890 | 0.000316 | 0.000090 | 22175 | 0.038 |
|
| rs652625 | Lung Neoplasms | chr1 | 12147937 | CEACAM7 | chr19:46M-47M | 1.10 | 18.66 | 0.041082 | 0.000180 | 0.000090 | 22175 | 0.0359 |
|
| rs4796793 | Lung Neoplasms | chr17 | 37795735 | CEACAM5 | chr19:46M-47M | 1.47 | 18.66 | 0.061601 | 0.000258 | 0.000129 | 23230 | 0.0008 |
Figure 3eQTL of SNP rs4796793 against CEACAM5.
This figure shows an eQTL evaluation of SNP rs4796793 against CEACAM5 for the LWK population using GeneVar.
Figure 4SNP rs4796793 in the UCSC genome browser.
This figure shows SNP rs4796793 mapped to the UCSC genome browser along with tracks showing ENCODE information for histone modifications, DNaseI Hypersensitivity and transcription factor binding.