| Literature DB >> 28881962 |
Dat Duong1, Lisa Gai1, Sagi Snir2,3, Eun Yong Kang1, Buhm Han4,5, Jae Hoon Sul6, Eleazar Eskin1,7.
Abstract
MOTIVATION: There is recent interest in using gene expression data to contextualize findings from traditional genome-wide association studies (GWAS). Conditioned on a tissue, expression quantitative trait loci (eQTLs) are genetic variants associated with gene expression, and eGenes are genes whose expression levels are associated with genetic variants. eQTLs and eGenes provide great supporting evidence for GWAS hits and important insights into the regulatory pathways involved in many diseases. When a significant variant or a candidate gene identified by GWAS is also an eQTL or eGene, there is strong evidence to further study this variant or gene. Multi-tissue gene expression datasets like the Gene Tissue Expression (GTEx) data are used to find eQTLs and eGenes. Unfortunately, these datasets often have small sample sizes in some tissues. For this reason, there have been many meta-analysis methods designed to combine gene expression data across many tissues to increase power for finding eQTLs and eGenes. However, these existing techniques are not scalable to datasets containing many tissues, like the GTEx data. Furthermore, these methods ignore a biological insight that the same variant may be associated with the same gene across similar tissues.Entities:
Mesh:
Year: 2017 PMID: 28881962 PMCID: PMC5870567 DOI: 10.1093/bioinformatics/btx227
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1Correlation for the likelihood ratios of a pair of cis-variants versus their LD. Denote as the correlation for the likelihood ratios of variants u and v over all genes where both are cis-SNPs. Empirically, is close to the LD of u and v. To show this, we randomly select many pairs of cis-SNPs from the gene ENGS00000204219.5 that also appear together in at least two other genes. These pairs are then grouped into bins by their LD (bin width 0.05). We compute the likelihood ratio for each SNP in each pair over all the genes in which they are cis-variants. Using these likelihood ratios, we estimate for the pair u, v. We average over all pairs u, v in each LD bin. We then plot the absolute value of this average against the LD value. The identity line is shown in red. Plots for additional pairs chosen from other genes are shown in Supplementary Figure S1
Fig. 2Shared individuals among the 44 tissues in the GTEx dataset. Degree of sample sharing between two tissues is measured using the Jacquard index
Fig. 3(A) RECOV and (B) RE2 applied to datasets where the tissues do not share individuals. (C) RECOV and (D) RE2 applied to datasets where the tissues share individuals
Fig. 4(A) Venn diagram of the numbers of eGenes found by TBT, RE2 and RECOV. (B) The correlation of SNP-effects for the gene ENSG00000134508.8 in 44 tissues (tissue names are omitted). The correlation is computed by using the matrix in Subsection 2.3.2 where the formula is (after proper scaling and removal of nearby SNPs). Black box indicates the brain tissues. ENSG00000134508.8 is found to be an eGene by only the RECOV method. The correlation of SNP-effects for gene (C) ENSG00000178234.8 and (D) ENSG00000269981.1 in 44 tissues (tissue names are omitted). ENSG00000178234.8 and ENSG00000269981.1 are found to be eGenes by only the RE2 method