| Literature DB >> 28302177 |
Mulin Jun Li1,2,3, Miaoxin Li4,5,6,7, Zipeng Liu5,8, Bin Yan5,9, Zhicheng Pan5,10, Dandan Huang11, Qian Liang11, Dingge Ying5, Feng Xu5,9, Hongcheng Yao5,9, Panwen Wang12, Jean-Pierre A Kocher12, Zhengyuan Xia8, Pak Chung Sham5,6, Jun S Liu13, Junwen Wang14,15.
Abstract
It remains challenging to predict regulatory variants in particular tissues or cell types due to highly context-specific gene regulation. By connecting large-scale epigenomic profiles to expression quantitative trait loci (eQTLs) in a wide range of human tissues/cell types, we identify critical chromatin features that predict variant regulatory potential. We present cepip, a joint likelihood framework, for estimating a variant's regulatory probability in a context-dependent manner. Our method exhibits significant GWAS signal enrichment and is superior to existing cell type-specific methods. Furthermore, using phenotypically relevant epigenomes to weight the GWAS single-nucleotide polymorphisms, we improve the statistical power of the gene-based association test.Entities:
Keywords: Cell type-specific; Disease-susceptible gene; Epigenome; Regulatory variant; Variant prioritization
Mesh:
Substances:
Year: 2017 PMID: 28302177 PMCID: PMC5356314 DOI: 10.1186/s13059-017-1177-3
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Fig. 1Critical chromatin features and correlations among tissues/cell types. a Tissue/cell type-specific or generalized logit models trained by ten selected chromatin features using eQTLs fine mapping dataset (occurrence indicates the number of models sharing the feature after the feature selection procedure; * indicates the P value of coefficient for corresponding feature < 0.05; heatmap color is rendered by exponential coefficients). CPL CAP_LCL, STL Stranger_LCL, HCE Harvard_cerebellum, HPC Harvard_prefrontal_cortex, HVC Harvard_visual_cortex, GCF GenCord_fibroblast, GCL GenCord_LC, GCT GenCord_tcell, CLI UChicago_liver, MLI Merck_liver, MBR Myers_brain, All combined dataset, Geuvadis Geuvadis_LCL. b Spearman’s rank correlation tests between the coefficients of the selected features in each cell type-specific logit model using ten selected chromatin features. c Tissue/cell type-specific logit models trained by 11 selected chromatin features using the GTEx eQTLs dataset (occurrence indicates the number of models sharing the feature after the feature selection procedure; * indicates the P value of coefficient for corresponding feature < 0.05; heatmap color is rendered by exponential coefficients). d Spearman’s rank correlation between the coefficients of selected features in each GTEx cell type-specific logit model using 11 selected chromatin features
Fig. 2Clustering of 127 tissues/cell types using the normalized mean regulatory potential for fine-mapped GWAS SNPs of 38 immune and non-immune diseases/traits. See Additional file 2 for abbreviations of tissues/cell types
Fig. 3Evaluation of the generalized context-dependent model and combined model. a Boxplot of cell type-specific regulatory potentials for ImmVar 201 traits/diseases-associated eQTLs using 12 ENCODE cell lines. b Coverage of Geuvadis eQTLs with increasing top-ranked variants for three cell type-specific scores and the composite score. c Coverage of meta blood eQTLs with increasing top-ranked variants for three cell type-specific scores and composite score. d Coverage of RA eQTLs with increasing top-ranked variants for three cell type-specific scores and the composite score
Fig. 4Epigenomic profiles of 1p13.3 vary greatly in different cell lines. Critical chromatin features (H3K4me1 and DHS) around LDL-C-associated SNPs in 1p13.3 (rs12740374 indexed) show large differences among the 16 ENCODE cell lines and specific enrichment in HepG2 cell lines
Fig. 5Context-dependent prioritization shows GWAS signal enrichment in relevant cells. a The top 5% prioritized SNPs using blood cell line Mo-CD14+ (purple) display more leftward shift than using skin (green), muscle (blue), or liver (gold) cell lines against permutated GWAS signals; gray area shows 95% intervals of permutated signals. b The top 5% prioritized SNPs using blood cell line GM12878 (red) display more leftward shift than using skin (green), muscle (blue), or liver (gold) cell lines against permutated GWAS signals; gray area shows 95% intervals of permutated signals. c The empirical P values of permutations for blood cell lines (Mo-CD14+ and GM12878) are more significant than other tissue/cell types. d After the blood cell line Mo-CD14+ prioritization, the top-ranked SNPs display more significant shift from permutated GWAS signals than lower-ranked ones; gray area shows 95% intervals of permutated signals. BLD blood, LNG lung, SKIN skin, BONE bone, LIV liver, VAS vascular, MUS muscle, BRN brain, CRVX cervix, BRST breast, Composite context-free composite model
Fig. 6Context-dependent epigenomic weighting improves the detection of disease-associated genes. a Number of improved genes by the W_SNP-based gene association test relative to the NW_SNP-based gene association test. Blue bars represent the number of all improved genes; red bars represent the number of eGenes among the improved genes. b Immune system-related pathway enrichment analysis. Blood cell lines are colored in black and non-blood cell lines are colored in gray. BLD blood, LNG lung, SKIN skin, BONE bone, LIV liver, VAS vascular, MUS muscle, BRN brain, CRVX cervix, BRST breast