| Literature DB >> 28611825 |
Yu Zhang1.
Abstract
Most genetic variants identified in genome-wide association studies are noncoding and are likely tagging nearby causal variants. It is a challenging task to pinpoint the precise locations of disease-causal variants and understand their functions in disease. A promising approach to improve fine mapping is to integrate the functional data currently available on hundreds of human tissues and cell types. Although there are several methods that use functional data to prioritize disease variants, they mainly use linear models, or equivalent naive likelihood-based models for prediction. Here, we investigate whether study of the combinatorial patterns of functional data across cell types can improve prediction accuracy for disease variants. Using functional annotation in 127 human cell types, we first introduce a Bayesian method to identify recurring cell-type-specificity partitions on the scale of the genome. We show that our de novo identification of epigenome partition patterns agrees well with known cell-type origins and that the associated functional elements are strongly enriched in disease variants. Using epigenetic cell-type specificity in addition to enrichment of functional elements, we further demonstrate that the power to predict disease variants can be greatly improved over that achievable with linear models. Our approach thus provides a new way to prioritize disease functional variants for testing.Entities:
Keywords: Bayesian method; GWAS; cell-type specificity; epigenetics; functional mutation
Year: 2017 PMID: 28611825 PMCID: PMC5447712 DOI: 10.3389/fgene.2017.00071
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Ten clusters of the 127 Roadmap Epigenomics epigenomes.
| 1 | BLD.CD14.MONO, BLD.CD14.PC, BLD.CD15.PC, BLD.CD19.PPC, BLD.CD3.PPC, BLD.CD34.CC, BLD.CD4.CD25.CD127M.TREGPC, BLD.CD4.CD25I.CD127.TMEMPC, BLD.CD4.CD25M.CD45RA.NPC, BLD.CD4.CD25M.CD45RO.MPC, BLD.CD4.CD25M.IL17M.PL.TPC, BLD.CD4.CD25M.IL17P.PL.TPC, BLD.CD4.CD25M.TPC, BLD.CD4.MPC, BLD.CD4.NPC, BLD.CD56.PC, BLD.CD8.MPC, BLD.CD8.NPC, BLD.DND41.CNCR, BLD.MOB.CD34.PC.F, BLD.MOB.CD34.PC.M, BLD.PER.MONUC.PC, THYM.FET |
| 2 | ESC.4STAR, ESC.H1, ESC.HUES48, ESC.HUES6, ESC.HUES64, ESC.I3, ESDR.CD184.ENDO, ESDR.CD56.ECTO, ESDR.CD56.MESO, IPSC.15b, IPSC.18, IPSC.20B |
| 3 | BRN.FET.M, BRN.GRM.MTRX, ESC.H9, ESC.WA7, ESDR.H1.BMP4.MESO, ESDR.H1.NEUR.PROG, IPSC.DF.19.11, IPSC.DF.6.9 |
| 4 | BLD.CD19.CPC, BLD.CD3.CPC, BLD.CD34.PC, BLD.GM12878, THYM |
| 5 | BLD.K562.CNCR, ESDR.H1.BMP4.TROP, ESDR.H1.MSC, GI.CLN.MUC, GI.CLN.SIG, GI.ESO, GI.RECT.SM.MUS, GI.S.INT, GI.STMC.GAST, GI.STMC.MUC, HRT.ATR.R, HRT.FET, HRT.VENT.L, HRT.VNT.R, KID.FET, LNG, LNG.NHLF, MUS.PSOAS, OVRY, PANC, PANC.ISLT, PLCNT.AMN, SKIN.NHDFAD, SKIN.PEN.FRSK.MEL.01, SPLN, VAS.AOR |
| 6 | ADRL.GLND.FET, BRN.NHA, BRST.HMEC, BRST.HMEC.35, BRST.MYO, CRVX.HELAS3.CNCR, LNG.A549.ETOH002.CNCR, MUS.HSMM, MUS.HSMMT, SKIN.NHEK, SKIN.PEN.FRSK.KER.02, SKIN.PEN.FRSK.KER.03, SKIN.PEN.FRSK.MEL.03, VAS.HUVEC |
| 7 | BONE.OSTEO, FAT.ADIP.DR.MSC, FAT.MSC.DR.ADIP, LNG.IMR90, MUS.SAT, SKIN.PEN.FRSK.FIB.01, SKIN.PEN.FRSK.FIB.02, STRM.CHON.MRW.DR.MSC, STRM.MRW.MSC |
| 8 | BRN.ANG.GYR, BRN.ANT.CAUD, BRN.CING.GYR, BRN.DL.PRFRNTL.CRTX, BRN.HIPP.MID, BRN.INF.TMP, BRN.SUB.NIG |
| 9 | BRN.CRTX.DR.NRSPHR, BRN.FET.F, BRN.GANGEM.DR.NRSPHR, ESDR.H9.NEUR, ESDR.H9.NEUR.PROG, GI.CLN.SM.MUS, GI.DUO.MUC, GI.RECT.MUC.29, GI.RECT.MUC.31, GI.STMC.MUS, LIV.HEPG2.CNCR, MUS.SKLT.F, MUS.SKLT.M, PLCNT.FET |
| 10 | FAT.ADIP.NUC, GI.DUO.SM.MUS, GI.L.INT.FET, GI.S.INT.FET, GI.STMC.FET, LIV.ADLT, LNG.FET, MUS.LEG.FET, MUS.TRNK.FET |
Figure 1Genome-wide patterns of cell type specificity. X-axis denotes the 127 epigenomes, with epigenome color keys shown on the top and the right hand side. The cell type abbreviations are given by the Roadmap Epigenomics consortium. Y-axis shows the 48 CSPs. Each row in the matrix denote one epigenome partition pattern, with different groups of epigenomes indicated by black, white and gray. The percentage of the genome carrying each pattern is shown on the right. We also marked 10 clusters of epigenomes that were often together in most patterns.
Figure 2Enrichment of the 48 cell type specificity patterns in disease variants. The heatmap at top left corner shows the –log10 p-value of enrichment/depletion for the risk variants of all complex trait (y-axis) in each CSP (x-axis). Significant depletion is shown in blue and white (−log10 p-value is multiplied by −1 for depletion), and significant enrichment is shown in red and yellow. The heatmap at lower right corner shows the −log10 p-value of enrichment/depletion for the risk variants of 52 traits that have at least 50 lead variants. The upper right panel shows the 48 CSPs.
Figure 3Enrichment of epigenetic states in disease variants with respect to the 10 epigenome clusters defined in Table 1. The left most heatmap shows the mean signal of histone marks in the 25 epigenetic states. State color keys and their putative functions are shown on the left. The right two panels show the enrichment of epigenetic states at the risk variants of each complex trait, with colors reflecting the most enriched states (given by the state color keys) and the strength of enrichments (brighter color means stronger enrichment).
Figure 4Power for predicting disease variants. (A) AUC difference of precision-recall curves between our model and single best cell type model (y-axis) plotted with respect to the mean AUC between the two models (x-axis); the mean AUC reflects how well the variants of a complex trait can be predicted overall. (B) is similar to (A), but compares between our model and the model using a linear combination of all cell types.
Figure 5Power comparison between the conditional test method using functional priors (gray) and the fixed threshold method without priors (black) for detecting causal variants in GWAS. Horizontal line marks the mean power without using priors. Only the top 70 traits whose mean AUCs >0.05 in our model were shown.
Figure 6Example of change in IBD association probability before (top) and after (bottom) incorporating functional predictions at SKAP2 gene. Black vertical lines show the probability of IBD association at 179 credible variants obtained by Huang et al. (2015) via fine mapping using genetic data alone. The probabilities are overlaid with IDEAS functional annotation map to highlight how probabilities changed with functions. Overall, green in the functional annotation map indicates transcription, blue indicates repression, red indicates promoter activity, yellow/orange indicates enhancer activity, and gray indicates no regulatory events. The dashed box shows the group (rows) of blood T cells, and the remaining rows in the functional map are blood B & HSC cells.