Chantriolnt-Andreas Kapourani1, Guido Sanguinetti2. 1. IANC, School of Informatics, University of Edinburgh, Edinburgh EH8 9AB, UK. 2. IANC, School of Informatics, University of Edinburgh, Edinburgh EH8 9AB, UK Synthetic and Systems Biology, University of Edinburgh, Edinburgh EH9 3JD, UK.
Abstract
MOTIVATION: DNA methylation is an intensely studied epigenetic mark, yet its functional role is incompletely understood. Attempts to quantitatively associate average DNA methylation to gene expression yield poor correlations outside of the well-understood methylation-switch at CpG islands. RESULTS: Here, we use probabilistic machine learning to extract higher order features associated with the methylation profile across a defined region. These features quantitate precisely notions of shape of a methylation profile, capturing spatial correlations in DNA methylation across genomic regions. Using these higher order features across promoter-proximal regions, we are able to construct a powerful machine learning predictor of gene expression, significantly improving upon the predictive power of average DNA methylation levels. Furthermore, we can use higher order features to cluster promoter-proximal regions, showing that five major patterns of methylation occur at promoters across different cell lines, and we provide evidence that methylation beyond CpG islands may be related to regulation of gene expression. Our results support previous reports of a functional role of spatial correlations in methylation patterns, and provide a mean to quantitate such features for downstream analyses. AVAILABILITY AND IMPLEMENTATION: https://github.com/andreaskapou/BPRMeth CONTACT: G.Sanguinetti@ed.ac.uk SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: DNA methylation is an intensely studied epigenetic mark, yet its functional role is incompletely understood. Attempts to quantitatively associate average DNA methylation to gene expression yield poor correlations outside of the well-understood methylation-switch at CpG islands. RESULTS: Here, we use probabilistic machine learning to extract higher order features associated with the methylation profile across a defined region. These features quantitate precisely notions of shape of a methylation profile, capturing spatial correlations in DNA methylation across genomic regions. Using these higher order features across promoter-proximal regions, we are able to construct a powerful machine learning predictor of gene expression, significantly improving upon the predictive power of average DNA methylation levels. Furthermore, we can use higher order features to cluster promoter-proximal regions, showing that five major patterns of methylation occur at promoters across different cell lines, and we provide evidence that methylation beyond CpG islands may be related to regulation of gene expression. Our results support previous reports of a functional role of spatial correlations in methylation patterns, and provide a mean to quantitate such features for downstream analyses. AVAILABILITY AND IMPLEMENTATION: https://github.com/andreaskapou/BPRMeth CONTACT: G.Sanguinetti@ed.ac.uk SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Nils Eling; Arianne C Richard; Sylvia Richardson; John C Marioni; Catalina A Vallejos Journal: Cell Syst Date: 2018-08-29 Impact factor: 10.304
Authors: Stephen J Clark; Ricard Argelaguet; Chantriolnt-Andreas Kapourani; Thomas M Stubbs; Heather J Lee; Celia Alda-Catalinas; Felix Krueger; Guido Sanguinetti; Gavin Kelsey; John C Marioni; Oliver Stegle; Wolf Reik Journal: Nat Commun Date: 2018-02-22 Impact factor: 14.919
Authors: Shafagh A Waters; Alexander Capraro; Kim L McIntyre; Jennifer A Marshall Graves; Paul D Waters Journal: Genes (Basel) Date: 2018-05-01 Impact factor: 4.096