Tamar Sofer1, Elizabeth D Schifano, Jane A Hoppin, Lifang Hou, Andrea A Baccarelli. 1. Department of Biostatistics, Harvard School of Public Health, 677 Huntington Avenue, SPH2, 4th floor, Boston, MA 02115, USA, Department of Statistics, University of Connecticut, 215 Glenbrook Road, Storrs, CT 06269, USA, NIEHS, Epidemiology Branch, MD A3-05, PO Box 12233, Research Triangle Park, NC 27709, USA, Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, 680 N Lake Shore Drive, Suite 1400 Chicago, IL 60611, USA, Department of Environmental Health and Department of Epidemiology, Harvard School of Public Health, 401 Park Drive, Landmark Ctr Room 415E, Boston, MA 02215, USA.
Abstract
MOTIVATION: DNA methylation is a heritable modifiable chemical process that affects gene transcription and is associated with other molecular markers (e.g. gene expression) and biomarkers (e.g. cancer or other diseases). Current technology measures methylation in hundred of thousands, or millions of CpG sites throughout the genome. It is evident that neighboring CpG sites are often highly correlated with each other, and current literature suggests that clusters of adjacent CpG sites are co-regulated. RESULTS: We develop the Adjacent Site Clustering (A-clustering) algorithm to detect sets of neighboring CpG sites that are correlated with each other. To detect methylation regions associated with exposure, we propose an analysis pipeline for high-dimensional methylation data in which CpG sites within regions identified by A-clustering are modeled as multivariate responses to environmental exposure using a generalized estimating equation approach that assumes exposure equally affects all sites in the cluster. We develop a correlation preserving simulation scheme, and study the proposed methodology via simulations. We study the clusters detected by the algorithm on high dimensional dataset of peripheral blood methylation of pesticide applicators. AVAILABILITY: We provide the R package Aclust that efficiently implements the A-clustering and the analysis pipeline, and produces analysis reports. The package is found on http://www.hsph.harvard.edu/tamar-sofer/packages/ CONTACT: tsofer@hsph.harvard.edu
MOTIVATION: DNA methylation is a heritable modifiable chemical process that affects gene transcription and is associated with other molecular markers (e.g. gene expression) and biomarkers (e.g. cancer or other diseases). Current technology measures methylation in hundred of thousands, or millions of CpG sites throughout the genome. It is evident that neighboring CpG sites are often highly correlated with each other, and current literature suggests that clusters of adjacent CpG sites are co-regulated. RESULTS: We develop the Adjacent Site Clustering (A-clustering) algorithm to detect sets of neighboring CpG sites that are correlated with each other. To detect methylation regions associated with exposure, we propose an analysis pipeline for high-dimensional methylation data in which CpG sites within regions identified by A-clustering are modeled as multivariate responses to environmental exposure using a generalized estimating equation approach that assumes exposure equally affects all sites in the cluster. We develop a correlation preserving simulation scheme, and study the proposed methodology via simulations. We study the clusters detected by the algorithm on high dimensional dataset of peripheral blood methylation of pesticide applicators. AVAILABILITY: We provide the R package Aclust that efficiently implements the A-clustering and the analysis pipeline, and produces analysis reports. The package is found on http://www.hsph.harvard.edu/tamar-sofer/packages/ CONTACT: tsofer@hsph.harvard.edu
Authors: Sisko Anttila; Jukka Hakkola; Päivi Tuominen; Eivor Elovaara; Kirsti Husgafvel-Pursiainen; Antti Karjalainen; Ari Hirvonen; Tuula Nurminen Journal: Cancer Res Date: 2003-12-15 Impact factor: 12.701
Authors: Martin J Aryee; Zhijin Wu; Christine Ladd-Acosta; Brian Herb; Andrew P Feinberg; Srinivasan Yegnasubramanian; Rafael A Irizarry Journal: Biostatistics Date: 2010-09-21 Impact factor: 5.899
Authors: Daniel P Gaile; Elizabeth D Schifano; Jeffrey C Miecznikowski; James J Java; Jeffrey M Conroy; Norma J Nowak Journal: Stat Appl Genet Mol Biol Date: 2007-11-19
Authors: Rafael A Irizarry; Christine Ladd-Acosta; Andrew P Feinberg; Bo Wen; Zhijin Wu; Carolina Montano; Patrick Onyango; Hengmi Cui; Kevin Gabo; Michael Rongione; Maree Webster; Hong Ji; James Potash; Sarven Sabunciyan Journal: Nat Genet Date: 2009-01-18 Impact factor: 38.330
Authors: Margherita M De Carli; Andrea A Baccarelli; Letizia Trevisi; Ivan Pantic; Kasey Jm Brennan; Michele R Hacker; Holly Loudon; Kelly J Brunst; Robert O Wright; Rosalind J Wright; Allan C Just Journal: Epigenomics Date: 2017-02-17 Impact factor: 4.778
Authors: David E Frankhouser; Mark Murphy; James S Blachly; Jincheol Park; Mike W Zoller; Javkhlan-Ochir Ganbat; John Curfman; John C Byrd; Shili Lin; Guido Marcucci; Pearlly Yan; Ralf Bundschuh Journal: Bioinformatics Date: 2014-08-31 Impact factor: 6.937
Authors: Lissette Gomez; Gabriel J Odom; Juan I Young; Eden R Martin; Lizhong Liu; Xi Chen; Anthony J Griswold; Zhen Gao; Lanyu Zhang; Lily Wang Journal: Nucleic Acids Res Date: 2019-09-26 Impact factor: 16.971
Authors: Arko Sen; Nicole Heredia; Marie-Claude Senut; Matthew Hess; Susan Land; Wen Qu; Kurt Hollacher; Mary O Dereski; Douglas M Ruden Journal: Epigenomics Date: 2015 Impact factor: 4.778