Jun Chen1, Allan C Just2, Joel Schwartz2, Lifang Hou3, Nadereh Jafari4, Zhifu Sun5, Jean-Pierre A Kocher5, Andrea Baccarelli2, Xihong Lin6. 1. Division of Biomedical Statistics and Informatics and Center for Individualized Medicine, Mayo Clinic, Rochester, MN 55905, Department of Biostatistics, Harvard School of Public Health, Boston, MA 02115. 2. Department of Environmental Health, Harvard School of Public Health, Boston, MA 02115. 3. Department of Preventive Medicine and the Robert H. Lurie Comprehensive Cancer Center, Feinberg School of Medicine, Northwestern University, Chicago, IL 60208 and. 4. Center for Genetic Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL 60208, USA. 5. Division of Biomedical Statistics and Informatics and Center for Individualized Medicine, Mayo Clinic, Rochester, MN 55905. 6. Department of Biostatistics, Harvard School of Public Health, Boston, MA 02115.
Abstract
SUMMARY: The development of the Infinium HumanMethylation450 BeadChip enables epigenome-wide association studies at a reduced cost. One observation of the 450K data is that many CpG sites the beadchip interrogates have very large measurement errors. Including these noisy CpGs will decrease the statistical power of detecting relevant associations due to multiple testing correction. We propose to use intra-class correlation coefficient (ICC), which characterizes the relative contribution of the biological variability to the total variability, to filter CpGs when technical replicates are available. We estimate the ICC based on a linear mixed effects model by pooling all the samples instead of using the technical replicates only. An ultra-fast algorithm has been developed to address the computational complexity and CpG filtering can be completed in minutes on a desktop computer for a 450K data set of over 1000 samples. Our method is very flexible and can accommodate any replicate design. Simulations and a real data application demonstrate that our whole-sample ICC method performs better than replicate-sample ICC or variance-based method. AVAILABILITY AND IMPLEMENTATION: CpGFilter is implemented in R and publicly available under CRAN via the R package 'CpGFilter'. CONTACT: chen.jun2@mayo.edu or xlin@hsph.harvard.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
SUMMARY: The development of the Infinium HumanMethylation450 BeadChip enables epigenome-wide association studies at a reduced cost. One observation of the 450K data is that many CpG sites the beadchip interrogates have very large measurement errors. Including these noisy CpGs will decrease the statistical power of detecting relevant associations due to multiple testing correction. We propose to use intra-class correlation coefficient (ICC), which characterizes the relative contribution of the biological variability to the total variability, to filter CpGs when technical replicates are available. We estimate the ICC based on a linear mixed effects model by pooling all the samples instead of using the technical replicates only. An ultra-fast algorithm has been developed to address the computational complexity and CpG filtering can be completed in minutes on a desktop computer for a 450K data set of over 1000 samples. Our method is very flexible and can accommodate any replicate design. Simulations and a real data application demonstrate that our whole-sample ICC method performs better than replicate-sample ICC or variance-based method. AVAILABILITY AND IMPLEMENTATION: CpGFilter is implemented in R and publicly available under CRAN via the R package 'CpGFilter'. CONTACT: chen.jun2@mayo.edu or xlin@hsph.harvard.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Hailong Meng; Andrew R Joyce; Daniel E Adkins; Priyadarshi Basu; Yankai Jia; Guoya Li; Tapas K Sengupta; Barbara K Zedler; E Lenn Murrelle; Edwin J C G van den Oord Journal: BMC Bioinformatics Date: 2010-05-05 Impact factor: 3.169
Authors: Riccardo E Marioni; Sonia Shah; Allan F McRae; Brian H Chen; Elena Colicino; Sarah E Harris; Jude Gibson; Anjali K Henders; Paul Redmond; Simon R Cox; Alison Pattie; Janie Corley; Lee Murphy; Nicholas G Martin; Grant W Montgomery; Andrew P Feinberg; M Daniele Fallin; Michael L Multhaup; Andrew E Jaffe; Roby Joehanes; Joel Schwartz; Allan C Just; Kathryn L Lunetta; Joanne M Murabito; John M Starr; Steve Horvath; Andrea A Baccarelli; Daniel Levy; Peter M Visscher; Naomi R Wray; Ian J Deary Journal: Genome Biol Date: 2015-01-30 Impact factor: 13.583
Authors: Maitreyee Bose; Chong Wu; James S Pankow; Ellen W Demerath; Jan Bressler; Myriam Fornage; Megan L Grove; Thomas H Mosley; Chindo Hicks; Kari North; Wen Hong Kao; Yu Zhang; Eric Boerwinkle; Weihua Guan Journal: BMC Bioinformatics Date: 2014-09-19 Impact factor: 3.169
Authors: Lauren E Wilson; Zongli Xu; Sophia Harlid; Alexandra J White; Melissa A Troester; Dale P Sandler; Jack A Taylor Journal: Am J Epidemiol Date: 2019-06-01 Impact factor: 4.897
Authors: Todd M Everson; Marta Vives-Usano; Emie Seyve; Johanna Lepeule; Marie-France Hivert; Mariona Bustamante; Andres Cardenas; Marina Lacasaña; Jeffrey M Craig; Corina Lesseur; Emily R Baker; Nora Fernandez-Jimenez; Barbara Heude; Patrice Perron; Beatriz Gónzalez-Alzaga; Jane Halliday; Maya A Deyssenroth; Margaret R Karagas; Carmen Íñiguez; Luigi Bouchard; Pedro Carmona-Sáez; Yuk J Loke; Ke Hao; Thalia Belmonte; Marie A Charles; Jordi Martorell-Marugán; Evelyne Muggli; Jia Chen; Mariana F Fernández; Jorg Tost; Antonio Gómez-Martín; Stephanie J London; Jordi Sunyer; Carmen J Marsit Journal: Nat Commun Date: 2021-08-24 Impact factor: 14.919
Authors: Xinyi Lin; Ai Ling Teh; Li Chen; Ives Yubin Lim; Pei Fang Tan; Julia L MacIsaac; Alexander M Morin; Fabian Yap; Kok Hian Tan; Seang Mei Saw; Yung Seng Lee; Joanna D Holbrook; Keith M Godfrey; Michael J Meaney; Michael S Kobor; Yap Seng Chong; Peter D Gluckman; Neerja Karnani Journal: BMC Med Date: 2017-12-05 Impact factor: 8.775
Authors: Mark W Logue; Alicia K Smith; Erika J Wolf; Hannah Maniates; Annjanette Stone; Steven A Schichman; Regina E McGlinchey; William Milberg; Mark W Miller Journal: Epigenomics Date: 2017-08-15 Impact factor: 4.778
Authors: Pierre-Antoine Dugué; Dallas R English; Robert J MacInnis; Chol-Hee Jung; Julie K Bassett; Liesel M FitzGerald; Ee Ming Wong; Jihoon E Joo; John L Hopper; Melissa C Southey; Graham G Giles; Roger L Milne Journal: Sci Rep Date: 2016-07-26 Impact factor: 4.379
Authors: Marie Forest; Kieran J O'Donnell; Greg Voisin; Helene Gaudreau; Julia L MacIsaac; Lisa M McEwen; Patricia P Silveira; Meir Steiner; Michael S Kobor; Michael J Meaney; Celia M T Greenwood Journal: Epigenetics Date: 2018-01-30 Impact factor: 4.528
Authors: Katie M O'Brien; Dale P Sandler; Zongli Xu; H Karimi Kinyamu; Jack A Taylor; Clarice R Weinberg Journal: Breast Cancer Res Date: 2018-07-11 Impact factor: 6.466