Jun Ding1, Ziv Bar-Joseph1. 1. Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA.
Abstract
MOTIVATION: Profiling of genome wide DNA methylation is now routinely performed when studying development, cancer and several other biological processes. Although Whole genome Bisulfite Sequencing provides high-quality methylation measurements at the resolution of nucleotides, it is relatively costly and so several studies have used alternative methods for such profiling. One of the most widely used low cost alternatives is MeDIP-Seq. However, MeDIP-Seq is biased for CpG enriched regions and thus its results need to be corrected in order to determine accurate methylation levels. RESULTS: Here we present a method for correcting MeDIP-Seq results based on Random Forest regression. Applying the method to real data from several different tissues (brain, cortex, penis) we show that it achieves almost 4 fold decrease in run time while increasing accuracy by as much as 20% over prior methods developed for this task. AVAILABILITY AND IMPLEMENTATION: MethRaFo is freely available as a python package (with a R wrapper) at https://github.com/phoenixding/methrafo. CONTACT: zivbj@cs.cmu.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: Profiling of genome wide DNA methylation is now routinely performed when studying development, cancer and several other biological processes. Although Whole genome Bisulfite Sequencing provides high-quality methylation measurements at the resolution of nucleotides, it is relatively costly and so several studies have used alternative methods for such profiling. One of the most widely used low cost alternatives is MeDIP-Seq. However, MeDIP-Seq is biased for CpG enriched regions and thus its results need to be corrected in order to determine accurate methylation levels. RESULTS: Here we present a method for correcting MeDIP-Seq results based on Random Forest regression. Applying the method to real data from several different tissues (brain, cortex, penis) we show that it achieves almost 4 fold decrease in run time while increasing accuracy by as much as 20% over prior methods developed for this task. AVAILABILITY AND IMPLEMENTATION: MethRaFo is freely available as a python package (with a R wrapper) at https://github.com/phoenixding/methrafo. CONTACT: zivbj@cs.cmu.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Bradley E Bernstein; John A Stamatoyannopoulos; Joseph F Costello; Bing Ren; Aleksandar Milosavljevic; Alexander Meissner; Manolis Kellis; Marco A Marra; Arthur L Beaudet; Joseph R Ecker; Peggy J Farnham; Martin Hirst; Eric S Lander; Tarjei S Mikkelsen; James A Thomson Journal: Nat Biotechnol Date: 2010-10 Impact factor: 54.908
Authors: Thomas A Down; Vardhman K Rakyan; Daniel J Turner; Paul Flicek; Heng Li; Eugene Kulesha; Stefan Gräf; Nathan Johnson; Javier Herrero; Eleni M Tomazou; Natalie P Thorne; Liselotte Bäckdahl; Marlis Herberth; Kevin L Howe; David K Jackson; Marcos M Miretti; John C Marioni; Ewan Birney; Tim J P Hubbard; Richard Durbin; Simon Tavaré; Stephan Beck Journal: Nat Biotechnol Date: 2008-07 Impact factor: 54.908
Authors: Michael Stevens; Jeffrey B Cheng; Daofeng Li; Mingchao Xie; Chibo Hong; Cécile L Maire; Keith L Ligon; Martin Hirst; Marco A Marra; Joseph F Costello; Ting Wang Journal: Genome Res Date: 2013-06-26 Impact factor: 9.043
Authors: Andrea Riebler; Mirco Menigatti; Jenny Z Song; Aaron L Statham; Clare Stirzaker; Nadiya Mahmud; Charles A Mein; Susan J Clark; Mark D Robinson Journal: Genome Biol Date: 2014-02-11 Impact factor: 13.583