Jens Lichtenberg1, Laura Elnitski2, David M Bodine1. 1. Genetics and Molecular Biology Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA. 2. Translational and Functional Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA.
Abstract
MOTIVATION: Epigenetic data are invaluable when determining the regulatory programs governing a cell. Based on use of next-generation sequencing data for characterizing epigenetic marks and transcription factor binding, numerous peak-calling approaches have been developed to determine sites of genomic significance in these data. Such analyses can produce a large number of false positive predictions, suggesting that sites supported by multiple algorithms provide a stronger foundation for inferring and characterizing regulatory programs associated with the epigenetic data. Few methodologies integrate epigenetic based predictions of multiple approaches when combining profiles generated by different tools. RESULTS: The SigSeeker peak-calling ensemble uses multiple tools to identify peaks, and with user-defined thresholds for peak overlap and signal strength it retains only those peaks that are concordant across multiple tools. Peaks predicted to be co-localized by only a very small number of tools, discovered to be only marginally overlapping, or found to represent significant outliers to the approximation model are removed from the results, providing concise and high quality epigenetic datasets. SigSeeker has been validated using established benchmarks for transcription factor binding and histone modification ChIP-Seq data. These comparisons indicate that the quality of our ensemble technique exceeds that of single tool approaches, enhances existing peak-calling ensembles, and results in epigenetic profiles of higher confidence. AVAILABILITY AND IMPLEMENTATION: http://sigseeker.org. CONTACT: lichtenbergj@mail.nih.gov. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Published by Oxford University Press 2017. This work is written by US Government employees and are in the public domain in the US.
MOTIVATION: Epigenetic data are invaluable when determining the regulatory programs governing a cell. Based on use of next-generation sequencing data for characterizing epigenetic marks and transcription factor binding, numerous peak-calling approaches have been developed to determine sites of genomic significance in these data. Such analyses can produce a large number of false positive predictions, suggesting that sites supported by multiple algorithms provide a stronger foundation for inferring and characterizing regulatory programs associated with the epigenetic data. Few methodologies integrate epigenetic based predictions of multiple approaches when combining profiles generated by different tools. RESULTS: The SigSeeker peak-calling ensemble uses multiple tools to identify peaks, and with user-defined thresholds for peak overlap and signal strength it retains only those peaks that are concordant across multiple tools. Peaks predicted to be co-localized by only a very small number of tools, discovered to be only marginally overlapping, or found to represent significant outliers to the approximation model are removed from the results, providing concise and high quality epigenetic datasets. SigSeeker has been validated using established benchmarks for transcription factor binding and histone modification ChIP-Seq data. These comparisons indicate that the quality of our ensemble technique exceeds that of single tool approaches, enhances existing peak-calling ensembles, and results in epigenetic profiles of higher confidence. AVAILABILITY AND IMPLEMENTATION: http://sigseeker.org. CONTACT: lichtenbergj@mail.nih.gov. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Published by Oxford University Press 2017. This work is written by US Government employees and are in the public domain in the US.
Authors: Cory Y McLean; Dave Bristor; Michael Hiller; Shoa L Clarke; Bruce T Schaar; Craig B Lowe; Aaron M Wenger; Gill Bejerano Journal: Nat Biotechnol Date: 2010-05-02 Impact factor: 54.908
Authors: Stephen G Landt; Georgi K Marinov; Anshul Kundaje; Pouya Kheradpour; Florencia Pauli; Serafim Batzoglou; Bradley E Bernstein; Peter Bickel; James B Brown; Philip Cayting; Yiwen Chen; Gilberto DeSalvo; Charles Epstein; Katherine I Fisher-Aylor; Ghia Euskirchen; Mark Gerstein; Jason Gertz; Alexander J Hartemink; Michael M Hoffman; Vishwanath R Iyer; Youngsook L Jung; Subhradip Karmakar; Manolis Kellis; Peter V Kharchenko; Qunhua Li; Tao Liu; X Shirley Liu; Lijia Ma; Aleksandar Milosavljevic; Richard M Myers; Peter J Park; Michael J Pazin; Marc D Perry; Debasish Raha; Timothy E Reddy; Joel Rozowsky; Noam Shoresh; Arend Sidow; Matthew Slattery; John A Stamatoyannopoulos; Michael Y Tolstorukov; Kevin P White; Simon Xi; Peggy J Farnham; Jason D Lieb; Barbara J Wold; Michael Snyder Journal: Genome Res Date: 2012-09 Impact factor: 9.043
Authors: Amber Hogart; Jens Lichtenberg; Subramanian S Ajay; Stacie Anderson; Elliott H Margulies; David M Bodine Journal: Genome Res Date: 2012-06-08 Impact factor: 9.043
Authors: Elisabeth F Heuston; Cheryl A Keller; Jens Lichtenberg; Belinda Giardine; Stacie M Anderson; Ross C Hardison; David M Bodine Journal: Epigenetics Chromatin Date: 2018-05-28 Impact factor: 4.954