Literature DB >> 29087447

ChIPWig: a random access-enabling lossless and lossy compression method for ChIP-seq data.

Vida Ravanmehr1, Minji Kim1, Zhiying Wang2, Olgica Milenkovic1.   

Abstract

Motivation: Chromatin immunoprecipitation sequencing (ChIP-seq) experiments are inexpensive and time-efficient, and result in massive datasets that introduce significant storage and maintenance challenges. To address the resulting Big Data problems, we propose a lossless and lossy compression framework specifically designed for ChIP-seq Wig data, termed ChIPWig. ChIPWig enables random access, summary statistics lookups and it is based on the asymptotic theory of optimal point density design for nonuniform quantizers.
Results: We tested the ChIPWig compressor on 10 ChIP-seq datasets generated by the ENCODE consortium. On average, lossless ChIPWig reduced the file sizes to merely 6% of the original, and offered 6-fold compression rate improvement compared to bigWig. The lossy feature further reduced file sizes 2-fold compared to the lossless mode, with little or no effects on peak calling and motif discovery using specialized NarrowPeaks methods. The compression and decompression speed rates are of the order of 0.2 sec/MB using general purpose computers. Availability and implementation: The source code and binaries are freely available for download at https://github.com/vidarmehr/ChIPWig-v2, implemented in C ++. Contact: milenkov@illinois.edu. Supplementary information: Supplementary data are available at Bioinformatics online.

Entities:  

Mesh:

Year:  2018        PMID: 29087447      PMCID: PMC5860022          DOI: 10.1093/bioinformatics/btx685

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  18 in total

1.  CWig: compressed representation of Wiggle/BedGraph format.

Authors:  Do Huy Hoang; Wing-Kin Sung
Journal:  Bioinformatics       Date:  2014-05-27       Impact factor: 6.937

2.  Entropy-scaling search of massive biological data.

Authors:  Y William Yu; Noah M Daniels; David Christian Danko; Bonnie Berger
Journal:  Cell Syst       Date:  2015-08-26       Impact factor: 10.304

3.  The NIH Roadmap Epigenomics Mapping Consortium.

Authors:  Bradley E Bernstein; John A Stamatoyannopoulos; Joseph F Costello; Bing Ren; Aleksandar Milosavljevic; Alexander Meissner; Manolis Kellis; Marco A Marra; Arthur L Beaudet; Joseph R Ecker; Peggy J Farnham; Martin Hirst; Eric S Lander; Tarjei S Mikkelsen; James A Thomson
Journal:  Nat Biotechnol       Date:  2010-10       Impact factor: 54.908

4.  MEME-ChIP: motif analysis of large DNA datasets.

Authors:  Philip Machanick; Timothy L Bailey
Journal:  Bioinformatics       Date:  2011-04-12       Impact factor: 6.937

5.  On the representability of complete genomes by multiple competing finite-context (Markov) models.

Authors:  Armando J Pinho; Paulo J S G Ferreira; António J R Neves; Carlos A C Bastos
Journal:  PLoS One       Date:  2011-06-30       Impact factor: 3.240

6.  Cistrome: an integrative platform for transcriptional regulation studies.

Authors:  Tao Liu; Jorge A Ortiz; Len Taing; Clifford A Meyer; Bernett Lee; Yong Zhang; Hyunjin Shin; Swee S Wong; Jian Ma; Ying Lei; Utz J Pape; Michael Poidinger; Yiwen Chen; Kevin Yeung; Myles Brown; Yaron Turpaz; X Shirley Liu
Journal:  Genome Biol       Date:  2011-08-22       Impact factor: 13.583

7.  Combinatorial activities of SHORT VEGETATIVE PHASE and FLOWERING LOCUS C define distinct modes of flowering regulation in Arabidopsis.

Authors:  Julieta L Mateos; Pedro Madrigal; Kenichi Tsuda; Vimal Rawat; René Richter; Maida Romera-Branchat; Fabio Fornara; Korbinian Schneeberger; Paweł Krajewski; George Coupland
Journal:  Genome Biol       Date:  2015-02-11       Impact factor: 13.583

8.  Design and analysis of ChIP-seq experiments for DNA-binding proteins.

Authors:  Peter V Kharchenko; Michael Y Tolstorukov; Peter J Park
Journal:  Nat Biotechnol       Date:  2008-11-16       Impact factor: 54.908

9.  Practical guidelines for the comprehensive analysis of ChIP-seq data.

Authors:  Timothy Bailey; Pawel Krajewski; Istvan Ladunga; Celine Lefebvre; Qunhua Li; Tao Liu; Pedro Madrigal; Cenny Taslim; Jie Zhang
Journal:  PLoS Comput Biol       Date:  2013-11-14       Impact factor: 4.475

Review 10.  Recent advances in ChIP-seq analysis: from quality management to whole-genome annotation.

Authors:  Ryuichiro Nakato; Katsuhiko Shirahige
Journal:  Brief Bioinform       Date:  2017-03-01       Impact factor: 11.622

View more
  1 in total

1.  Productive visualization of high-throughput sequencing data using the SeqCode open portable platform.

Authors:  Enrique Blanco; Mar González-Ramírez; Luciano Di Croce
Journal:  Sci Rep       Date:  2021-10-01       Impact factor: 4.379

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.