Literature DB >> 29267972

THiCweed: fast, sensitive detection of sequence features by clustering big datasets.

Ankit Agrawal1, Snehal V Sambare1, Leelavati Narlikar2, Rahul Siddharthan1.   

Abstract

We present THiCweed, a new approach to analyzing transcription factor binding data from high-throughput chromatin immunoprecipitation-sequencing (ChIP-seq) experiments. THiCweed clusters bound regions based on sequence similarity using a divisive hierarchical clustering approach based on sequence similarity within sliding windows, while exploring both strands. ThiCweed is specially geared toward data containing mixtures of motifs, which present a challenge to traditional motif-finders. Our implementation is significantly faster than standard motif-finding programs, able to process 30 000 peaks in 1-2 h, on a single CPU core of a desktop computer. On synthetic data containing mixtures of motifs it is as accurate or more accurate than all other tested programs. THiCweed performs best with large 'window' sizes (≥50 bp), much longer than typical binding sites (7-15 bp). On real data it successfully recovers literature motifs, but also uncovers complex sequence characteristics in flanking DNA, variant motifs and secondary motifs even when they occur in <5% of the input, all of which appear biologically relevant. We also find recurring sequence patterns across diverse ChIP-seq datasets, possibly related to chromatin architecture and looping. THiCweed thus goes beyond traditional motif finding to give new insights into genomic transcription factor-binding complexity.

Entities:  

Mesh:

Substances:

Year:  2018        PMID: 29267972      PMCID: PMC5861420          DOI: 10.1093/nar/gkx1251

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


  32 in total

1.  Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae.

Authors:  J D Hughes; P W Estep; S Tavazoie; G M Church
Journal:  J Mol Biol       Date:  2000-03-10       Impact factor: 5.469

2.  JASPAR: an open-access database for eukaryotic transcription factor binding profiles.

Authors:  Albin Sandelin; Wynand Alkema; Pär Engström; Wyeth W Wasserman; Boris Lenhard
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

3.  The UCSC Genome Browser Database.

Authors:  D Karolchik; R Baertsch; M Diekhans; T S Furey; A Hinrichs; Y T Lu; K M Roskin; M Schwartz; C W Sugnet; D J Thomas; R J Weber; D Haussler; W J Kent
Journal:  Nucleic Acids Res       Date:  2003-01-01       Impact factor: 16.971

4.  Deep and wide digging for binding motifs in ChIP-Seq data.

Authors:  I V Kulakovskiy; V A Boeva; A V Favorov; V J Makeev
Journal:  Bioinformatics       Date:  2010-08-24       Impact factor: 6.937

5.  Comprehensive genome-wide protein-DNA interactions detected at single-nucleotide resolution.

Authors:  Ho Sung Rhee; B Franklin Pugh
Journal:  Cell       Date:  2011-12-09       Impact factor: 41.582

6.  PhyloGibbs: a Gibbs sampling motif finder that incorporates phylogeny.

Authors:  Rahul Siddharthan; Eric D Siggia; Erik van Nimwegen
Journal:  PLoS Comput Biol       Date:  2005-12-09       Impact factor: 4.475

7.  Genome-wide mapping of in vivo protein-DNA interactions.

Authors:  David S Johnson; Ali Mortazavi; Richard M Myers; Barbara Wold
Journal:  Science       Date:  2007-05-31       Impact factor: 47.728

8.  MEME-ChIP: motif analysis of large DNA datasets.

Authors:  Philip Machanick; Timothy L Bailey
Journal:  Bioinformatics       Date:  2011-04-12       Impact factor: 6.937

9.  ChIP-nexus enables improved detection of in vivo transcription factor binding footprints.

Authors:  Qiye He; Jeff Johnston; Julia Zeitlinger
Journal:  Nat Biotechnol       Date:  2015-03-09       Impact factor: 54.908

10.  JASPAR 2016: a major expansion and update of the open-access database of transcription factor binding profiles.

Authors:  Anthony Mathelier; Oriol Fornes; David J Arenillas; Chih-Yu Chen; Grégoire Denay; Jessica Lee; Wenqiang Shi; Casper Shyr; Ge Tan; Rebecca Worsley-Hunt; Allen W Zhang; François Parcy; Boris Lenhard; Albin Sandelin; Wyeth W Wasserman
Journal:  Nucleic Acids Res       Date:  2015-11-03       Impact factor: 16.971

View more
  2 in total

1.  Disentangling transcription factor binding site complexity.

Authors:  Ralf Eggeling
Journal:  Nucleic Acids Res       Date:  2018-11-16       Impact factor: 16.971

2.  A universal framework for detecting cis-regulatory diversity in DNA regions.

Authors:  Anushua Biswas; Leelavati Narlikar
Journal:  Genome Res       Date:  2021-07-19       Impact factor: 9.043

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.