Literature DB >> 24532725

EXTREME: an online EM algorithm for motif discovery.

Daniel Quang1, Xiaohui Xie1.   

Abstract

MOTIVATION: Identifying regulatory elements is a fundamental problem in the field of gene transcription. Motif discovery-the task of identifying the sequence preference of transcription factor proteins, which bind to these elements-is an important step in this challenge. MEME is a popular motif discovery algorithm. Unfortunately, MEME's running time scales poorly with the size of the dataset. Experiments such as ChIP-Seq and DNase-Seq are providing a rich amount of information on the binding preference of transcription factors. MEME cannot discover motifs in data from these experiments in a practical amount of time without a compromising strategy such as discarding a majority of the sequences.
RESULTS: We present EXTREME, a motif discovery algorithm designed to find DNA-binding motifs in ChIP-Seq and DNase-Seq data. Unlike MEME, which uses the expectation-maximization algorithm for motif discovery, EXTREME uses the online expectation-maximization algorithm to discover motifs. EXTREME can discover motifs in large datasets in a practical amount of time without discarding any sequences. Using EXTREME on ChIP-Seq and DNase-Seq data, we discover many motifs, including some novel and infrequent motifs that can only be discovered by using the entire dataset. Conservation analysis of one of these novel infrequent motifs confirms that it is evolutionarily conserved and possibly functional.
AVAILABILITY AND IMPLEMENTATION: All source code is available at the Github repository http://github.com/uci-cbcl/EXTREME.
© The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

Mesh:

Substances:

Year:  2014        PMID: 24532725      PMCID: PMC4058924          DOI: 10.1093/bioinformatics/btu093

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  16 in total

1.  Distribution and intensity of constraint in mammalian genomic sequence.

Authors:  Gregory M Cooper; Eric A Stone; George Asimenos; Eric D Green; Serafim Batzoglou; Arend Sidow
Journal:  Genome Res       Date:  2005-06-17       Impact factor: 9.043

2.  Systematic discovery of regulatory motifs in human promoters and 3' UTRs by comparison of several mammals.

Authors:  Xiaohui Xie; Jun Lu; E J Kulbokas; Todd R Golub; Vamsi Mootha; Kerstin Lindblad-Toh; Eric S Lander; Manolis Kellis
Journal:  Nature       Date:  2005-02-27       Impact factor: 49.962

3.  Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project.

Authors:  Ewan Birney; John A Stamatoyannopoulos; Anindya Dutta; Roderic Guigó; Thomas R Gingeras; Elliott H Margulies; Zhiping Weng; Michael Snyder; Emmanouil T Dermitzakis; Robert E Thurman; Michael S Kuehn; Christopher M Taylor; Shane Neph; Christoph M Koch; Saurabh Asthana; Ankit Malhotra; Ivan Adzhubei; Jason A Greenbaum; Robert M Andrews; Paul Flicek; Patrick J Boyle; Hua Cao; Nigel P Carter; Gayle K Clelland; Sean Davis; Nathan Day; Pawandeep Dhami; Shane C Dillon; Michael O Dorschner; Heike Fiegler; Paul G Giresi; Jeff Goldy; Michael Hawrylycz; Andrew Haydock; Richard Humbert; Keith D James; Brett E Johnson; Ericka M Johnson; Tristan T Frum; Elizabeth R Rosenzweig; Neerja Karnani; Kirsten Lee; Gregory C Lefebvre; Patrick A Navas; Fidencio Neri; Stephen C J Parker; Peter J Sabo; Richard Sandstrom; Anthony Shafer; David Vetrie; Molly Weaver; Sarah Wilcox; Man Yu; Francis S Collins; Job Dekker; Jason D Lieb; Thomas D Tullius; Gregory E Crawford; Shamil Sunyaev; William S Noble; Ian Dunham; France Denoeud; Alexandre Reymond; Philipp Kapranov; Joel Rozowsky; Deyou Zheng; Robert Castelo; Adam Frankish; Jennifer Harrow; Srinka Ghosh; Albin Sandelin; Ivo L Hofacker; Robert Baertsch; Damian Keefe; Sujit Dike; Jill Cheng; Heather A Hirsch; Edward A Sekinger; Julien Lagarde; Josep F Abril; Atif Shahab; Christoph Flamm; Claudia Fried; Jörg Hackermüller; Jana Hertel; Manja Lindemeyer; Kristin Missal; Andrea Tanzer; Stefan Washietl; Jan Korbel; Olof Emanuelsson; Jakob S Pedersen; Nancy Holroyd; Ruth Taylor; David Swarbreck; Nicholas Matthews; Mark C Dickson; Daryl J Thomas; Matthew T Weirauch; James Gilbert; Jorg Drenkow; Ian Bell; XiaoDong Zhao; K G Srinivasan; Wing-Kin Sung; Hong Sain Ooi; Kuo Ping Chiu; Sylvain Foissac; Tyler Alioto; Michael Brent; Lior Pachter; Michael L Tress; Alfonso Valencia; Siew Woh Choo; Chiou Yu Choo; Catherine Ucla; Caroline Manzano; Carine Wyss; Evelyn Cheung; Taane G Clark; James B Brown; Madhavan Ganesh; Sandeep Patel; Hari Tammana; Jacqueline Chrast; Charlotte N Henrichsen; Chikatoshi Kai; Jun Kawai; Ugrappa Nagalakshmi; Jiaqian Wu; Zheng Lian; Jin Lian; Peter Newburger; Xueqing Zhang; Peter Bickel; John S Mattick; Piero Carninci; Yoshihide Hayashizaki; Sherman Weissman; Tim Hubbard; Richard M Myers; Jane Rogers; Peter F Stadler; Todd M Lowe; Chia-Lin Wei; Yijun Ruan; Kevin Struhl; Mark Gerstein; Stylianos E Antonarakis; Yutao Fu; Eric D Green; Ulaş Karaöz; Adam Siepel; James Taylor; Laura A Liefer; Kris A Wetterstrand; Peter J Good; Elise A Feingold; Mark S Guyer; Gregory M Cooper; George Asimenos; Colin N Dewey; Minmei Hou; Sergey Nikolaev; Juan I Montoya-Burgos; Ari Löytynoja; Simon Whelan; Fabio Pardi; Tim Massingham; Haiyan Huang; Nancy R Zhang; Ian Holmes; James C Mullikin; Abel Ureta-Vidal; Benedict Paten; Michael Seringhaus; Deanna Church; Kate Rosenbloom; W James Kent; Eric A Stone; Serafim Batzoglou; Nick Goldman; Ross C Hardison; David Haussler; Webb Miller; Arend Sidow; Nathan D Trinklein; Zhengdong D Zhang; Leah Barrera; Rhona Stuart; David C King; Adam Ameur; Stefan Enroth; Mark C Bieda; Jonghwan Kim; Akshay A Bhinge; Nan Jiang; Jun Liu; Fei Yao; Vinsensius B Vega; Charlie W H Lee; Patrick Ng; Atif Shahab; Annie Yang; Zarmik Moqtaderi; Zhou Zhu; Xiaoqin Xu; Sharon Squazzo; Matthew J Oberley; David Inman; Michael A Singer; Todd A Richmond; Kyle J Munn; Alvaro Rada-Iglesias; Ola Wallerman; Jan Komorowski; Joanna C Fowler; Phillippe Couttet; Alexander W Bruce; Oliver M Dovey; Peter D Ellis; Cordelia F Langford; David A Nix; Ghia Euskirchen; Stephen Hartman; Alexander E Urban; Peter Kraus; Sara Van Calcar; Nate Heintzman; Tae Hoon Kim; Kun Wang; Chunxu Qu; Gary Hon; Rosa Luna; Christopher K Glass; M Geoff Rosenfeld; Shelley Force Aldred; Sara J Cooper; Anason Halees; Jane M Lin; Hennady P Shulha; Xiaoling Zhang; Mousheng Xu; Jaafar N S Haidar; Yong Yu; Yijun Ruan; Vishwanath R Iyer; Roland D Green; Claes Wadelius; Peggy J Farnham; Bing Ren; Rachel A Harte; Angie S Hinrichs; Heather Trumbower; Hiram Clawson; Jennifer Hillman-Jackson; Ann S Zweig; Kayla Smith; Archana Thakkapallayil; Galt Barber; Robert M Kuhn; Donna Karolchik; Lluis Armengol; Christine P Bird; Paul I W de Bakker; Andrew D Kern; Nuria Lopez-Bigas; Joel D Martin; Barbara E Stranger; Abigail Woodroffe; Eugene Davydov; Antigone Dimas; Eduardo Eyras; Ingileif B Hallgrímsdóttir; Julian Huppert; Michael C Zody; Gonçalo R Abecasis; Xavier Estivill; Gerard G Bouffard; Xiaobin Guan; Nancy F Hansen; Jacquelyn R Idol; Valerie V B Maduro; Baishali Maskeri; Jennifer C McDowell; Morgan Park; Pamela J Thomas; Alice C Young; Robert W Blakesley; Donna M Muzny; Erica Sodergren; David A Wheeler; Kim C Worley; Huaiyang Jiang; George M Weinstock; Richard A Gibbs; Tina Graves; Robert Fulton; Elaine R Mardis; Richard K Wilson; Michele Clamp; James Cuff; Sante Gnerre; David B Jaffe; Jean L Chang; Kerstin Lindblad-Toh; Eric S Lander; Maxim Koriabine; Mikhail Nefedov; Kazutoyo Osoegawa; Yuko Yoshinaga; Baoli Zhu; Pieter J de Jong
Journal:  Nature       Date:  2007-06-14       Impact factor: 49.962

4.  The value of position-specific priors in motif discovery using MEME.

Authors:  Timothy L Bailey; Mikael Bodén; Tom Whitington; Philip Machanick
Journal:  BMC Bioinformatics       Date:  2010-04-09       Impact factor: 3.169

5.  Genome-wide mapping of in vivo protein-DNA interactions.

Authors:  David S Johnson; Ali Mortazavi; Richard M Myers; Barbara Wold
Journal:  Science       Date:  2007-05-31       Impact factor: 47.728

6.  STEME: efficient EM to find motifs in large data sets.

Authors:  John E Reid; Lorenz Wernisch
Journal:  Nucleic Acids Res       Date:  2011-07-23       Impact factor: 16.971

7.  MEME-ChIP: motif analysis of large DNA datasets.

Authors:  Philip Machanick; Timothy L Bailey
Journal:  Bioinformatics       Date:  2011-04-12       Impact factor: 6.937

8.  DREME: motif discovery in transcription factor ChIP-seq data.

Authors:  Timothy L Bailey
Journal:  Bioinformatics       Date:  2011-05-04       Impact factor: 6.937

9.  RSAT 2011: regulatory sequence analysis tools.

Authors:  Morgane Thomas-Chollier; Matthieu Defrance; Alejandra Medina-Rivera; Olivier Sand; Carl Herrmann; Denis Thieffry; Jacques van Helden
Journal:  Nucleic Acids Res       Date:  2011-07       Impact factor: 16.971

10.  Global mapping of protein-DNA interactions in vivo by digital genomic footprinting.

Authors:  Jay R Hesselberth; Xiaoyu Chen; Zhihong Zhang; Peter J Sabo; Richard Sandstrom; Alex P Reynolds; Robert E Thurman; Shane Neph; Michael S Kuehn; William S Noble; Stanley Fields; John A Stamatoyannopoulos
Journal:  Nat Methods       Date:  2009-03-22       Impact factor: 28.547

View more
  16 in total

1.  ProSampler: an ultrafast and accurate motif finder in large ChIP-seq datasets for combinatory motif discovery.

Authors:  Yang Li; Pengyu Ni; Shaoqiang Zhang; Guojun Li; Zhengchang Su
Journal:  Bioinformatics       Date:  2019-11-01       Impact factor: 6.937

2.  FactorNet: A deep learning framework for predicting cell type specific transcription factor binding from nucleotide-resolution sequential data.

Authors:  Daniel Quang; Xiaohui Xie
Journal:  Methods       Date:  2019-03-26       Impact factor: 3.608

3.  Dynamic Gene Regulatory Networks of Human Myeloid Differentiation.

Authors:  Ricardo N Ramirez; Nicole C El-Ali; Mikayla Anne Mager; Dana Wyman; Ana Conesa; Ali Mortazavi
Journal:  Cell Syst       Date:  2017-03-29       Impact factor: 10.304

4.  BML: a versatile web server for bipartite motif discovery.

Authors:  Mohammad Vahed; Majid Vahed; Lana X Garmire
Journal:  Brief Bioinform       Date:  2022-01-17       Impact factor: 11.622

5.  Modular discovery of monomeric and dimeric transcription factor binding motifs for large data sets.

Authors:  Jarkko Toivonen; Teemu Kivioja; Arttu Jolma; Yimeng Yin; Jussi Taipale; Esko Ukkonen
Journal:  Nucleic Acids Res       Date:  2018-05-04       Impact factor: 16.971

6.  Enhanced stability and polyadenylation of select mRNAs support rapid thermogenesis in the brown fat of a hibernator.

Authors:  Katharine R Grabek; Cecilia Diniz Behn; Gregory S Barsh; Jay R Hesselberth; Sandra L Martin
Journal:  Elife       Date:  2015-01-27       Impact factor: 8.140

7.  Motif signatures in stretch enhancers are enriched for disease-associated genetic variants.

Authors:  Daniel X Quang; Michael R Erdos; Stephen C J Parker; Francis S Collins
Journal:  Epigenetics Chromatin       Date:  2015-07-16       Impact factor: 4.954

8.  DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences.

Authors:  Daniel Quang; Xiaohui Xie
Journal:  Nucleic Acids Res       Date:  2016-04-15       Impact factor: 16.971

9.  An Affinity Propagation-Based DNA Motif Discovery Algorithm.

Authors:  Chunxiao Sun; Hongwei Huo; Qiang Yu; Haitao Guo; Zhigang Sun
Journal:  Biomed Res Int       Date:  2015-08-10       Impact factor: 3.411

10.  RefSelect: a reference sequence selection algorithm for planted (l, d) motif search.

Authors:  Qiang Yu; Hongwei Huo; Ruixing Zhao; Dazheng Feng; Jeffrey Scott Vitter; Jun Huan
Journal:  BMC Bioinformatics       Date:  2016-07-19       Impact factor: 3.169

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.