Literature DB >> 25872217

An Efficient Algorithm for Discovering Motifs in Large DNA Data Sets.

Qiang Yu, Hongwei Huo, Xiaoyang Chen, Haitao Guo, Jeffrey Scott Vitter, Jun Huan.   

Abstract

The planted (l,d) motif discovery has been successfully used to locate transcription factor binding sites in dozens of promoter sequences over the past decade. However, there has not been enough work done in identifying (l,d) motifs in the next-generation sequencing (ChIP-seq) data sets, which contain thousands of input sequences and thereby bring new challenge to make a good identification in reasonable time. To cater this need, we propose a new planted (l,d) motif discovery algorithm named MCES, which identifies motifs by mining and combining emerging substrings. Specially, to handle larger data sets, we design a MapReduce-based strategy to mine emerging substrings distributedly. Experimental results on the simulated data show that i) MCES is able to identify (l,d) motifs efficiently and effectively in thousands to millions of input sequences, and runs faster than the state-of-the-art (l,d) motif discovery algorithms, such as F-motif and TraverStringsR; ii) MCES is able to identify motifs without known lengths, and has a better identification accuracy than the competing algorithm CisFinder. Also, the validity of MCES is tested on real data sets. MCES is freely available at http://sites.google.com/site/feqond/mces.

Entities:  

Mesh:

Substances:

Year:  2015        PMID: 25872217     DOI: 10.1109/TNB.2015.2421340

Source DB:  PubMed          Journal:  IEEE Trans Nanobioscience        ISSN: 1536-1241            Impact factor:   2.935


  5 in total

1.  GSMC: Combining Parallel Gibbs Sampling with Maximal Cliques for Hunting DNA Motif.

Authors:  Chao Pei; Shu-Lin Wang; Jianwen Fang; Wei Zhang
Journal:  J Comput Biol       Date:  2017-11-08       Impact factor: 1.479

2.  PairMotifChIP: A Fast Algorithm for Discovery of Patterns Conserved in Large ChIP-seq Data Sets.

Authors:  Qiang Yu; Hongwei Huo; Dazheng Feng
Journal:  Biomed Res Int       Date:  2016-10-24       Impact factor: 3.411

3.  An Entropy-Based Position Projection Algorithm for Motif Discovery.

Authors:  Yipu Zhang; Ping Wang; Maode Yan
Journal:  Biomed Res Int       Date:  2016-11-02       Impact factor: 3.411

Review 4.  Review of Different Sequence Motif Finding Algorithms.

Authors:  Fatma A Hashim; Mai S Mabrouk; Walid Al-Atabany
Journal:  Avicenna J Med Biotechnol       Date:  2019 Apr-Jun

5.  A survey on deep learning in DNA/RNA motif mining.

Authors:  Ying He; Zhen Shen; Qinhu Zhang; Siguo Wang; De-Shuang Huang
Journal:  Brief Bioinform       Date:  2021-07-20       Impact factor: 11.622

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.