Literature DB >> 16839417

EMD: an ensemble algorithm for discovering regulatory motifs in DNA sequences.

Jianjun Hu1, Yifeng D Yang, Daisuke Kihara.   

Abstract

BACKGROUND: Understanding gene regulatory networks has become one of the central research problems in bioinformatics. More than thirty algorithms have been proposed to identify DNA regulatory sites during the past thirty years. However, the prediction accuracy of these algorithms is still quite low. Ensemble algorithms have emerged as an effective strategy in bioinformatics for improving the prediction accuracy by exploiting the synergetic prediction capability of multiple algorithms.
RESULTS: We proposed a novel clustering-based ensemble algorithm named EMD for de novo motif discovery by combining multiple predictions from multiple runs of one or more base component algorithms. The ensemble approach is applied to the motif discovery problem for the first time. The algorithm is tested on a benchmark dataset generated from E. coli RegulonDB. The EMD algorithm has achieved 22.4% improvement in terms of the nucleotide level prediction accuracy over the best stand-alone component algorithm. The advantage of the EMD algorithm is more significant for shorter input sequences, but most importantly, it always outperforms or at least stays at the same performance level of the stand-alone component algorithms even for longer sequences.
CONCLUSION: We proposed an ensemble approach for the motif discovery problem by taking advantage of the availability of a large number of motif discovery programs. We have shown that the ensemble approach is an effective strategy for improving both sensitivity and specificity, thus the accuracy of the prediction. The advantage of the EMD algorithm is its flexibility in the sense that a new powerful algorithm can be easily added to the system.

Entities:  

Mesh:

Substances:

Year:  2006        PMID: 16839417      PMCID: PMC1539026          DOI: 10.1186/1471-2105-7-342

Source DB:  PubMed          Journal:  BMC Bioinformatics        ISSN: 1471-2105            Impact factor:   3.169


  37 in total

1.  The KEGG resource for deciphering the genome.

Authors:  Minoru Kanehisa; Susumu Goto; Shuichi Kawashima; Yasushi Okuno; Masahiro Hattori
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

2.  RegulonDB (version 4.0): transcriptional regulation, operon organization and growth conditions in Escherichia coli K-12.

Authors:  Heladia Salgado; Socorro Gama-Castro; Agustino Martínez-Antonio; Edgar Díaz-Peredo; Fabiola Sánchez-Solano; Martín Peralta-Gil; Delfino Garcia-Alonso; Verónica Jiménez-Jacinto; Alberto Santos-Zavaleta; César Bonavides-Martínez; Julio Collado-Vides
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

3.  Detection of reliable and unexpected protein fold predictions using 3D-Jury.

Authors:  Krzysztof Ginalski; Leszek Rychlewski
Journal:  Nucleic Acids Res       Date:  2003-07-01       Impact factor: 16.971

Review 4.  Applied bioinformatics for the identification of regulatory elements.

Authors:  Wyeth W Wasserman; Albin Sandelin
Journal:  Nat Rev Genet       Date:  2004-04       Impact factor: 53.242

5.  [Prediction of protein secondary structure by a new joint method].

Authors:  K Nishikawa
Journal:  Seikagaku       Date:  1990-12

6.  Profile analysis: detection of distantly related proteins.

Authors:  M Gribskov; A D McLachlan; D Eisenberg
Journal:  Proc Natl Acad Sci U S A       Date:  1987-07       Impact factor: 11.205

7.  Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment.

Authors:  C E Lawrence; S F Altschul; M S Boguski; J S Liu; A F Neuwald; J C Wootton
Journal:  Science       Date:  1993-10-08       Impact factor: 47.728

8.  Methods in comparative genomics: genome correspondence, gene identification and regulatory motif discovery.

Authors:  Manolis Kellis; Nick Patterson; Bruce Birren; Bonnie Berger; Eric S Lander
Journal:  J Comput Biol       Date:  2004       Impact factor: 1.479

9.  Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation.

Authors:  F P Roth; J D Hughes; P W Estep; G M Church
Journal:  Nat Biotechnol       Date:  1998-10       Impact factor: 54.908

10.  Importing statistical measures into Artemis enhances gene identification in the Leishmania genome project.

Authors:  Gautam Aggarwal; E A Worthey; Paul D McDonagh; Peter J Myler
Journal:  BMC Bioinformatics       Date:  2003-06-07       Impact factor: 3.169

View more
  16 in total

1.  cMonkey2: Automated, systematic, integrated detection of co-regulated gene modules for any organism.

Authors:  David J Reiss; Christopher L Plaisier; Wei-Ju Wu; Nitin S Baliga
Journal:  Nucleic Acids Res       Date:  2015-04-14       Impact factor: 16.971

2.  M are better than one: an ensemble-based motif finder and its application to regulatory element prediction.

Authors:  Chen Yanover; Mona Singh; Elena Zaslavsky
Journal:  Bioinformatics       Date:  2009-02-17       Impact factor: 6.937

Review 3.  Mechanisms and evolution of control logic in prokaryotic transcriptional regulation.

Authors:  Sacha A F T van Hijum; Marnix H Medema; Oscar P Kuipers
Journal:  Microbiol Mol Biol Rev       Date:  2009-09       Impact factor: 11.056

4.  A novel method for protein-protein interaction site prediction using phylogenetic substitution models.

Authors:  David La; Daisuke Kihara
Journal:  Proteins       Date:  2011-10-12

5.  MTAP: the motif tool assessment platform.

Authors:  Daniel Quest; Kathryn Dempsey; Mohammad Shafiullah; Dhundy Bastola; Hesham Ali
Journal:  BMC Bioinformatics       Date:  2008-08-12       Impact factor: 3.169

6.  MotifLab: a tools and data integration workbench for motif discovery and regulatory sequence analysis.

Authors:  Kjetil Klepper; Finn Drabløs
Journal:  BMC Bioinformatics       Date:  2013-01-16       Impact factor: 3.169

7.  Discovering multiple realistic TFBS motifs based on a generalized model.

Authors:  Tak-Ming Chan; Gang Li; Kwong-Sak Leung; Kin-Hong Lee
Journal:  BMC Bioinformatics       Date:  2009-10-07       Impact factor: 3.169

8.  Promzea: a pipeline for discovery of co-regulatory motifs in maize and other plant species and its application to the anthocyanin and phlobaphene biosynthetic pathways and the Maize Development Atlas.

Authors:  Christophe Liseron-Monfils; Tim Lewis; Daniel Ashlock; Paul D McNicholas; François Fauteux; Martina Strömvik; Manish N Raizada
Journal:  BMC Plant Biol       Date:  2013-03-15       Impact factor: 4.215

9.  A Monte Carlo-based framework enhances the discovery and interpretation of regulatory sequence motifs.

Authors:  Phillip Seitzer; Elizabeth G Wilbanks; David J Larsen; Marc T Facciotti
Journal:  BMC Bioinformatics       Date:  2012-11-27       Impact factor: 3.169

10.  Computational prediction of conformational B-cell epitopes from antigen primary structures by ensemble learning.

Authors:  Wen Zhang; Yanqing Niu; Yi Xiong; Meng Zhao; Rongwei Yu; Juan Liu
Journal:  PLoS One       Date:  2012-08-21       Impact factor: 3.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.