Literature DB >> 18296465

fdrMotif: identifying cis-elements by an EM algorithm coupled with false discovery rate control.

Leping Li1, Robert L Bass, Yu Liang.   

Abstract

MOTIVATION: Most de novo motif identification methods optimize the motif model first and then separately test the statistical significance of the motif score. In the first stage, a motif abundance parameter needs to be specified or modeled. In the second stage, a Z-score or P-value is used as the test statistic. Error rates under multiple comparisons are not fully considered.
METHODOLOGY: We propose a simple but novel approach, fdrMotif, that selects as many binding sites as possible while controlling a user-specified false discovery rate (FDR). Unlike existing iterative methods, fdrMotif combines model optimization [e.g. position weight matrix (PWM)] and significance testing at each step. By monitoring the proportion of binding sites selected in many sets of background sequences, fdrMotif controls the FDR in the original data. The model is then updated using an expectation (E)- and maximization (M)-like procedure. We propose a new normalization procedure in the E-step for updating the model. This process is repeated until either the model converges or the number of iterations exceeds a maximum.
RESULTS: Simulation studies suggest that our normalization procedure assigns larger weights to the binding sites than do two other commonly used normalization procedures. Furthermore, fdrMotif requires only a user-specified FDR and an initial PWM. When tested on 542 high confidence experimental p53 binding loci, fdrMotif identified 569 p53 binding sites in 505 (93.2%) sequences. In comparison, MEME identified more binding sites but in fewer ChIP sequences than fdrMotif. When tested on 500 sets of simulated 'ChIP' sequences with embedded known p53 binding sites, fdrMotif, compared to MEME, has higher sensitivity with similar positive predictive value. Furthermore, fdrMotif is robust to noise: it selected nearly identical binding sites in data adulterated with 50% added background sequences and the unadulterated data. We suggest that fdrMotif represents an improvement over MEME. AVAILABILITY: C code can be found at: http://www.niehs.nih.gov/research/resources/software/fdrMotif/.

Entities:  

Mesh:

Year:  2008        PMID: 18296465      PMCID: PMC2376047          DOI: 10.1093/bioinformatics/btn009

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  18 in total

1.  BioProspector: discovering conserved DNA motifs in upstream regulatory regions of co-expressed genes.

Authors:  X Liu; D L Brutlag; J S Liu
Journal:  Pac Symp Biocomput       Date:  2001

2.  Truncated product method for combining P-values.

Authors:  D V Zaykin; Lev A Zhivotovsky; P H Westfall; B S Weir
Journal:  Genet Epidemiol       Date:  2002-02       Impact factor: 2.135

3.  A higher-order background model improves the detection of promoter regulatory elements by Gibbs sampling.

Authors:  G Thijs; M Lescot; K Marchal; S Rombauts; B De Moor; P Rouzé; Y Moreau
Journal:  Bioinformatics       Date:  2001-12       Impact factor: 6.937

4.  JASPAR: an open-access database for eukaryotic transcription factor binding profiles.

Authors:  Albin Sandelin; Wynand Alkema; Pär Engström; Wyeth W Wasserman; Boris Lenhard
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

5.  WebLogo: a sequence logo generator.

Authors:  Gavin E Crooks; Gary Hon; John-Marc Chandonia; Steven E Brenner
Journal:  Genome Res       Date:  2004-06       Impact factor: 9.043

6.  Estimation of false discovery rates in multiple testing: application to gene microarray data.

Authors:  Chen-An Tsai; Huey-miin Hsueh; James J Chen
Journal:  Biometrics       Date:  2003-12       Impact factor: 2.571

7.  GAPWM: a genetic algorithm method for optimizing a position weight matrix.

Authors:  Leping Li; Yu Liang; Robert L Bass
Journal:  Bioinformatics       Date:  2007-03-06       Impact factor: 6.937

8.  Analysis of the vertebrate insulator protein CTCF-binding sites in the human genome.

Authors:  Tae Hoon Kim; Ziedulla K Abdullaev; Andrew D Smith; Keith A Ching; Dmitri I Loukinov; Roland D Green; Michael Q Zhang; Victor V Lobanenkov; Bing Ren
Journal:  Cell       Date:  2007-03-23       Impact factor: 41.582

9.  An algorithm for finding protein-DNA binding sites with applications to chromatin-immunoprecipitation microarray experiments.

Authors:  X Shirley Liu; Douglas L Brutlag; Jun S Liu
Journal:  Nat Biotechnol       Date:  2002-07-08       Impact factor: 54.908

10.  Whole-genome cartography of estrogen receptor alpha binding sites.

Authors:  Chin-Yo Lin; Vinsensius B Vega; Jane S Thomsen; Tao Zhang; Say Li Kong; Min Xie; Kuo Ping Chiu; Leonard Lipovich; Daniel H Barnett; Fabio Stossi; Ailing Yeo; Joshy George; Vladimir A Kuznetsov; Yew Kok Lee; Tze Howe Charn; Nallasivam Palanisamy; Lance D Miller; Edwin Cheung; Benita S Katzenellenbogen; Yijun Ruan; Guillaume Bourque; Chia-Lin Wei; Edison T Liu
Journal:  PLoS Genet       Date:  2007-04-17       Impact factor: 5.917

View more
  5 in total

1.  coMOTIF: a mixture framework for identifying transcription factor and a coregulator motif in ChIP-seq data.

Authors:  Mengyuan Xu; Clarice R Weinberg; David M Umbach; Leping Li
Journal:  Bioinformatics       Date:  2011-07-19       Impact factor: 6.937

2.  Single base-pair resolution analysis of DNA binding motif with MoMotif reveals an oncogenic function of CTCF zinc-finger 1 mutation.

Authors:  Benjamin Lebeau; Kaiqiong Zhao; Maika Jangal; Tiejun Zhao; Maria Guerra; Celia M T Greenwood; Michael Witcher
Journal:  Nucleic Acids Res       Date:  2022-08-10       Impact factor: 19.160

3.  GADEM: a genetic algorithm guided formation of spaced dyads coupled with an EM algorithm for motif discovery.

Authors:  Leping Li
Journal:  J Comput Biol       Date:  2009-02       Impact factor: 1.479

4.  Gene coexpression clusters and putative regulatory elements underlying seed storage reserve accumulation in Arabidopsis.

Authors:  Fred Y Peng; Randall J Weselake
Journal:  BMC Genomics       Date:  2011-06-02       Impact factor: 3.969

5.  Genome-wide analysis of coordinated transcript abundance during seed development in different Brassica rapa morphotypes.

Authors:  Ram Kumar Basnet; Natalia Moreno-Pachon; Ke Lin; Johan Bucher; Richard G F Visser; Chris Maliepaard; Guusje Bonnema
Journal:  BMC Genomics       Date:  2013-12-01       Impact factor: 3.969

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.