Literature DB >> 15072685

Determination of local statistical significance of patterns in Markov sequences with application to promoter element identification.

Haiyan Huang1, Ming-Chih J Kao, Xianghong Zhou, Jun S Liu, Wing H Wong.   

Abstract

High-level eukaryotic genomes present a particular challenge to the computational identification of transcription factor binding sites (TFBSs) because of their long noncoding regions and large numbers of repeat elements. This is evidenced by the noisy results generated by most current methods. In this paper, we present a p-value-based scoring scheme using probability generating functions to evaluate the statistical significance of potential TFBSs. Furthermore, we introduce the local genomic context into the model so that candidate sites are evaluated based both on their similarities to known binding sites and on their contrasts against their respective local genomic contexts. We demonstrate that our approach is advantageous in the prediction of myogenin and MEF2 binding sites in the human genome. We also apply LMM to large-scale human binding site sequences in situ and found that, compared to current popular methods, LMM analysis can reduce false positive errors by more than 50% without compromising sensitivity. This improvement will be of importance to any subsequent algorithm that aims to detect regulatory modules based on known PSSMs.

Entities:  

Mesh:

Substances:

Year:  2004        PMID: 15072685     DOI: 10.1089/106652704773416858

Source DB:  PubMed          Journal:  J Comput Biol        ISSN: 1066-5277            Impact factor:   1.479


  16 in total

1.  Importance sampling of word patterns in DNA and protein sequences.

Authors:  Hock Peng Chan; Nancy Ruonan Zhang; Louis H Y Chen
Journal:  J Comput Biol       Date:  2010-12       Impact factor: 1.479

2.  Identification of context-dependent motifs by contrasting ChIP binding data.

Authors:  Mike J Mason; Kathrin Plath; Qing Zhou
Journal:  Bioinformatics       Date:  2010-09-23       Impact factor: 6.937

3.  Approximation of sojourn-times via maximal couplings: motif frequency distributions.

Authors:  Manuel E Lladser; Stephen R Chestnut
Journal:  J Math Biol       Date:  2013-06-06       Impact factor: 2.259

4.  Systematic prediction of cis-regulatory elements in the Chlamydomonas reinhardtii genome using comparative genomics.

Authors:  Jun Ding; Xiaoman Li; Haiyan Hu
Journal:  Plant Physiol       Date:  2012-08-22       Impact factor: 8.340

5.  FITBAR: a web tool for the robust prediction of prokaryotic regulons.

Authors:  Jacques Oberto
Journal:  BMC Bioinformatics       Date:  2010-11-11       Impact factor: 3.169

6.  Predicting transcription factor binding sites using local over-representation and comparative genomics.

Authors:  Matthieu Defrance; Hélène Touzet
Journal:  BMC Bioinformatics       Date:  2006-08-31       Impact factor: 3.169

7.  Optimizing the GATA-3 position weight matrix to improve the identification of novel binding sites.

Authors:  Soumyadeep Nandi; Ilya Ioshikhes
Journal:  BMC Genomics       Date:  2012-08-22       Impact factor: 3.969

8.  Efficient and accurate P-value computation for Position Weight Matrices.

Authors:  Hélène Touzet; Jean-Stéphane Varré
Journal:  Algorithms Mol Biol       Date:  2007-12-11       Impact factor: 1.405

9.  An efficient method for statistical significance calculation of transcription factor binding sites.

Authors:  Ziliang Qian; Lingyi Lu; Liu Qi; Yixue Li
Journal:  Bioinformation       Date:  2007-12-30

10.  The identification of complete domains within protein sequences using accurate E-values for semi-global alignment.

Authors:  Maricel G Kann; Sergey L Sheetlin; Yonil Park; Stephen H Bryant; John L Spouge
Journal:  Nucleic Acids Res       Date:  2007-06-27       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.