Literature DB >> 12016049

Distribution patterns of over-represented k-mers in non-coding yeast DNA.

Steven Hampson1, Dennis Kibler, Pierre Baldi.   

Abstract

MOTIVATION: Over-represented k-mers in genomic DNA regions are often of particular biological interest. For example, over-represented k-mers in co-regulated families of genes are associated with the DNA binding sites of transcription factors. To measure over-representation, we introduce a statistical background model based on single-mismatches, and apply it to the pooled 500 bp ORF Upstream Regions (USRs) of yeast. More importantly, we investigate the context and spatial distribution of over-represented k-mers in yeast USRs.
RESULTS: Single and double-stranded spatial distributions of most over-represented k-mers are highly non-random, and predominantly cluster into a small number of classes that are robust with respect to over-representation measures. Specifically, we show that the three most common distribution patterns can be related to DNA structure, function, and evolution and correspond to: (a) homologous ORF clusters associated with sharply localized distributions; (b) regulatory elements associated with a symmetric broad hill-shaped distribution in the 50-200 bp USR; and (c) runs of As, Ts, and ATs associated with a broad hill-shaped distribution also in the 50-200 bp USR, with extreme structural properties. Analysis of over-representation, homology, localization, and DNA structure are essential components of a general data-mining approach to finding biologically important k-mers in raw genomic DNA and understanding the 'lexicon' of regulatory regions.

Entities:  

Mesh:

Year:  2002        PMID: 12016049     DOI: 10.1093/bioinformatics/18.4.513

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  13 in total

1.  SPA: Simple web tool to assess statistical significance of DNA patterns.

Authors:  H Richard; G Nuel
Journal:  Nucleic Acids Res       Date:  2003-07-01       Impact factor: 16.971

Review 2.  Computational approaches to identify promoters and cis-regulatory elements in plant genomes.

Authors:  Stephane Rombauts; Kobe Florquin; Magali Lescot; Kathleen Marchal; Pierre Rouzé; Yves van de Peer
Journal:  Plant Physiol       Date:  2003-07       Impact factor: 8.340

3.  CisModule: de novo discovery of cis-regulatory modules by hierarchical mixture modeling.

Authors:  Qing Zhou; Wing H Wong
Journal:  Proc Natl Acad Sci U S A       Date:  2004-08-05       Impact factor: 11.205

4.  The wavelet-based cluster analysis for temporal gene expression data.

Authors:  J Z Song; K M Duan; T Ware; M Surette
Journal:  EURASIP J Bioinform Syst Biol       Date:  2007

5.  Integrating regulatory motif discovery and genome-wide expression analysis.

Authors:  Erin M Conlon; X Shirley Liu; Jason D Lieb; Jun S Liu
Journal:  Proc Natl Acad Sci U S A       Date:  2003-03-07       Impact factor: 11.205

6.  Abundant oligonucleotides common to most bacteria.

Authors:  Colin F Davenport; Burkhard Tümmler
Journal:  PLoS One       Date:  2010-03-23       Impact factor: 3.240

7.  Unravelling cis-regulatory elements in the genome of the smallest photosynthetic eukaryote: phylogenetic footprinting in Ostreococcus.

Authors:  Gwenael Piganeau; Klaas Vandepoele; Sébastien Gourbière; Yves Van de Peer; Hervé Moreau
Journal:  J Mol Evol       Date:  2009-08-20       Impact factor: 2.395

8.  Direct mapping of symbolic DNA sequence into frequency domain in global repeat map algorithm.

Authors:  Matko Glunčić; Vladimir Paar
Journal:  Nucleic Acids Res       Date:  2012-09-12       Impact factor: 16.971

9.  Comparative analysis of DNA word abundances in four yeast genomes using a novel statistical background model.

Authors:  Ramkumar Hariharan; Reji Simon; M Radhakrishna Pillai; Todd D Taylor
Journal:  PLoS One       Date:  2013-03-05       Impact factor: 3.240

10.  A new systematic computational approach to predicting target genes of transcription factors.

Authors:  Xinbin Dai; Ji He; Xuechun Zhao
Journal:  Nucleic Acids Res       Date:  2007-06-18       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.