Literature DB >> 17964682

Reconsidering the significance of genomic word frequencies.

Miklós Csurös1, Laurent Noé, Gregory Kucherov.   

Abstract

By conventional wisdom, a feature that occurs too often or too rarely in a genome can indicate a functional element. To infer functionality from frequency, it is crucial to precisely characterize occurrences in randomly evolving DNA. We find that the frequency of oligonucleotides in a genomic sequence follows primarily a Pareto-lognormal distribution, which encapsulates lognormal and power-law features found across all known genomes. Such a distribution could be the result of completely random evolution by a copying process. Our characterization of the entire frequency distribution of genomic words opens a way to a more accurate reasoning about their over- and underrepresentation in genomic sequences.

Entities:  

Mesh:

Substances:

Year:  2007        PMID: 17964682     DOI: 10.1016/j.tig.2007.07.008

Source DB:  PubMed          Journal:  Trends Genet        ISSN: 0168-9525            Impact factor:   11.639


  12 in total

1.  The distribution of word matches between Markovian sequences with periodic boundary conditions.

Authors:  Conrad J Burden; Paul Leopardi; Sylvain Forêt
Journal:  J Comput Biol       Date:  2013-10-26       Impact factor: 1.479

2.  Predicting nucleosome binding motif set and analyzing their distributions around functional sites of human genes.

Authors:  Tonglaga Bao; Hong Li; Xiaoqing Zhao; Guoqing Liu
Journal:  Chromosome Res       Date:  2012-07-31       Impact factor: 5.239

3.  Space-efficient representation of genomic k-mer count tables.

Authors:  Yoshihiro Shibuya; Djamal Belazzougui; Gregory Kucherov
Journal:  Algorithms Mol Biol       Date:  2022-03-21       Impact factor: 1.405

4.  Within-genome evolution of REPINs: a new family of miniature mobile DNA in bacteria.

Authors:  Frederic Bertels; Paul B Rainey
Journal:  PLoS Genet       Date:  2011-06-16       Impact factor: 5.917

5.  Algebraic distribution of segmental duplication lengths in whole-genome sequence self-alignments.

Authors:  Kun Gao; Jonathan Miller
Journal:  PLoS One       Date:  2011-07-14       Impact factor: 3.240

6.  Organizational heterogeneity of vertebrate genomes.

Authors:  Svetlana Frenkel; Valery Kirzhner; Abraham Korol
Journal:  PLoS One       Date:  2012-02-27       Impact factor: 3.240

7.  Genomic DNA k-mer spectra: models and modalities.

Authors:  Benny Chor; David Horn; Nick Goldman; Yaron Levy; Tim Massingham
Journal:  Genome Biol       Date:  2009-10-08       Impact factor: 13.583

8.  Evolution of genomic sequence inhomogeneity at mid-range scales.

Authors:  Ashwin Prakash; Samuel S Shepard; Jie He; Benjamin Hart; Miao Chen; Surya P Amarachintha; Olga Mileyeva-Biebesheimer; Jason Bechtel; Alexei Fedorov
Journal:  BMC Genomics       Date:  2009-11-05       Impact factor: 3.969

9.  Comparative analysis of DNA word abundances in four yeast genomes using a novel statistical background model.

Authors:  Ramkumar Hariharan; Reji Simon; M Radhakrishna Pillai; Todd D Taylor
Journal:  PLoS One       Date:  2013-03-05       Impact factor: 3.240

10.  Protein languages differ depending on microorganism lifestyle.

Authors:  Joseph J Grzymski; Adam G Marsh
Journal:  PLoS One       Date:  2014-05-14       Impact factor: 3.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.