| Literature DB >> 17964682 |
Miklós Csurös1, Laurent Noé, Gregory Kucherov.
Abstract
By conventional wisdom, a feature that occurs too often or too rarely in a genome can indicate a functional element. To infer functionality from frequency, it is crucial to precisely characterize occurrences in randomly evolving DNA. We find that the frequency of oligonucleotides in a genomic sequence follows primarily a Pareto-lognormal distribution, which encapsulates lognormal and power-law features found across all known genomes. Such a distribution could be the result of completely random evolution by a copying process. Our characterization of the entire frequency distribution of genomic words opens a way to a more accurate reasoning about their over- and underrepresentation in genomic sequences.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17964682 DOI: 10.1016/j.tig.2007.07.008
Source DB: PubMed Journal: Trends Genet ISSN: 0168-9525 Impact factor: 11.639