| Literature DB >> 18443840 |
Michael G Sadovsky1, Julia A Putintseva, Alexander S Shchepanovsky.
Abstract
Information capacity of nucleotide sequences measures the unexpectedness of a continuation of a given string of nucleotides, thus having a sound relation to a variety of biological issues. A continuation is defined in a way maximizing the entropy of the ensemble of such continuations. The capacity is defined as a mutual entropy of real frequency dictionary of a sequence with respect to the one bearing the most expected continuations; it does not depend on the length of strings contained in a dictionary. Various genomes exhibit a multi-minima pattern of the dependence of information capacity on the string length, thus reflecting an order within a sequence. The strings with significant deviation of an expected frequency from the real one are the words of increased information value. Such words exhibit a non-random distribution alongside a sequence, thus making it possible to retrieve the correlation between a structure, and a function encoded within a sequence.Mesh:
Substances:
Year: 2008 PMID: 18443840 DOI: 10.1007/s12064-008-0032-1
Source DB: PubMed Journal: Theory Biosci ISSN: 1431-7613 Impact factor: 1.919