V D Gusev1, L A Nemytikova, N A Chuzhanova. 1. Institute of Mathematics, Siberian Branch of Russian Academy of Science, Novosibirsk, Russia. gusev@math.nsc.ru
Abstract
MOTIVATION: It is well known that the regulatory regions of genomes are highly repetitive. They are rich in direct, symmetric and complemented repeats, and there is no doubt about the functional significance of these repeats. Among known measures of complexity, the Ziv-Lempel complexity measure reflects most adequately repeats occurring in the text. But this measure does not take into account isomorphic repeats. By isomorphic repeats we mean fragments that are identical (or symmetric) modulo some permutation of the alphabet letters. RESULTS: In this paper, two complexity measures of symbolic sequences are proposed that generalize the Ziv-Lempel complexity measure by taking into account any isomorphic repeats in the text (rather than just direct repeats as in Ziv-Lempel). The first of them, the complexity vector, is designed for small alphabets such as the alphabet of nucleotides. The second is based on a search for the longest isomorphic fragment in the history of sequence synthesis and can be used for alphabets of arbitrary cardinality. These measures have been used for recognition of structural regularities in DNA sequences. Some interesting structures related to the regulatory region of the human growth hormone are reported.
MOTIVATION: It is well known that the regulatory regions of genomes are highly repetitive. They are rich in direct, symmetric and complemented repeats, and there is no doubt about the functional significance of these repeats. Among known measures of complexity, the Ziv-Lempel complexity measure reflects most adequately repeats occurring in the text. But this measure does not take into account isomorphic repeats. By isomorphic repeats we mean fragments that are identical (or symmetric) modulo some permutation of the alphabet letters. RESULTS: In this paper, two complexity measures of symbolic sequences are proposed that generalize the Ziv-Lempel complexity measure by taking into account any isomorphic repeats in the text (rather than just direct repeats as in Ziv-Lempel). The first of them, the complexity vector, is designed for small alphabets such as the alphabet of nucleotides. The second is based on a search for the longest isomorphic fragment in the history of sequence synthesis and can be used for alphabets of arbitrary cardinality. These measures have been used for recognition of structural regularities in DNA sequences. Some interesting structures related to the regulatory region of the humangrowth hormone are reported.
Authors: Olena G Alkhimova; Nina A Mazurok; Tatyana A Potapova; Suren M Zakian; John S Heslop-Harrison; Alexander V Vershinin Journal: Chromosoma Date: 2004-07-15 Impact factor: 4.316
Authors: David Santamarta; Roberto Hornero; Daniel Abásolo; Milton Martínez-Madrigal; Javier Fernández; Jose García-Cosamalón Journal: Childs Nerv Syst Date: 2010-08-03 Impact factor: 1.475
Authors: Lyndon G Rosser; Shane McKee; David S Millar; Hayley Archer; James Hughes; Rachel Butler; Nadia Chuzhanova; David N Cooper; Lazarus P Lazarou Journal: Genomic Med Date: 2008-09-20