| Literature DB >> 12870770 |
Valery Kirzhner1, Eviatar Nevo, Abraham Korol, Alexander Bolshoy.
Abstract
We introduce a novel, linguistic-like method of genome analysis. We propose a natural approach to characterizing genomic sequences based on occurrences of fixed length words from a predefined, sufficiently large set of words (strings over the alphabet [A, C, G, T]). A measure based on this approach is called compositional spectrum and is actually a histogram of imperfect word occurrences. Our results assert that the compositional spectrum is an overall characteristic of a long sequence i.e., a complete genome or an uninterrupted part of a chromosome. This attribute is manifested in the similarity of spectra obtained on different stretches of the same genome, and simultaneously in a broad range of dissimilarities between spectral representations of different genomes. High flexibility characterizes this approach due to imperfect matching and as a result sets of relatively long words can be considered. The proposed approach may have various applications in intra- and intergenomic sequence comparisons.Mesh:
Substances:
Year: 2003 PMID: 12870770 DOI: 10.1023/a:1024553109779
Source DB: PubMed Journal: Acta Biotheor ISSN: 0001-5342 Impact factor: 1.774