| Literature DB >> 17241946 |
Jan P Radomski1, Piotr P Slonimski.
Abstract
A method is proposed to represent and to analyze complete genome sequences (52 species from procaryotes and eukaryotes), based upon n-gram sequence's frequencies of amino acid pairs (bigrams), separated by a given number of other residues. For each of the species analyzed, it allows us to construct over-abundant and over-deficient occurrence profiles, summarizing amino acid bigram frequencies over the entire genome. The method deals efficiently with a sparseness of statistical representations of individual sequences, and describes every gene sequence in the same way, independently of its length and of the genome sizes. The frequency of over-abundant and over-deficient occurrences of bigrams presents a singular periodicity around 3.5 peptide bonds, suggesting a relation with the alpha helical secondary structure.Mesh:
Substances:
Year: 2006 PMID: 17241946 DOI: 10.1016/j.crvi.2006.11.001
Source DB: PubMed Journal: C R Biol ISSN: 1631-0691 Impact factor: 1.583