Literature DB >> 17241946

Primary sequences of proteins from complete genomes display a singular periodicity: Alignment-free N-gram analysis.

Jan P Radomski1, Piotr P Slonimski.   

Abstract

A method is proposed to represent and to analyze complete genome sequences (52 species from procaryotes and eukaryotes), based upon n-gram sequence's frequencies of amino acid pairs (bigrams), separated by a given number of other residues. For each of the species analyzed, it allows us to construct over-abundant and over-deficient occurrence profiles, summarizing amino acid bigram frequencies over the entire genome. The method deals efficiently with a sparseness of statistical representations of individual sequences, and describes every gene sequence in the same way, independently of its length and of the genome sizes. The frequency of over-abundant and over-deficient occurrences of bigrams presents a singular periodicity around 3.5 peptide bonds, suggesting a relation with the alpha helical secondary structure.

Mesh:

Substances:

Year:  2006        PMID: 17241946     DOI: 10.1016/j.crvi.2006.11.001

Source DB:  PubMed          Journal:  C R Biol        ISSN: 1631-0691            Impact factor:   1.583


  2 in total

1.  Word decoding of protein amino Acid sequences with availability analysis: a linguistic approach.

Authors:  Kenta Motomura; Tomohiro Fujita; Motosuke Tsutsumi; Satsuki Kikuzato; Morikazu Nakamura; Joji M Otaki
Journal:  PLoS One       Date:  2012-11-21       Impact factor: 3.240

2.  n-Gram characterization of genomic islands in bacterial genomes.

Authors:  Gordana M Pavlović-Lazetić; Nenad S Mitić; Milos V Beljanski
Journal:  Comput Methods Programs Biomed       Date:  2008-12-19       Impact factor: 5.428

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.