Literature DB >> 14571370

A novel bioinformatic strategy for unveiling hidden genome signatures of eukaryotes: self-organizing map of oligonucleotide frequency.

Takashi Abe1, Shigehiko Kanaya, Makoto Kinouchi, Yuta Ichiba, Tokio Kozuki, Toshimichi Ikemura.   

Abstract

With the increasing amount of available genome sequences, novel tools are needed for comprehensive analysis of species-specific sequence characteristics for a wide variety of genomes. We used an unsupervised neural network algorithm, Kohonen's self-organizing map (SOM), to analyze di- and trinucleotide frequencies in 9 eukaryotic genomes of known sequences (a total of 1.2 Gb); S. cerevisiae, S. pombe, C. elegans, A. thaliana, D. melanogaster, Fugu, and rice, as well as P. falciparum chromosomes 2 and 3, and human chromosomes 14, 20, 21, and 22, that have been almost completely sequenced. Each genomic sequence with different window sizes was encoded as a 16- and 64-dimensional vector giving relative frequencies of di- and trinucleotides, respectively. From analysis of a total of 120,000 nonoverlapping 10-kb sequences and overlapping 100-kb sequences with a moving step size of 10 kb, derived from a total of the 1.2 Gb genomic sequences, clear species-specific separations of most sequences were obtained with the SOMs. The unsupervised algorithm could recognize, in most of the 120,000 10-kb sequences, the species-specific characteristics (key combinations of oligonucleotide frequencies) that are signature representations of each genome. Because the classification power is very high, the SOMs can provide fundamental bioinformatic strategies for extracting a wide range of genomic information that could not otherwise be obtained.

Entities:  

Mesh:

Substances:

Year:  2002        PMID: 14571370

Source DB:  PubMed          Journal:  Genome Inform        ISSN: 0919-9454


  6 in total

1.  Word-based characterization of promoters involved in human DNA repair pathways.

Authors:  Jens Lichtenberg; Edwin Jacox; Joshua D Welch; Kyle Kurz; Xiaoyu Liang; Mary Qu Yang; Frank Drews; Klaus Ecker; Stephen S Lee; Laura Elnitski; Lonnie R Welch
Journal:  BMC Genomics       Date:  2009-07-07       Impact factor: 3.969

2.  GENSTYLE: exploration and analysis of DNA sequences with genomic signature.

Authors:  Bernard Fertil; Matthieu Massin; Sylvain Lespinats; Caroline Devic; Philippe Dumee; Alain Giron
Journal:  Nucleic Acids Res       Date:  2005-07-01       Impact factor: 16.971

3.  Recovery of genomes from metagenomes via a dereplication, aggregation and scoring strategy.

Authors:  Christian M K Sieber; Alexander J Probst; Allison Sharrar; Brian C Thomas; Matthias Hess; Susannah G Tringe; Jillian F Banfield
Journal:  Nat Microbiol       Date:  2018-05-28       Impact factor: 17.745

4.  Genome signatures, self-organizing maps and higher order phylogenies: a parametric analysis.

Authors:  Derek Gatherer
Journal:  Evol Bioinform Online       Date:  2007-09-17       Impact factor: 1.625

5.  A novel bioinformatics method for efficient knowledge discovery by BLSOM from big genomic sequence data.

Authors:  Yu Bai; Yuki Iwasaki; Shigehiko Kanaya; Yue Zhao; Toshimichi Ikemura
Journal:  Biomed Res Int       Date:  2014-04-03       Impact factor: 3.411

6.  WSE, a new sequence distance measure based on word frequencies.

Authors:  Jun Wang; Xiaoqi Zheng
Journal:  Math Biosci       Date:  2008-06-12       Impact factor: 2.144

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.