Literature DB >> 21745534

A mathematical consideration of the word-composition vector method in comparison of biological sequences.

Takuyo Aita1, Yuzuru Husimi, Koichi Nishigaki.   

Abstract

To measure the similarity or dissimilarity between two given biological sequences, several papers proposed metrics based on the "word-composition vector". The essence of these metrics is as follows. First, we count the appearance frequencies of all the K-tuple words throughout each of two given sequences. Then, the two given sequences are transformed into their respective word-composition vectors. Next, the distance metrics, for example the angle between the two vectors, are calculated. A significant issue is to determine the optimal word size K. With a mathematical model of mutational events (including substitutions, insertions, deletions and duplications) that occur in sequences, we analyzed how the angle between the composition vectors depends on the mutational events. We also considered the optimal word size (=resolution) from our original approach. Our results were verified by computational experiments using artificially generated sequences, amino acid sequences of hemoglobin and nucleotide sequences of 16S ribosomal RNA.
Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.

Mesh:

Substances:

Year:  2011        PMID: 21745534     DOI: 10.1016/j.biosystems.2011.06.009

Source DB:  PubMed          Journal:  Biosystems        ISSN: 0303-2647            Impact factor:   1.973


  3 in total

1.  A novel hierarchical clustering algorithm for gene sequences.

Authors:  Dan Wei; Qingshan Jiang; Yanjie Wei; Shengrui Wang
Journal:  BMC Bioinformatics       Date:  2012-07-23       Impact factor: 3.169

2.  An improved alignment-free model for DNA sequence similarity metric.

Authors:  Junpeng Bao; Ruiyu Yuan; Zhe Bao
Journal:  BMC Bioinformatics       Date:  2014-09-28       Impact factor: 3.169

3.  Word decoding of protein amino Acid sequences with availability analysis: a linguistic approach.

Authors:  Kenta Motomura; Tomohiro Fujita; Motosuke Tsutsumi; Satsuki Kikuzato; Morikazu Nakamura; Joji M Otaki
Journal:  PLoS One       Date:  2012-11-21       Impact factor: 3.240

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.