Literature DB >> 16843686

Clustering DNA sequences by feature vectors.

Libin Liu1, Yee-kin Ho, Stephen Yau.   

Abstract

We represent all DNA sequences as points in twelve-dimensional space in such a way that homologous DNA sequences are clustered together, from which a new genomic space is created for global DNA sequences comparison of millions of genes simultaneously. More specifically, basing on the contents of four nucleotides, their distances from the origin and their distribution along the sequences, a twelve-dimensional vector is given to any DNA sequence. The applicability of this analysis on global comparison of gene structures was tested on myoglobin, beta-globin, histone-4, lysozyme, and rhodopsin families. Members from each family exhibit smaller vector distances relative to the distances of members from different families. The vector distance also distinguishes random sequences generated based on same bases composition. Sequence comparisons showed consistency with the BLAST method. Once the new gene is discovered, we can compute the location of this new gene in our genomic space. It is natural to predict that the properties of this new gene are similar to the properties of known genes that are locating near by. Biologists can do various experiments to test these properties.

Entities:  

Mesh:

Substances:

Year:  2006        PMID: 16843686     DOI: 10.1016/j.ympev.2006.05.019

Source DB:  PubMed          Journal:  Mol Phylogenet Evol        ISSN: 1055-7903            Impact factor:   4.286


  4 in total

1.  A rapid method for characterization of protein relatedness using feature vectors.

Authors:  Kareem Carr; Eleanor Murray; Ebenezer Armah; Rong L He; Stephen S-T Yau
Journal:  PLoS One       Date:  2010-03-05       Impact factor: 3.240

2.  A novel method of characterizing genetic sequences: genome space with biological distance and applications.

Authors:  Mo Deng; Chenglong Yu; Qian Liang; Rong L He; Stephen S-T Yau
Journal:  PLoS One       Date:  2011-03-02       Impact factor: 3.240

3.  An improved alignment-free model for DNA sequence similarity metric.

Authors:  Junpeng Bao; Ruiyu Yuan; Zhe Bao
Journal:  BMC Bioinformatics       Date:  2014-09-28       Impact factor: 3.169

4.  Dynamic order Markov model for categorical sequence clustering.

Authors:  Rongbo Chen; Haojun Sun; Lifei Chen; Jianfei Zhang; Shengrui Wang
Journal:  J Big Data       Date:  2021-12-07
  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.