Literature DB >> 21385621

A new distribution vector and its application in genome clustering.

Bo Zhao1, Rong L He, Stephen S-T Yau.   

Abstract

In this paper we report a novel mathematical method to transform the DNA sequences into the distribution vectors which correspond to points in the sixty dimensional space. Each component of the distribution vector represents the distribution of one kind of nucleotide in k segments of the DNA sequences. The mathematical and statistical properties of the distribution vectors are demonstrated and examined with huge datasets of human DNA sequences and random sequences. The determined expectation and standard deviation can make the mapping stable and practicable. Moreover, we apply the distribution vectors to the clustering of the Haemagglutinin (HA) gene of 60 H1N1 viruses from Human, Swine and Avian, the complete mitochondrial genomes from 80 placental mammals and the complete genomes from 50 bacteria. The 60 H1N1 viruses, 80 placental mammals and 50 bacteria are classified accurately and rapidly compared to the multiple sequence alignment methods. The results indicate that the distribution vectors can reveal the similarity and evolutionary relationship among homologous DNA sequences based on the distances between any two of these distribution vectors. The advantage of fast computation offers the distribution vectors the opportunity to deal with a huge amount of DNA sequences efficiently.
Copyright © 2011 Elsevier Inc. All rights reserved.

Entities:  

Mesh:

Substances:

Year:  2011        PMID: 21385621     DOI: 10.1016/j.ympev.2011.02.020

Source DB:  PubMed          Journal:  Mol Phylogenet Evol        ISSN: 1055-7903            Impact factor:   4.286


  7 in total

Review 1.  New developments of alignment-free sequence comparison: measures, statistics and next-generation sequencing.

Authors:  Kai Song; Jie Ren; Gesine Reinert; Minghua Deng; Michael S Waterman; Fengzhu Sun
Journal:  Brief Bioinform       Date:  2013-09-23       Impact factor: 11.622

2.  An information-based network approach for protein classification.

Authors:  Xiaogeng Wan; Xin Zhao; Stephen S T Yau
Journal:  PLoS One       Date:  2017-03-28       Impact factor: 3.240

3.  Classification of genomic components and prediction of genes of Begomovirus based on subsequence natural vector and support vector machine.

Authors:  Shaojun Pei; Rui Dong; Yiming Bao; Rong Lucy He; Stephen S-T Yau
Journal:  PeerJ       Date:  2020-08-03       Impact factor: 2.984

4.  A novel hierarchical clustering algorithm for gene sequences.

Authors:  Dan Wei; Qingshan Jiang; Yanjie Wei; Shengrui Wang
Journal:  BMC Bioinformatics       Date:  2012-07-23       Impact factor: 3.169

5.  An improved alignment-free model for DNA sequence similarity metric.

Authors:  Junpeng Bao; Ruiyu Yuan; Zhe Bao
Journal:  BMC Bioinformatics       Date:  2014-09-28       Impact factor: 3.169

6.  A protein structural study based on the centrality analysis of protein sequence feature networks.

Authors:  Xiaogeng Wan; Xinying Tan
Journal:  PLoS One       Date:  2021-03-29       Impact factor: 3.240

7.  A study on separation of the protein structural types in amino acid sequence feature spaces.

Authors:  Xiaogeng Wan; Xinying Tan
Journal:  PLoS One       Date:  2019-12-23       Impact factor: 3.240

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.