Literature DB >> 21383416

Composition vector method based on maximum entropy principle for sequence comparison.

Raymond H Chan1, Tony H Chan, Hau Man Yeung, Roger Wei Wang.   

Abstract

The composition vector (CV) method is an alignment-free method for sequence comparison. Because of its simplicity when compared with multiple sequence alignment methods, the method has been widely discussed lately; and some formulas based on probabilistic models, like Hao’s and Yu’s formulas, have been proposed. In this paper, we improve these formulas by using the entropy principle which can quantify the nonrandomness occurrence of patterns in the sequences. More precisely, existing formulas are used to generate a set of possible formulas from which we choose the one that maximizes the entropy. We give the closed-form solution to the resulting optimization problem. Hence, from any given CV formula, we can find the corresponding one that maximizes the entropy. In particular, we show that Hao’s formula is itself maximizing the entropy and we derive a new entropy-maximizing formula from Yu’s formula. We illustrate the accuracy of our new formula by using both simulated and experimental data sets. For the simulated data sets, our new formula gives the best consensus and significant values for three different kinds of evolution models. For the data set of tetrapod 18S rRNA sequences, our new formula groups the clades of bird and reptile together correctly, where Hao’s and Yu’s formulas failed. Using real data sets with different sizes, we show that our formula is more accurate than Hao’s and Yu’s formulas even for small data sets.

Mesh:

Year:  2011        PMID: 21383416     DOI: 10.1109/TCBB.2011.45

Source DB:  PubMed          Journal:  IEEE/ACM Trans Comput Biol Bioinform        ISSN: 1545-5963            Impact factor:   3.710


  5 in total

Review 1.  Alignment-free genetic sequence comparisons: a review of recent approaches by word analysis.

Authors:  Oliver Bonham-Carter; Joe Steele; Dhundy Bastola
Journal:  Brief Bioinform       Date:  2013-07-31       Impact factor: 11.622

2.  K-mer natural vector and its application to the phylogenetic analysis of genetic sequences.

Authors:  Jia Wen; Raymond H F Chan; Shek-Chung Yau; Rong L He; Stephen S T Yau
Journal:  Gene       Date:  2014-05-22       Impact factor: 3.688

3.  Whole-Genome k-mer Topic Modeling AssociatesBacterial Families.

Authors:  Ernesto Borrayo-Carbajal; Isaias May-Canche; Omar Paredes; J Alejandro Morales; Rebeca Romo-Vázquez; Hugo Vélez-Pérez
Journal:  Genes (Basel)       Date:  2020-02-14       Impact factor: 4.096

4.  CVTree3 Web Server for Whole-genome-based and Alignment-free Prokaryotic Phylogeny and Taxonomy.

Authors:  Guanghong Zuo; Bailin Hao
Journal:  Genomics Proteomics Bioinformatics       Date:  2015-11-10       Impact factor: 7.691

5.  LAF: Logic Alignment Free and its application to bacterial genomes classification.

Authors:  Emanuel Weitschek; Fabio Cunial; Giovanni Felici
Journal:  BioData Min       Date:  2015-12-08       Impact factor: 2.522

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.