Literature DB >> 30195069

Phylogenetic analysis of protein sequences based on a novel k-mer natural vector method.

YuYan Zhang1, Jia Wen2, Stephen S-T Yau3.   

Abstract

Based on the k-mer model for protein sequence, a novel k-mer natural vector method is proposed to characterize the features of k-mers in a protein sequence, in which the numbers and distributions of k-mers are considered. It is proved that the relationship between a protein sequence and its k-mer natural vector is one-to-one. Phylogenetic analysis of protein sequences therefore can be easily performed without requiring evolutionary models or human intervention. In addition, there exists no a criterion to choose a suitable k, and k has a great influence on obtaining results as well as computational complexity. In this paper, a compound k-mer natural vector is utilized to quantify each protein sequence. The results gotten from phylogenetic analysis on three protein datasets demonstrate that our new method can precisely describe the evolutionary relationships of proteins, and greatly heighten the computing efficiency.
Copyright © 2018 Elsevier Inc. All rights reserved.

Entities:  

Keywords:  Natural vector; Neighbor joining; Phylogenetic analysis; Protein sequence; k-mer model

Mesh:

Substances:

Year:  2018        PMID: 30195069     DOI: 10.1016/j.ygeno.2018.08.010

Source DB:  PubMed          Journal:  Genomics        ISSN: 0888-7543            Impact factor:   5.736


  4 in total

1.  An accurate alignment-free protein sequence comparator based on physicochemical properties of amino acids.

Authors:  Saeedeh Akbari Rokn Abadi; Azam Sadat Abdosalehi; Faezeh Pouyamehr; Somayyeh Koohi
Journal:  Sci Rep       Date:  2022-07-01       Impact factor: 4.996

2.  Residue Cluster Classes: A Unified Protein Representation for Efficient Structural and Functional Classification.

Authors:  Fernando Fontove; Gabriel Del Rio
Journal:  Entropy (Basel)       Date:  2020-04-20       Impact factor: 2.524

3.  Organizing the bacterial annotation space with amino acid sequence embeddings.

Authors:  Susanna R Grigson; Jody C McKerral; James G Mitchell; Robert A Edwards
Journal:  BMC Bioinformatics       Date:  2022-09-23       Impact factor: 3.307

4.  FEGS: a novel feature extraction model for protein sequences and its applications.

Authors:  Zengchao Mu; Ting Yu; Xiaoping Liu; Hongyu Zheng; Leyi Wei; Juntao Liu
Journal:  BMC Bioinformatics       Date:  2021-06-03       Impact factor: 3.169

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.