Literature DB >> 23939466

Protein sequence comparison based on K-string dictionary.

Chenglong Yu1, Rong L He, Stephen S-T Yau.   

Abstract

The current K-string-based protein sequence comparisons require large amounts of computer memory because the dimension of the protein vector representation grows exponentially with K. In this paper, we propose a novel concept, the "K-string dictionary", to solve this high-dimensional problem. It allows us to use a much lower dimensional K-string-based frequency or probability vector to represent a protein, and thus significantly reduce the computer memory requirements for their implementation. Furthermore, based on this new concept, we use Singular Value Decomposition to analyze real protein datasets, and the improved protein vector representation allows us to obtain accurate gene trees.
© 2013.

Keywords:  Cardinality; Frequency vector; K-string; MSA; NADH dehydrogenase 1; ND1; SVD; Sequence comparison; Singular Value Decomposition; multiple sequence alignment

Mesh:

Substances:

Year:  2013        PMID: 23939466     DOI: 10.1016/j.gene.2013.07.092

Source DB:  PubMed          Journal:  Gene        ISSN: 0378-1119            Impact factor:   3.688


  14 in total

1.  An information-based network approach for protein classification.

Authors:  Xiaogeng Wan; Xin Zhao; Stephen S T Yau
Journal:  PLoS One       Date:  2017-03-28       Impact factor: 3.240

2.  Efficient feature selection and classification of protein sequence data in bioinformatics.

Authors:  Muhammad Javed Iqbal; Ibrahima Faye; Brahim Belhaouari Samir; Abas Md Said
Journal:  ScientificWorldJournal       Date:  2014-06-19

3.  An Alignment-Free Algorithm in Comparing the Similarity of Protein Sequences Based on Pseudo-Markov Transition Probabilities among Amino Acids.

Authors:  Yushuang Li; Tian Song; Jiasheng Yang; Yi Zhang; Jialiang Yang
Journal:  PLoS One       Date:  2016-12-05       Impact factor: 3.240

4.  Establishing the phylogeny of Prochlorococcus with a new alignment-free method.

Authors:  Xin Zhao; Kun Tian; Rong L He; Stephen S-T Yau
Journal:  Ecol Evol       Date:  2017-11-15       Impact factor: 2.912

5.  A latent genetic subtype of major depression identified by whole-exome genotyping data in a Mexican-American cohort.

Authors:  C Yu; M Arcos-Burgos; J Licinio; M-L Wong
Journal:  Transl Psychiatry       Date:  2017-05-16       Impact factor: 6.222

6.  Alignment-free similarity analysis for protein sequences based on fuzzy integral.

Authors:  Ajay Kumar Saw; Binod Chandra Tripathy; Soumyadeep Nandi
Journal:  Sci Rep       Date:  2019-02-26       Impact factor: 4.379

7.  Drug-Target Interaction Prediction Based on Drug Fingerprint Information and Protein Sequence.

Authors:  Yang Li; Yu-An Huang; Zhu-Hong You; Li-Ping Li; Zheng Wang
Journal:  Molecules       Date:  2019-08-19       Impact factor: 4.411

8.  Phylogenetic analysis of H7N9 avian influenza virus based on a novel mathematical descriptor.

Authors:  Yusheng Bai; Tingting Ma; Yuhua Yao; Qi Dai; Ping-an He
Journal:  Biomed Res Int       Date:  2014-06-16       Impact factor: 3.411

9.  A novel strategy for clustering major depression individuals using whole-genome sequencing variant data.

Authors:  Chenglong Yu; Bernhard T Baune; Julio Licinio; Ma-Li Wong
Journal:  Sci Rep       Date:  2017-03-13       Impact factor: 4.379

10.  Large-Scale Genome Comparison Based on Cumulative Fourier Power and Phase Spectra: Central Moment and Covariance Vector.

Authors:  Shaojun Pei; Rui Dong; Rong Lucy He; Stephen S-T Yau
Journal:  Comput Struct Biotechnol J       Date:  2019-07-11       Impact factor: 7.271

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.