Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Protein sequence comparison based on K-string dictionary.

Literature DB >> 23939466

Protein sequence comparison based on K-string dictionary.

Chenglong Yu¹, Rong L He, Stephen S-T Yau.

Abstract

The current K-string-based protein sequence comparisons require large amounts of computer memory because the dimension of the protein vector representation grows exponentially with K. In this paper, we propose a novel concept, the "K-string dictionary", to solve this high-dimensional problem. It allows us to use a much lower dimensional K-string-based frequency or probability vector to represent a protein, and thus significantly reduce the computer memory requirements for their implementation. Furthermore, based on this new concept, we use Singular Value Decomposition to analyze real protein datasets, and the improved protein vector representation allows us to obtain accurate gene trees.

Keywords: Cardinality; Frequency vector; K-string; MSA; NADH dehydrogenase 1; ND1; SVD; Sequence comparison; Singular Value Decomposition; multiple sequence alignment

Mesh：

Substances：
Proteins

Year: 2013 PMID： 23939466 DOI： 10.1016/j.gene.2013.07.092

Source DB: PubMed Journal: Gene ISSN： 0378-1119 Impact factor: 3.688

Keyword Cloud
Cited

14 in total

1. An information-based network approach for protein classification.

Authors: Xiaogeng Wan; Xin Zhao; Stephen S T Yau
Journal: PLoS One Date: 2017-03-28 Impact factor: 3.240

2. Efficient feature selection and classification of protein sequence data in bioinformatics.

Authors: Muhammad Javed Iqbal; Ibrahima Faye; Brahim Belhaouari Samir; Abas Md Said
Journal: ScientificWorldJournal Date: 2014-06-19

3. An Alignment-Free Algorithm in Comparing the Similarity of Protein Sequences Based on Pseudo-Markov Transition Probabilities among Amino Acids.

Authors: Yushuang Li; Tian Song; Jiasheng Yang; Yi Zhang; Jialiang Yang
Journal: PLoS One Date: 2016-12-05 Impact factor: 3.240

10. Large-Scale Genome Comparison Based on Cumulative Fourier Power and Phase Spectra: Central Moment and Covariance Vector.

Authors: Shaojun Pei; Rui Dong; Rong Lucy He; Stephen S-T Yau
Journal: Comput Struct Biotechnol J Date: 2019-07-11 Impact factor: 7.271

Protein sequence comparison based on K-string dictionary.

1. An information-based network approach for protein classification.

2. Efficient feature selection and classification of protein sequence data in bioinformatics.

3. An Alignment-Free Algorithm in Comparing the Similarity of Protein Sequences Based on Pseudo-Markov Transition Probabilities among Amino Acids.

4. Establishing the phylogeny of Prochlorococcus with a new alignment-free method.

5. A latent genetic subtype of major depression identified by whole-exome genotyping data in a Mexican-American cohort.

6. Alignment-free similarity analysis for protein sequences based on fuzzy integral.

7. Drug-Target Interaction Prediction Based on Drug Fingerprint Information and Protein Sequence.

8. Phylogenetic analysis of H7N9 avian influenza virus based on a novel mathematical descriptor.

9. A novel strategy for clustering major depression individuals using whole-genome sequencing variant data.

10. Large-Scale Genome Comparison Based on Cumulative Fourier Power and Phase Spectra: Central Moment and Covariance Vector.