Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 A Comparative Analysis Between k-Mers and Community Detection-Based Features for the Task of Protein Classification.

Literature DB >> 26863669

A Comparative Analysis Between k-Mers and Community Detection-Based Features for the Task of Protein Classification.

Karthik Tangirala, Nic Herndon, Doina Caragea.

Abstract

Machine learning algorithms are widely used to annotate biological sequences. Low-dimensional informative feature vectors can be crucial for the performance of the algorithms. In prior work, we have proposed the use of a community detection approach to construct low dimensional feature sets for nucleotide sequence classification. Our approach used the Hamming distance between short nucleotide subsequences, called k-mers, to construct a network, and subsequently used community detection to identify groups of k -mers that appear frequently in a set of sequences. Whereas this approach worked well for nucleotide sequence classification, it could not be directly used for protein sequences, as the Hamming distance is not a good measure for comparing short protein k-mers. To address this limitation, we extended our prior approach by replacing the Hamming distance with substitution scores. Experimental results in different learning scenarios show that the features generated with the new approach are more informative than k-mers.

Entities: Chemical Disease Species

Mesh：

Substances：
Proteins

Year: 2016 PMID： 26863669 PMCID： PMC6245644 DOI： 10.1109/TNB.2016.2523501

Source DB: PubMed Journal: IEEE Trans Nanobioscience ISSN： 1536-1241 Impact factor: 2.935

Keyword Cloud
References

17 in total

1. Finding composite regulatory patterns in DNA sequences.

Authors: Eleazar Eskin; Pavel A Pevzner
Journal: Bioinformatics Date: 2002 Impact factor: 6.937

Review 2. Community structure in social and biological networks.

Authors: M Girvan; M E J Newman
Journal: Proc Natl Acad Sci U S A Date: 2002-06-11 Impact factor: 11.205

3. PSORTb v.2.0: expanded prediction of bacterial protein subcellular localization and insights gained from comparative proteome analysis.

Authors: J L Gardy; M R Laird; F Chen; S Rey; C J Walsh; M Ester; F S L Brinkman
Journal: Bioinformatics Date: 2004-10-22 Impact factor: 6.937

4. Protein classification based on text document classification techniques.

Authors: Betty Yee Man Cheng; Jaime G Carbonell; Judith Klein-Seetharaman
Journal: Proteins Date: 2005-03-01

5. Weight matrix descriptions of four eukaryotic RNA polymerase II promoter elements derived from 502 unrelated promoter sequences.

Authors: P Bucher
Journal: J Mol Biol Date: 1990-04-20 Impact factor: 5.469

A Comparative Analysis Between k-Mers and Community Detection-Based Features for the Task of Protein Classification.

1. Finding composite regulatory patterns in DNA sequences.

Review 2. Community structure in social and biological networks.

3. PSORTb v.2.0: expanded prediction of bacterial protein subcellular localization and insights gained from comparative proteome analysis.

4. Protein classification based on text document classification techniques.

5. Weight matrix descriptions of four eukaryotic RNA polymerase II promoter elements derived from 502 unrelated promoter sequences.

6. Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment.

7. Predicting subcellular localization of proteins based on their N-terminal amino acid sequence.

8. A survey of motif discovery methods in an integrated framework.

9. DRIMust: a web server for discovering rank imbalanced motifs using suffix trees.

10. A fast weak motif-finding algorithm based on community detection in graphs.