Literature DB >> 16317074

Application of latent semantic analysis to protein remote homology detection.

Qi-Wen Dong1, Xiao-Long Wang, Lei Lin.   

Abstract

MOTIVATION: Remote homology detection between protein sequences is a central problem in computational biology. The discriminative method such as the support vector machine (SVM) is one of the most effective methods. Many of the SVM-based methods focus on finding useful representations of protein sequence, using either explicit feature vector representations or kernel functions. Such representations may suffer from the peaking phenomenon in many machine-learning methods because the features are usually very large and noise data may be introduced. Based on these observations, this research focuses on feature extraction and efficient representation of protein vectors for SVM protein classification.
RESULTS: In this study, a latent semantic analysis (LSA) model, which is an efficient feature extraction technique from natural language processing, has been introduced in protein remote homology detection. Several basic building blocks of protein sequences have been investigated as the 'words' of 'protein sequence language', including N-grams, patterns and motifs. Each protein sequence is taken as a 'document' that is composed of bags-of-word. The word-document matrix is constructed first. The LSA is performed on the matrix to produce the latent semantic representation vectors of protein sequences, leading to noise-removal and smart description of protein sequences. The latent semantic representation vectors are then evaluated by SVM. The method is tested on the SCOP 1.53 database. The results show that the LSA model significantly improves the performance of remote homology detection in comparison with the basic formalisms. Furthermore, the performance of this method is comparable with that of the complex kernel methods such as SVM-LA and better than that of other sequence-based methods such as PSI-BLAST and SVM-pairwise.

Mesh:

Substances:

Year:  2005        PMID: 16317074     DOI: 10.1093/bioinformatics/bti801

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  20 in total

1.  Protein remote homology detection by combining Chou's distance-pair pseudo amino acid composition and principal component analysis.

Authors:  Bin Liu; Junjie Chen; Xiaolong Wang
Journal:  Mol Genet Genomics       Date:  2015-04-21       Impact factor: 3.291

2.  Physicochemical property distributions for accurate and rapid pairwise protein homology detection.

Authors:  Bobbie-Jo M Webb-Robertson; Kyle G Ratuiste; Christopher S Oehmen
Journal:  BMC Bioinformatics       Date:  2010-03-19       Impact factor: 3.169

Review 3.  Template-based protein modeling: recent methodological advances.

Authors:  Pankaj R Daga; Ronak Y Patel; Robert J Doerksen
Journal:  Curr Top Med Chem       Date:  2010       Impact factor: 3.295

4.  iDNA-MT: Identification DNA Modification Sites in Multiple Species by Using Multi-Task Learning Based a Neural Network Tool.

Authors:  Xiao Yang; Xiucai Ye; Xuehong Li; Lesong Wei
Journal:  Front Genet       Date:  2021-03-31       Impact factor: 4.599

5.  Using amino acid physicochemical distance transformation for fast protein remote homology detection.

Authors:  Bin Liu; Xiaolong Wang; Qingcai Chen; Qiwen Dong; Xun Lan
Journal:  PLoS One       Date:  2012-09-28       Impact factor: 3.240

6.  A discriminative method for family-based protein remote homology detection that combines inductive logic programming and propositional models.

Authors:  Juliana S Bernardes; Alessandra Carbone; Gerson Zaverucha
Journal:  BMC Bioinformatics       Date:  2011-03-23       Impact factor: 3.169

7.  Motif kernel generated by genetic programming improves remote homology and fold detection.

Authors:  Tony Håndstad; Arne J H Hestnes; Pål Saetrom
Journal:  BMC Bioinformatics       Date:  2007-01-25       Impact factor: 3.169

8.  A discriminative method for protein remote homology detection and fold recognition combining Top-n-grams and latent semantic analysis.

Authors:  Bin Liu; Xiaolong Wang; Lei Lin; Qiwen Dong; Xuan Wang
Journal:  BMC Bioinformatics       Date:  2008-12-01       Impact factor: 3.169

9.  4mCPred-MTL: Accurate Identification of DNA 4mC Sites in Multiple Species Using Multi-Task Deep Learning Based on Multi-Head Attention Mechanism.

Authors:  Rao Zeng; Song Cheng; Minghong Liao
Journal:  Front Cell Dev Biol       Date:  2021-05-10

10.  MiRTif: a support vector machine-based microRNA target interaction filter.

Authors:  Yuchen Yang; Yu-Ping Wang; Kuo-Bin Li
Journal:  BMC Bioinformatics       Date:  2008-12-12       Impact factor: 3.169

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.