Literature DB >> 18245127

SVM-HUSTLE--an iterative semi-supervised machine learning approach for pairwise protein remote homology detection.

Anuj R Shah1, Christopher S Oehmen, Bobbie-Jo Webb-Robertson.   

Abstract

MOTIVATION: As the amount of biological sequence data continues to grow exponentially we face the increasing challenge of assigning function to this enormous molecular 'parts list'. The most popular approaches to this challenge make use of the simplifying assumption that similar functional molecules, or proteins, sometimes have similar composition, or sequence. However, these algorithms often fail to identify remote homologs (proteins with similar function but dissimilar sequence) which often are a significant fraction of the total homolog collection for a given sequence. We introduce a Support Vector Machine (SVM)-based tool to detect homology using semi-supervised iterative learning (SVM-HUSTLE) that identifies significantly more remote homologs than current state-of-the-art sequence or cluster-based methods. As opposed to building profiles or position specific scoring matrices, SVM-HUSTLE builds an SVM classifier for a query sequence by training on a collection of representative high-confidence training sets, recruits additional sequences and assigns a statistical measure of homology between a pair of sequences. SVM-HUSTLE combines principles of semi-supervised learning theory with statistical sampling to create many concurrent classifiers to iteratively detect and refine, on-the-fly, patterns indicating homology.
RESULTS: When compared against existing methods for identifying protein homologs (BLAST, PSI-BLAST, COMPASS, PROF_SIM, RANKPROP and their variants) on two different benchmark datasets SVM-HUSTLE significantly outperforms each of the above methods using the most stringent ROC(1) statistic with P-values less than 1e-20. SVM-HUSTLE also yields results comparable to HHSearch but at a substantially reduced computational cost since we do not require the construction of HMMs. AVAILABILITY: The software executable to run SVM-HUSTLE can be downloaded from http://www.sysbio.org/sysbio/networkbio/svm_hustle

Mesh:

Substances:

Year:  2008        PMID: 18245127     DOI: 10.1093/bioinformatics/btn028

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  14 in total

Review 1.  Computational prediction of type III and IV secreted effectors in gram-negative bacteria.

Authors:  Jason E McDermott; Abigail Corrigan; Elena Peterson; Christopher Oehmen; George Niemann; Eric D Cambronne; Danna Sharp; Joshua N Adkins; Ram Samudrala; Fred Heffron
Journal:  Infect Immun       Date:  2010-10-25       Impact factor: 3.441

2.  HHsvm: fast and accurate classification of profile-profile matches identified by HHsearch.

Authors:  Mensur Dlakić
Journal:  Bioinformatics       Date:  2009-09-22       Impact factor: 6.937

3.  Towards site-based protein functional annotations.

Authors:  Seak Fei Lei; Jun Huan
Journal:  Int J Data Min Bioinform       Date:  2010       Impact factor: 0.667

4.  Protein Science Meets Artificial Intelligence: A Systematic Review and a Biochemical Meta-Analysis of an Inter-Field.

Authors:  Jalil Villalobos-Alva; Luis Ochoa-Toledo; Mario Javier Villalobos-Alva; Atocha Aliseda; Fernando Pérez-Escamirosa; Nelly F Altamirano-Bustamante; Francine Ochoa-Fernández; Ricardo Zamora-Solís; Sebastián Villalobos-Alva; Cristina Revilla-Monsalve; Nicolás Kemper-Valverde; Myriam M Altamirano-Bustamante
Journal:  Front Bioeng Biotechnol       Date:  2022-07-07

5.  Physicochemical property distributions for accurate and rapid pairwise protein homology detection.

Authors:  Bobbie-Jo M Webb-Robertson; Kyle G Ratuiste; Christopher S Oehmen
Journal:  BMC Bioinformatics       Date:  2010-03-19       Impact factor: 3.169

6.  FACT: functional annotation transfer between proteins with similar feature architectures.

Authors:  Tina Koestler; Arndt von Haeseler; Ingo Ebersberger
Journal:  BMC Bioinformatics       Date:  2010-08-09       Impact factor: 3.169

7.  Using amino acid physicochemical distance transformation for fast protein remote homology detection.

Authors:  Bin Liu; Xiaolong Wang; Qingcai Chen; Qiwen Dong; Xun Lan
Journal:  PLoS One       Date:  2012-09-28       Impact factor: 3.240

8.  A discriminative method for family-based protein remote homology detection that combines inductive logic programming and propositional models.

Authors:  Juliana S Bernardes; Alessandra Carbone; Gerson Zaverucha
Journal:  BMC Bioinformatics       Date:  2011-03-23       Impact factor: 3.169

9.  A discriminative method for protein remote homology detection and fold recognition combining Top-n-grams and latent semantic analysis.

Authors:  Bin Liu; Xiaolong Wang; Lei Lin; Qiwen Dong; Xuan Wang
Journal:  BMC Bioinformatics       Date:  2008-12-01       Impact factor: 3.169

10.  Accurate prediction of secreted substrates and identification of a conserved putative secretion signal for type III secretion systems.

Authors:  Ram Samudrala; Fred Heffron; Jason E McDermott
Journal:  PLoS Pathog       Date:  2009-04-24       Impact factor: 6.823

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.