Literature DB >> 16448009

Profile-based string kernels for remote homology detection and motif extraction.

Rui Kuang1, Eugene Ie, Ke Wang, Kai Wang, Mahira Siddiqi, Yoav Freund, Christina Leslie.   

Abstract

We introduce novel profile-based string kernels for use with support vector machines (SVMs) for the problems of protein classification and remote homology detection. These kernels use probabilistic profiles, such as those produced by the PSI-BLAST algorithm, to define position-dependent mutation neighborhoods along protein sequences for inexact matching of k-length subsequences ("k-mers") in the data. By use of an efficient data structure, the kernels are fast to compute once the profiles have been obtained. For example, the time needed to run PSI-BLAST in order to build the pro- files is significantly longer than both the kernel computation time and the SVM training time. We present remote homology detection experiments based on the SCOP database where we show that profile-based string kernels used with SVM classifiers strongly outperform all recently presented supervised SVM methods. We also show how we can use the learned SVM classifier to extract "discriminative sequence motifs" -- short regions of the original profile that contribute almost all the weight of the SVM classification score -- and show that these discriminative motifs correspond to meaningful structural features in the protein data. The use of PSI-BLAST profiles can be seen as a semi-supervised learning technique, since PSI-BLAST leverages unlabeled data from a large sequence database to build more informative profiles. Recently presented "cluster kernels" give general semi-supervised methods for improving SVM protein classification performance. We show that our profile kernel results are comparable to cluster kernels while providing much better scalability to large datasets.

Entities:  

Mesh:

Year:  2004        PMID: 16448009     DOI: 10.1109/csb.2004.1332428

Source DB:  PubMed          Journal:  Proc IEEE Comput Syst Bioinform Conf        ISSN: 1551-7497


  18 in total

1.  Machine learning based prediction for peptide drift times in ion mobility spectrometry.

Authors:  Anuj R Shah; Khushbu Agarwal; Erin S Baker; Mudita Singhal; Anoop M Mayampurath; Yehia M Ibrahim; Lars J Kangas; Matthew E Monroe; Rui Zhao; Mikhail E Belov; Gordon A Anderson; Richard D Smith
Journal:  Bioinformatics       Date:  2010-05-21       Impact factor: 6.937

2.  A new prediction strategy for long local protein structures using an original description.

Authors:  Aurélie Bornot; Catherine Etchebest; Alexandre G de Brevern
Journal:  Proteins       Date:  2009-08-15

3.  LocTree3 prediction of localization.

Authors:  Tatyana Goldberg; Maximilian Hecht; Tobias Hamp; Timothy Karl; Guy Yachdav; Nadeem Ahmed; Uwe Altermann; Philipp Angerer; Sonja Ansorge; Kinga Balasz; Michael Bernhofer; Alexander Betz; Laura Cizmadija; Kieu Trinh Do; Julia Gerke; Robert Greil; Vadim Joerdens; Maximilian Hastreiter; Katharina Hembach; Max Herzog; Maria Kalemanov; Michael Kluge; Alice Meier; Hassan Nasir; Ulrich Neumaier; Verena Prade; Jonas Reeb; Aleksandr Sorokoumov; Ilira Troshani; Susann Vorberg; Sonja Waldraff; Jonas Zierer; Henrik Nielsen; Burkhard Rost
Journal:  Nucleic Acids Res       Date:  2014-05-21       Impact factor: 16.971

4.  Efficient alignment-free DNA barcode analytics.

Authors:  Pavel Kuksa; Vladimir Pavlovic
Journal:  BMC Bioinformatics       Date:  2009-11-10       Impact factor: 3.169

Review 5.  Machine learning for in silico virtual screening and chemical genomics: new strategies.

Authors:  Jean-Philippe Vert; Laurent Jacob
Journal:  Comb Chem High Throughput Screen       Date:  2008-09       Impact factor: 1.339

6.  Exploiting physico-chemical properties in string kernels.

Authors:  Nora C Toussaint; Christian Widmer; Oliver Kohlbacher; Gunnar Rätsch
Journal:  BMC Bioinformatics       Date:  2010-10-26       Impact factor: 3.169

7.  Efficient use of unlabeled data for protein sequence classification: a comparative study.

Authors:  Pavel Kuksa; Pai-Hsi Huang; Vladimir Pavlovic
Journal:  BMC Bioinformatics       Date:  2009-04-29       Impact factor: 3.169

8.  LocTree2 predicts localization for all domains of life.

Authors:  Tatyana Goldberg; Tobias Hamp; Burkhard Rost
Journal:  Bioinformatics       Date:  2012-09-15       Impact factor: 6.937

9.  Accurate splice site prediction using support vector machines.

Authors:  Sören Sonnenburg; Gabriele Schweikert; Petra Philips; Jonas Behr; Gunnar Rätsch
Journal:  BMC Bioinformatics       Date:  2007       Impact factor: 3.169

10.  MiRTif: a support vector machine-based microRNA target interaction filter.

Authors:  Yuchen Yang; Yu-Ping Wang; Kuo-Bin Li
Journal:  BMC Bioinformatics       Date:  2008-12-12       Impact factor: 3.169

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.