Literature DB >> 14990442

Mismatch string kernels for discriminative protein classification.

Christina S Leslie1, Eleazar Eskin, Adiel Cohen, Jason Weston, William Stafford Noble.   

Abstract

MOTIVATION: Classification of proteins sequences into functional and structural families based on sequence homology is a central problem in computational biology. Discriminative supervised machine learning approaches provide good performance, but simplicity and computational efficiency of training and prediction are also important concerns.
RESULTS: We introduce a class of string kernels, called mismatch kernels, for use with support vector machines (SVMs) in a discriminative approach to the problem of protein classification and remote homology detection. These kernels measure sequence similarity based on shared occurrences of fixed-length patterns in the data, allowing for mutations between patterns. Thus, the kernels provide a biologically well-motivated way to compare protein sequences without relying on family-based generative models such as hidden Markov models. We compute the kernels efficiently using a mismatch tree data structure, allowing us to calculate the contributions of all patterns occurring in the data in one pass while traversing the tree. When used with an SVM, the kernels enable fast prediction on test sequences. We report experiments on two benchmark SCOP datasets, where we show that the mismatch kernel used with an SVM classifier performs competitively with state-of-the-art methods for homology detection, particularly when very few training examples are available. Examination of the highest-weighted patterns learned by the SVM classifier recovers biologically important motifs in protein families and superfamilies.

Mesh:

Substances:

Year:  2004        PMID: 14990442     DOI: 10.1093/bioinformatics/btg431

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  112 in total

Review 1.  Methods for biological data integration: perspectives and challenges.

Authors:  Vladimir Gligorijević; Nataša Pržulj
Journal:  J R Soc Interface       Date:  2015-11-06       Impact factor: 4.118

2.  Framework for kernel regularization with application to protein clustering.

Authors:  Fan Lu; Sündüz Keles; Stephen J Wright; Grace Wahba
Journal:  Proc Natl Acad Sci U S A       Date:  2005-08-18       Impact factor: 11.205

3.  Functional census of mutation sequence spaces: the example of p53 cancer rescue mutants.

Authors:  Samuel A Danziger; S Joshua Swamidass; Jue Zeng; Lawrence R Dearth; Qiang Lu; Jonathan H Chen; Jianlin Cheng; Vinh P Hoang; Hiroto Saigo; Ray Luo; Pierre Baldi; Rainer K Brachmann; Richard H Lathrop
Journal:  IEEE/ACM Trans Comput Biol Bioinform       Date:  2006 Apr-Jun       Impact factor: 3.710

4.  Improved prediction of malaria degradomes by supervised learning with SVM and profile kernel.

Authors:  Rui Kuang; Jianying Gu; Hong Cai; Yufeng Wang
Journal:  Genetica       Date:  2008-12-06       Impact factor: 1.082

5.  Classification of nucleotide sequences using support vector machines.

Authors:  Tae-Kun Seo
Journal:  J Mol Evol       Date:  2010-08-26       Impact factor: 2.395

6.  Predicting flexible length linear B-cell epitopes.

Authors:  Yasser El-Manzalawy; Drena Dobbs; Vasant Honavar
Journal:  Comput Syst Bioinformatics Conf       Date:  2008

Review 7.  Penalized feature selection and classification in bioinformatics.

Authors:  Shuangge Ma; Jian Huang
Journal:  Brief Bioinform       Date:  2008-06-18       Impact factor: 11.622

8.  Discriminative prediction of mammalian enhancers from DNA sequence.

Authors:  Dongwon Lee; Rachel Karchin; Michael A Beer
Journal:  Genome Res       Date:  2011-08-29       Impact factor: 9.043

9.  Computational chemogenomics: is it more than inductive transfer?

Authors:  J B Brown; Yasushi Okuno; Gilles Marcou; Alexandre Varnek; Dragos Horvath
Journal:  J Comput Aided Mol Des       Date:  2014-04-27       Impact factor: 3.686

10.  Protein-ligand interaction prediction: an improved chemogenomics approach.

Authors:  Laurent Jacob; Jean-Philippe Vert
Journal:  Bioinformatics       Date:  2008-08-01       Impact factor: 6.937

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.