Literature DB >> 16966363

Support vector machine learning from heterogeneous data: an empirical analysis using protein sequence and structure.

Darrin P Lewis1, Tony Jebara, William Stafford Noble.   

Abstract

MOTIVATION: Drawing inferences from large, heterogeneous sets of biological data requires a theoretical framework that is capable of representing, e.g. DNA and protein sequences, protein structures, microarray expression data, various types of interaction networks, etc. Recently, a class of algorithms known as kernel methods has emerged as a powerful framework for combining diverse types of data. The support vector machine (SVM) algorithm is the most popular kernel method, due to its theoretical underpinnings and strong empirical performance on a wide variety of classification tasks. Furthermore, several recently described extensions allow the SVM to assign relative weights to various datasets, depending upon their utilities in performing a given classification task.
RESULTS: In this work, we empirically investigate the performance of the SVM on the task of inferring gene functional annotations from a combination of protein sequence and structure data. Our results suggest that the SVM is quite robust to noise in the input datasets. Consequently, in the presence of only two types of data, an SVM trained from an unweighted combination of datasets performs as well or better than a more sophisticated algorithm that assigns weights to individual data types. Indeed, for this simple case, we can demonstrate empirically that no solution is significantly better than the naive, unweighted average of the two datasets. On the other hand, when multiple noisy datasets are included in the experiment, then the naive approach fares worse than the weighted approach. Our results suggest that for many applications, a naive unweighted sum of kernels may be sufficient. AVAILABILITY: http://noble.gs.washington.edu/proj/seqstruct

Mesh:

Substances:

Year:  2006        PMID: 16966363     DOI: 10.1093/bioinformatics/btl475

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  22 in total

1.  A new prediction strategy for long local protein structures using an original description.

Authors:  Aurélie Bornot; Catherine Etchebest; Alexandre G de Brevern
Journal:  Proteins       Date:  2009-08-15

2.  Tissue-aware data integration approach for the inference of pathway interactions in metazoan organisms.

Authors:  Christopher Y Park; Arjun Krishnan; Qian Zhu; Aaron K Wong; Young-Suk Lee; Olga G Troyanskaya
Journal:  Bioinformatics       Date:  2014-11-26       Impact factor: 6.937

3.  Protein annotation from protein interaction networks and Gene Ontology.

Authors:  Cao D Nguyen; Katheleen J Gardiner; Krzysztof J Cios
Journal:  J Biomed Inform       Date:  2011-05-06       Impact factor: 6.317

4.  Automatic quality of life prediction using electronic medical records.

Authors:  Sergeui Pakhomov; Nilay Shah; Penny Hanson; Saranya Balasubramaniam; Steven A Smith; Steven Allan Smith
Journal:  AMIA Annu Symp Proc       Date:  2008-11-06

5.  Improved microarray-based decision support with graph encoded interactome data.

Authors:  Anneleen Daemen; Marco Signoretto; Olivier Gevaert; Johan A K Suykens; Bart De Moor
Journal:  PLoS One       Date:  2010-04-19       Impact factor: 3.240

6.  A Bayesian integration model of high-throughput proteomics and metabolomics data for improved early detection of microbial infections.

Authors:  Bobbie-Jo M Webb-Robertson; Lee Ann McCue; Nathanial Beagley; Jason E McDermott; David S Wunschel; Susan M Varnum; Jian Zhi Hu; Nancy G Isern; Garry W Buchko; Kathleen Mcateer; Joel G Pounds; Shawn J Skerrett; Denny Liggitt; Charles W Frevert
Journal:  Pac Symp Biocomput       Date:  2009

7.  Use Chou's 5-Step Rule to Predict DNA-Binding Proteins with Evolutionary Information.

Authors:  Weizhong Lu; Zhengwei Song; Yijie Ding; Hongjie Wu; Yan Cao; Yu Zhang; Haiou Li
Journal:  Biomed Res Int       Date:  2020-07-27       Impact factor: 3.411

8.  Investigation of noise-induced instabilities in quantitative biological spectroscopy and its implications for noninvasive glucose monitoring.

Authors:  Ishan Barman; Narahara Chari Dingari; Gajendra Pratap Singh; Jaqueline S Soares; Ramachandra R Dasari; Janusz M Smulko
Journal:  Anal Chem       Date:  2012-09-19       Impact factor: 6.986

9.  Enzyme classification with peptide programs: a comparative study.

Authors:  Daniel Faria; António E N Ferreira; André O Falcão
Journal:  BMC Bioinformatics       Date:  2009-07-24       Impact factor: 3.169

10.  Prediction of candidate primary immunodeficiency disease genes using a support vector machine learning approach.

Authors:  Shivakumar Keerthikumar; Sahely Bhadra; Kumaran Kandasamy; Rajesh Raju; Y L Ramachandra; Chiranjib Bhattacharyya; Kohsuke Imai; Osamu Ohara; Sujatha Mohan; Akhilesh Pandey
Journal:  DNA Res       Date:  2009-10-03       Impact factor: 4.458

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.