Literature DB >> 27118908

Maximum margin classifier working in a set of strings.

Hitoshi Koyano1, Morihiro Hayashida2, Tatsuya Akutsu2.   

Abstract

Numbers and numerical vectors account for a large portion of data. However, recently, the amount of string data generated has increased dramatically. Consequently, classifying string data is a common problem in many fields. The most widely used approach to this problem is to convert strings into numerical vectors using string kernels and subsequently apply a support vector machine that works in a numerical vector space. However, this non-one-to-one conversion involves a loss of information and makes it impossible to evaluate, using probability theory, the generalization error of a learning machine, considering that the given data to train and test the machine are strings generated according to probability laws. In this study, we approach this classification problem by constructing a classifier that works in a set of strings. To evaluate the generalization error of such a classifier theoretically, probability theory for strings is required. Therefore, we first extend a limit theorem for a consensus sequence of strings demonstrated by one of the authors and co-workers in a previous study. Using the obtained result, we then demonstrate that our learning machine classifies strings in an asymptotically optimal manner. Furthermore, we demonstrate the usefulness of our machine in practical data analysis by applying it to predicting protein-protein interactions using amino acid sequences and classifying RNAs by the secondary structure using nucleotide sequences.

Keywords:  bioinformatics; machine learning; probability theory; statistical asymptotics; strings

Year:  2016        PMID: 27118908      PMCID: PMC4841474          DOI: 10.1098/rspa.2015.0551

Source DB:  PubMed          Journal:  Proc Math Phys Eng Sci        ISSN: 1364-5021            Impact factor:   2.704


  28 in total

1.  Lethality and centrality in protein networks.

Authors:  H Jeong; S P Mason; A L Barabási; Z N Oltvai
Journal:  Nature       Date:  2001-05-03       Impact factor: 49.962

2.  Quantifying biodiversity and asymptotics for a sequence of random strings.

Authors:  Hitoshi Koyano; Hirohisa Kishino
Journal:  Phys Rev E Stat Nonlin Soft Matter Phys       Date:  2010-06-07

3.  Combinatorial microRNA target predictions.

Authors:  Azra Krek; Dominic Grün; Matthew N Poy; Rachel Wolf; Lauren Rosenberg; Eric J Epstein; Philip MacMenamin; Isabelle da Piedade; Kristin C Gunsalus; Markus Stoffel; Nikolaus Rajewsky
Journal:  Nat Genet       Date:  2005-04-03       Impact factor: 38.330

4.  Semi-supervised protein classification using cluster kernels.

Authors:  Jason Weston; Christina Leslie; Eugene Ie; Dengyong Zhou; Andre Elisseeff; William Stafford Noble
Journal:  Bioinformatics       Date:  2005-05-19       Impact factor: 6.937

5.  Profile-based string kernels for remote homology detection and motif extraction.

Authors:  Rui Kuang; Eugene Ie; Ke Wang; Kai Wang; Mahira Siddiqi; Yoav Freund; Christina Leslie
Journal:  J Bioinform Comput Biol       Date:  2005-06       Impact factor: 1.122

Review 6.  Anatomy of hot spots in protein interfaces.

Authors:  A A Bogan; K S Thorn
Journal:  J Mol Biol       Date:  1998-07-03       Impact factor: 5.469

7.  Analysis of eukaryotic promoter sequences reveals a systematically occurring CT-signal.

Authors:  N I Larsen; J Engelbrecht; S Brunak
Journal:  Nucleic Acids Res       Date:  1995-04-11       Impact factor: 16.971

8.  Prediction of mammalian microRNA targets.

Authors:  Benjamin P Lewis; I-hung Shih; Matthew W Jones-Rhoades; David P Bartel; Christopher B Burge
Journal:  Cell       Date:  2003-12-26       Impact factor: 41.582

9.  Rfam 12.0: updates to the RNA families database.

Authors:  Eric P Nawrocki; Sarah W Burge; Alex Bateman; Jennifer Daub; Ruth Y Eberhardt; Sean R Eddy; Evan W Floden; Paul P Gardner; Thomas A Jones; John Tate; Robert D Finn
Journal:  Nucleic Acids Res       Date:  2014-11-11       Impact factor: 19.160

10.  3did: a catalog of domain-based interactions of known three-dimensional structure.

Authors:  Roberto Mosca; Arnaud Céol; Amelie Stein; Roger Olivella; Patrick Aloy
Journal:  Nucleic Acids Res       Date:  2013-09-29       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.