Literature DB >> 22820557

Abstraction Augmented Markov Models.

Cornelia Caragea1, Adrian Silvescu, Doina Caragea, Vasant Honavar.   

Abstract

High accuracy sequence classification often requires the use of higher order Markov models (MMs). However, the number of MM parameters increases exponentially with the range of direct dependencies between sequence elements, thereby increasing the risk of overfitting when the data set is limited in size. We present abstraction augmented Markov models (AAMMs) that effectively reduce the number of numeric parameters of k(th) order MMs by successively grouping strings of length k (i.e., k-grams) into abstraction hierarchies. We evaluate AAMMs on three protein subcellular localization prediction tasks. The results of our experiments show that abstraction makes it possible to construct predictive models that use significantly smaller number of features (by one to three orders of magnitude) as compared to MMs. AAMMs are competitive with and, in some cases, significantly outperform MMs. Moreover, the results show that AAMMs often perform significantly better than variable order Markov models, such as decomposed context tree weighting, prediction by partial match, and probabilistic suffix trees.

Entities:  

Year:  2010        PMID: 22820557      PMCID: PMC3400679          DOI: 10.1109/ICDM.2010.158

Source DB:  PubMed          Journal:  Proc IEEE Int Conf Data Min        ISSN: 1550-4786


  4 in total

1.  Learning accurate and concise naïve Bayes classifiers from attribute value taxonomies and data.

Authors:  J Zhang; D-K Kang; A Silvescu; V Honavar
Journal:  Knowl Inf Syst       Date:  2006-02-01       Impact factor: 2.822

2.  Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms.

Authors: 
Journal:  Neural Comput       Date:  1998-09-15       Impact factor: 2.026

3.  Predicting subcellular localization of proteins based on their N-terminal amino acid sequence.

Authors:  O Emanuelsson; H Nielsen; S Brunak; G von Heijne
Journal:  J Mol Biol       Date:  2000-07-21       Impact factor: 5.469

4.  PSORT-B: Improving protein subcellular localization prediction for Gram-negative bacteria.

Authors:  Jennifer L Gardy; Cory Spencer; Ke Wang; Martin Ester; Gábor E Tusnády; István Simon; Sujun Hua; Katalin deFays; Christophe Lambert; Kenta Nakai; Fiona S L Brinkman
Journal:  Nucleic Acids Res       Date:  2003-07-01       Impact factor: 16.971

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.