Literature DB >> 9918947

Weighting hidden Markov models for maximum discrimination.

R Karchin1, R Hughey.   

Abstract

MOTIVATION: Hidden Markov models can efficiently and automatically build statistical representations of related sequences. Unfortunately, training sets are frequently biased toward one subgroup of sequences, leading to an insufficiently general model. This work evaluates sequence weighting methods based on the maximum-discrimination idea.
RESULTS: One good method scales sequence weights by an exponential that ranges between 0.1 for the best scoring sequence and 1.0 for the worst. Experiments with a curated data set show that while training with one or two sequences performed worse than single-sequence Probabilistic Smith-Waterman, training with five or ten sequences reduced errors by 20% and 51%, respectively. This new version of the SAM HMM suite outperforms HMMer (17% reduction over PSW for 10 training sequences), Meta-MEME (28% reduction), and unweighted SAM (31% reduction). AVAILABILITY: A WWW server, as well as information on obtaining the Sequence Alignment and Modeling (SAM) software suite and additional data from this work, can be found at http://www.cse.ucse. edu/research/compbio/sam.html

Entities:  

Mesh:

Substances:

Year:  1998        PMID: 9918947     DOI: 10.1093/bioinformatics/14.9.772

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  5 in total

1.  The sperm outer dense fiber protein is the 10th member of the superfamily of mammalian small stress proteins.

Authors:  Jean-Marc Fontaine; Joshua S Rest; Michael J Welsh; Rainer Benndorf
Journal:  Cell Stress Chaperones       Date:  2003       Impact factor: 3.667

2.  A sequence sub-sampling algorithm increases the power to detect distant homologues.

Authors:  Catrióna R Johnston; Denis C Shields
Journal:  Nucleic Acids Res       Date:  2005-07-08       Impact factor: 16.971

3.  Improved profile HMM performance by assessment of critical algorithmic features in SAM and HMMER.

Authors:  Markus Wistrand; Erik L L Sonnhammer
Journal:  BMC Bioinformatics       Date:  2005-04-15       Impact factor: 3.169

4.  Subfamily specific conservation profiles for proteins based on n-gram patterns.

Authors:  John K Vries; Xiong Liu
Journal:  BMC Bioinformatics       Date:  2008-01-30       Impact factor: 3.169

5.  Comparative genomics search for losses of long-established genes on the human lineage.

Authors:  Jingchun Zhu; J Zachary Sanborn; Mark Diekhans; Craig B Lowe; Tom H Pringle; David Haussler
Journal:  PLoS Comput Biol       Date:  2007-12       Impact factor: 4.475

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.