Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Weighting hidden Markov models for maximum discrimination.

Literature DB >> 9918947

Weighting hidden Markov models for maximum discrimination.

Abstract

MOTIVATION: Hidden Markov models can efficiently and automatically build statistical representations of related sequences. Unfortunately, training sets are frequently biased toward one subgroup of sequences, leading to an insufficiently general model. This work evaluates sequence weighting methods based on the maximum-discrimination idea.
RESULTS: One good method scales sequence weights by an exponential that ranges between 0.1 for the best scoring sequence and 1.0 for the worst. Experiments with a curated data set show that while training with one or two sequences performed worse than single-sequence Probabilistic Smith-Waterman, training with five or ten sequences reduced errors by 20% and 51%, respectively. This new version of the SAM HMM suite outperforms HMMer (17% reduction over PSW for 10 training sequences), Meta-MEME (28% reduction), and unweighted SAM (31% reduction). AVAILABILITY: A WWW server, as well as information on obtaining the Sequence Alignment and Modeling (SAM) software suite and additional data from this work, can be found at http://www.cse.ucse. edu/research/compbio/sam.html

Entities: Chemical Disease

Mesh：

Substances：
Proteins

Year: 1998 PMID： 9918947 DOI： 10.1093/bioinformatics/14.9.772

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

Keyword Cloud
Cited

5 in total

Weighting hidden Markov models for maximum discrimination.

1. The sperm outer dense fiber protein is the 10th member of the superfamily of mammalian small stress proteins.

2. A sequence sub-sampling algorithm increases the power to detect distant homologues.

3. Improved profile HMM performance by assessment of critical algorithmic features in SAM and HMMER.

4. Subfamily specific conservation profiles for proteins based on n-gram patterns.

5. Comparative genomics search for losses of long-established genes on the human lineage.