Literature DB >> 12364612

A comparison of profile hidden Markov model procedures for remote homology detection.

Martin Madera1, Julian Gough.   

Abstract

Profile hidden Markov models (HMMs) are amongst the most successful procedures for detecting remote homology between proteins. There are two popular profile HMM programs, HMMER and SAM. Little is known about their performance relative to each other and to the recently improved version of PSI-BLAST. Here we compare the two programs to each other and to non-HMM methods, to determine their relative performance and the features that are important for their success. The quality of the multiple sequence alignments used to build models was the most important factor affecting the overall performance of profile HMMs. The SAM T99 procedure is needed to produce high quality alignments automatically, and the lack of an equivalent component in HMMER makes it less complete as a package. Using the default options and parameters as would be expected of an inexpert user, it was found that from identical alignments SAM consistently produces better models than HMMER and that the relative performance of the model-scoring components varies. On average, HMMER was found to be between one and three times faster than SAM when searching databases larger than 2000 sequences, SAM being faster on smaller ones. Both methods were shown to have effective low complexity and repeat sequence masking using their null models, and the accuracy of their E-values was comparable. It was found that the SAM T99 iterative database search procedure performs better than the most recent version of PSI-BLAST, but that scoring of PSI-BLAST profiles is more than 30 times faster than scoring of SAM models.

Mesh:

Substances:

Year:  2002        PMID: 12364612      PMCID: PMC140544          DOI: 10.1093/nar/gkf544

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


  20 in total

1.  Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure.

Authors:  J Gough; K Karplus; R Hughey; C Chothia
Journal:  J Mol Biol       Date:  2001-11-02       Impact factor: 5.469

2.  Removing near-neighbour redundancy from large protein sequence collections.

Authors:  L Holm; C Sander
Journal:  Bioinformatics       Date:  1998-06       Impact factor: 6.937

3.  Hidden Markov models for sequence analysis: extension and analysis of the basic method.

Authors:  R Hughey; A Krogh
Journal:  Comput Appl Biosci       Date:  1996-04

Review 4.  Hidden Markov models.

Authors:  S R Eddy
Journal:  Curr Opin Struct Biol       Date:  1996-06       Impact factor: 6.809

5.  Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships.

Authors:  S E Brenner; C Chothia; T J Hubbard
Journal:  Proc Natl Acad Sci U S A       Date:  1998-05-26       Impact factor: 11.205

Review 6.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.

Authors:  S F Altschul; T L Madden; A A Schäffer; J Zhang; Z Zhang; W Miller; D J Lipman
Journal:  Nucleic Acids Res       Date:  1997-09-01       Impact factor: 16.971

7.  Determinants of a protein fold. Unique features of the globin amino acid sequences.

Authors:  D Bashford; C Chothia; A M Lesk
Journal:  J Mol Biol       Date:  1987-07-05       Impact factor: 5.469

8.  SCOP: a structural classification of proteins database for the investigation of sequences and structures.

Authors:  A G Murzin; S E Brenner; T Hubbard; C Chothia
Journal:  J Mol Biol       Date:  1995-04-07       Impact factor: 5.469

9.  Hidden Markov models in computational biology. Applications to protein modeling.

Authors:  A Krogh; M Brown; I S Mian; K Sjölander; D Haussler
Journal:  J Mol Biol       Date:  1994-02-04       Impact factor: 5.469

10.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice.

Authors:  J D Thompson; D G Higgins; T J Gibson
Journal:  Nucleic Acids Res       Date:  1994-11-11       Impact factor: 16.971

View more
  66 in total

1.  topoSNP: a topographic database of non-synonymous single nucleotide polymorphisms with and without known disease association.

Authors:  Nathan O Stitziel; T Andrew Binkowski; Yan Yuan Tseng; Simon Kasif; Jie Liang
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

2.  The SUPERFAMILY database in 2004: additions and improvements.

Authors:  Martin Madera; Christine Vogel; Sarah K Kummerfeld; Cyrus Chothia; Julian Gough
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

3.  eShadow: a tool for comparing closely related sequences.

Authors:  Ivan Ovcharenko; Dario Boffelli; Gabriela G Loots
Journal:  Genome Res       Date:  2004-06       Impact factor: 9.043

4.  Alignment of protein sequences by their profiles.

Authors:  Marc A Marti-Renom; M S Madhusudhan; Andrej Sali
Journal:  Protein Sci       Date:  2004-04       Impact factor: 6.725

5.  Expression dynamics of metabolic and regulatory components across stages of panicle and seed development in indica rice.

Authors:  Rita Sharma; Pinky Agarwal; Swatismita Ray; Priyanka Deveshwar; Pooja Sharma; Niharika Sharma; Aashima Nijhawan; Mukesh Jain; Ashok Kumar Singh; Vijay Pal Singh; Jitendra Paul Khurana; Akhilesh Kumar Tyagi; Sanjay Kapoor
Journal:  Funct Integr Genomics       Date:  2012-03-31       Impact factor: 3.410

6.  Identification, phylogeny, and transcript profiling of ERF family genes during development and abiotic stress treatments in tomato.

Authors:  Manoj K Sharma; Rahul Kumar; Amolkumar U Solanke; Rita Sharma; Akhilesh K Tyagi; Arun K Sharma
Journal:  Mol Genet Genomics       Date:  2010-10-05       Impact factor: 3.291

7.  Divergent evolution within protein superfolds inferred from profile-based phylogenetics.

Authors:  Douglas L Theobald; Deborah S Wuttke
Journal:  J Mol Biol       Date:  2005-09-20       Impact factor: 5.469

Review 8.  The limits of protein sequence comparison?

Authors:  William R Pearson; Michael L Sierk
Journal:  Curr Opin Struct Biol       Date:  2005-06       Impact factor: 6.809

9.  Assessing strategies for improved superfamily recognition.

Authors:  Ian Sillitoe; Mark Dibley; James Bray; Sarah Addou; Christine Orengo
Journal:  Protein Sci       Date:  2005-06-03       Impact factor: 6.725

10.  Exploring genomic dark matter: a critical assessment of the performance of homology search methods on noncoding RNA.

Authors:  Eva K Freyhult; Jonathan P Bollback; Paul P Gardner
Journal:  Genome Res       Date:  2006-12-06       Impact factor: 9.043

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.