Literature DB >> 9927713

Hidden Markov models for detecting remote protein homologies.

K Karplus1, C Barrett, R Hughey.   

Abstract

MOTIVATION: A new hidden Markov model method (SAM-T98) for finding remote homologs of protein sequences is described and evaluated. The method begins with a single target sequence and iteratively builds a hidden Markov model (HMM) from the sequence and homologs found using the HMM for database search. SAM-T98 is also used to construct model libraries automatically from sequences in structural databases.
METHODS: We evaluate the SAM-T98 method with four datasets. Three of the test sets are fold-recognition tests, where the correct answers are determined by structural similarity. The fourth uses a curated database. The method is compared against WU-BLASTP and against DOUBLE-BLAST, a two-step method similar to ISS, but using BLAST instead of FASTA.
RESULTS: SAM-T98 had the fewest errors in all tests-dramatically so for the fold-recognition tests. At the minimum-error point on the SCOP (Structural Classification of Proteins)-domains test, SAM-T98 got 880 true positives and 68 false positives, DOUBLE-BLAST got 533 true positives with 71 false positives, and WU-BLASTP got 353 true positives with 24 false positives. The method is optimized to recognize superfamilies, and would require parameter adjustment to be used to find family or fold relationships. One key to the performance of the HMM method is a new score-normalization technique that compares the score to the score with a reversed model rather than to a uniform null model. AVAILABILITY: A World Wide Web server, as well as information on obtaining the Sequence Alignment and Modeling (SAM) software suite, can be found at http://www.cse.ucsc.edu/research/compbi o/ CONTACT: karplus@cse.ucsc.edu; http://www.cse.ucsc.edu/karplus

Mesh:

Substances:

Year:  1998        PMID: 9927713     DOI: 10.1093/bioinformatics/14.10.846

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  291 in total

1.  Prediction of a common beta-propeller catalytic domain for fructosyltransferases of different origin and substrate specificity.

Authors:  T Pons; L Hernández; F R Batista; G Chinea
Journal:  Protein Sci       Date:  2000-11       Impact factor: 6.725

2.  DbClustal: rapid and reliable global multiple alignments of protein sequences detected by database searches.

Authors:  J D Thompson; F Plewniak; J Thierry; O Poch
Journal:  Nucleic Acids Res       Date:  2000-08-01       Impact factor: 16.971

3.  Motif-based fold assignment.

Authors:  L Salwinski; D Eisenberg
Journal:  Protein Sci       Date:  2001-12       Impact factor: 6.725

4.  The CATH extended protein-family database: providing structural annotations for genome sequences.

Authors:  Frances M G Pearl; David Lee; James E Bray; Daniel W A Buchan; Adrian J Shepherd; Christine A Orengo
Journal:  Protein Sci       Date:  2002-02       Impact factor: 6.725

Review 5.  Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements.

Authors:  A A Schäffer; L Aravind; T L Madden; S Shavirin; J L Spouge; Y I Wolf; E V Koonin; S F Altschul
Journal:  Nucleic Acids Res       Date:  2001-07-15       Impact factor: 16.971

6.  CDD: a database of conserved domain alignments with links to domain three-dimensional structure.

Authors:  Aron Marchler-Bauer; Anna R Panchenko; Benjamin A Shoemaker; Paul A Thiessen; Lewis Y Geer; Stephen H Bryant
Journal:  Nucleic Acids Res       Date:  2002-01-01       Impact factor: 16.971

7.  SUPERFAMILY: HMMs representing all proteins of known structure. SCOP sequence searches, alignments and genome assignments.

Authors:  Julian Gough; Cyrus Chothia
Journal:  Nucleic Acids Res       Date:  2002-01-01       Impact factor: 16.971

8.  SCOP database in 2002: refinements accommodate structural genomics.

Authors:  Loredana Lo Conte; Steven E Brenner; Tim J P Hubbard; Cyrus Chothia; Alexey G Murzin
Journal:  Nucleic Acids Res       Date:  2002-01-01       Impact factor: 16.971

9.  BLAT--the BLAST-like alignment tool.

Authors:  W James Kent
Journal:  Genome Res       Date:  2002-04       Impact factor: 9.043

10.  Genome-wide analysis of core cell cycle genes in Arabidopsis.

Authors:  Klaas Vandepoele; Jeroen Raes; Lieven De Veylder; Pierre Rouzé; Stephane Rombauts; Dirk Inzé
Journal:  Plant Cell       Date:  2002-04       Impact factor: 11.277

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.