Literature DB >> 11697912

Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure.

J Gough1, K Karplus, R Hughey, C Chothia.   

Abstract

Of the sequence comparison methods, profile-based methods perform with greater selectively than those that use pairwise comparisons. Of the profile methods, hidden Markov models (HMMs) are apparently the best. The first part of this paper describes calculations that (i) improve the performance of HMMs and (ii) determine a good procedure for creating HMMs for sequences of proteins of known structure. For a family of related proteins, more homologues are detected using multiple models built from diverse single seed sequences than from one model built from a good alignment of those sequences. A new procedure is described for detecting and correcting those errors that arise at the model-building stage of the procedure. These two improvements greatly increase selectivity and coverage. The second part of the paper describes the construction of a library of HMMs, called SUPERFAMILY, that represent essentially all proteins of known structure. The sequences of the domains in proteins of known structure, that have identities less than 95 %, are used as seeds to build the models. Using the current data, this gives a library with 4894 models. The third part of the paper describes the use of the SUPERFAMILY model library to annotate the sequences of over 50 genomes. The models match twice as many target sequences as are matched by pairwise sequence comparison methods. For each genome, close to half of the sequences are matched in all or in part and, overall, the matches cover 35 % of eukaryotic genomes and 45 % of bacterial genomes. On average roughly 15% of genome sequences are labelled as being hypothetical yet homologous to proteins of known structure. The annotations derived from these matches are available from a public web server at: http://stash.mrc-lmb.cam.ac.uk/SUPERFAMILY. This server also enables users to match their own sequences against the SUPERFAMILY model library. Copyright 2001 Academic Press.

Mesh:

Substances:

Year:  2001        PMID: 11697912     DOI: 10.1006/jmbi.2001.5080

Source DB:  PubMed          Journal:  J Mol Biol        ISSN: 0022-2836            Impact factor:   5.469


  509 in total

1.  SUPERFAMILY: HMMs representing all proteins of known structure. SCOP sequence searches, alignments and genome assignments.

Authors:  Julian Gough; Cyrus Chothia
Journal:  Nucleic Acids Res       Date:  2002-01-01       Impact factor: 16.971

2.  SCOP database in 2002: refinements accommodate structural genomics.

Authors:  Loredana Lo Conte; Steven E Brenner; Tim J P Hubbard; Cyrus Chothia; Alexey G Murzin
Journal:  Nucleic Acids Res       Date:  2002-01-01       Impact factor: 16.971

3.  Gene3D: structural assignment for whole genes and genomes using the CATH domain structure database.

Authors:  Daniel W A Buchan; Adrian J Shepherd; David Lee; Frances M G Pearl; Stuart C G Rison; Janet M Thornton; Christine A Orengo
Journal:  Genome Res       Date:  2002-03       Impact factor: 9.043

4.  Structural characterization of the human proteome.

Authors:  Arne Müller; Robert M MacCallum; Michael J E Sternberg
Journal:  Genome Res       Date:  2002-11       Impact factor: 9.043

5.  Multi-domain protein families and domain pairs: comparison with known structures and a random model of domain recombination.

Authors:  Gordana Apic; Wolfgang Huber; Sarah A Teichmann
Journal:  J Struct Funct Genomics       Date:  2003

6.  Functional map and domain structure of MET, the product of the c-met protooncogene and receptor for hepatocyte growth factor/scatter factor.

Authors:  Ermanno Gherardi; Mark E Youles; Ricardo N Miguel; Tom L Blundell; Luisa Iamele; Julian Gough; Abhishek Bandyopadhyay; Guido Hartmann; P Jonathan G Butler
Journal:  Proc Natl Acad Sci U S A       Date:  2003-10-03       Impact factor: 11.205

7.  Structure of RdxA--an oxygen-insensitive nitroreductase essential for metronidazole activation in Helicobacter pylori.

Authors:  Marta Martínez-Júlvez; Adriana L Rojas; Igor Olekhnovich; Vladimir Espinosa Angarica; Paul S Hoffman; Javier Sancho
Journal:  FEBS J       Date:  2012-11-07       Impact factor: 5.542

8.  Crystal structure of YfeU protein from Haemophilus influenzae: a predicted etherase involved in peptidoglycan recycling.

Authors:  Y Kim; P Quartey; R Ng; T I Zarembinski; A Joachimiak
Journal:  J Struct Funct Genomics       Date:  2009-02-21

9.  Novel activator of mannose-specific phosphotransferase system permease expression in Listeria innocua, identified by screening for pediocin AcH resistance.

Authors:  Junfeng Xue; Ian Hunter; Tori Steinmetz; Adam Peters; Bibek Ray; Kurt W Miller
Journal:  Appl Environ Microbiol       Date:  2005-03       Impact factor: 4.792

10.  ccm2-like is required for cardiovascular development as a novel component of the Heg-CCM pathway.

Authors:  Jonathan N Rosen; Vanessa M Sogah; Lillian Y Ye; John D Mably
Journal:  Dev Biol       Date:  2013-01-15       Impact factor: 3.582

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.