Literature DB >> 10792051

Estimating the probability for a protein to have a new fold: A statistical computational model.

E Portugaly1, M Linial.   

Abstract

Structural genomics aims to solve a large number of protein structures that represent the protein space. Currently an exhaustive solution for all structures seems prohibitively expensive, so the challenge is to define a relatively small set of proteins with new, currently unknown folds. This paper presents a method that assigns each protein with a probability of having an unsolved fold. The method makes extensive use of protomap, a sequence-based classification, and scop, a structure-based classification. According to protomap, the protein space encodes the relationship among proteins as a graph whose vertices correspond to 13,354 clusters of proteins. A representative fold for a cluster with at least one solved protein is determined after superposition of all scop (release 1.37) folds onto protomap clusters. Distances within the protomap graph are computed from each representative fold to the neighboring folds. The distribution of these distances is used to create a statistical model for distances among those folds that are already known and those that have yet to be discovered. The distribution of distances for solved/unsolved proteins is significantly different. This difference makes it possible to use Bayes' rule to derive a statistical estimate that any protein has a yet undetermined fold. Proteins that score the highest probability to represent a new fold constitute the target list for structural determination. Our predicted probabilities for unsolved proteins correlate very well with the proportion of new folds among recently solved structures (new scop 1.39 records) that are disjoint from our original training set.

Mesh:

Substances:

Year:  2000        PMID: 10792051      PMCID: PMC25799          DOI: 10.1073/pnas.090559497

Source DB:  PubMed          Journal:  Proc Natl Acad Sci U S A        ISSN: 0027-8424            Impact factor:   11.205


  30 in total

1.  Estimating the total number of protein folds.

Authors:  S Govindarajan; R Recabarren; R A Goldstein
Journal:  Proteins       Date:  1999-06-01

2.  GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequences.

Authors:  D T Jones
Journal:  J Mol Biol       Date:  1999-04-09       Impact factor: 5.469

Review 3.  Comparing genomes in terms of protein structure: surveys of a finite parts list.

Authors:  M Gerstein; H Hegyi
Journal:  FEMS Microbiol Rev       Date:  1998-10       Impact factor: 16.408

4.  A brighter future for protein structure prediction.

Authors:  P Koehl; M Levitt
Journal:  Nat Struct Biol       Date:  1999-02

Review 5.  100,000 protein structures for the biologist.

Authors:  A Sali
Journal:  Nat Struct Biol       Date:  1998-12

6.  SCOP: a Structural Classification of Proteins database.

Authors:  T J Hubbard; B Ailey; S E Brenner; A G Murzin; C Chothia
Journal:  Nucleic Acids Res       Date:  1999-01-01       Impact factor: 16.971

7.  Homology-based fold predictions for Mycoplasma genitalium proteins.

Authors:  M Huynen; T Doerks; F Eisenhaber; C Orengo; S Sunyaev; Y Yuan; P Bork
Journal:  J Mol Biol       Date:  1998-07-17       Impact factor: 5.469

8.  Structural genomics: keystone for a Human Proteome Project.

Authors:  G T Montelione; S Anderson
Journal:  Nat Struct Biol       Date:  1999-01

9.  Distribution of protein folds in the three superkingdoms of life.

Authors:  Y I Wolf; S E Brenner; P A Bash; E V Koonin
Journal:  Genome Res       Date:  1999-01       Impact factor: 9.043

10.  Estimating the number of protein folds.

Authors:  C Zhang; C DeLisi
Journal:  J Mol Biol       Date:  1998-12-18       Impact factor: 5.469

View more
  5 in total

1.  Streptococcus pneumonia YlxR at 1.35 A shows a putative new fold.

Authors:  J Osipiuk; P Górnicki; L Maj; I Dementieva; R Laskowski; A Joachimiak
Journal:  Acta Crystallogr D Biol Crystallogr       Date:  2001-10-25

2.  ProtoNet: hierarchical classification of the protein space.

Authors:  Ori Sasson; Avishay Vaaknin; Hillel Fleischer; Elon Portugaly; Yonatan Bilu; Nathan Linial; Michal Linial
Journal:  Nucleic Acids Res       Date:  2003-01-01       Impact factor: 16.971

3.  Exploring the sequence-structure protein landscape in the glycosyltransferase family.

Authors:  Ziding Zhang; Sunil Kochhar; Martin Grigorov
Journal:  Protein Sci       Date:  2003-10       Impact factor: 6.725

4.  Functional coverage of the human genome by existing structures, structural genomics targets, and homology models.

Authors:  Lei Xie; Philip E Bourne
Journal:  PLoS Comput Biol       Date:  2005-08-19       Impact factor: 4.475

5.  Structural features and the persistence of acquired proteins.

Authors:  Hema Prasad Narra; Matthew H J Cordes; Howard Ochman
Journal:  Proteomics       Date:  2008-11       Impact factor: 3.984

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.