Literature DB >> 10966805

A Bayesian system integrating expression data with sequence patterns for localizing proteins: comprehensive application to the yeast genome.

A Drawid1, M Gerstein.   

Abstract

We develop a probabilistic system for predicting the subcellular localization of proteins and estimating the relative population of the various compartments in yeast. Our system employs a Bayesian approach, updating a protein's probability of being in a compartment, based on a diverse range of 30 features. These range from specific motifs (e.g. signal sequences or the HDEL motif) to overall properties of a sequence (e.g. surface composition or isoelectric point) to whole-genome data (e.g. absolute mRNA expression levels or their fluctuations). The strength of our approach is the easy integration of many features, particularly the whole-genome expression data. We construct a training and testing set of approximately 1300 yeast proteins with an experimentally known localization from merging, filtering, and standardizing the annotation in the MIPS, Swiss-Prot and YPD databases, and we achieve 75 % accuracy on individual protein predictions using this dataset. Moreover, we are able to estimate the relative protein population of the various compartments without requiring a definite localization for every protein. This approach, which is based on an analogy to formalism in quantum mechanics, gives better accuracy in determining relative compartment populations than that obtained by simply tallying the localization predictions for individual proteins (on the yeast proteins with known localization, 92% versus 74%). Our training and testing also highlights which of the 30 features are informative and which are redundant (19 being particularly useful). After developing our system, we apply it to the 4700 yeast proteins with currently unknown localization and estimate the relative population of the various compartments in the entire yeast genome. An unbiased prior is essential to this extrapolated estimate; for this, we use the MIPS localization catalogue, and adapt recent results on the localization of yeast proteins obtained by Snyder and colleagues using a minitransposon system. Our final localizations for all approximately 6000 proteins in the yeast genome are available over the web at: http://bioinfo.mbb.yale. edu/genome/localize. Copyright 2000 Academic Press.

Entities:  

Mesh:

Substances:

Year:  2000        PMID: 10966805     DOI: 10.1006/jmbi.2000.3968

Source DB:  PubMed          Journal:  J Mol Biol        ISSN: 0022-2836            Impact factor:   5.469


  32 in total

1.  Relating whole-genome expression data with protein-protein interactions.

Authors:  Ronald Jansen; Dov Greenbaum; Mark Gerstein
Journal:  Genome Res       Date:  2002-01       Impact factor: 9.043

2.  Annotation transfer for genomics: measuring functional divergence in multi-domain proteins.

Authors:  H Hegyi; M Gerstein
Journal:  Genome Res       Date:  2001-10       Impact factor: 9.043

3.  GeneCensus: genome comparisons in terms of metabolic pathway activity and protein family sharing.

Authors:  J Lin; J Qian; D Greenbaum; P Bertone; R Das; N Echols; A Senes; B Stenger; M Gerstein
Journal:  Nucleic Acids Res       Date:  2002-10-15       Impact factor: 16.971

4.  Sequence conserved for subcellular localization.

Authors:  Rajesh Nair; Burkhard Rost
Journal:  Protein Sci       Date:  2002-12       Impact factor: 6.725

5.  Integration of genomic datasets to predict protein complexes in yeast.

Authors:  Ronald Jansen; Ning Lan; Jiang Qian; Mark Gerstein
Journal:  J Struct Funct Genomics       Date:  2002

6.  Predicting protein cellular localization using a domain projection method.

Authors:  Richard Mott; Jörg Schultz; Peer Bork; Chris P Ponting
Journal:  Genome Res       Date:  2002-08       Impact factor: 9.043

7.  MITOPRED: a web server for the prediction of mitochondrial proteins.

Authors:  Chittibabu Guda; Purnima Guda; Eoin Fahy; Shankar Subramaniam
Journal:  Nucleic Acids Res       Date:  2004-07-01       Impact factor: 16.971

8.  Predicting subcellular localization via protein motif co-occurrence.

Authors:  Michelle S Scott; David Y Thomas; Michael T Hallett
Journal:  Genome Res       Date:  2004-10       Impact factor: 9.043

9.  Subcellular localization of Gram-negative bacterial proteins using sparse learning.

Authors:  Zhonglong Zheng; Jie Yang
Journal:  Protein J       Date:  2010-04       Impact factor: 2.371

10.  MitoP2, an integrated database on mitochondrial proteins in yeast and man.

Authors:  C Andreoli; H Prokisch; K Hörtnagel; J C Mueller; M Münsterkötter; C Scharfe; T Meitinger
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.