Literature DB >> 15608220

The Yeast Resource Center Public Data Repository.

Michael Riffle¹, Lars Malmström, Trisha N Davis.

Abstract

The Yeast Resource Center Public Data Repository (YRC PDR) serves as a single point of access for the experimental data produced from many collaborations typically studying Saccharomyces cerevisiae (baker's yeast). The experimental data include large amounts of mass spectrometry results from protein co-purification experiments, yeast two-hybrid interaction experiments, fluorescence microscopy images and protein structure predictions. All of the data are accessible via searching by gene or protein name, and are available on the Web at http://www.yeastrc.org/pdr/.

Entities: Chemical Gene Species

Mesh：

Substances：

Year: 2005 PMID： 15608220 PMCID： PMC540027 DOI： 10.1093/nar/gki073

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

The Yeast Resource Center (YRC) is an NCRR Biomedical Technology Resource Center that provides expertise and otherwise costly tools of research to scientists and students worldwide. This is accomplished via collaborations and technology development projects—with 231 such collaborations having been submitted since the beginning of 2002. The collaborations focus mainly on the study of Saccharomyces cerevisiae via four primary areas of expertise provided by the YRC: mass spectrometry, yeast two-hybrid arrays, deconvolution fluorescence microscopy and protein structure prediction. The YRC investigators, who have been responsible for fulfilling collaboration requests are Dr John Yates, Dr Ruedi Aebersold (mass spectrometry), Dr Stanley Fields (yeast two-hybrid), Dr Trisha Davis, Dr Eric Muller (fluorescence microscopy) and Dr David Baker (protein structure prediction). Collaborative projects can involve multiple experiments carried out in one or more of these four areas. All four areas can produce large amounts of data—not all of which are necessarily used in the course of publication by the collaborator. In addition, not all collaborations necessarily lead to a publication; but data produced through the collaboration may be valuable and useful. The YRC makes available both the published and unpublished data through the YRC Public Data Repository (PDR) to the community at large. Perhaps the most significant aspect of the YRC PDR is that it releases all of the data at a single point of access, bringing together the experimental data from many research projects into one consolidated searchable database accessible through the Web. Instead of going from website to website supporting individual papers, one can easily search the experimental data for multiple papers at once and view the results in a single interface. As more datasets from research collaborations with the YRC become public, the database will continue to grow and become an increasingly significant asset to the research community.

THE CONTENTS OF THE YRC PDR DATABASE

At the time of this writing, the YRC PDR includes data from six collaborative projects—including four publications (1–4). This includes mass spectrometry data collected through protein co-purification experiments, yeast two-hybrid protein interaction data, fluorescent microscopy images and protein structure prediction data. Protein structures are predicted for protein domains, as parsed from the Ginzu algorithm (5). Ab initio structure predictions are available as Protein Data Bank (PDB) (6) formatted text, as generated using the Rosetta de novo structure prediction method (7–10). In addition, the database includes images taken from silver-stained polyacrylamide gels of samples produced from protein co-purification experiments; and links to descriptions of the protocol used for the purification. The breakdown of the amount of data presently included in the database is summarized in Table 1.

Table 1.

A summary of the quantity of the different types of data currently available in the database

Mass spectrometry data
Total runs	119
Total unique proteins identified	3138
Total peptides identified	41 397
Total gel images	45
Yeast two-hybrid data
Total baits with significant hits	409
Total unique ORFs with significant hits	1373
Total unique significant interactions	2031
Fluorescence microscopy data
Total unique proteins localized	122
Total full-field images	767
Total selected region images	877
Protein structure prediction
Total ORFs with structure data	145
Total domains with structure data	255
Total ab initio structures	850 (for 86 domains from 63 proteins)

THE YRC PDR WEB INTERFACE

We have developed a simple-to-use web interface to the YRC PDR database. The primary means of interacting with the data is to perform searches based on systematic open reading frame (ORF) or gene names. Gene names are mapped onto systematic ORF names through the publicly available Saccharomyces Genome Database (11,12). Searching will bring the user to a page displaying an overall summary of all the experimental data we have for a given ORF. An example search result is given in Figure 1. This ‘ORF Overview Page’ is separated into five sections, from which the user can view the Gene Ontology (13) description for the ORF and jump to experimental data view pages for each of the four types of data. Each of these data view pages is tailored to a specific kind of data and each has its own features that are described below. All data are clearly labeled according to publication(s) for which they were produced. In addition, data not used in any publication are clearly labeled as unpublished.

Figure 1

A screen capture of the ‘ORF Overview Page’ for the S.cerevisiae gene NSL1. This screen illustrates the result of searching for NSL1 or YPL233w. The page is separated into five distinct sections—Gene Ontology annotations, mass spectrometry, localization, yeast two-hybrid and protein structure prediction. Each section contains a summary of the experimental data relevant to NSL1 and provides links to the data.

Mass spectrometry data

From the ORF Overview Page's mass spectrometry section, the user is presented with several links for viewing the mass spectrometry data. View Protocol link: This provides the user with a text description of the protocol used for a particular protein purification, if the protocol is available. Bait ORF link: This lists the actual purified protein and a link to that protein's ORF Overview Page. Whenever the name of an ORF is given in the website, it is linked to that ORF's overview page. View Gel link: If the protein sample was subjected to electrophoresis on an SDS polyacrylamide gel, this link will be present and will provide an image of the silver-stained gel. View Run link: This is a link to the results from the analysis of the protein by mass spectrometry. The data include a filtered and formatted listing produced from the DTASelect algorithm (14). The data are presented as a list of systematic ORF names for proteins that are co-purified with the bait protein, along with its sequence coverage, number of peptides, spectrum count and molecular weight. A guideline for interpretation of these columns is provided on this page. For each ORF listed, there is a link for viewing the peptides that were used to make that identification. The list of ORFs and the peptide lists may be downloaded as tab-delimited text files from the site. An example of the page displaying mass spectrometry data is provided in Figure 2.

Figure 2

A screen capture of the mass spectrometry data view page. Listed are the ORFs identified through mass spectrometry as having co-purified with the bait ORF, along with experimental data and links to peptide information.

Fluorescence microscopy (localization) data

The ORF Overview Page's localization section allows the user to view fluorescence microscopy images of each protein tagged with a fluorescent protein such as green fluorescent protein. All localization experiments involving this ORF are clearly listed here. The ‘View Images’ link provides the means to view all images from the localization experiment, the experimental parameters used to create these images and the localization determination expressed as a cellular component term from Gene Ontology.

Yeast two-hybrid data

The ORF Overview Page's yeast two-hybrid section provides the means to quickly jump to and view results from all yeast two-hybrid screens in which the ORF of interest was bait or prey. Screen results display the prey ORF as well as the number of hits. A number of hits greater than one are considered significant, but single hits are shown for completeness. The results from these screens are also available for download as a tab-delimited text file.

Protein structure prediction data

If structure prediction data are available for an ORF, the protein structure prediction section provides a list of computationally derived domains for the ORF. This section will give the start and stop residue for each domain, the source of the structure in the database and a link to structural information for that domain. The information in these structure links is tailored to how the structure was derived. Domains, for which the structures were obtained through ab initio prediction, will contain links to the top ten predicted structures. These structures are viewable in the site itself via the WebMol Java applet (15). The structures are also downloadable as PDB text files.

AVAILABILITY

The contents of the YRC PDR are available on the Web at http://www.yeastrc.org/pdr/. From this URL the contents of the database can be viewed as HTML pages, as well as tab-delimited text files when applicable. The entire published datasets of yeast two-hybrid and mass spectrometry run results are available as tab-delimited text files, linked from the front page. The unpublished datasets are available upon request. These tab-delimited text files can easily be imported into Microsoft Excel, as well as other spreadsheet and data software.

FUTURE DIRECTIONS

The YRC will likely expand beyond providing collaborations and technology development in only these four current areas of expertise. As a result, the type of experimental data available in the YRC PDR database will also expand. Currently, the YRC PDR only includes experimental data covering S.cerevisiae. The YRC has broadened its scope and has begun participating in collaborations involving other organisms. As a result, the YRC PDR will contain data from protein experiments involving multiple organisms. Given these two main points and the fact that the YRC PDR will continue to expand by the addition of data from more and more collaborations, the functionality of the interface will be expanded to include more sophisticated searching tools, such as searching only published data, searching by species and searching by protein or gene sequence. User-controlled filters will be added to the mass spectrometry results in order to facilitate the user in identifying more meaningful results. In addition, a probability-based algorithm for analyzing multiple mass spectrometry that runs simultaneously will be added to the site, allowing the user to discover probable protein complexes.

15 in total

1. The Protein Data Bank.

Authors: H M Berman; J Westbrook; Z Feng; G Gilliland; T N Bhat; H Weissig; I N Shindyalov; P E Bourne
Journal: Nucleic Acids Res Date: 2000-01-01 Impact factor: 16.971

2. Rosetta in CASP4: progress in ab initio protein structure prediction.

Authors: R Bonneau; J Tsai; I Ruczinski; D Chivian; C Rohl; C E Strauss; D Baker
Journal: Proteins Date: 2001

3. Automated prediction of CASP-5 structures using the Robetta server.

Authors: Dylan Chivian; David E Kim; Lars Malmström; Philip Bradley; Timothy Robertson; Paul Murphy; Charles E M Strauss; Richard Bonneau; Carol A Rohl; David Baker
Journal: Proteins Date: 2003

4. DTASelect and Contrast: tools for assembling and comparing protein identifications from shotgun proteomics.

Authors: David L Tabb; W Hayes McDonald; John R Yates
Journal: J Proteome Res Date: 2002 Jan-Feb Impact factor: 4.466

5. A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae.

Authors: P Uetz; L Giot; G Cagney; T A Mansfield; R S Judson; J R Knight; D Lockshon; V Narayan; M Srinivasan; P Pochart; A Qureshi-Emili; Y Li; B Godwin; D Conover; T Kalbfleisch; G Vijayadamodar; M Yang; M Johnston; S Fields; J M Rothberg
Journal: Nature Date: 2000-02-10 Impact factor: 49.962

6. Saccharomyces Genome Database (SGD) provides tools to identify and analyze sequences from Saccharomyces cerevisiae and related sequences from other organisms.

Authors: Karen R Christie; Shuai Weng; Rama Balakrishnan; Maria C Costanzo; Kara Dolinski; Selina S Dwight; Stacia R Engel; Becket Feierbach; Dianna G Fisk; Jodi E Hirschman; Eurie L Hong; Laurie Issel-Tarver; Robert Nash; Anand Sethuraman; Barry Starr; Chandra L Theesfeld; Rey Andrada; Gail Binkley; Qing Dong; Christopher Lane; Mark Schroeder; David Botstein; J Michael Cherry
Journal: Nucleic Acids Res Date: 2004-01-01 Impact factor: 16.971

7. Localization of proteins that are coordinately expressed with Cln2 during the cell cycle.

Authors: Bryan A Sundin; Chun-Hwei Chiu; Michael Riffle; Trisha N Davis; Eric G D Muller
Journal: Yeast Date: 2004-07-15 Impact factor: 3.239

8. Saccharomyces genome database: underlying principles and organisation.

Authors: Selina S Dwight; Rama Balakrishnan; Karen R Christie; Maria C Costanzo; Kara Dolinski; Stacia R Engel; Becket Feierbach; Dianna G Fisk; Jodi Hirschman; Eurie L Hong; Laurie Issel-Tarver; Robert S Nash; Anand Sethuraman; Barry Starr; Chandra L Theesfeld; Rey Andrada; Gail Binkley; Qing Dong; Christopher Lane; Mark Schroeder; Shuai Weng; David Botstein; J Michael Cherry
Journal: Brief Bioinform Date: 2004-03 Impact factor: 11.622

9. Assigning function to yeast proteins by integration of technologies.

Authors: Tony R Hazbun; Lars Malmström; Scott Anderson; Beth J Graczyk; Bethany Fox; Michael Riffle; Bryan A Sundin; J Derringer Aranda; W Hayes McDonald; Chun-Hwei Chiu; Brian E Snydsman; Phillip Bradley; Eric G D Muller; Stanley Fields; David Baker; John R Yates; Trisha N Davis
Journal: Mol Cell Date: 2003-12 Impact factor: 17.970

10. A protein interaction map for cell polarity development.

Authors: B L Drees; B Sundin; E Brazeau; J P Caviston; G C Chen; W Guo; K G Kozminski; M W Lau; J J Moskow; A Tong; L R Schenkman; A McKenzie; P Brennwald; M Longtine; E Bi; C Chan; P Novick; C Boone; J R Pringle; T N Davis; S Fields; D G Drubin
Journal: J Cell Biol Date: 2001-08-06 Impact factor: 10.539

21 in total

1. HOPS prevents the disassembly of trans-SNARE complexes by Sec17p/Sec18p during membrane fusion.

Authors: Hao Xu; Youngsoo Jun; James Thompson; John Yates; William Wickner
Journal: EMBO J Date: 2010-05-14 Impact factor: 11.598

2. Prp40 Homolog A Is a Novel Centrin Target.

Authors: Adalberto Díaz Casas; Walter J Chazin; Belinda Pastrana-Ríos
Journal: Biophys J Date: 2017-06-20 Impact factor: 4.033

3. A mass spectrometry proteomics data management platform.

Authors: Vagisha Sharma; Jimmy K Eng; Michael J Maccoss; Michael Riffle
Journal: Mol Cell Proteomics Date: 2012-05-18 Impact factor: 5.911

4. Markov chain Monte Carlo simulation of a Bayesian mixture model for gene network inference.

Authors: Younhee Ko; Jaebum Kim; Sandra L Rodriguez-Zas
Journal: Genes Genomics Date: 2019-02-11 Impact factor: 1.839

5. Transmembrane topology and signal peptide prediction using dynamic bayesian networks.

Authors: Sheila M Reynolds; Lukas Käll; Michael E Riffle; Jeff A Bilmes; William Stafford Noble
Journal: PLoS Comput Biol Date: 2008-11-07 Impact factor: 4.475

Review 6. Proteomics of plant pathogenic fungi.

Authors: Raquel González-Fernández; Elena Prats; Jesús V Jorrín-Novo
Journal: J Biomed Biotechnol Date: 2010-05-27

7. Metadata matters: access to image data in the real world.

Authors: Melissa Linkert; Curtis T Rueden; Chris Allan; Jean-Marie Burel; Will Moore; Andrew Patterson; Brian Loranger; Josh Moore; Carlos Neves; Donald Macdonald; Aleksandra Tarkowska; Caitlin Sticco; Emma Hill; Mike Rossner; Kevin W Eliceiri; Jason R Swedlow
Journal: J Cell Biol Date: 2010-05-31 Impact factor: 10.539

8. Large-scale prediction of protein-protein interactions from structures.

Authors: Martial Hue; Michael Riffle; Jean-Philippe Vert; William S Noble
Journal: BMC Bioinformatics Date: 2010-03-18 Impact factor: 3.169

9. The Yeast Resource Center Public Image Repository: A large database of fluorescence microscopy images.

Authors: Michael Riffle; Trisha N Davis
Journal: BMC Bioinformatics Date: 2010-05-19 Impact factor: 3.169

Review 10. Proteomics data repositories: providing a safe haven for your data and acting as a springboard for further research.

Authors: Juan Antonio Vizcaíno; Joseph M Foster; Lennart Martens
Journal: J Proteomics Date: 2010-07-06 Impact factor: 4.044