Literature DB >> 17142221

Expanded protein information at SGD: new pages and proteome browser.

Robert Nash¹, Shuai Weng, Ben Hitz, Rama Balakrishnan, Karen R Christie, Maria C Costanzo, Selina S Dwight, Stacia R Engel, Dianna G Fisk, Jodi E Hirschman, Eurie L Hong, Michael S Livstone, Rose Oughtred, Julie Park, Marek Skrzypek, Chandra L Theesfeld, Gail Binkley, Qing Dong, Christopher Lane, Stuart Miyasato, Anand Sethuraman, Mark Schroeder, Kara Dolinski, David Botstein, J Michael Cherry.

Abstract

The recent explosion in protein data generated from both directed small-scale studies and large-scale proteomics efforts has greatly expanded the quantity of available protein information and has prompted the Saccharomyces Genome Database (SGD; http://www.yeastgenome.org/) to enhance the depth and accessibility of protein annotations. In particular, we have expanded ongoing efforts to improve the integration of experimental information and sequence-based predictions and have redesigned the protein information web pages. A key feature of this redesign is the development of a GBrowse-derived interactive Proteome Browser customized to improve the visualization of sequence-based protein information. This Proteome Browser has enabled SGD to unify the display of hidden Markov model (HMM) domains, protein family HMMs, motifs, transmembrane regions, signal peptides, hydropathy plots and profile hits using several popular prediction algorithms. In addition, a physico-chemical properties page has been introduced to provide easy access to basic protein information. Improvements to the layout of the Protein Information page and integration of the Proteome Browser will facilitate the ongoing expansion of sequence-specific experimental information captured in SGD, including post-translational modifications and other user-defined annotations. Finally, SGD continues to improve upon the availability of genetic and physical interaction data in an ongoing collaboration with BioGRID by providing direct access to more than 82,000 manually-curated interactions.

Entities: Gene Species

Mesh：

Substances：

Year: 2006 PMID： 17142221 PMCID： PMC1669759 DOI： 10.1093/nar/gkl931

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

The Saccharomyces Genome Database (SGD) collects, organizes and presents biological information about the genes and proteins of the budding yeast Saccharomyces cerevisiae. In 2003, in response to the community's needs for additional sequence-based predictive protein information, SGD introduced the Protein Information page, the PDB Homologs page, and the eMOTIF resource for the display of shared protein motifs (1). Since that time, there has been a marked increase in the number of studies focused on protein function, regulation and pathway/process involvement. These include a number of studies aimed at mapping the complete interactome by investigating the composition of protein complexes and specific protein–protein interactions, as well as other studies focused on proteome-wide post-translational modifications (2–6). Much of this research has become possible due to the increased use of proteome chip technologies, such as protein microarrays, as well as technological advances in mass spectrometry-based proteomics that result in increased sensitivity and higher throughput (7–9). To meet the needs of both the traditional biochemist and the proteomics researcher, we have improved the integration and display of protein data at SGD by redesigning protein information pages, introducing a new sequence-based visualization tool and utilizing improved algorithms for the calculation of predictive information based on primary amino acid sequence.

ORGANIZATION OF THE PROTEIN INFORMATION PAGE(S)

The new Protein Information pages () are accessible via the ‘Protein’ tab located at the top of all Locus Summary pages for protein-encoding genes. Each Protein Information page presents basic locus-specific protein information and provides access to detailed information regarding HMM domains, protein family HMMs, motifs and physico-chemical properties by clicking on a sub-tab. The types of information displayed on these pages are listed in Table 1 and discussed in more detail below.

Table 1

Summary of protein information available at SGD ()

Protein information page	Nomenclature fields
	Description fields
	Predicted protein sequence and basic information
	Proteome Browser
	Summary links (physical interactions, HMM domains and homologs)
Physico-chemical Properties page	Amino acid composition
	Atomic composition
	Extinction coefficient
	Aliphatic index
	Estimated half-life
	Instability Index
	Coding region translation calculations
Domains/Motifs page	InterPro-derived shared and unique domains
	TMHMM-predicted transmembrane domains
	SignalP-predicted signal peptides

Summary of protein information available at SGD () The main Protein Information page has been redesigned to provide basic protein information clearly and concisely with a familiar and readily navigable layout similar to that of the SGD Locus Summary page. The top section of the page, devoted to nomenclature, provides standard and systematic protein names plus any associated aliases. The nomenclature section is followed by several descriptive information fields including: Description, which provides a brief synopsis of the function and/or role of the gene product within the cell; Name Description, which contains the expanded form of the standard gene name acronym; and gene product, which describes the specific function of the protein when it is known. These fields have been recently reviewed and rewritten using a standard, consistent format so that they accurately reflect the current state of knowledge for each gene product. The references for this information are found at the bottom of the page. This section also includes basic information including the predicted length, molecular weight and isoelectric point of the protein. The middle of the Protein Information page includes a ‘thumbnail’ of the GBrowse graphical proteome viewer (see below), as well as links that summarize and provide access to other types of protein-related data such as physical interactions. These physical interactions are part of the extensive collection of genetic and physical interactions available as the result of an ongoing collaboration with BioGRID () and are mirrored by SGD. These data are manually curated from the primary literature using a standard format to describe the protein interactions and are updated monthly. As of September 2006, this collaboration documents 82 633 genetic and physical interactions (10,11). Towards the bottom of the Protein Information page are the complete amino acid sequence and links to sequence records held at various external sequence databases. This is followed by a list of external databases that provide additional protein information and a list of references for all basic information presented on the page.

GBROWSE GRAPHICAL PROTEOME VIEWER

To improve the visualization and navigability of sequence-based protein information, SGD has introduced an interactive Proteome Browser () (Figure 1) based on GBrowse, a genome browser developed by the Generic Model Organism Database (GMOD) project [; (12)]. The Proteome Browser has been customized to view protein-centric information and consolidates the display of multiple types of information including HMM domains, protein family HMMs, motifs, signal peptides, profile hits, and hydropathy plots by populating individual tracks with distinct information types. Pop-up mouseover functionality provides easy access to InterPro-derived HMM domain details including source, name, description, database identifier and E-value or score of matches. This Proteome Browser also supports the interactive feature of GBrowse that allows the addition of user-provided annotations.

Figure 1

The SGD Proteome Browser. The Pdr5p primary protein sequence is shown in the SGD Proteome Browser, a GBrowse-derived tool used to display predicted features, including: InterProScan-derived domains and motifs color-coded by HMM type, hidden Markov model-derived transmembrane domains (TMHMM), the Kyte-Doolittle hydropathy plot and profile hits. The Proteome Browser can also be used for the display of experimentally-derived features. HMM Domains are color-coded by originating resource: orange/yellows = Pfam, reds = Superfamily, purples = Gene3D, greens = Panther, blues = TIGRFAM and browns = SMART.

IMPROVED PREDICTIVE INFORMATION

In response to community feedback, SGD has updated the algorithms used for the prediction of protein functional HMM domains, transmembrane domains and signal peptides. Beginning in 2005, HMM domains, protein family HMMs and motifs in S.cerevisiae proteins were predicted by software and datasets assembled by the InterPro database, using InterProScan (13). This included several HMM packages such as Pfam, SMART, TIGRFAM, Panther, Gene3D and Superfamily (14). Pfam is a comprehensive collection of protein families and HMM domains represented by multiple sequence alignments and profile HMMs. SMART contains a smaller library of HMM domains found in signaling, extracellular and chromatin-associated proteins. TIGRFAM is a collection of manually-curated protein families based on multiple sequence alignments, HMMs and additional annotations. Panther contains a large collection of protein families, subfamilies and domains, classified by experts based on function. Finally, Gene3D and Superfamily contain a collection of HMMs based on structural classification in the CATH and SCOP databases, respectively. The InterPro-derived results have been expanded to include two profile-based methods: BlastProDom and ProfileScan (ProSite) (15,16). InterProScan results are updated quarterly. Transmembrane domains are now predicted using TMHMM, which has been independently rated as superior for predicting transmembrane helices (17,18). Finally, the presence and location of signal peptides are predicted using SignalP (v. 3.0), a popular method that uses either a neural network or an HMM (19). SGD is using the HMM method of SignalP analysis. All predictions of functional domains, transmembrane domains, and signal peptides are included for display in the Proteome Browser (see above), providing an integrated view of multiple HMM and domain prediction packages. On the Physico-chemical Properties page, we have included additional properties calculated by ExPASy's ProtParam tool, such as estimated protein half-life, instability index and extinction coefficient (20). A complete list of properties is available in Table 1.

SUMMARY

The increase in protein information generated using both traditional approaches and technology-driven large-scale studies has necessitated the expansion of protein annotation at SGD. The redesign of the main Protein Information page and Domains/Motifs page and the addition of a Physico-chemical Properties page have improved the organization of protein information. Furthermore, a Proteome Browser has been developed, enhancing the ability to visualize both predictive and experimentally-derived information. Finally, more robust algorithms have been employed to enhance the quality of sequence-based predictions at SGD. These changes will allow the integration of new data types in the future because the protein pages have been designed to be expandable. SGD is committed to increasing the ease of access to information about S.cerevisiae and welcomes all comments from the research community toward this end. Please send any suggestions about the Protein Information pages, the Proteome Browser or any other tool or resource at SGD to: yeast-curator@genome.stanford.edu.

19 in total

1. Improved prediction of signal peptides: SignalP 3.0.

Authors: Jannick Dyrløv Bendtsen; Henrik Nielsen; Gunnar von Heijne; Søren Brunak
Journal: J Mol Biol Date: 2004-07-16 Impact factor: 5.469

2. Global analysis of protein phosphorylation in yeast.

Authors: Jason Ptacek; Geeta Devgan; Gregory Michaud; Heng Zhu; Xiaowei Zhu; Joseph Fasolo; Hong Guo; Ghil Jona; Ashton Breitkreutz; Richelle Sopko; Rhonda R McCartney; Martin C Schmidt; Najma Rachidi; Soo-Jung Lee; Angie S Mah; Lihao Meng; Michael J R Stark; David F Stern; Claudio De Virgilio; Mike Tyers; Brenda Andrews; Mark Gerstein; Barry Schweitzer; Paul F Predki; Michael Snyder
Journal: Nature Date: 2005-12-01 Impact factor: 49.962

3. Proteome survey reveals modularity of the yeast cell machinery.

Authors: Anne-Claude Gavin; Patrick Aloy; Paola Grandi; Roland Krause; Markus Boesche; Martina Marzioch; Christina Rau; Lars Juhl Jensen; Sonja Bastuck; Birgit Dümpelfeld; Angela Edelmann; Marie-Anne Heurtier; Verena Hoffman; Christian Hoefert; Karin Klein; Manuela Hudak; Anne-Marie Michon; Malgorzata Schelder; Markus Schirle; Marita Remor; Tatjana Rudi; Sean Hooper; Andreas Bauer; Tewis Bouwmeester; Georg Casari; Gerard Drewes; Gitte Neubauer; Jens M Rick; Bernhard Kuster; Peer Bork; Robert B Russell; Giulio Superti-Furga
Journal: Nature Date: 2006-01-22 Impact factor: 49.962

4. A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae.

Authors: P Uetz; L Giot; G Cagney; T A Mansfield; R S Judson; J R Knight; D Lockshon; V Narayan; M Srinivasan; P Pochart; A Qureshi-Emili; Y Li; B Godwin; D Conover; T Kalbfleisch; G Vijayadamodar; M Yang; M Johnston; S Fields; J M Rothberg
Journal: Nature Date: 2000-02-10 Impact factor: 49.962

5. Profile analysis: detection of distantly related proteins.

Authors: M Gribskov; A D McLachlan; D Eisenberg
Journal: Proc Natl Acad Sci U S A Date: 1987-07 Impact factor: 11.205

6. A proteomics approach to understanding protein ubiquitination.

Authors: Junmin Peng; Daniel Schwartz; Joshua E Elias; Carson C Thoreen; Dongmei Cheng; Gerald Marsischky; Jeroen Roelofs; Daniel Finley; Steven P Gygi
Journal: Nat Biotechnol Date: 2003-07-20 Impact factor: 54.908

7. Global landscape of protein complexes in the yeast Saccharomyces cerevisiae.

Authors: Nevan J Krogan; Gerard Cagney; Haiyuan Yu; Gouqing Zhong; Xinghua Guo; Alexandr Ignatchenko; Joyce Li; Shuye Pu; Nira Datta; Aaron P Tikuisis; Thanuja Punna; José M Peregrín-Alvarez; Michael Shales; Xin Zhang; Michael Davey; Mark D Robinson; Alberto Paccanaro; James E Bray; Anthony Sheung; Bryan Beattie; Dawn P Richards; Veronica Canadien; Atanas Lalev; Frank Mena; Peter Wong; Andrei Starostine; Myra M Canete; James Vlasblom; Samuel Wu; Chris Orsi; Sean R Collins; Shamanta Chandran; Robin Haw; Jennifer J Rilstone; Kiran Gandi; Natalie J Thompson; Gabe Musso; Peter St Onge; Shaun Ghanny; Mandy H Y Lam; Gareth Butland; Amin M Altaf-Ul; Shigehiko Kanaya; Ali Shilatifard; Erin O'Shea; Jonathan S Weissman; C James Ingles; Timothy R Hughes; John Parkinson; Mark Gerstein; Shoshana J Wodak; Andrew Emili; Jack F Greenblatt
Journal: Nature Date: 2006-03-22 Impact factor: 49.962

8. Comprehensive curation and analysis of global interaction networks in Saccharomyces cerevisiae.

Authors: Teresa Reguly; Ashton Breitkreutz; Lorrie Boucher; Bobby-Joe Breitkreutz; Nizar N Batada; Gary C Hon; Chad L Myers; Ainslie Parsons; Helena Friesen; Rose Oughtred; Amy Tong; Chris Stark; Yuen Ho; David Botstein; Brenda Andrews; Charles Boone; Olga G Troyanskya; Trey Ideker; Kara Dolinski; Mike Tyers
Journal: J Biol Date: 2006-06-08

9. InterPro, progress and status in 2005.

Authors: Nicola J Mulder; Rolf Apweiler; Teresa K Attwood; Amos Bairoch; Alex Bateman; David Binns; Paul Bradley; Peer Bork; Phillip Bucher; Lorenzo Cerutti; Richard Copley; Emmanuel Courcelle; Ujjwal Das; Richard Durbin; Wolfgang Fleischmann; Julian Gough; Daniel Haft; Nicola Harte; Nicolas Hulo; Daniel Kahn; Alexander Kanapin; Maria Krestyaninova; David Lonsdale; Rodrigo Lopez; Ivica Letunic; Martin Madera; John Maslen; Jennifer McDowall; Alex Mitchell; Anastasia N Nikolskaya; Sandra Orchard; Marco Pagni; Chris P Ponting; Emmanuel Quevillon; Jeremy Selengut; Christian J A Sigrist; Ville Silventoinen; David J Studholme; Robert Vaughan; Cathy H Wu
Journal: Nucleic Acids Res Date: 2005-01-01 Impact factor: 16.971

10. InterProScan: protein domains identifier.

Authors: E Quevillon; V Silventoinen; S Pillai; N Harte; N Mulder; R Apweiler; R Lopez
Journal: Nucleic Acids Res Date: 2005-07-01 Impact factor: 16.971

47 in total

1. Eukaryotic genes of archaebacterial origin are more important than the more numerous eubacterial genes, irrespective of function.

Authors: James A Cotton; James O McInerney
Journal: Proc Natl Acad Sci U S A Date: 2010-09-17 Impact factor: 11.205

Review 2. Mitochondrial protein import: from proteomics to functional mechanisms.

Authors: Oliver Schmidt; Nikolaus Pfanner; Chris Meisinger
Journal: Nat Rev Mol Cell Biol Date: 2010-09 Impact factor: 94.444

Review 3. Identification of genes encoding tRNA modification enzymes by comparative genomics.

Authors: Valérie de Crécy-Lagard
Journal: Methods Enzymol Date: 2007 Impact factor: 1.600

Review 4. In silico characterization of proteins: UniProt, InterPro and Integr8.

Authors: Nicola Jane Mulder; Paul Kersey; Manuela Pruess; Rolf Apweiler
Journal: Mol Biotechnol Date: 2007-10-04 Impact factor: 2.695

5. Alternative splicing of PTC7 in Saccharomyces cerevisiae determines protein localization.

Authors: Kara Juneau; Corey Nislow; Ronald W Davis
Journal: Genetics Date: 2009-06-29 Impact factor: 4.562

6. Minimization of biosynthetic costs in adaptive gene expression responses of yeast to environmental changes.

Authors: Ester Vilaprinyo; Rui Alves; Albert Sorribas
Journal: PLoS Comput Biol Date: 2010-02-12 Impact factor: 4.475

7. An atlas of chaperone-protein interactions in Saccharomyces cerevisiae: implications to protein folding pathways in the cell.

Authors: Yunchen Gong; Yoshito Kakihara; Nevan Krogan; Jack Greenblatt; Andrew Emili; Zhaolei Zhang; Walid A Houry
Journal: Mol Syst Biol Date: 2009-06-16 Impact factor: 11.429

8. Bayesian modeling of the yeast SH3 domain interactome predicts spatiotemporal dynamics of endocytosis proteins.

Authors: Raffi Tonikian; Xiaofeng Xin; Christopher P Toret; David Gfeller; Christiane Landgraf; Simona Panni; Serena Paoluzi; Luisa Castagnoli; Bridget Currell; Somasekar Seshagiri; Haiyuan Yu; Barbara Winsor; Marc Vidal; Mark B Gerstein; Gary D Bader; Rudolf Volkmer; Gianni Cesareni; David G Drubin; Philip M Kim; Sachdev S Sidhu; Charles Boone
Journal: PLoS Biol Date: 2009-10-20 Impact factor: 8.029

9. Evidence for the adaptation of protein pH-dependence to subcellular pH.

Authors: Pedro Chan; Jim Warwicker
Journal: BMC Biol Date: 2009-10-22 Impact factor: 7.431

10. An analytical platform for mass spectrometry-based identification and chemical analysis of RNA in ribonucleoprotein complexes.

Authors: Masato Taoka; Yoshio Yamauchi; Yuko Nobe; Shunpei Masaki; Hiroshi Nakayama; Hideaki Ishikawa; Nobuhiro Takahashi; Toshiaki Isobe
Journal: Nucleic Acids Res Date: 2009-11 Impact factor: 16.971