Literature DB >> 17965093

The Molecule Pages database.

Brian Saunders¹, Stephen Lyon, Matthew Day, Brenda Riley, Emily Chenette, Shankar Subramaniam, Ilango Vadivelu.

Abstract

The UCSD-Nature Signaling Gateway Molecule Pages (http://www.signaling-gateway.org/molecule) provides essential information on more than 3800 mammalian proteins involved in cellular signaling. The Molecule Pages contain expert-authored and peer-reviewed information based on the published literature, complemented by regularly updated information derived from public data source references and sequence analysis. The expert-authored data includes both a full-text review about the molecule, with citations, and highly structured data for bioinformatics interrogation, including information on protein interactions and states, transitions between states and protein function. The expert-authored pages are anonymously peer reviewed by the Nature Publishing Group. The Molecule Pages data is present in an object-relational database format and is freely accessible to the authors, the reviewers and the public from a web browser that serves as a presentation layer. The Molecule Pages are supported by several applications that along with the database and the interfaces form a multi-tier architecture. The Molecule Pages and the Signaling Gateway are routinely accessed by a very large research community.

Entities: Chemical Disease Gene Species

Mesh：

Substances：

Year: 2007 PMID： 17965093 PMCID： PMC2238911 DOI： 10.1093/nar/gkm907

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

The UCSD-Nature Signaling Gateway (http://www.signaling-gateway.org) is a collaboration between the University of California, San Diego and the Nature Publishing Group (NPG), designed to facilitate navigation of the complex world of research into cellular signaling. The Signaling Gateway is made up of three components: the Molecule Pages (described in this study), the Signaling Update and the Data Center. The Signaling Update is published weekly by the NPG to provide topical and timely information about progress in signal transduction research. The Signaling Gateway was formerly sponsored by the Alliance for Cellular Signaling (AfCS) (1,2), which performed comprehensive experimental analyses of selected signaling systems. The Data Center section of the web site contains all the data generated by the AfCS during the period Signaling Gateway was part of the AfCS. The Signaling Gateway Molecule Pages (SGMP) database provides essential information on more than 3800 proteins involved in cellular signaling in mammals, with each protein having its own Molecule Page. Molecule Page information is presented in two categories: author-entered data and automated data. Author-entered data contain expert-authored and peer-reviewed information based on published literature, with both review style free text with citations, and highly structured data for bioinformatic interrogation. The information is linked to appropriate journal citations, and covers areas such as protein interactions and states, transitions between states and protein function. The automated data information is a collection of public bioinformatic data source links and sequence analysis results, derived from the sequence and data record used to define the Molecule Page. The SGMP base organism is Mus musculus (because of the mouse-centric focus of the AfCS), though much of the information in the SGMP is derived from homologous proteins in other species, such as Homo sapiens. Once the author-entered information has gone through a peer review process, the Molecule Page is published. The published Molecule Pages are citable and to date NPG has published entries for 365 proteins, with nearly 130 submitted Molecule Pages currently in the review system, and 350 Molecule Pages in author preparation. New published Molecule Pages are promoted through the Signaling Update website pages, e-alerts and linkouts from NPG content.

DATABASE CONTENT

The SGMP is a complicated online annotation and publishing system, containing three major subcomponents: (i) online pathway curation (author-entered data); (ii) online peer review and (iii) public repository data acquisition and display (automated data). The peer review information and the pre-published author-entered information for a given Molecule Page are only visible to the author, selected reviewers and the editorial staff, and are invisible to all other users. The automated data for each Molecule Page is visible to all users. Each Molecule Page is assigned a specific protein sequence, a name, a list of synonyms and a specific protein function category (based on ‘best fit’). That information is used to generate the properties of the sequence such as molecular weight, and all the automated data associated with the sequence. A combination of database links and computational methods are used to find the related database records and the parameters of computational matches (e.g. a domain region). This information is displayed in the ‘Protein Overview’ section of the Molecule Page, which is the landing page for unpublished Molecule Pages.

Author-entered data

To illustrate the depth of the author-entered data, we choose Adenylyl cyclase type 5 (SGMP ID A000001). Because this is a published Molecule Page, the user first arrives at the ‘Abstract’ section, which gives information on the author (Carmen W. Dessauer), gives a summary of the role of Adenylyl cyclase type 5, lists the names and synonyms provided by the author and the editors, indicates that A000001 molecule has 32 enzyme functions, exists in 33 states, has 96 transitions between these states, and shows a miniature version of the network map of these transitions. The ‘Full Text’ section contains a textual description—with published references—of protein function, regulation, interactions, subcellular localization, expression, phenotypes, splice variants and antibodies. The ‘States’ section lists each defined functional state, with links to a constituent list and a transition graph (if applicable) to indicate all the transitions that lead to the state. A protein state is defined by the principal proteins interactions with other protein partners, covalent modifications on all protein components, association with small molecule ligands and cellular location. The ‘Transitions’ section shows a list of the defined transitions, with a link to detailed information on each transition—with initial and final state information, the change that occurred in the transition, process information, other comments and citations (Figure 1). A transition is defined as a biological process that causes the conversion of a protein from one state to another. The ‘Network Map’ gives a graphical representation of all the states, and the transitions between them, defined by the author for A000001 (Figure 2). The ‘Functions’ section shows that A000001 acts as an enzyme, catalyzing the conversion of Mg-ATP to cyclic AMP and pyrophosphate. Each state that catalyzes the reaction is listed, with a link to the detailed state information, and a link to detailed function information with reaction information, comments and citations (Figure 3). The ‘Protein Classes’ section shows classes defined by the author to aid in data entry and display—a class is defined as a group of three or more proteins that behave identically in a particular state.

Figure 1.

Example state transition for Adenylyl cyclase type 5 (A000001).

Figure 2.

Network map for Adenylyl cyclase type 5 (A000001).

Figure 3.

Example function for Adenylyl cyclase type 5 (A000001).

Example state transition for Adenylyl cyclase type 5 (A000001). Network map for Adenylyl cyclase type 5 (A000001). Example function for Adenylyl cyclase type 5 (A000001).

Automated data

The ‘Protein Records’ section displays all the sequence database records related to particular Molecule Page. A specific record, defined by an NCBI protein GI number (3), is assigned to the Molecule Page as a base sequence. All the other sequence records listed in the same Entrez Gene (4) record are displayed, as well as any UniProt (5) and Ensembl (6) records that refer to those sequence records. The records are grouped by their specific sequence. The ‘Gene Info’ page displays pertinent information related to the Molecule from the Entrez Gene record, including any related Ensembl gene records or the sequence records within it. The ‘Domains & Motifs’ (Figure 4) section contains domain information Pfam (7) and Smart (8), pattern/motif information from PRINTS (9) and InterPRO (10) records related to the Molecule Page sequence. These records are produced using a combination of database record references, computational schemes [hmmpfam (11) and FingerPRINTScan (12)]. Matching sequence regions are given for the computational matches. The ‘Interactions’ page displays matching interaction database records from the BIND database and from the Entrez Gene database, including BIND (13), BioGRID (14) and HPRD (15) interaction records. Interactions involving likely orthologs of the Molecule Page base sequence are displayed, to provide additional information. The ‘Orthologs’ section shows genes in other select organisms that are likely orthologs of the base gene. This list is constructed using a combination of a species-specific Blast against the NCBI protein database, and database analysis of HomoloGene (16) and Ensembl homology databases. The ‘Blast Data’ section contains a list of the top Blast hits against the entire NCBI protein database. The Protein ‘Structure’ section displays the PDB (17) records that are related to the Molecule Page, either through a database reference to one of the related protein sequence records, or by a sequence match with Blast.

Figure 4.

Domain/Motif map for Syk (A000040).

Domain/Motif map for Syk (A000040). In addition to all the hyperlinks to the relevant bioinformatic databases, the SGMP base sequence also links directly to the SDSC Biology Workbench (http://workbench.sdsc.edu). The Biology Workbench (18) enables a user to carry out seamlessly a variety of sequence analysis operations. The link is located on the ‘Protein Overview’ page.

EDITORIAL PROCESS

NPG and UCSD are assisted by a scientific Advisory Board and an Editorial Board. The Advisory Board provides high-level guidance and advice concerning the development of the Molecule Pages database. The Editorial Board helps the editorial team with several aspects of publishing Molecule Pages, such as identifying relevant authors, reviewers and adjudication during peer-review. NPG manages a rigorous editorial process to ensure that expert-authored Molecule Pages are accurate and complete, and that structured data is recorded in a consistent manner. Authors either apply to contribute and are selected by NPG editors, or are commissioned by editors. After an initial editorial evaluation of an author's submission, the Molecule Page undergoes anonymous peer-review by two or three experts in the relevant field. Following peer-review the author may be required to revise their submission in light of reviewer and editor comments. Following revision the Molecule Page is critically assessed and a decision to publish is made. The Molecule Page is copy edited before publication and a Digital Object Identifier (DOI) assigned upon publication.

DATABASE IMPLEMENTATION

The SGMP is a multi-tier Enterprise Java web application. The database tier is an Oracle 10g database instance running on a Sun server. The middle tier contains business and web components and is deployed on an Oracle Components 4 Java (OC4J) application server that also runs on a Sun server. The client tier consists of a web browser running on the user's machine. The business components consist of data access objects that encapsulate database access for the web layer. The web layer consists of Java servlets and server pages complying with the J2EE (Enterprise Java) 1.3 specifications. The compute and database servers are located at the San Diego Supercomputer Center (SDSC). SDSC provides hardware, network and system administrative support. In addition to the web forms that the general public uses for viewing the data, there are specialized web forms for the authors, reviewers and editors to perform their tasks. A password-protected user access system controls access to the specialized forms and the unpublished data for a given Molecule Page, but the general public is able to access any published data and all automated data without having to register for an account. Registration to the Signaling Gateway is, and always will be, free. The automated data is calculated monthly. Local copies of the constituent databases are stored in custom, relational forms on the Oracle system, with the computational methods being run on Sun systems. The results of the automated data are stored in the database tier, along with the archived results of automated analysis at the time of publication. Tab-delimited files relating the Molecule Pages to protein sequence database records (e.g. UniProt, Refseq, Genbank) and gene database records (Entrez Gene and Ensembl) are provided via anonymous ftp. The data is accessed via a browse function, a simple search engine and an advanced search engine. The simple search engine allows users to query the database using the Molecule Page ID, gene symbols, protein names and synonyms. The advanced search engine allows users to ask complex questions, such as ‘show me all functional states involving Molecule A and Molecule B.’ The advanced search engine uses the Lucene library (http://lucene.apache.org/) from the Apache Software Foundation (http://www.apache.org/)—an open-source Java toolset.

FUTURE DIRECTIONS

A wiki will be added to the web site, allowing for open comments on any given Molecule Page, as well as a living molecule summary that does not require the rigorous process of a published Molecule Page. Previously published Molecule Pages will be updated by the authors, and released as subsequent versions. We plan on adding exportable Molecule Page information, for published Molecule Pages, in XML form, as well as standard exchange formats such as BioPAX (http://www.biopax.org/). We will continue to add to the links provided in the automated data sections, addressing areas such as gene expression and phosphorylation.

18 in total

1. FingerPRINTScan: intelligent searching of the PRINTS motif database.

Authors: P Scordis; D R Flower; T K Attwood
Journal: Bioinformatics Date: 1999-10 Impact factor: 6.937

2. PRINTS and PRINTS-S shed light on protein ancestry.

Authors: T K Attwood; M J Blythe; D R Flower; A Gaulton; J E Mabey; N Maudling; L McGregor; A L Mitchell; G Moulton; K Paine; P Scordis
Journal: Nucleic Acids Res Date: 2002-01-01 Impact factor: 16.971

3. The Molecule Pages database.

Authors: Joshua Li; Yuhong Ning; Warren Hedley; Brian Saunders; Yongsheng Chen; Nicole Tindill; Timo Hannay; Shankar Subramaniam
Journal: Nature Date: 2002-12-12 Impact factor: 49.962

4. Overview of the Alliance for Cellular Signaling.

Authors: Alfred G Gilman; Melvin I Simon; Henry R Bourne; Bruce A Harris; Rochelle Long; Elliott M Ross; James T Stull; Ronald Taussig; Henry R Bourne; Adam P Arkin; Melanie H Cobb; Jason G Cyster; Peter N Devreotes; James E Ferrell; David Fruman; Michael Gold; Arthur Weiss; James T Stull; Michael J Berridge; Lewis C Cantley; William A Catterall; Shaun R Coughlin; Eric N Olson; Temple F Smith; Joan S Brugge; David Botstein; Jack E Dixon; Tony Hunter; Robert J Lefkowitz; Anthony J Pawson; Paul W Sternberg; Harold Varmus; Shankar Subramaniam; Robert S Sinkovits; Joshua Li; Dennis Mock; Yuhong Ning; Brian Saunders; Paul C Sternweis; Donald Hilgemann; Richard H Scheuermann; Dianne DeCamp; Robert Hsueh; Keng-Mean Lin; Yan Ni; William E Seaman; Paul C Simpson; Timothy D O'Connell; Tamara Roach; Melvin I Simon; Sangdun Choi; Pamela Eversole-Cire; Iain Fraser; Marc C Mumby; Yingming Zhao; Deirdre Brekken; Hongjun Shu; Tobias Meyer; Grischa Chandy; Won Do Heo; Jen Liou; Nancy O'Rourke; Mary Verghese; Susanne M Mumby; Heping Han; H Alex Brown; Jeffrey S Forrester; Pavlina Ivanova; Stephen B Milne; Patrick J Casey; T Kendall Harden; Adam P Arkin; John Doyle; Martha L Gray; Tobias Meyer; Stephen Michnick; Martin A Schmidt; Mehmet Toner; Roger Y Tsien; Madhusudan Natarajan; Rama Ranganathan; Gilberto R Sambrano
Journal: Nature Date: 2002-12-12 Impact factor: 49.962

5. BIND: the Biomolecular Interaction Network Database.

Authors: Gary D Bader; Doron Betel; Christopher W V Hogue
Journal: Nucleic Acids Res Date: 2003-01-01 Impact factor: 16.971

6. The Biology Workbench--a seamless database and analysis environment for the biologist.

Authors: S Subramaniam
Journal: Proteins Date: 1998-07-01

7. Pfam: multiple sequence alignments and HMM-profiles of protein domains.

Authors: E L Sonnhammer; S R Eddy; E Birney; A Bateman; R Durbin
Journal: Nucleic Acids Res Date: 1998-01-01 Impact factor: 16.971

8. Human protein reference database--2006 update.

Authors: Gopa R Mishra; M Suresh; K Kumaran; N Kannabiran; Shubha Suresh; P Bala; K Shivakumar; N Anuradha; Raghunath Reddy; T Madhan Raghavan; Shalini Menon; G Hanumanthu; Malvika Gupta; Sapna Upendran; Shweta Gupta; M Mahesh; Bincy Jacob; Pinky Mathew; Pritam Chatterjee; K S Arun; Salil Sharma; K N Chandrika; Nandan Deshpande; Kshitish Palvankar; R Raghavnath; R Krishnakanth; Hiren Karathia; B Rekha; Rashmi Nayak; G Vishnupriya; H G Mohan Kumar; M Nagini; G S Sameer Kumar; Rojan Jose; P Deepthi; S Sujatha Mohan; T K B Gandhi; H C Harsha; Krishna S Deshpande; Malabika Sarker; T S Keshava Prasad; Akhilesh Pandey
Journal: Nucleic Acids Res Date: 2006-01-01 Impact factor: 16.971

9. SMART 5: domains in the context of genomes and networks.

Authors: Ivica Letunic; Richard R Copley; Birgit Pils; Stefan Pinkert; Jörg Schultz; Peer Bork
Journal: Nucleic Acids Res Date: 2006-01-01 Impact factor: 16.971

10. Pfam: clans, web tools and services.

Authors: Robert D Finn; Jaina Mistry; Benjamin Schuster-Böckler; Sam Griffiths-Jones; Volker Hollich; Timo Lassmann; Simon Moxon; Mhairi Marshall; Ajay Khanna; Richard Durbin; Sean R Eddy; Erik L L Sonnhammer; Alex Bateman
Journal: Nucleic Acids Res Date: 2006-01-01 Impact factor: 16.971

7 in total

1. FNV: light-weight flash-based network and pathway viewer.

Authors: Ruth Dannenfelser; Alexander Lachmann; Mariola Szenk; Avi Ma'ayan
Journal: Bioinformatics Date: 2011-02-23 Impact factor: 6.937

Review 2. Bioinformatics and systems biology of the lipidome.

Authors: Shankar Subramaniam; Eoin Fahy; Shakti Gupta; Manish Sud; Robert W Byrnes; Dawn Cotter; Ashok Reddy Dinasarapu; Mano Ram Maurya
Journal: Chem Rev Date: 2011-09-23 Impact factor: 60.622

3. A cellular and bioinformatics analysis of the SENP1 SUMO isopeptidase in pancreatic cancer.

Authors: Danielle M Bouchard; Michael J Matunis
Journal: J Gastrointest Oncol Date: 2019-10

4. Sprouty proteins inhibit receptor-mediated activation of phosphatidylinositol-specific phospholipase C.

Authors: Simge Akbulut; Alagarsamy L Reddi; Priya Aggarwal; Charuta Ambardekar; Barbara Canciani; Marianne K H Kim; Laura Hix; Tomas Vilimas; Jacqueline Mason; M Albert Basson; Matthew Lovatt; Jonathan Powell; Samuel Collins; Steven Quatela; Mark Phillips; Jonathan D Licht
Journal: Mol Biol Cell Date: 2010-08-18 Impact factor: 4.138

5. Cancer associated fibroblasts have phenotypic and functional characteristics similar to the fibrocytes that represent a novel MDSC subset.

Authors: Gurcan Gunaydin; S Altug Kesikli; Dicle Guc
Journal: Oncoimmunology Date: 2015-05-27 Impact factor: 8.110

6. Peripheral blood mononuclear cell gene expression profiles predict poor outcome in idiopathic pulmonary fibrosis.

Authors: Jose D Herazo-Maya; Imre Noth; Steven R Duncan; Sunghwan Kim; Shwu-Fan Ma; George C Tseng; Eleanor Feingold; Brenda M Juan-Guardela; Thomas J Richards; Yves Lussier; Yong Huang; Rekha Vij; Kathleen O Lindell; Jianmin Xue; Kevin F Gibson; Steven D Shapiro; Joe G N Garcia; Naftali Kaminski
Journal: Sci Transl Med Date: 2013-10-02 Impact factor: 17.956

7. Causal biological network database: a comprehensive platform of causal biological network models focused on the pulmonary and vascular systems.

Authors: Stéphanie Boué; Marja Talikka; Jurjen Willem Westra; William Hayes; Anselmo Di Fabio; Jennifer Park; Walter K Schlage; Alain Sewer; Brett Fields; Sam Ansari; Florian Martin; Emilija Veljkovic; Renee Kenney; Manuel C Peitsch; Julia Hoeng
Journal: Database (Oxford) Date: 2015-04-17 Impact factor: 3.451

7 in total