Literature DB >> 16381900

Human protein reference database--2006 update.

Gopa R Mishra1, M Suresh, K Kumaran, N Kannabiran, Shubha Suresh, P Bala, K Shivakumar, N Anuradha, Raghunath Reddy, T Madhan Raghavan, Shalini Menon, G Hanumanthu, Malvika Gupta, Sapna Upendran, Shweta Gupta, M Mahesh, Bincy Jacob, Pinky Mathew, Pritam Chatterjee, K S Arun, Salil Sharma, K N Chandrika, Nandan Deshpande, Kshitish Palvankar, R Raghavnath, R Krishnakanth, Hiren Karathia, B Rekha, Rashmi Nayak, G Vishnupriya, H G Mohan Kumar, M Nagini, G S Sameer Kumar, Rojan Jose, P Deepthi, S Sujatha Mohan, T K B Gandhi, H C Harsha, Krishna S Deshpande, Malabika Sarker, T S Keshava Prasad, Akhilesh Pandey.   

Abstract

Human Protein Reference Database (HPRD) (http://www.hprd.org) was developed to serve as a comprehensive collection of protein features, post-translational modifications (PTMs) and protein-protein interactions. Since the original report, this database has increased to >20 000 proteins entries and has become the largest database for literature-derived protein-protein interactions (>30 000) and PTMs (>8000) for human proteins. We have also introduced several new features in HPRD including: (i) protein isoforms, (ii) enhanced search options, (iii) linking of pathway annotations and (iv) integration of a novel browser, GenProt Viewer (http://www.genprot.org), developed by us that allows integration of genomic and proteomic information. With the continued support and active participation by the biomedical community, we expect HPRD to become a unique source of curated information for the human proteome and spur biomedical discoveries based on integration of genomic, transcriptomic and proteomic data.

Entities:  

Mesh:

Substances:

Year:  2006        PMID: 16381900      PMCID: PMC1347503          DOI: 10.1093/nar/gkj141

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

The Human Protein Reference Database (HPRD) is a protein information resource that provides extensive information pertaining to human proteins including domain architecture, protein functions, protein–protein interactions, post-translational modifications (PTMs), enzyme–substrate relationships, subcellular localization, tissue expression and disease association of genes (1–3). In order to make HPRD a more comprehensive resource, we have greatly expanded the number of protein entries, protein–protein interactions and PTMs. We have also incorporated additional query (e.g. BLAST) and browse options and provided explanatory pages for motifs found in the proteins cataloged in HPRD. Some of the new features include protein isoforms, links to signal transduction pathways and integration of GenProt Viewer, a novel browser that we have recently developed. HPRD currently contains over 20 000 protein entries including 1587 protein isoforms and has grown significantly in size over the last 3 years (Figure 1a).
Figure 1

Statistics pertaining to HPRD growth, experimental types for protein–protein interactions and a breakdown of PTMs. (a) Growth of HPRD over the last 3 years with respect to protein entries, protein–protein interactions and PTMs. (b) Distribution of protein–protein interactions in HPRD based on the type of the experimental method. (c) Distribution of various types of PTMs in HPRD. The percentage of the respective PTM is indicated only when it is greater than or equal to 2.

Cataloging protein–protein interactions

A crucial aspect of any proteomic analysis is the elucidation of interacting proteins—the interactome (4). HPRD currently has 33 710 unique protein–protein interactions. The experimental evidence for the interactions is derived from in vivo experiments for 19 175 interactions, in vitro for 11 114 interactions and yeast two-hybrid for 1813 interactions. Figure 1b shows the distribution of the protein–protein interactions annotated in HPRD. Table 1 shows the overall statistics as of September 15, 2005.
Table 1

The total numbers of entries in the various fields in HPRD are shown

FeatureNumbers
Total protein entries20 097
Isoforms included in the protein entries1587
Total number of protein interactions33 710
Total number of PTMs8409
Total number of domains and motifs478
Total number of enzyme–substrate relationships3343

Enrichment of PTMs, subcellular localization and tissue expression data in HPRD

PTMs can alter both structure and function of proteins. In recent years, several large-scale studies have been carried out to characterize PTMs using proteomic methods. For instance, 2002 phosphorylation sites were identified using mass spectrometry from HeLa cell nuclear extract in a single experiment (5). A total of 5011 phosphorylation events and 1132 glycosylation events are among the 8409 recorded PTMs in HPRD (Figure 1c). Updated annotations involving subcellular localization include 489 nucleolar proteins (6,7) and 270 secreted proteins (8). Similarly, tissue expression data have been added to a number of entries including those encoded by KIAA cDNAs (9).

Novel features added since initial release of HPRD

Protein isoforms

One of the important additions to HPRD is the inclusion of protein isoforms. Criteria for inclusion as an isoform include only those RefSeq database (10) entries with different CDS (coding sequence) for the same gene. Thus, only those alternate splice forms are considered in which the splicing involves the coding region and not the 5′ or 3′ untranslated regions. All annotations are displayed for all isoforms by default except when isoform-specific data regarding subcellular localization, PTMs, domain architecture or tissue expression are available. Mainly due to lack of data, isoform-specific annotations for protein–protein interactions, substrates and disease involvement are not provided currently but are common to all isoforms.

Enhanced search options

HPRD can be queried through gene symbols or a variety of database accession numbers such as RefSeq (10), OMIM (11), Swiss-Prot (12), HPRD and Entrez Gene (13). A multiple search option is included in the updated query system that allows the database to be queried by simultaneously specifying several different parameters. Because accession numbers, gene symbols or protein names might still not yield the protein being searched for, we have now also included a BLAST option as a search tool.

Links to pathways

In order to visualize and identify the potential function of a protein in the context of a large signaling network and its interaction partners, we have curated a number of pathways. These pathways have been integrated through the ‘Pathways’ tab. The pathway data are diagrammatically represented using GenMAPP (Gene Microarray Pathway Profiler) (14), a computer application designed for viewing and analyzing the data in the context of biological pathways (Figure 2). These pathways are a collection of literature-derived information usually downstream of ligand-receptor interactions. In addition to information about protein–protein interactions, pathways include reactions involving PTMs, shuttling of proteins between subcellular compartments, activation or inhibition of enzymatic activity and up or down regulation of mRNAs.
Figure 2

A screenshot of the molecule page of EGF receptor in HPRD is shown. The molecule page shows a graphical representation of the protein with its protein domains as polygons and sites of phosphorylation as vertical straight lines with red circles at the ends and disulfide bonds represented as gray lines. A link to the GenProt Viewer showing a number of SNPs and exons is shown in the left lower corner. Haploview that shows population haplotype patterns is shown on the lower right corner. The popup on the top right shows the EGF signaling pathway.

Integration of GenProt Viewer

HPRD provides annotations by mapping the protein onto the genome alongwith transcript and SNP information for each protein from the molecule page, through GenProt Viewer (Figure 2). The GenProt Viewer, a browser () developed by our group, provides an integrated genomic, transcriptomic and proteomic view of the human genome. Genomic annotations that have been addressed here include the mapping of single nucleotide polymorphisms and homology blocks of the human genome when compared against that of mouse. Transcriptomic data include RefSeq annotations and the categorization of protein-coding transcripts into untranslated regions and open reading frame. It also integrates ‘Haploview’ for investigating population haplotype patterns (15). Experimentally derived peptide sequences obtained by mass spectrometry that have been deposited in PeptideAtlas () (16) and PRIDE (17) repositories are also mapped onto the genomic sequence in GenProt Viewer. The peptides are linked to the sequence pages in these two repositories. Finally, the BLAST option allows users to query the genome using protein or nucleotide sequences.

Downloading HPRD data

HPRD data are available for download in XML as well as tab delimited file formats. Regular updates of full release of all the data in a compressed format is available using the ‘Download’ tab (). Interaction datasets in PSI-MI (18) format are provided as individual files for each protein as well as a single combined file for the entire dataset. The PSI-MI is an evolving data format which was originally released as level 1.0. We are currently formatting the protein–protein interaction data in HPRD to stay compliant with the latest version of this specification (PSI-MI level 2.5).

Future plans

We wish to develop a Protein Distributed Annotation System, which will enable laboratories throughout the world to annotate valuable proteomic information including PTMs, tissue expression, protein–protein interactions and enzyme–substrate relationships in the context of HPRD data. We hope to link any data obtained by mass spectrometry directly to such annotations in HPRD. We are also in the process of integrating transcriptomic data into HPRD, which will allow gene expression patterns to be visualized in normal and diseased states. Based on user input over the last 3 years, we also hope to include a list of genes regulated by the major transcription factors.

CONCLUSIONS

Our strategy of involving the biomedical community in providing feedback for individual entries using the ‘Comments’ button and designating interested researchers as ‘Molecule Authority’ listed under the ‘Credits’ tab is already successful. To make the best use of HPRD and to understand the annotation procedure and philosophy, we strongly encourage all users to visit the ‘FAQs’ page (). We hope that this community involvement will continue to intensify over the coming years in our effort to make HPRD a knowledgebase of human proteins that will assist in biomedical discoveries by serving as a complete resource of genomic, transcriptomic and proteomic information and in providing an integrated view of sequence, function and protein networks in health and disease.
  18 in total

Review 1.  Protein interaction maps for model organisms.

Authors:  A J Walhout; M Vidal
Journal:  Nat Rev Mol Cell Biol       Date:  2001-01       Impact factor: 94.444

2.  GenMAPP, a new tool for viewing and analyzing microarray data on biological pathways.

Authors:  Kam D Dahlquist; Nathan Salomonis; Karen Vranizan; Steven C Lawlor; Bruce R Conklin
Journal:  Nat Genet       Date:  2002-05       Impact factor: 38.330

3.  The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003.

Authors:  Brigitte Boeckmann; Amos Bairoch; Rolf Apweiler; Marie-Claude Blatter; Anne Estreicher; Elisabeth Gasteiger; Maria J Martin; Karine Michoud; Claire O'Donovan; Isabelle Phan; Sandrine Pilbout; Michel Schneider
Journal:  Nucleic Acids Res       Date:  2003-01-01       Impact factor: 16.971

4.  Large-scale characterization of HeLa cell nuclear phosphoproteins.

Authors:  Sean A Beausoleil; Mark Jedrychowski; Daniel Schwartz; Joshua E Elias; Judit Villén; Jiaxu Li; Martin A Cohn; Lewis C Cantley; Steven P Gygi
Journal:  Proc Natl Acad Sci U S A       Date:  2004-08-09       Impact factor: 11.205

5.  Human Plasma PeptideAtlas.

Authors:  Eric W Deutsch; Jimmy K Eng; Hui Zhang; Nichole L King; Alexey I Nesvizhskii; Biaoyang Lin; Hookeun Lee; Eugene C Yi; Reto Ossola; Ruedi Aebersold
Journal:  Proteomics       Date:  2005-08       Impact factor: 3.984

6.  Human protein reference database as a discovery resource for proteomics.

Authors:  Suraj Peri; J Daniel Navarro; Troels Z Kristiansen; Ramars Amanchy; Vineeth Surendranath; Babylakshmi Muthusamy; T K B Gandhi; K N Chandrika; Nandan Deshpande; Shubha Suresh; B P Rashmi; K Shanker; N Padma; Vidya Niranjan; H C Harsha; Naveen Talreja; B M Vrushabendra; M A Ramya; A J Yatish; Mary Joy; H N Shivashankar; M P Kavitha; Minal Menezes; Dipanwita Roy Choudhury; Neelanjana Ghosh; R Saravana; Sreenath Chandran; Sujatha Mohan; Chandra Kiran Jonnalagadda; C K Prasad; Chandan Kumar-Sinha; Krishna S Deshpande; Akhilesh Pandey
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

7.  HUGE: a database for human KIAA proteins, a 2004 update integrating HUGEppi and ROUGE.

Authors:  Reiko Kikuno; Takahiro Nagase; Manabu Nakayama; Hisashi Koga; Noriko Okazaki; Daisuke Nakajima; Osamu Ohara
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

8.  Development of human protein reference database as an initial platform for approaching systems biology in humans.

Authors:  Suraj Peri; J Daniel Navarro; Ramars Amanchy; Troels Z Kristiansen; Chandra Kiran Jonnalagadda; Vineeth Surendranath; Vidya Niranjan; Babylakshmi Muthusamy; T K B Gandhi; Mads Gronborg; Nieves Ibarrola; Nandan Deshpande; K Shanker; H N Shivashankar; B P Rashmi; M A Ramya; Zhixing Zhao; K N Chandrika; N Padma; H C Harsha; A J Yatish; M P Kavitha; Minal Menezes; Dipanwita Roy Choudhury; Shubha Suresh; Neelanjana Ghosh; R Saravana; Sreenath Chandran; Subhalakshmi Krishna; Mary Joy; Sanjeev K Anand; V Madavan; Ansamma Joseph; Guang W Wong; William P Schiemann; Stefan N Constantinescu; Lily Huang; Roya Khosravi-Far; Hanno Steen; Muneesh Tewari; Saghi Ghaffari; Gerard C Blobe; Chi V Dang; Joe G N Garcia; Jonathan Pevsner; Ole N Jensen; Peter Roepstorff; Krishna S Deshpande; Arul M Chinnaiyan; Ada Hamosh; Aravinda Chakravarti; Akhilesh Pandey
Journal:  Genome Res       Date:  2003-10       Impact factor: 9.043

9.  The HUPO PSI's molecular interaction format--a community standard for the representation of protein interaction data.

Authors:  Henning Hermjakob; Luisa Montecchi-Palazzi; Gary Bader; Jérôme Wojcik; Lukasz Salwinski; Arnaud Ceol; Susan Moore; Sandra Orchard; Ugis Sarkans; Christian von Mering; Bernd Roechert; Sylvain Poux; Eva Jung; Henning Mersch; Paul Kersey; Michael Lappe; Yixue Li; Rong Zeng; Debashis Rana; Macha Nikolski; Holger Husi; Christine Brun; K Shanker; Seth G N Grant; Chris Sander; Peer Bork; Weimin Zhu; Akhilesh Pandey; Alvis Brazma; Bernard Jacq; Marc Vidal; David Sherman; Pierre Legrain; Gianni Cesareni; Ioannis Xenarios; David Eisenberg; Boris Steipe; Chris Hogue; Rolf Apweiler
Journal:  Nat Biotechnol       Date:  2004-02       Impact factor: 54.908

10.  BioBuilder as a database development and functional annotation platform for proteins.

Authors:  J Daniel Navarro; Naveen Talreja; Suraj Peri; B M Vrushabendra; B P Rashmi; N Padma; Vineeth Surendranath; Chandra Kiran Jonnalagadda; P S Kousthub; Nandan Deshpande; K Shanker; Akhilesh Pandey
Journal:  BMC Bioinformatics       Date:  2004-04-20       Impact factor: 3.169

View more
  286 in total

1.  Human protein reference database and human proteinpedia as discovery resources for molecular biotechnology.

Authors:  Renu Goel; Babylakshmi Muthusamy; Akhilesh Pandey; T S Keshava Prasad
Journal:  Mol Biotechnol       Date:  2011-05       Impact factor: 2.695

2.  IMID: integrated molecular interaction database.

Authors:  Sentil Balaji; Charles Mcclendon; Rajesh Chowdhary; Jun S Liu; Jinfeng Zhang
Journal:  Bioinformatics       Date:  2012-01-11       Impact factor: 6.937

3.  Modeling community-wide molecular networks of multicellular systems.

Authors:  Kakajan Komurov
Journal:  Bioinformatics       Date:  2011-12-30       Impact factor: 6.937

4.  Systems biology in heart diseases.

Authors:  G E Louridas; I E Kanonidis; K G Lourida
Journal:  Hippokratia       Date:  2010-01       Impact factor: 0.471

5.  Systems pharmacology of arrhythmias.

Authors:  Seth I Berger; Avi Ma'ayan; Ravi Iyengar
Journal:  Sci Signal       Date:  2010-04-20       Impact factor: 8.192

6.  Multilevel support vector regression analysis to identify condition-specific regulatory networks.

Authors:  Li Chen; Jianhua Xuan; Rebecca B Riggins; Yue Wang; Eric P Hoffman; Robert Clarke
Journal:  Bioinformatics       Date:  2010-04-07       Impact factor: 6.937

7.  SH2 domains recognize contextual peptide sequence information to determine selectivity.

Authors:  Bernard A Liu; Karl Jablonowski; Eshana E Shah; Brett W Engelmann; Richard B Jones; Piers D Nash
Journal:  Mol Cell Proteomics       Date:  2010-07-13       Impact factor: 5.911

8.  Algorithms for modeling global and context-specific functional relationship networks.

Authors:  Fan Zhu; Bharat Panwar; Yuanfang Guan
Journal:  Brief Bioinform       Date:  2015-08-06       Impact factor: 11.622

Review 9.  Mechanisms of tumor resistance to EGFR-targeted therapies.

Authors:  Elizabeth A Hopper-Borge; Rochelle E Nasto; Vladimir Ratushny; Louis M Weiner; Erica A Golemis; Igor Astsaturov
Journal:  Expert Opin Ther Targets       Date:  2009-03       Impact factor: 6.902

10.  Pathway crosstalk analysis of microarray gene expression profile in human hepatocellular carcinoma.

Authors:  Xiaodong Zhou; Ruiguo Zheng; Huifang Zhang; Tianlin He
Journal:  Pathol Oncol Res       Date:  2014-12-06       Impact factor: 3.201

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.