Literature DB >> 18988627

Human Protein Reference Database--2009 update.

T S Keshava Prasad1, Renu Goel, Kumaran Kandasamy, Shivakumar Keerthikumar, Sameer Kumar, Suresh Mathivanan, Deepthi Telikicherla, Rajesh Raju, Beema Shafreen, Abhilash Venugopal, Lavanya Balakrishnan, Arivusudar Marimuthu, Sutopa Banerjee, Devi S Somanathan, Aimy Sebastian, Sandhya Rani, Somak Ray, C J Harrys Kishore, Sashi Kanth, Mukhtar Ahmed, Manoj K Kashyap, Riaz Mohmood, Y L Ramachandra, V Krishna, B Abdul Rahiman, Sujatha Mohan, Prathibha Ranganathan, Subhashri Ramabadran, Raghothama Chaerkady, Akhilesh Pandey.   

Abstract

Human Protein Reference Database (HPRD--http://www.hprd.org/), initially described in 2003, is a database of curated proteomic information pertaining to human proteins. We have recently added a number of new features in HPRD. These include PhosphoMotif Finder, which allows users to find the presence of over 320 experimentally verified phosphorylation motifs in proteins of interest. Another new feature is a protein distributed annotation system--Human Proteinpedia (http://www.humanproteinpedia.org/)--through which laboratories can submit their data, which is mapped onto protein entries in HPRD. Over 75 laboratories involved in proteomics research have already participated in this effort by submitting data for over 15,000 human proteins. The submitted data includes mass spectrometry and protein microarray-derived data, among other data types. Finally, HPRD is also linked to a compendium of human signaling pathways developed by our group, NetPath (http://www.netpath.org/), which currently contains annotations for several cancer and immune signaling pathways. Since the last update, more than 5500 new protein sequences have been added, making HPRD a comprehensive resource for studying the human proteome.

Entities:  

Mesh:

Substances:

Year:  2008        PMID: 18988627      PMCID: PMC2686490          DOI: 10.1093/nar/gkn892

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Human Protein Reference Database (HPRD; http://www.hprd.org/) is a resource for experimentally derived information about the human proteome including protein–protein interactions, post-translational modifications (PTMs) and tissue expression (1–4). The contents of several proteomic databases, including HPRD, pertaining to human proteins have recently been evaluated in terms of the number of nonredundant protein–protein interactions, number of direct interactions per protein, number of proteins with disease annotation and the number of linked citations (5). The curation and annotation process in HPRD involves entry of protein data through BioBuilder, a tool developed by our group for editing and managing data through a web browser (6). We have incorporated new features, such as PhosphoMotif Finder, links to a signaling pathway resource called NetPath, Human Proteinpedia for enhanced community participation and the use of BLAST for querying mRNA/protein data. Since the last update, we have added approximately 5500 new protein sequences and corresponding information in HPRD, which now contains information on most of the human proteins including their isoforms.

‘PhosphoMotif Finder’ searches experimentally derived phosphorylation-based substrate and binding motifs

PhosphoMotif Finder contains experimentally characterized phosphorylation-based substrate and binding motifs derived from the literature (7) and has been integrated with HPRD. PhosphoMotif Finder searches across the user submitted protein sequence for the presence of any of the 320 phosphorylation-based motifs listed in the compendium. Figure 1 shows the presence of 30 known tyrosine kinase phosphorylation sites in microtubule-associated serine/threonine kinase-like protein (MASTL), which is implicated in thrombocytopenia, a blood disorder. In addition to the mapped motifs, PhosphoMotif Finder also indicates potential enzymes (i.e. kinases or phosphatases) associated with these phosphorylation motifs. PhosphoMotif Finder should also be helpful in ascertaining the novelty of any motif that is described in the literature. Finally, it can be used in designing phosphorylation motif-specific antibodies and antibody-based arrays.
Figure 1.

Display of PhosphoMotif Finder integrated into HPRD. Screen shot shows molecule page of MASTL, a hypothetical protein implicated in autosomal dominant thrombocytopenia. ‘PhosphoMotif Finder’ tab in the HPRD page leads to the utility page where the sequence of the MASTL is displayed. Users can select either serine/threonine or tyrosine motifs and submit the query by clicking ‘Find Motifs’ button. Result page displays mapped experimentally derived motifs present in sequence along with the information on position, actual sequence, experimentally derived consensus phosphorylation motifs and link to the PubMed abstracts where these motifs have been described. MASTL sequence is shown to contain 30 potential tyrosine phosphorylation sites as seen in this figure.

Display of PhosphoMotif Finder integrated into HPRD. Screen shot shows molecule page of MASTL, a hypothetical protein implicated in autosomal dominant thrombocytopenia. ‘PhosphoMotif Finder’ tab in the HPRD page leads to the utility page where the sequence of the MASTL is displayed. Users can select either serine/threonine or tyrosine motifs and submit the query by clicking ‘Find Motifs’ button. Result page displays mapped experimentally derived motifs present in sequence along with the information on position, actual sequence, experimentally derived consensus phosphorylation motifs and link to the PubMed abstracts where these motifs have been described. MASTL sequence is shown to contain 30 potential tyrosine phosphorylation sites as seen in this figure.

‘NetPath’ pathway resource

We have incorporated a compendium of human signaling pathways called NetPath (http://www.netpath.org/) through the ‘Pathways’ tab in HPRD. NetPath contains information about protein interactions, catalytic reactions and protein translocation events, which occur downstream of ligand–receptor interactions. Currently, the role of 2732 and 1793 proteins are thus annotated in the context of cancer and immune signaling pathways, respectively. We have also cataloged genes that are upregulated or downregulated at the transcriptional level under the influence of these signaling pathways. Pathway data can be downloaded in standard international data exchange formats including BioPAX Level 2.0, PSI-MI version 2.5 and SBML version 2.1. The list of transcriptionally upregulated and downregulated genes can be obtained in the form of Excel sheet and tab delimited text documents. Integration of NetPath data in HPRD will assist users in visualizing the probable role of proteins in diverse signaling networks. For example, Janus Kinase 2 (JAK2) is involved in diverse pathways including EGFR1, Kit receptor, Notch, IL-2, IL-3, IL-4, IL-5 and IL-6 signaling pathways. NetPath provides the list of physical interactions and catalysis events of JAK2 with various proteins under different signaling pathways. Each interaction or catalysis event is linked to the PubMed abstract of the original article (Figure 2).
Figure 2.

Linking to human signaling pathways from HPRD. ‘Pathways’ button in the HPRD page of JAK2 is hyperlinked to its NetPath page. It shows the list of signaling pathways in which the protein is involved along with the description of its interactors in each pathway. Each interaction or catalysis event is linked to the PubMed abstract of the original article. The pathway name is linked to the specific signaling pathway annotated in NetPath.

Linking to human signaling pathways from HPRD. ‘Pathways’ button in the HPRD page of JAK2 is hyperlinked to its NetPath page. It shows the list of signaling pathways in which the protein is involved along with the description of its interactors in each pathway. Each interaction or catalysis event is linked to the PubMed abstract of the original article. The pathway name is linked to the specific signaling pathway annotated in NetPath.

Annotation of proteomic information

Protein isoforms

We have included most of human protein isoforms present in the RefSeq Database (8). Currently, 25 661 protein sequences encoded by 19 433 genes have been annotated in HPRD. Phosphodiesterase 9A, cAMP response element modulator, collagen type XIII alpha1 and dystrophin are examples of proteins with the highest number of isoforms with 20, 20, 19 and 18 isoforms, respectively. However, only data pertaining to the sequence, subcellular localization, mRNA/protein expression, biological motifs and domains are currently being annotated as isoform specific whereas protein–protein interactions and enzyme–substrate relationships are annotated as common to all isoforms. This is mainly due to the general lack of experimental data for the latter.

Protein–protein interactions

Protein–protein interactions are one of the most requested components of HPRD among those who downloaded this dataset. We have added more than 5000 protein–protein interactions in HPRD since the previous update in 2006. Among the 38 167 protein–protein interactions documented in HPRD, 8958 interactions were based on yeast two-hybrid analysis alone, whereas 8827 interactions were based on in vitro and 7163 on in vivo methods. Detection of 2410 protein–protein interactions was confirmed by all three methods. Overall, in HPRD, 8710 proteins are annotated with at least one protein–protein interaction, whereas 2015 and 774 proteins have more than 5 or 10 protein–protein interactions, respectively. The 14-3-3 gamma protein has a maximum of 173 protein–protein interactions. 15 231 protein–protein interactions (Table 1) have been submitted to HPRD by the scientific community using Human Proteinpedia (9,10). Enzyme–substrate relationships determined through peptide/protein arrays is a new data type included in HPRD, as represented by the phosphorylation of Tyr 16 of RNA binding motif protein 10 by c-Src.
Table 1.

Statistics of proteomic data annotated by HPRD team and submitted to Human Proteinpedia

DatasetDataset annotated by HPRD teamData submitted through Human Proteinpedia
Protein–protein interactions38 16715 231
PTMs16 97217 410
Subcellular localization19 6702906
mRNA/protein expression65 536150 368
Statistics of proteomic data annotated by HPRD team and submitted to Human Proteinpedia

PTMs and subcellular localization

HPRD currently contains information for 16 972 PTMs (Table 2) which belong to various categories with phosphorylation (10 858), dephosphorylation (3118) and glycosylation (1860) forming the majority of the annotated PTMs (Table 2). At least one enzyme responsible for PTMs has been annotated for 8960 PTMs, which resulted in the documentation of 7253 enzyme–substrate relationships. Of these, 1277 PTMs have more than one enzyme annotated. Human Proteinpedia has contributed over 17 400 PTMs, which are mainly derived from mass spectrometry studies. One or more site of subcellular localization has been annotated for 8620 proteins in HPRD with 586 of them being isoform specific. In addition to these, scientific investigators have contributed 2906 entries pertaining to subcellular localization through Human Proteinpedia.
Table 2.

Statistics of PTM data annotated among various PTM types

PTM typeCount
Phosphorylation10 858
Dephosphorylation3118
Glycosylation1860
Sumolylation305
Acetylation259
Methylation274
Palmitoylation149
Myristoylation43
Glutathionylation11
ADP-ribosylation7
Others88
Total16 972
Statistics of PTM data annotated among various PTM types

Community participation through ‘Human Proteinpedia’

We have developed a distributed annotation system called Human Proteinpedia and incorporated in HPRD (9,10). Proteomic investigators can directly contribute protein data derived from diverse platforms including the yeast two-hybrid, mass spectrometry, peptide/protein array, immunohistochemistry, Western blot, coimmunoprecipitation and fluorescence microscopy to HPRD using Human Proteinpedia. The protein features that can be mapped to corresponding entries in HPRD include PTMs, mRNA/protein expression in tissues or cell lines, subcellular localization, enzyme–substrate relationships and protein–protein interactions. These annotations are made available for viewing in a separate box beneath the HPRD annotation (Figure 3). Each entry is also linked to experimental evidence, such as mass spectra, images of Western blots and fluorescence micrographs. Figure 3 shows five serine phosphorylation sites for Adducin 1 protein in HPRD, submitted through Human Proteinpedia. PTM sites are linked to the meta-annotation of mass spectrometry data in Human Proteinpedia database as submitted by the investigator. The corresponding MS/MS spectrum can also be viewed by following a link in the meta-annotation page.
Figure 3.

Display of PTM data in HPRD submitted through Human Proteinpedia. Adducin1 molecule page in HPRD shows five novel phosphorylation sites submitted through Human Proteinpedia. Phosphorylation sites are hyperlinked to Human Proteinpedia page with information on the investigator, laboratory and meta-annotation of mass spectrometry experiment. Corresponding MS/MS spectrum for a peptide is also displayed using spectrum viewer developed by PRIDE.

Display of PTM data in HPRD submitted through Human Proteinpedia. Adducin1 molecule page in HPRD shows five novel phosphorylation sites submitted through Human Proteinpedia. Phosphorylation sites are hyperlinked to Human Proteinpedia page with information on the investigator, laboratory and meta-annotation of mass spectrometry experiment. Corresponding MS/MS spectrum for a peptide is also displayed using spectrum viewer developed by PRIDE. Investigators worldwide have already submitted 15 231 protein–protein interactions, 17 410 PTMs and 150 368 mRNA/protein expression to HPRD through Human Proteinpedia. Human Proteinpedia has increased quantity of the HPRD data by 2-fold in a relatively short span of time (Table 1). By involving investigators and experimentalists in the annotation of proteomic data, Human Proteinpedia has transformed HPRD into a true community database.

Usage of HPRD data by the community

Over the years, the biomedical community has provided valuable suggestions by interacting with HPRD team through ‘Comments’ and ‘Help’ buttons provided in HPRD page. More than 8000 gene comments, expert suggestions and help requests have been received and nearly 100 scientists have been designated as ‘Molecule Authorities’ based on their expertise. We hope to further increase participation by the community by implementing a microattribution system, which provides a citable credit to the investigators. Web resources that display or have made use of HPRD data include Entrez-Gene, VisANT (11) Genes2Networks (12), Cerebral (13), BioNetBuilder (14), COXPRESdb (15), STRING 7 (16) and UniHI (17). Molecular Signature Database (MSigDB) (18) used for Gene Set Enrichment Analysis of gene expression data incorporates pathway gene sets curated from HPRD. Sequence analysis tools which use HPRD data include CompariMotif (19) and SLiMFinder (20). CutDB, a database of proteolytic events (21), PepBank, a database of peptides (22) and T1Dbase, a database for type 1 diabetes research (23) are other resources that also incorporate curated proteomic data from HPRD.

CONCLUSIONS

With the inclusion of most of human protein sequences, HPRD has grown into an integrated knowledgebase for genomic and proteomic investigators. Incorporation of PhosphoMotif Finder and signaling pathways will help users to generate novel hypotheses or to point out likely molecules involved in a biological process of their interest. Further, the implementation of Human Proteinpedia has transformed HPRD into a community driven database and we hope that this trend will continue so that each and every entry is directly or indirectly verified by the individual experimentalists.

FUNDING

Funding for open access charge: Institute of Bioinformatics. Conflict of interest statement. None declared.
  23 in total

1.  Analysis of the human protein interactome and comparison with yeast, worm and fly interaction datasets.

Authors:  T K B Gandhi; Jun Zhong; Suresh Mathivanan; L Karthick; K N Chandrika; S Sujatha Mohan; Salil Sharma; Stefan Pinkert; Shilpa Nagaraju; Balamurugan Periaswamy; Goparani Mishra; Kannabiran Nandakumar; Beiyi Shen; Nandan Deshpande; Rashmi Nayak; Malabika Sarker; Jef D Boeke; Giovanni Parmigiani; Jörg Schultz; Joel S Bader; Akhilesh Pandey
Journal:  Nat Genet       Date:  2006-03       Impact factor: 38.330

2.  Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles.

Authors:  Aravind Subramanian; Pablo Tamayo; Vamsi K Mootha; Sayan Mukherjee; Benjamin L Ebert; Michael A Gillette; Amanda Paulovich; Scott L Pomeroy; Todd R Golub; Eric S Lander; Jill P Mesirov
Journal:  Proc Natl Acad Sci U S A       Date:  2005-09-30       Impact factor: 11.205

3.  Human protein reference database as a discovery resource for proteomics.

Authors:  Suraj Peri; J Daniel Navarro; Troels Z Kristiansen; Ramars Amanchy; Vineeth Surendranath; Babylakshmi Muthusamy; T K B Gandhi; K N Chandrika; Nandan Deshpande; Shubha Suresh; B P Rashmi; K Shanker; N Padma; Vidya Niranjan; H C Harsha; Naveen Talreja; B M Vrushabendra; M A Ramya; A J Yatish; Mary Joy; H N Shivashankar; M P Kavitha; Minal Menezes; Dipanwita Roy Choudhury; Neelanjana Ghosh; R Saravana; Sreenath Chandran; Sujatha Mohan; Chandra Kiran Jonnalagadda; C K Prasad; Chandan Kumar-Sinha; Krishna S Deshpande; Akhilesh Pandey
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

4.  Development of human protein reference database as an initial platform for approaching systems biology in humans.

Authors:  Suraj Peri; J Daniel Navarro; Ramars Amanchy; Troels Z Kristiansen; Chandra Kiran Jonnalagadda; Vineeth Surendranath; Vidya Niranjan; Babylakshmi Muthusamy; T K B Gandhi; Mads Gronborg; Nieves Ibarrola; Nandan Deshpande; K Shanker; H N Shivashankar; B P Rashmi; M A Ramya; Zhixing Zhao; K N Chandrika; N Padma; H C Harsha; A J Yatish; M P Kavitha; Minal Menezes; Dipanwita Roy Choudhury; Shubha Suresh; Neelanjana Ghosh; R Saravana; Sreenath Chandran; Subhalakshmi Krishna; Mary Joy; Sanjeev K Anand; V Madavan; Ansamma Joseph; Guang W Wong; William P Schiemann; Stefan N Constantinescu; Lily Huang; Roya Khosravi-Far; Hanno Steen; Muneesh Tewari; Saghi Ghaffari; Gerard C Blobe; Chi V Dang; Joe G N Garcia; Jonathan Pevsner; Ole N Jensen; Peter Roepstorff; Krishna S Deshpande; Arul M Chinnaiyan; Ada Hamosh; Aravinda Chakravarti; Akhilesh Pandey
Journal:  Genome Res       Date:  2003-10       Impact factor: 9.043

5.  Human protein reference database--2006 update.

Authors:  Gopa R Mishra; M Suresh; K Kumaran; N Kannabiran; Shubha Suresh; P Bala; K Shivakumar; N Anuradha; Raghunath Reddy; T Madhan Raghavan; Shalini Menon; G Hanumanthu; Malvika Gupta; Sapna Upendran; Shweta Gupta; M Mahesh; Bincy Jacob; Pinky Mathew; Pritam Chatterjee; K S Arun; Salil Sharma; K N Chandrika; Nandan Deshpande; Kshitish Palvankar; R Raghavnath; R Krishnakanth; Hiren Karathia; B Rekha; Rashmi Nayak; G Vishnupriya; H G Mohan Kumar; M Nagini; G S Sameer Kumar; Rojan Jose; P Deepthi; S Sujatha Mohan; T K B Gandhi; H C Harsha; Krishna S Deshpande; Malabika Sarker; T S Keshava Prasad; Akhilesh Pandey
Journal:  Nucleic Acids Res       Date:  2006-01-01       Impact factor: 16.971

6.  UniHI: an entry gate to the human protein interactome.

Authors:  Gautam Chaurasia; Yasir Iqbal; Christian Hänig; Hanspeter Herzel; Erich E Wanker; Matthias E Futschik
Journal:  Nucleic Acids Res       Date:  2006-12-07       Impact factor: 16.971

7.  T1DBase: integration and presentation of complex data for type 1 diabetes research.

Authors:  Erin M Hulbert; Luc J Smink; Ellen C Adlem; James E Allen; David B Burdick; Oliver S Burren; Victor M Cassen; Christopher C Cavnor; Geoffrey E Dolman; Daisy Flamez; Karen F Friery; Barry C Healy; Sarah A Killcoyne; Burak Kutlu; Helen Schuilenburg; Neil M Walker; Josyf Mychaleckyj; Decio L Eizirik; Linda S Wicker; John A Todd; Nathan Goodman
Journal:  Nucleic Acids Res       Date:  2006-12-14       Impact factor: 16.971

8.  CutDB: a proteolytic event database.

Authors:  Yoshinobu Igarashi; Alexey Eroshkin; Svetlana Gramatikova; Kosi Gramatikoff; Ying Zhang; Jeffrey W Smith; Andrei L Osterman; Adam Godzik
Journal:  Nucleic Acids Res       Date:  2006-11-16       Impact factor: 16.971

9.  STRING 7--recent developments in the integration and prediction of protein interactions.

Authors:  Christian von Mering; Lars J Jensen; Michael Kuhn; Samuel Chaffron; Tobias Doerks; Beate Krüger; Berend Snel; Peer Bork
Journal:  Nucleic Acids Res       Date:  2006-11-10       Impact factor: 16.971

10.  BioBuilder as a database development and functional annotation platform for proteins.

Authors:  J Daniel Navarro; Naveen Talreja; Suraj Peri; B M Vrushabendra; B P Rashmi; N Padma; Vineeth Surendranath; Chandra Kiran Jonnalagadda; P S Kousthub; Nandan Deshpande; K Shanker; Akhilesh Pandey
Journal:  BMC Bioinformatics       Date:  2004-04-20       Impact factor: 3.169

View more
  1375 in total

1.  Human protein reference database and human proteinpedia as discovery resources for molecular biotechnology.

Authors:  Renu Goel; Babylakshmi Muthusamy; Akhilesh Pandey; T S Keshava Prasad
Journal:  Mol Biotechnol       Date:  2011-05       Impact factor: 2.695

2.  Research resource: interactome of human embryo implantation: identification of gene expression pathways, regulation, and integrated regulatory networks.

Authors:  Signe Altmäe; Jüri Reimand; Outi Hovatta; Pu Zhang; Juha Kere; Triin Laisk; Merli Saare; Maire Peters; Jaak Vilo; Anneli Stavreus-Evers; Andres Salumets
Journal:  Mol Endocrinol       Date:  2011-11-10

3.  Superoxide dismutase 1 (SOD1) is a target for a small molecule identified in a screen for inhibitors of the growth of lung adenocarcinoma cell lines.

Authors:  Romel Somwar; Hediye Erdjument-Bromage; Erik Larsson; David Shum; William W Lockwood; Guangli Yang; Chris Sander; Ouathek Ouerfelli; Paul J Tempst; Hakim Djaballah; Harold E Varmus
Journal:  Proc Natl Acad Sci U S A       Date:  2011-09-19       Impact factor: 11.205

4.  Bayesian Joint Modeling of Multiple Gene Networks and Diverse Genomic Data to Identify Target Genes of a Transcription Factor.

Authors:  Peng Wei; Wei Pan
Journal:  Ann Appl Stat       Date:  2012-01-01       Impact factor: 2.083

5.  IMID: integrated molecular interaction database.

Authors:  Sentil Balaji; Charles Mcclendon; Rajesh Chowdhary; Jun S Liu; Jinfeng Zhang
Journal:  Bioinformatics       Date:  2012-01-11       Impact factor: 6.937

6.  Expression2Kinases: mRNA profiling linked to multiple upstream regulatory layers.

Authors:  Edward Y Chen; Huilei Xu; Simon Gordonov; Maribel P Lim; Matthew H Perkins; Avi Ma'ayan
Journal:  Bioinformatics       Date:  2011-11-10       Impact factor: 6.937

Review 7.  Tools for protein-protein interaction network analysis in cancer research.

Authors:  Rebeca Sanz-Pamplona; Antoni Berenguer; Xavier Sole; David Cordero; Marta Crous-Bou; Jordi Serra-Musach; Elisabet Guinó; Miguel Ángel Pujana; Víctor Moreno
Journal:  Clin Transl Oncol       Date:  2012-01       Impact factor: 3.405

8.  SynaptomeDB: an ontology-based knowledgebase for synaptic genes.

Authors:  Mehdi Pirooznia; Tao Wang; Dimitrios Avramopoulos; David Valle; Gareth Thomas; Richard L Huganir; Fernando S Goes; James B Potash; Peter P Zandi
Journal:  Bioinformatics       Date:  2012-01-27       Impact factor: 6.937

Review 9.  Network analysis of GWAS data.

Authors:  Mark D M Leiserson; Jonathan V Eldridge; Sohini Ramachandran; Benjamin J Raphael
Journal:  Curr Opin Genet Dev       Date:  2013-11-26       Impact factor: 5.578

10.  Identification of candidate genes that may contribute to the metastasis of prostate cancer by bioinformatics analysis.

Authors:  Lingyun Liu; Kaimin Guo; Zuowen Liang; Fubiao Li; Hongliang Wang
Journal:  Oncol Lett       Date:  2017-11-14       Impact factor: 2.967

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.