Literature DB >> 18160410

PepCyber:P~PEP: a database of human protein protein interactions mediated by phosphoprotein-binding domains.

Wuming Gong1, Dihan Zhou, Yongliang Ren, Yejun Wang, Zhixiang Zuo, Yanping Shen, Feifei Xiao, Qi Zhu, Ailing Hong, Xiaochuan Zhou, Xiaolian Gao, Tongbin Li.   

Abstract

Phosphoprotein-binding domains (PPBDs) mediate many important cellular and molecular processes. Ten PPBDs have been known to exist in the human proteome, namely, 14-3-3, BRCT, C2, FHA, MH2, PBD, PTB, SH2, WD-40 and WW. PepCyber:P approximately PEP is a newly constructed database specialized in documenting human PPBD-containing proteins and PPBD-mediated interactions. Our motivation is to provide the research community with a rich information source emphasizing the reported, experimentally validated data for specific PPBD-PPEP interactions. This information is not only useful for designing, comparing and validating the relevant experiments, but it also serves as a knowledge-base for computationally constructing systems signaling pathways and networks. PepCyber:P approximately PEP is accessible through the URL, http://www.pepcyber.org/PPEP/. The current release of the database contains 7044 PPBD-mediated interactions involving 337 PPBD-containing proteins and 1123 substrate proteins.

Entities:  

Mesh:

Substances:

Year:  2007        PMID: 18160410      PMCID: PMC2238930          DOI: 10.1093/nar/gkm854

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Protein phosphorylation-mediated signal transduction is an important post-translational modification (PTM)-based regulatory mechanism, and is implicated in a broad spectrum of key cellular molecular processes, including cell cycle, oncogenic transformation, immunological responsiveness, apoptosis and development (1–5). In these activities, ‘phosphoprotein-binding domains’ (PPBDs, denoting domains that have specific binding affinity to phosphorylated sites in proteins) play a pivotal role in connecting the kinases and the effector molecules, forming multi-protein complexes, and inducing specific protein–protein interactions responsible for changes in these proteins’ subcellular localization, folding state, binding specificity or activity (6). PPBDs achieve their binding specificity to their substrate proteins primarily through the recognition of a phosphopeptide (PPEP) region, which are short peptide sequences (∼6–15 residues) containing phosphorylated residues (i.e. pS, pT or pY, where S is serine, T is threonine and Y is tyrosine) (1,7). Other factors, such as the tertiary structures, subcellular localization of the substrate proteins, as well as domain competition, are also known to influence PPBD–phosphoprotein interactions in vivo (7,8). Phosphorylation sites are frequently found in intrinsically disordered or unstructured regions of the proteins (9–11), making these regions good candidate sites for PPBD binding. In the human proteome, 10 protein domains—14-3-3, BRCT, C2, FHA, MH2, PBD, PTB, SH2, WD-40 and WW—have been identified as PPBDs, i.e. they possess phosphoprotein or PPEP-binding activities (Table 1).
Table 1.

Summary of the 10 human PPBD classes

PPBD (class name)Reported substrate specificityaRefs.
14-3-3R-S-X-pS-X-P, R-X-X-X-pS-X-P(1,20)
BRCTpS-X-X-F(21,22)
C2(Y/F)-(S/A)-(V/I)-pY-(Q/R)-X-(Y/F)-X(23)
FHA (Forkhead-associated)pT-X-X-D(24,25)
MH2 (MAD homology 2)pS-X-pS(26)
PBD (Polo-Box domain)S-(pT/pS)-(P/X)(27)
PTB (phosphotyrosine binding)N-P-X-pY, N-P-X-Y(28)
SH2 (Src homology 2)pY-X-X-(I/P) (CRK); pY-(I/V/L)-X-(I/V/L) (PTPN11); pY-(M/I/L/V/E)-X-M (PIK3R1)(29,30)b
WD-40D-pS-G-Φ-X-pS (BTRC); (I/L/P)-(I/L/P)-pT-P (FBXW7)(31,32)
WW(A/P)-P-P-(A/P)-Y; pS-P; pT-P(33)

aThe listed information is representative of the vast amount of literature information.

bThe information also includes our unpublished data on SH2-PPEP microarray data.

Summary of the 10 human PPBD classes aThe listed information is representative of the vast amount of literature information. bThe information also includes our unpublished data on SH2-PPEP microarray data. The interactions between PPBDs and their PPEP substrates have been studied extensively using a variety of techniques, including structural determination (using X-ray crystallography or NMR spectroscopy), peptide assay (using phage display, synthetic peptide library or oriented peptide library), combinatorial screening, mass spectrometry analysis, mutagenesis (usually followed by GST pull-down or yeast two-hybrid assays) and computational sequence analysis (3). The rich information generated by these means is now partially captured by a few database resources, where the information about PPBDs, PPBD-containing proteins and their interactions with PPEP-containing substrate proteins can be obtained. These resources include general protein–protein interaction databases such as BIND (12), HPRD (13) and DOMINO (14), functional motif databases such as ELM (8) and Phospho.ELM (15), and a specialized prediction server—Scansite, which makes predictions about the PPBD–PPEP interactions based on results obtained from oriented peptide library experiments (16). Despite the availability of these existing resources, a database that offers integrated, comprehensive, detailed annotations regarding the proteomic interactions mediated by PPBDs is still lacking. PepCyber:P∼PEP was constructed with the intention of filling this gap. In PepCyber:P∼PEP, the information about PPBD-mediated protein interactions was carefully compiled through curation of peer-reviewed publications, and deposited into a relational database. For each interaction, specific information about the PPBD, PPBD-containing protein, the specific PPEP substrate bound, the substrate protein, the evidence of the interaction and the citations were recorded. Moreover, information regarding the signaling pathways associated with the PPBD–PPEP-binding interactions (in particular, tumorigenesis-related signaling pathways and tumor types) is also documented and stored in the database. The data hosted in PepCyber:P∼PEP meet the Human Proteome Organization (HUPO), Proteomics Standards Initiative (PSI) standard (17), which is supported by most major protein–protein interaction databases.

UTILITY

Data content

We term an occurrence of a PPBD in a specific protein a ‘PPBD instance’, and the collection of similar PPBD instances with a high level of sequence homology and structural similarity a ‘PPBD class’. For example, the SH2 domain located close to the N-terminus of the protein, PTPN11 (SHP2), is an ‘instance’ belonging to the SH2 ‘PPBD class’. Presently, there are 10 reported human PPBD classes (Table 1). The current PepCyber:P∼PEP release (V.1.0, release 31 July 2007) includes 7044 PPBD-mediated interactions involving 337 PPBD-containing proteins and 1123 substrate proteins. This rich information was obtained through the curation of 2446 peer-reviewed research articles published between 1975 and 2007. The largest number of interactions involves the SH2 PPBD class (4290 interactions) that is followed by the WW PPBD class (1389 interactions) and 14-3-3 PPBD class (1086 interactions). These interactions were classified into three categories based on whether the interaction was known to be mediated by a concerned PPBD instance, and whether the substrate peptide had been identified: if the interaction was known to be mediated by a PPBD instance and the substrate peptide had been identified, the interaction was classified as ‘category A’; if the interaction was known to be mediated by a PPBD instance but the substrate peptide had not been identified, the interaction is classified as ‘category B’; if it was not known whether the interaction was mediated by a PPBD instance (though one of the interacting proteins was a PPBD protein), then the interaction was classified as ‘category C’. Among the 7044 interactions documented in the current release of PepCyber:P∼PEP, 5376 (76%) are category A interactions. The 14-3-3, PTB and WW PPBD classes are unique in that they are also capable of binding to non-PPEP substrates in certain instances. These non-PPEP interactions are also documented in PepCyber:P∼PEP. There are a total of 1068 non-PPEP interactions, accounting for 15% of the total collection of the current PepCyber:P∼Pep release. All non-PPEP interactions documented are category A interactions.

Web interface

The web interface of the PepCyber:P∼PEP database can be accessed through the URL http://www.pepcyber.org/PPEP/. Five tabs are located underneath the logo of the web site, namely, ‘PPBD Classes’, ‘PPBD Proteins’, ‘Interaction Search’, ‘Tutorial’ and ‘Glossary’. These five tabs are described subsequently. The ‘PPBD Classes’ tab leads to the introduction pages of the 10 PPBD classes (Table 1), where information about the lengths, structures, representative instances and the reported binding specificity for each of the 10 PPBD classes is presented. The ‘PPBD Proteins’ tab leads to the PPBD-containing proteins browsing page, where the user can select a PPBD-containing protein to view the details for the protein of interest, including the gene symbol, description, NCBI RefSeq and Swiss-Prot accessions and a graphical representation of all PPBD-mediated interactions involving this protein. The information regarding the interactions involving each PPBD instance of the protein is then listed separately. When the number of the known interactions involving a domain is sufficiently large (≥10), the positional amino acid composition preference of the substrate sequences (17 amino acid long) is also available as a WebLogo image (18). The ‘Interaction Search’ tab leads to the page where custom search functions can be executed. The user has multiple search options for the PPBD–PPEP interactions: by the PPBD class; by the PPBD instance; by the name, NCBI RefSeq accession or Swiss-Prot accession of either the PPBD protein or the substrate protein; by the substrate peptide sequence, or by the pathway involved. The search can also be conducted using any combinations of the above criteria. The search result is presented as a list of interactions, each with the names of the PPBD-containing protein and the substrate protein, the substrate sequence, index site, evidence type, category of the interaction and the number of records matching the interaction. The ‘index site’ refers to the locus of the phosphorylated residue on a PPEP substrate protein, or the locus of the central contact residue on a non-PPEP substrate protein. The ‘evidence type’ indicates the type of analysis conducted to support the presence of the interaction. Currently, four evidence types are defined: (i) structural determinations; (ii) peptide library experiments; (iii) mutagenesis and (iv) sequence analyses. By clicking the ‘Details’ link for each listed interaction, the user can view more detailed information about the interaction, including the names, NCBI RefSeq accessions, Swiss-Prot accessions of the PPBD-containing protein and the substrate protein, the sequence of the substrate protein, the evidence type and the references for the interactions reported. The user can choose to plot all interactions or a selected set of interactions as PPBD-mediated protein–protein interaction networks. The network graphs are rendered dynamically using the graph visualization software, GraphViz (19). In the network plot, each node represents one protein: PPBD-containing proteins are labeled in green, and other proteins are labeled in yellow. Each directed edge represents an interaction between a PPBD instance and substrate protein, with the index site displayed on the edge. The user can click on a node representing a PPBD-containing protein to reach the PPBD protein information page. The ‘Tutorial’ tab leads to the page where the utility of the database is demonstrated in a graphical manner. The ‘Glossary’ tab leads to the glossary page where terms and abbreviations used in the web site are explained.

Data access

PepCyber:P∼PEP is publicly accessible through the URL http://www.pepcyber.org/PPEP/and the data sets are available, free of charge, to researchers from academic and non-profit institutions. Additional requests can be made by emailing to help@pepcyber.org.

Implementation

The PepCyber:P∼PEP database is a relational database implemented with MySQL on a Fedora Core 2 Linux system. The front-end web interface is implemented as a PHP project running under Apache 2.0.

COMPARISON WITH DATABASES RELEVANT TO PPBPS AND/OR PPEPS

PepCyber:P∼PEP is the first database specialized in documenting human PPBDs, PPBD-containing proteins and PPBD-mediated protein–protein interactions. However, the information about human PPBDs and PPBD-mediated interactions is also hosted in existing general protein–protein interaction databases such as BIND (12), HPRD (13) and DOMINO (14), and functional motif databases such as ELM (8) and Phospho.ELM (15). All these databases have different focuses, and as such the types of information stored vary among them (Table 2).
Table 2.

A comparison between PepCyber:P∼PEP and similar databases in the type of information provided for PPBD-mediated protein–protein interactions

DatabaseURLInstitution/companyPPBD classPPBD proteinPPBD-mediated networksSubstrate proteinSubstrate sequenceBinding patternIndex siteEvidencePathway
BINDhttp://bind.caUnleashed InformaticsNoYesNoYesYesNoNoYesNo
HPRDhttp://www.hprd.org/Institute of Bioinformatics, Bangalore, India and Johns Hopkins UniversityYesYesYesNoNoYesNoNoNo
DOMINOhttp://mint.bio.uniroma2.it/domino/University of Rome Tor Vergata, ItalyYesYesNoYesYesNoYesYesNo
ELMhttp://elm.eu.org/Eukaryotic Linear Motif (ELM) consortiumYesYesNoYesYesYesYesNoYes
Phospho.ELMhttp://phospho.elm.eu.org/Eukaryotic Linear Motif (ELM) consortiumYesNoNoYesYesNoYesNoNo
PepCyber:P∼PEPhttp://www.pepcyber.org/PPEP/University of Minnesota, University of Houston, LC SciencesYesYesYesYesYesYesYesYesYes

aThe comparison was made based only on PPBDs and PPBD-mediated interactions. BIND, HPRD and DOMINO are general protein–protein interaction databases and host information about interactions of much broader scopes than PepCyber:P∼PEP. ELM is a general functional motif database and has much broader scope of coverage than the PPBD substrates that PepCyber:P∼PPEP focuses on. Phospho.ELM primarily focuses on documenting information related to phosphorylation sites. Bold indicates that PepCyber:P∼PEP is the database we are presenting, and that it is superior to other databases in comparison.

A comparison between PepCyber:P∼PEP and similar databases in the type of information provided for PPBD-mediated protein–protein interactions aThe comparison was made based only on PPBDs and PPBD-mediated interactions. BIND, HPRD and DOMINO are general protein–protein interaction databases and host information about interactions of much broader scopes than PepCyber:P∼PEP. ELM is a general functional motif database and has much broader scope of coverage than the PPBD substrates that PepCyber:P∼PPEP focuses on. Phospho.ELM primarily focuses on documenting information related to phosphorylation sites. Bold indicates that PepCyber:P∼PEP is the database we are presenting, and that it is superior to other databases in comparison. PepCyber:P∼PEP hosts a substantially richer collection of data about PPBD-mediated interactions than any other database. Table 3 provides a quantitative comparison between PepCyber:P∼PEP and the existing databases in the number of PPBD instances and PPBD-mediated interactions. In addition to this advantageous depth and breadth of information, the PepCyber:P∼PEP data collection is also of notably high quality, attributed to the meticulous data curation procedure followed and the rigorous quality control (QC) process carried out before the data is deposited into the MySQL database. For example, special attention was given to allow synonymous protein symbols used in the search, allowing a user to obtain consistent results, as different original studies may use different symbols to represent the same gene or protein. During data curation, each gene/protein symbol used in the original articles was checked against three databases—NCBI GenBank, Swiss-Prot and NCBI Entrez Gene to ensure that different symbols (or synonyms) of the same gene/protein are represented by the same entity in the data set. During QC, the curated entries were checked against a local copy of the three public gene/protein databases. If any inconsistency was identified, the entry was returned to the curation process for re-checking. These procedures guarantee that only high-confidence data were deposited into the released PepCyber:P∼PEP database. Problems with gene/protein symbols that occur from time to time in other databases were minimized. As an example of such problems, three symbols—SHP2, SHP-2 and SHPTP2—were used in different entries in Phospho.ELM, without indication that they are all synonyms of the same gene PTPN11.
Table 3.

A quantitative comparison between PepCyber:P∼PEP and similar databases in the numbers of PPBD instances, PPBD-mediated interactions for all PPBD classes as well as for the four most popular PPBD classes: 14-3-3, PTB, SH2 and WW

Database#PPBD instances#PPBD-mediated interactions#14-3-3-mediated interactions#PTB-mediated interactions#SH2-mediated interactions#WW-mediated interactions
BIND81149419824
HPRD5710577821
DOMINO195102023485468157
ELM1468200471
Phospho.ELM472200112090
PepCyber:P∼PEP3377044108615842901389

Bold indicates that PepCyber:P∼PEP is the database we are presenting, and that it is superior to other databases in comparison.

A quantitative comparison between PepCyber:P∼PEP and similar databases in the numbers of PPBD instances, PPBD-mediated interactions for all PPBD classes as well as for the four most popular PPBD classes: 14-3-3, PTB, SH2 and WW Bold indicates that PepCyber:P∼PEP is the database we are presenting, and that it is superior to other databases in comparison.

FUTURE DIRECTIONS

Pepcyber:P∼PEP is intended to provide comprehensive, up-to-date, dynamic information and tools to researchers who require information on PPBD-mediated protein–protein interactions as well as the sequence patterns and connecting maps of these interactions in the human proteome. The reported information herein represents an initial step toward our long-term goals. Our continuing effort will be in the following several areas: (i) Update and expand the content and functions of PepCyber:P∼PEP as published studies on the PPBDs and PPBD-mediated interactions continue to accumulate; (ii) develop and implement novel analysis methods of proteins and peptides to mine the rich data compilation stored in PepCyber:P∼PEP; (iii) develop strategies and methods to predict substrate specificity for one or more PPBD instances within a PPBD class and (iv) develop necessary tools for systems biology modeling using the PepCyber:P∼PEP data. These developments will complement experimental efforts, lead to savings in time and cost in experiments, and accelerate our understanding of the key processes in cellular regulation mechanisms. We envision that such an information source will have significant value for not only proteomic research, but also for discovery and development of drug candidates, drug targets and biomarkers. PepCyber:P∼PEP is merely the first significant component of the overall PepCyber, a valuable information source for important peptide-related biological and biomedical subject areas. The PepCyber effort will eventually result in a suite of database resources and computational tools assisting the development of peptide microarray-based proteomics profiling analysis. Future developments of PepCyber will include database resources with expanded scopes, e.g. non-PPBD-mediated protein–protein interactions, as well as non-database components such as peptide microarray design and data analysis tools.
  32 in total

1.  The molecular basis of FHA domain:phosphopeptide binding specificity and implications for phospho-dependent signaling mechanisms.

Authors:  D Durocher; I A Taylor; D Sarbassova; L F Haire; S L Westcott; S P Jackson; S J Smerdon; M B Yaffe
Journal:  Mol Cell       Date:  2000-11       Impact factor: 17.970

2.  Phosphothreonine recognition comes into focus.

Authors:  M M Zhou
Journal:  Nat Struct Biol       Date:  2000-12

3.  Crystal structure of a phosphorylated Smad2. Recognition of phosphoserine by the MH2 domain and insights on Smad function in TGF-beta signaling.

Authors:  J W Wu; M Hu; J Chai; J Seoane; M Huse; C Li; D J Rigotti; S Kyin; T W Muir; R Fairman; J Massagué; Y Shi
Journal:  Mol Cell       Date:  2001-12       Impact factor: 17.970

4.  Intrinsic disorder and protein function.

Authors:  A Keith Dunker; Celeste J Brown; J David Lawson; Lilia M Iakoucheva; Zoran Obradović
Journal:  Biochemistry       Date:  2002-05-28       Impact factor: 3.162

Review 5.  The protein kinase complement of the human genome.

Authors:  G Manning; D B Whyte; R Martinez; T Hunter; S Sudarsanam
Journal:  Science       Date:  2002-12-06       Impact factor: 47.728

6.  BIND: the Biomolecular Interaction Network Database.

Authors:  Gary D Bader; Doron Betel; Christopher W V Hogue
Journal:  Nucleic Acids Res       Date:  2003-01-01       Impact factor: 16.971

7.  Local structural disorder imparts plasticity on linear motifs.

Authors:  Monika Fuxreiter; Peter Tompa; István Simon
Journal:  Bioinformatics       Date:  2007-03-25       Impact factor: 6.937

Review 8.  WD-repeat proteins: structure characteristics, biological function, and their involvement in human diseases.

Authors:  D Li; R Roberts
Journal:  Cell Mol Life Sci       Date:  2001-12       Impact factor: 9.261

Review 9.  Phosphotyrosine-binding domains in signal transduction.

Authors:  Michael B Yaffe
Journal:  Nat Rev Mol Cell Biol       Date:  2002-03       Impact factor: 94.444

10.  Proteomic screen finds pSer/pThr-binding domain localizing Plk1 to mitotic substrates.

Authors:  Andrew E H Elia; Lewis C Cantley; Michael B Yaffe
Journal:  Science       Date:  2003-02-21       Impact factor: 47.728

View more
  23 in total

1.  Human protein reference database and human proteinpedia as discovery resources for molecular biotechnology.

Authors:  Renu Goel; Babylakshmi Muthusamy; Akhilesh Pandey; T S Keshava Prasad
Journal:  Mol Biotechnol       Date:  2011-05       Impact factor: 2.695

Review 2.  Toward a complete in silico, multi-layered embryonic stem cell regulatory network.

Authors:  Huilei Xu; Christoph Schaniel; Ihor R Lemischka; Avi Ma'ayan
Journal:  Wiley Interdiscip Rev Syst Biol Med       Date:  2010 Nov-Dec

Review 3.  The role of protein-protein interactions in the intracellular traffic of the potassium channels TASK-1 and TASK-3.

Authors:  Markus Kilisch; Olga Lytovchenko; Blanche Schwappach; Vijay Renigunta; Jürgen Daut
Journal:  Pflugers Arch       Date:  2015-01-07       Impact factor: 3.657

4.  Posttranslational modifications in proteins: resources, tools and prediction methods.

Authors:  Shahin Ramazi; Javad Zahiri
Journal:  Database (Oxford)       Date:  2021-04-07       Impact factor: 3.451

5.  The SH2 domain interaction landscape.

Authors:  Michele Tinti; Lars Kiemer; Stefano Costa; Martin L Miller; Francesca Sacco; Jesper V Olsen; Martina Carducci; Serena Paoluzi; Francesca Langone; Christopher T Workman; Nikolaj Blom; Kazuya Machida; Christopher M Thompson; Mike Schutkowski; Søren Brunak; Matthias Mann; Bruce J Mayer; Luisa Castagnoli; Gianni Cesareni
Journal:  Cell Rep       Date:  2013-03-28       Impact factor: 9.423

Review 6.  Structure-based inhibition of protein-protein interactions.

Authors:  Andrew M Watkins; Paramjit S Arora
Journal:  Eur J Med Chem       Date:  2014-09-16       Impact factor: 6.514

7.  The development and application of a quantitative peptide microarray based approach to protein interaction domain specificity space.

Authors:  Brett W Engelmann; Yohan Kim; Miaoyan Wang; Bjoern Peters; Ronald S Rock; Piers D Nash
Journal:  Mol Cell Proteomics       Date:  2014-08-18       Impact factor: 5.911

8.  MimoSA: a system for minimotif annotation.

Authors:  Jay Vyas; Ronald J Nowling; Thomas Meusburger; David Sargeant; Krishna Kadaveru; Michael R Gryk; Vamsi Kundeti; Sanguthevar Rajasekaran; Martin R Schiller
Journal:  BMC Bioinformatics       Date:  2010-06-16       Impact factor: 3.169

9.  A proposed syntax for Minimotif Semantics, version 1.

Authors:  Jay Vyas; Ronald J Nowling; Mark W Maciejewski; Sanguthevar Rajasekaran; Michael R Gryk; Martin R Schiller
Journal:  BMC Genomics       Date:  2009-08-05       Impact factor: 3.969

10.  A structure filter for the Eukaryotic Linear Motif Resource.

Authors:  Allegra Via; Cathryn M Gould; Christine Gemünd; Toby J Gibson; Manuela Helmer-Citterich
Journal:  BMC Bioinformatics       Date:  2009-10-24       Impact factor: 3.169

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.