Literature DB >> 30380102

qPhos: a database of protein phosphorylation dynamics in humans.

Kai Yu1, Qingfeng Zhang1, Zekun Liu1,2, Qi Zhao1, Xiaolong Zhang1, Yan Wang1, Zi-Xian Wang1, Ying Jin1, Xiaoxing Li1, Ze-Xian Liu1, Rui-Hua Xu1.   

Abstract

Temporal and spatial protein phosphorylation dynamically orchestrates a broad spectrum of biological processes and plays various physiological and pathological roles in diseases and cancers. Recent advancements in high-throughput proteomics techniques greatly promoted the profiling and quantification of phosphoproteome. However, although several comprehensive databases have reserved the phosphorylated proteins and sites, a resource for phosphorylation quantification still remains to be constructed. In this study, we developed the qPhos (http://qphos.cancerbio.info) database to integrate and host the data on phosphorylation dynamics. A total of 3 537 533 quantification events for 199 071 non-redundant phosphorylation sites on 18 402 proteins under 484 conditions were collected through exhaustive curation of published literature. The experimental details, including sample materials, conditions and methods, were recorded. Various annotations, such as protein sequence and structure properties, potential upstream kinases and their inhibitors, were systematically integrated and carefully organized to present details about the quantified phosphorylation sites. Various browse and search functions were implemented for the user-defined filtering of samples, conditions and proteins. Furthermore, the qKinAct service was developed to dissect the kinase activity profile from user-submitted quantitative phosphoproteome data through annotating the kinase activity-related phosphorylation sites. Taken together, the qPhos database provides a comprehensive resource for protein phosphorylation dynamics to facilitate related investigations.

Entities:  

Mesh:

Substances:

Year:  2019        PMID: 30380102      PMCID: PMC6323974          DOI: 10.1093/nar/gky1052

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

As one of the most important post-translational modifications (PTMs), protein phosphorylation is involved in almost all biological processes and plays physiological and pathological roles in diseases and cancers (1–3). Phosphorylation is reversibly catalysed by kinases and phosphatases, which dynamically control the phosphorylation dynamics based on a temporal and spatial context (4,5). In 1992, Edmond H. Fischer and Edwin G. Krebs shared the Nobel Prize in Physiology or Medicine for their discovery of reversible protein phosphorylation as a biological regulatory mechanism (6), while the prize in 2001 was awarded to Leland H. Hartwell, Tim Hunt and Paul M. Nurse for identifying key regulators, including cyclin-dependent kinases (CDKs) and cyclins, which accurately orchestrate the cell cycle through phosphorylation (7). In recent decades, many studies have dissected the molecular mechanisms and biological functions of phosphorylation dynamics, and kinases and phosphatases are popular research areas for the development of target therapies (5,8). Recently, the advancement of high-throughput proteomics techniques greatly promoted the profiling and quantification of the phosphoproteome in various cells and tissues under different conditions (9,10). For example, based on phosphoproteome quantification, Wojcechowskyj et al. dissected the molecular mechanism of cellular reprogramming during HIV-1 entry (11), and van den Biggelaar et al. unveiled the dynamic phosphorylation of thrombin signalling in human primary endothelial cells (12). Since dynamic phosphorylation events could provide helpful clues for kinase-mediated signalling, Ho et al. identified the activation of PI3K/AKT/mTORC1 signalling for cell survival by the ELABELA peptide in human embryonic stem cells during heart development (13), Bai et al. profiled the driver tyrosine kinases in sarcoma (14) and Casado et al. inferred the aberrant kinase activation in leukaemia cells through the quantitative phosphoproteome-based computational analyses (15). Thus, the quantitative phosphoproteomics data could provide great help in understanding phosphorylation-controlled biological processes. Previously, several pioneering studies have contributed to constructing resources to host phosphorylation-related data. The currently available databases contain many human protein phosphorylation data. UniProt (16) is the most important infrastructure for protein annotations, which contain massive experimentally identified phosphorylation sites and annotations in the Human Protein Reference Database (HPRD) (17) also contain protein phosphorylation information. Databases including dbPTM (18), PhosphoSitePlus (19), SysPTM (20), PHOSIDA (21), dbPAF (22), Phospho.ELM (23) and PhosphoPep (24) curate and host experimentally identified phosphorylation sites from the published literature, and iPTMnet (25) contains information on the phosphorylation regulatory network and conservation. Furthermore, the PhosSNP (27) and ActiveDriverDB (28) databases analyse the genetic variations that influence phosphorylation, and the PTMcode (26) database provides the functional associations of phosphorylation sites. With the continuous improvement of high-throughput phosphoproteome techniques and the rapid increase of phosphoproteome datasets, ProteomeScout (29) and ProteomicsDB (30) were developed to store proteome and PTM proteome datasets and provide online analysis tools. However, although the ProteomeScout database contains protein quantification information, the quantification of phosphorylation events is still missing. Taken together, although databases for various aspects of protein phosphorylation are available, the database for the quantification of phosphorylation is still absent. Since the quantification/stoichiometry/dynamics of phosphorylation are critical for the molecular mechanisms under different temporal and spatial contexts, a comprehensive resource for protein phosphorylation quantification could facilitate the reuse of published quantitative phosphoproteome datasets and provide great help for phosphorylation-related studies. In this study, we developed the qPhos database to host the quantitative phosphoproteome data generated in Homo sapiens. In total, 3 537 533 quantification events for 199 071 non-redundant phosphorylation sites on 18 402 proteins under 484 conditions were curated and integrated into the database, coupled with various annotations such as sequence and structure properties, potential upstream kinases and their inhibitors. For convenient usage, browse and search functions were implemented and the information in the database was carefully organized. Furthermore, based on curated activity-related phosphorylation sites, the qKinAct service was developed to dissect the kinase activity profile from phosphoproteome data. Thus, qPhos could serve as a one-stop database to investigate phosphorylation dynamics in H. sapiens.

CONSTRUCTION AND CONTENT

To establish the resource for protein phosphorylation dynamics, the published quantitative phosphoproteomics datasets were collected from the literature and annotations from various resource were integrated (Figure 1). Keywords including ‘phosphoproteomic’, ‘phosphoproteomics’ and ‘phosphoproteome’ were used to search PubMed to retrieve the phosphoproteome-related literature. Based on the criteria of high-throughput mass spectrometry-based site-level protein phosphorylation quantification, the published datasets of quantitative phosphoproteome datasets in H. sapiens were manually curated. Besides the quantified phosphorylation sites, details about the quantification including the experimental condition, phosphopeptide enrichment method, mass spectrometry and raw peptide were collected. All quantified phosphopeptides and modified residues were mapped to the H. sapiens reference proteome sequences downloaded from UniProt database (Release 2018_04) (Figure 1) (16). Due to the differences among the miscellaneous reference proteome datasets used in these literature, about 4.31% of raw phosphopeptides could not be mapped to the UniProt reference human proteome, and these limited unmapped data was dropped. In addition, the annotations from databases such as UniProt (16), ExPASy (31) and DrugBank (Release 2018_07) (32) were integrated into the qPhos database to provide comprehensive information for phosphorylation events (Figure 1). The experimental verified kinase–substrate relationships were integrated from Phospho.ELM (23), PhosphoSitePlus (19) and MusiteDeep (33), and potential relationship were predicted by sequence-based predictor GPS (34) and network-based predictor iGPS (35) with a high threshold (Figure 1). Furthermore, the activity-related phosphorylation sites in kinases were also curated from the literature. The human kinases and their names were retrieved from the EKPD database (36), and coupled with keywords including ‘phosphorylation’ and ‘activity or activate or activation’ to search PubMed for experimental evidence of activity-related phosphorylation (Figure 1).
Figure 1.

The schema for the construction of the qPhos database.

The schema for the construction of the qPhos database. In total, 3 537 533 quantification events for 199 071 non-redundant phosphorylation sites on 18 402 proteins under 484 conditions were collected from 190 published studies. The primary references for the data were provided to ensure their quality and repeatability, and various annotations were integrated into the database. The sequence and structure preferences of the quantified phosphorylation sites were presented in Figure 2A–D. It seemed that the phosphorylation sites were generally equally distributed among the protein sequences but slightly enriched in C-terminal areas (E-ratio = 1.06, P-value < 10−17) (Figure 2A). Through computational annotation of secondary structures, surface accessibility and disorder region by NetSurfP (37) and IUPred (38), it was observed that the phosphorylation sites were enriched in the coil region (E-ratio = 1.21, P-value < 10−18) (Figure 2B), exposed region (E-ratio = 1.21, P-value < 10−18) (Figure 2C) and disordered region (E-ratio = 2.09, P-value < 10−18) (Figure 2D). As shown in Figure 2E, the distribution of phosphorylated serine, threonine and tyrosine residues was consistent with previous studies (19,22). Furthermore, the potential upstream kinases were annotated for these phosphorylation sites by various resources. The experimentally identified kinase-site regulatory relations were retrieved from previous studies, including Phospho.ELM (23), PhosphoSitePlus (19) and MusiteDeep (33). Additionally, the potential site-specific kinase-substrate relationships were predicted by sequence-based predictor GPS (34) and network-based predictor iGPS (35) with a high threshold. The annotations of kinases were summarized at the family level in Figure 2F.
Figure 2

Summary of the sequence and structure preferences and kinase families of the quantified phosphorylation sites, including the summary of the position along the protein sequence (A), secondary structure (B), surface accessibility (C), disorder region (D), serine/threonine/tyrosine (E) and regulator kinase family (F) for the quantified phosphorylation sites.

Summary of the sequence and structure preferences and kinase families of the quantified phosphorylation sites, including the summary of the position along the protein sequence (A), secondary structure (B), surface accessibility (C), disorder region (D), serine/threonine/tyrosine (E) and regulator kinase family (F) for the quantified phosphorylation sites.

USAGE

The qPhos database was developed for scientists to quickly access the quantitative phosphoproteome data in a user-friendly manner. For convenient usage, qPhos provided browse and search functions to query the database. Three browse options, including condition, sample and gene, were provided to browse the database by selecting the item from the list (Figure 3A). The experimental conditions, samples including cell lines and tissues, and gene symbols were sorted and organized in alphabetical order, which enabled the users to quickly choose the interesting data (Figure 3A). Simple and complex search functions were implemented at the home (Figure 3B) and search (Figure 3C) pages, respectively, which provided keyword-based queries in protein and gene names, protein functions and descriptions of conditions and samples. Furthermore, the retrieved phosphorylation sites from the browse or search functions could be further filtered by conditions, samples and methods in the results page (Figure 3D).
Figure 3.

The detailed information in qPhos. (A) Browse function. (B) Simple search function. (C) Advanced search function. (D) The returned search results. (E) The information about the protein. (F) The information about the quantification of the phosphorylation site. (G) The information on potential kinases and their inhibitors for the quantified phosphorylation site. (H) The sequence and structure properties of the phosphorylation site. (I) The enlarged view of the sequence and structure properties.

The detailed information in qPhos. (A) Browse function. (B) Simple search function. (C) Advanced search function. (D) The returned search results. (E) The information about the protein. (F) The information about the quantification of the phosphorylation site. (G) The information on potential kinases and their inhibitors for the quantified phosphorylation site. (H) The sequence and structure properties of the phosphorylation site. (I) The enlarged view of the sequence and structure properties. In the results page, the information was organized by the quantification events in a tabular format with UniProt accession, gene name, position, sequence window, sample information, sample type, experimental condition, quantification method, log2-transformed ratio and P-value (Figure 3D). Users could click the plus button to view the detailed information about the protein and phosphorylation sites. The detailed information had four sections, including ‘About experiment’, ‘Potential kinases and their inhibitors’, ‘Sequence and structure’ and ‘About protein’ (Figure 3E–I). The detailed description of the condition, raw quantifications, source reference, experimental method and instrument and raw peptide were shown in the ‘About Experiment’ section (Figure 3E). ‘Potential kinases and their inhibitors’ showed the experimental identified and predicted upstream kinases for the phosphorylation site (Figure 3F). Furthermore, the inhibitors annotated by DrugBank for the kinases were shown (Figure 3F). The sequence and structure properties of the protein were visualized in the ‘Sequence and Structure’ section, which included the quantified phosphorylation sites, activity-related phosphorylation sites if the protein was a kinase, disorder region, secondary structure and surface accessibility (Figure 3G). A magnifier was implemented to show the details by enlarging the selected region (Figure 3G). Furthermore, users could access descriptions about visualization by hovering over the content (Figure 3I). The ‘About Protein’ section presented the protein information including database accessions, protein/gene name/alias, functional descriptions, PTMs and sequences adopted from the UniProt database (Figure 3H). Since autophosphorylation or phosphorylation of the specified segment could activate or inactivate the kinases (39), the identification and quantification of these activity-related phosphorylation sites could indicate the activation status of these kinases. Through exhaustive curation of literature, we collected the experimentally identified activity-related phosphorylation sites for the human kinome. In total, 829 activity-related phosphorylation sites were curated in 272 kinases (Figure 4A), which covered over half of the kinome. Among the activity-related phosphorylation sites, nearly half were autophosphorylation sites. Most autophosphorylation sites could activate the kinases, while seven autophosphorylation sites could inactivate the kinases. Furthermore, most activity-related phosphorylation sites were positively related to the activities of the kinase, while only a small fraction were negatively related (Figure 4A). The distributions of activity-related phosphorylation sites among kinases and kinase families were summarized in Figure 4B and C, respectively. The qKinAct service was developed to query the quantification events of activity-related phosphorylation sites to annotate the kinase activity profile from quantitative phosphoproteome data (Figure 4D). The activity-related phosphorylation sites could be annotated through straightforward submission of the identified phosphorylated peptide and their quantifications (Figure 4E).
Figure 4

The qKinAct service for the analysis of kinase activities. (A) The distribution of different types of activity-related phosphorylation sites in kinases. ‘+’, ‘−’ and ‘auto’ represent positively related, negatively related and autophosphorylation sites. (B) The distribution of kinases with activity-related phosphorylation sites in kinases. (C) The distribution of kinase families with activity-related phosphorylation sites in kinases. (D) The example for submission of quantitative phosphoproteome data. (E) The returned results for the query of kinase activity-related phosphorylation sites. The kinase activity profile for phosphorylation dynamics in nicotine-treated pancreatic stellate cells (F) and TNFα-stimulated phosphorylation dynamics (G).

The qKinAct service for the analysis of kinase activities. (A) The distribution of different types of activity-related phosphorylation sites in kinases. ‘+’, ‘−’ and ‘auto’ represent positively related, negatively related and autophosphorylation sites. (B) The distribution of kinases with activity-related phosphorylation sites in kinases. (C) The distribution of kinase families with activity-related phosphorylation sites in kinases. (D) The example for submission of quantitative phosphoproteome data. (E) The returned results for the query of kinase activity-related phosphorylation sites. The kinase activity profile for phosphorylation dynamics in nicotine-treated pancreatic stellate cells (F) and TNFα-stimulated phosphorylation dynamics (G). Here, we provide two examples to show the application of the qKinAct service for kinase activity analysis. Paulo et al. globally analysed the phosphorylation dynamics in nicotine-treated pancreatic stellate cells and discovered the intensive dysregulation of protein phosphorylation in transcriptional events and nuclear function in response to nicotine (40). The kinase activity-related phosphorylation sites were analysed by qKinAct and shown in Figure 4F. It was observed that the phosphorylation level of activity-related phosphorylation sites was aberrant for kinases such as BRAF, PRKCB (PKCβ), PRKAA2 (AMPK) and EGFR. Interestingly, previous studies have identified the aberrant activation of EGFR (41) and PRKAA2 (42,43) induced by cigarette smoke. Moreover, the qKinAct analysis of TNFα-stimulated phosphorylation dynamics quantified by Mohideen et al. (44) identified ten aberrant activity-related phosphorylation sites for kinases such as PDPK1 (PDK1), GSK3, DYRK1 and RIPK1 (Figure 4G). Among the results, the observed inactivation of RIPK1 by S320 phosphorylation was also validated by their experiments (44). Thus, the qKinAct service could quickly provide helpful kinase activity profiles to elucidate the dynamics of phosphorylation signalling pathways regulated by kinases.

DISCUSSION

Protein phosphorylation occurs frequently over numerous biological processes, such as cell proliferation and differentiation, transcriptional activation and metabolic homeostasis (1–7). As one of the most important PTMs, phosphorylation greatly expands the complexity and diversity of proteome (1), while aberrant phosphorylation disorders the signalling pathways and is intensively correlated with diseases and cancers (2–5). The advancement of phosphoproteomic techniques greatly facilitated the high-throughput profiling and quantification of phosphorylation events and the accumulation of massive phosphoproteome data (9,10). However, although a number of systematic studies have contributed to recording the phosphorylation sites (16–30), their quantitative dynamics were still stored in the literature. qPhos is the first repository to curate phosphorylation dynamics data. Various annotations, including protein information, potential upstream kinases and their inhibitors, and sequence and structure properties, were integrated to annotate the phosphorylation sites. Furthermore, the qKinAct service was developed to dissect the kinase activity profile from user-submitted quantitative phosphoproteome data through direct annotations of activity-related phosphorylation sites for kinases. Meanwhile, there were also various limitations in the database. The database was focused on human health-related phosphorylation dynamics, and currently only data from human tissues and cell lines were collected. However, several quantitative phosphoproteomics studies were based on a mouse model or other model organisms (45), which also provided helpful clues for medical investigations. Furthermore, previous studies showed that single nucleotide polymorphisms (SNPs) and somatic mutations at or around the phosphorylation sites could affect the phosphorylation dynamics (27), and currently qPhos has not linked the phosphorylation sites to the large quantity of SNP and mutation data. Taken together, although improvements remain to be achieved, qPhos could serve as a comprehensive resource to enable researchers to systematically and conveniently access the phosphorylation dynamics data under different experimental conditions, and its qKinAct service could help users to easily analyse the kinase activity profile from their own quantitative phosphoproteome data. The qPhos database will be regularly updated to keep pace with the progress of the quantitative dynamics of phosphorylation.
  45 in total

1.  The origins of protein phosphorylation.

Authors:  Philip Cohen
Journal:  Nat Cell Biol       Date:  2002-05       Impact factor: 28.824

2.  The role of protein phosphorylation in human health and disease. The Sir Hans Krebs Medal Lecture.

Authors:  P Cohen
Journal:  Eur J Biochem       Date:  2001-10

Review 3.  Analysis of protein phosphorylation using mass spectrometry: deciphering the phosphoproteome.

Authors:  Matthias Mann; Shao En Ong; Mads Grønborg; Hanno Steen; Ole N Jensen; Akhilesh Pandey
Journal:  Trends Biotechnol       Date:  2002-06       Impact factor: 19.536

4.  Nobel prize in physiology or medicine. Cycling toward Stockholm.

Authors:  M Balter; G Vogel
Journal:  Science       Date:  2001-10-19       Impact factor: 47.728

5.  Nobel Prize given for work on protein phosphorylation.

Authors:  C Anderson
Journal:  Nature       Date:  1992-10-15       Impact factor: 49.962

Review 6.  Protein kinases--the major drug targets of the twenty-first century?

Authors:  Philip Cohen
Journal:  Nat Rev Drug Discov       Date:  2002-04       Impact factor: 84.694

Review 7.  Regulation of protein kinases; controlling activity through activation segment conformation.

Authors:  Brad Nolen; Susan Taylor; Gourisankar Ghosh
Journal:  Mol Cell       Date:  2004-09-10       Impact factor: 17.970

8.  IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content.

Authors:  Zsuzsanna Dosztányi; Veronika Csizmok; Peter Tompa; István Simon
Journal:  Bioinformatics       Date:  2005-06-14       Impact factor: 6.937

Review 9.  Global and site-specific quantitative phosphoproteomics: principles and applications.

Authors:  Boris Macek; Matthias Mann; Jesper V Olsen
Journal:  Annu Rev Pharmacol Toxicol       Date:  2009       Impact factor: 13.820

10.  GPS 2.0, a tool to predict kinase-specific phosphorylation sites in hierarchy.

Authors:  Yu Xue; Jian Ren; Xinjiao Gao; Changjiang Jin; Longping Wen; Xuebiao Yao
Journal:  Mol Cell Proteomics       Date:  2008-05-06       Impact factor: 5.911

View more
  16 in total

1.  MusiteDeep: a deep-learning based webserver for protein post-translational modification site prediction and visualization.

Authors:  Duolin Wang; Dongpeng Liu; Jiakang Yuchi; Fei He; Yuexu Jiang; Siteng Cai; Jingyi Li; Dong Xu
Journal:  Nucleic Acids Res       Date:  2020-07-02       Impact factor: 16.971

2.  Posttranslational modifications in proteins: resources, tools and prediction methods.

Authors:  Shahin Ramazi; Javad Zahiri
Journal:  Database (Oxford)       Date:  2021-04-07       Impact factor: 3.451

3.  ICOSL expressed in triple-negative breast cancer can induce Foxp3+ Treg cell differentiation and reverse p38 pathway activation.

Authors:  Ning Ma; Tianran Chen; Yingyi Zhang; Longpei Chen; Jie Li; Xiaobo Peng; Yajie Wang; Dongxun Zhou; Bin Wang
Journal:  Am J Cancer Res       Date:  2022-09-15       Impact factor: 5.942

4.  Multiple Site-Specific Phosphorylation of IDPs Monitored by NMR.

Authors:  Manon Julien; Chafiaa Bouguechtouli; Ania Alik; Rania Ghouil; Sophie Zinn-Justin; François-Xavier Theillet
Journal:  Methods Mol Biol       Date:  2020

Review 5.  Human carbonic anhydrases and post-translational modifications: a hidden world possibly affecting protein properties and functions.

Authors:  Anna Di Fiore; Claudiu T Supuran; Andrea Scaloni; Giuseppina De Simone
Journal:  J Enzyme Inhib Med Chem       Date:  2020-12       Impact factor: 5.051

6.  The 26th annual Nucleic Acids Research database issue and Molecular Biology Database Collection.

Authors:  Daniel J Rigden; Xosé M Fernández
Journal:  Nucleic Acids Res       Date:  2019-01-08       Impact factor: 16.971

7.  AMPKα1 confers survival advantage of colorectal cancer cells under metabolic stress by promoting redox balance through the regulation of glutathione reductase phosphorylation.

Authors:  Ying-Nan Wang; Yun-Xin Lu; Jie Liu; Ying Jin; Hui-Chang Bi; Qi Zhao; Ze-Xian Liu; Ying-Qin Li; Jia-Jia Hu; Hui Sheng; Yi-Ming Jiang; Chao Zhang; Feng Tian; Yang Chen; Zhi-Zhong Pan; Gong Chen; Zhao-Lei Zeng; Kai-Yan Liu; Marcia Ogasawara; Jin-Ping Yun; Huai-Qiang Ju; Jian-Xiong Feng; Dan Xie; Song Gao; Wei-Hua Jia; Scott Kopetz; Rui-Hua Xu; Feng Wang
Journal:  Oncogene       Date:  2019-09-17       Impact factor: 9.867

8.  Protein kinase Cα regulates the nucleocytoplasmic shuttling of KRIT1.

Authors:  Elisa De Luca; Andrea Perrelli; Harsha Swamy; Mariapaola Nitti; Mario Passalacqua; Anna Lisa Furfaro; Anna Maria Salzano; Andrea Scaloni; Angela J Glading; Saverio Francesco Retta
Journal:  J Cell Sci       Date:  2021-02-04       Impact factor: 5.285

9.  Regulation of the Phosphoinositide Code by Phosphorylation of Membrane Readers.

Authors:  Troy A Kervin; Michael Overduin
Journal:  Cells       Date:  2021-05-14       Impact factor: 6.600

10.  piNET: a versatile web platform for downstream analysis and visualization of proteomics data.

Authors:  Behrouz Shamsaei; Szymon Chojnacki; Marcin Pilarczyk; Mehdi Najafabadi; Wen Niu; Chuming Chen; Karen Ross; Andrea Matlock; Jeremy Muhlich; Somchai Chutipongtanate; Jie Zheng; John Turner; Dušica Vidović; Jake Jaffe; Michael MacCoss; Cathy Wu; Ajay Pillai; Avi Ma'ayan; Stephan Schürer; Michal Kouril; Mario Medvedovic; Jarek Meller
Journal:  Nucleic Acids Res       Date:  2020-07-02       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.