Literature DB >> 16689688

Characterizing the microenvironment surrounding phosphorylated protein sites.

Shi Cai Fan1, Xue Gong Zhang.   

Abstract

Protein phosphorylation plays an important role in various cellular processes. Due to its high complexity, the mechanism needs to be further studied. In the last few years, many methods have been contributed to this field, but almost all of them investigated the mechanism based on protein sequences around protein sites. In this study, we implement an exploration by characterizing the microenvironment surrounding phosphorylated protein sites with a modified shell model, and obtain some significant properties by the rank-sum test, such as the lack of some classes of residues, atoms, and secondary structures. Furthermore, we find that the depletion of some properties affects protein phosphorylation remarkably. Our results suggest that it is a meaningful direction to explore the mechanism of protein phosphorylation from microenvironment and we expect further findings along with the increasing size of phosphorylation and protein structure data.

Entities:  

Mesh:

Substances:

Year:  2005        PMID: 16689688      PMCID: PMC5172553          DOI: 10.1016/s1672-0229(05)03029-9

Source DB:  PubMed          Journal:  Genomics Proteomics Bioinformatics        ISSN: 1672-0229            Impact factor:   7.691


Introduction

Protein phosphorylation is a ubiquitous posttranslational modification occurring in either the cytosol or the nucleus of the cell, which is involved in many fundamental cellular processes, such as metabolism (, apoptosis (, cell signaling, and cellular proliferation (. It is estimated that about 30%–50% of eukaryotic proteins undergo phosphorylation (. Therefore, to investigate the mechanism of protein phosphorylation will be fairly useful to understand various protein functions and signal transduction pathways. Biochemically, protein phosphorylation includes a transfer of a moiety of phosphate from adenosine triphosphate to the hydroxyl of acceptor residue, regulated by protein kinases (. There are mainly three acceptor amino acids, namely serine (S), threonine (T), and tyrosine (Y), and many kinases could recognize substrates of both S and T sites (. Although the discovery of protein phosphorylation can be ascended to the fifties of 20th century, its mechanism still needs to be further studied due to its high complexity. In the early days, the investigation was carried out in experimental methods, which were accurate but hard and expensive. Then several computational methods were contributed to the field, including neural network (, C4.5 (, support vector machine (SVM; ref. ), etc., all of which were proposed to explore protein phosphorylation based on sequences around phosphorylated sites. As we know, protein phosphorylation is a process that several molecules interact with each other in the space, and positional correlation in the sequence standpoint may not reflect the truth. For example, amino acids neighboring in the space may be distant in sequence interval. Consequently, the conclusions extracted from protein sequences may be not completely reliable. In order to investigate the phosphorylation mechanism more directly, we propose to research from the microenvironment around phosphorylated and non-phosphorylated sites. As Altman’s shell model 9., 10. that accumulates the property distribution of each shell around a site was not applicable to our problem, we adopted a modified shell model that accumulated the spatial distribution of 80 biophysical and biochemical properties around a site at a distance range of 2–16 Å as a whole. As a result, we obtained some significant properties in the specified region by using the rank-sum test, such as the lack of some classes of residues, atoms, and secondary structures. Among all the properties, some are consistent with the findings based on protein sequences, some are new, while others are somewhat different. We suspect that the depletion of some properties around sites may be more important than the enrichment. In a word, our method provides a new direction to investigate protein phosphorylation and we expect further findings when the size of phosphorylation and protein structure data becomes larger.

Results

After obtaining the structure data of positive and negative samples (sites and non-sites), we accumulated the spatial distribution of 80 biophysical and biochemical properties with respect to S, T, and Y sites by using our modified shell model, and adopted a standard nonparametric test of significance (the Mann-Whitney rank-sum test; ref. ) to compare the distribution. We listed in Table 1 the ten most significantly differential distributed properties for S, T, and Y sites, respectively, that is, the ten properties with the lowest p-value (p-value < 0.05), which were defined as our candidate properties.
Table 1

Significant Properties of Serine (S), Threonine (T), and Tyrosine (Y) Sites

SitePropertyp-valueFrequency in randomicity testFrequency in sensitivity testSignificant status
Serine (S)
Residue-name-is-Ile*0.0005834999low
Ring-system*0.0010237931low
Mobility*0.0011824997low
Residue-name-is-Phe*0.0012941944low
Atom-name-is-C*0.0015628996low
Residue-class2-is-basic*0.0023946921low
Atom-type-is-CT0.0039935716low
Atom-name-is-N0.0041252806low
Vdw-volume0.0041734777low
Partial-charge0.0058250328high

Threonine (T)
Residue-class1-is-hydrophobic*0.00012311,000low
Residue-name-is-Val*0.0001338994low
Residue-class2-is-nonpolar*0.0002628973low
Atom-name-is-C*0.00037261,000low
Atom-type-is-CT*0.0004033998low
VDW-volume0.0006054920low
Atom-type-is-N0.0006029779low
Amide0.0007055819low
Atom-name-is-any0.0007022598low
Atom-type-is-O0.0007030410low

Tyrosine (Y)Secondary-structure2-is-beta*0.012411,000low
Charge*0.01334990high
Residue-name-is-Cys*0.01744977low
Residue-name-is-Pro*0.017461,000low
Atom-type-is-N*0.019371,000low
Residue-class1-is-hydrophobic*0.02343967low
Residue-name-is-Asp0.02347580high
Residue-name-is-Leu0.02552828low
Secondary-structure1-is-strand0.02941689low
Atom-type-is-O0.03047310high

Significant properties.

In order to test whether these properties appeared randomly, we repeated 1,000 times of permutation on the samples followed by the rank-sum test, and counted the frequency of each candidate property (Table 1). From the result, we can see that most of the candidate properties appeared less than 50 times, and such properties are unlikely to be random. We continued to test whether these properties were sensitive to the negative sample size, for the number of non-sites was far more than that of the sites. By clustering the negative samples to 4–5 times as many as positive samples for 1,000 times, we obtained 1,000 groups of samples; each was composed of a positive sample and one of the 1,000 negative samples. Then we implemented the rank-sum test on the 1,000 groups of data, respectively, and counted the frequency of each candidate property (Table 1). From the result, we can see that some properties are very stable, and we will ignore the unstable properties in the end. Considering both the randomicity and sensitivity tests, we selected the properties that appeared less than 50 times in the randomicity test and more than 900 times in the sensitivity test to characterize the microenvironment of S, T, and Y sites, and marked them in Table 1. We can see that there are 6, 5, and 6 significant properties to S, T, and Y sites, respectively.

Discussion

Based on these significant properties, we have the following conclusions: (1) Around S sites, it is deficient in residues Ile and Phe, which are characterized as nonpolar and hydrophobic that are consistent with another two properties, ring-system and residue-class2-is-basic. In addition, mobility and atom-name-is-C are also deficient around S sites. (2) Around T sites, the lack of residue Val is correlated with the depletion of hydrophobic and nonpolar; meanwhile, atom-type-is-CT and atom-name-is-C are underrepresented, and VDW-volume is also much smaller. From such results, we can see that there are some similar features between S and T sites, identical with the report that many protein kinases can recognize substrates of both S and T sites (. (3) Around Y sites, there is only one enriched property, namely charge, among all the significant properties. Meanwhile, it is notably lack of two residues, Cys and Pro, which are both neutral. Furthermore, hydrophobic, the secondary structure of beta, and atom-type-is-N are significantly absent around Y sites. Among all the properties we investigated, some are the same with the findings based on the protein sequences around sites, such as the depletion of Ile, Phe, and Val that are nonpolor and hydrophobic, the depletion of Cys and Pro that are neutral, and the enrichment of charge (; some are new findings, such as the lack of atom-name-is-C/N, atom-type-is-CT/N, and the smaller VDW-volume; others are somewhat different, and the most difference is that our conclusion tends to highlight the depletion of some properties rather than the enrichment emphasized in methods based on protein sequences, for nearly all of the properties we extracted are significantly low by using the rank-sum test. Therefore, we propose that the depletion of some properties around sites has much more effect on protein phosphorylation. In general, these biophysical and biochemical properties provide an insight that may be functionally related with the mechanism of protein phosphorylation, and such analysis may be important for protein engineering application. In addition, after obtaining these significant properties, we implemented site prediction using SVMs (, and the prediction accuracy of S, T, and Y sites were all more than 80% by cross-validation. Although our accuracy is not higher than that of sequence-based methods (, our sample size is much smaller. We expect that it will be significative to explore the mechanism of protein phosphorylation in the view of microenvironment along with the increasing size of phosphorylation and protein structure data.

Materials and Methods

Datasets

We obtained the dataset of phosphorylation sites from Phospho.ELM (version 3.0; ref. ) containing protein sequences, phosphorylated sites (S, T, and Y), and corresponding protein kinases. All of them have been validated by experiment. As there were no exact negative samples, we extracted S, T, and Y of the same proteins reported in Phospho.ELM without annotating them as phosphorylated sites to be our negative samples. In addition, we obtained the dataset of protein structures from the Protein Data Bank (PDB; ref. ). For the protein that has more than one record in PDB, we selected the one with the highest resolution. Only the sites reported in both databases were extracted as our samples. The sample sizes of S, T, and Y are shown in Table 2.
Table 2

Sample Data of S, T, and Y Reported in Both Phospho.ELM and PDB Databases

Amino acid residuePositive sample sizeNegative sample size
Serine (S)42433
Threonine (T)19232
Tyrosine (Y)39203

Properties

We used 80 biophysical and biochemical properties to characterize the microenvironment around S, T, and Y sites as listed below: Atom-based properties: atom type, hydrophobicity, charge, and charge-with-His. Chemical group-based properties: hydroxyl, amide, amine, carbonyl, ring-system, and peptide. Residue-based properties: residue type, hydrophobicity classifications 1 and 2. Secondary structure-based properties: secondary structure classifications 1 and 2. Other properties: VDW-volume, B-factor, mobility, and solvent accessibility. These properties include almost all the properties within a biomolecular structure and have been used to characterize the microenvironment of other sites 9., 16..

Model

Altman and his colleagues 9., 10. adopted a shell model to characterize the microenvironment surrounding calcium binding sites, whose main idea was to accumulate the property distribution of each shell around a site (Figure 1A). In this study, we applied a modified shell model on our data, which accumulated the property distribution around a site at a distance range of 2–16 Å as a whole (Figure 1B). Our reason is as follows: when we applied the original shell model on our data, the significant properties in each shell extracted by the rank-sum test were sensitive to the negative sample size, and we ascribed it to that the sites were not affected by the properties in each shell separately but the overall properties of all the shells. Therefore, we modified the model and extended the distance from 2 Å to 16 Å for most significant properties between sites and non-sites concentrated in such regions.
Fig. 1

A. Altman’s shell model that accumulates the property distribution of each shell around a site. B. The modified model that accumulates the property distribution around a site at a distance range of 2–16 Å as a whole.

Randomicity test

In order to test whether these significant properties appeared randomly, we implemented 1,000 times of permutation on the positive and negative samples. The permutation process was as follows: on the assumption that there were N positive and N negative samples, we first mixed the positive and negative samples together (N + N samples in all), then drew N positive and N negative samples randomly. After repeating the permutation for 1,000 times, we obtained 1,000 groups of data.

Sensitivity test

In order to test whether these significant properties were stable, we clustered the number of negative samples to 4–5 times as many as that of positive samples using the BLOSUM62 matrix ( by setting proper similarity cutoff, and selected one sample from each group to form negative samples. As there is inevitably a certain randomicity when selecting the first sequence to begin clustering, the result of each clustering may be different to some extent, so we repeated the process for 1,000 times and obtained 1,000 groups of negative samples.
  12 in total

1.  The Protein Data Bank.

Authors:  H M Berman; J Westbrook; Z Feng; G Gilliland; T N Bhat; H Weissig; I N Shindyalov; P E Bourne
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

2.  Sequence and structure-based prediction of eukaryotic protein phosphorylation sites.

Authors:  N Blom; S Gammeltoft; S Brunak
Journal:  J Mol Biol       Date:  1999-12-17       Impact factor: 5.469

3.  WebFEATURE: An interactive web tool for identifying and visualizing functional sites on macromolecular structures.

Authors:  Mike P Liang; D Rey Banatao; Teri E Klein; Douglas L Brutlag; Russ B Altman
Journal:  Nucleic Acids Res       Date:  2003-07-01       Impact factor: 16.971

4.  Amino acid substitution matrices from protein blocks.

Authors:  S Henikoff; J G Henikoff
Journal:  Proc Natl Acad Sci U S A       Date:  1992-11-15       Impact factor: 11.205

5.  Prediction of phosphorylation sites using SVMs.

Authors:  Jong Hun Kim; Juyoung Lee; Bermseok Oh; Kuchan Kimm; Insong Koh
Journal:  Bioinformatics       Date:  2004-07-01       Impact factor: 6.937

6.  Reduced bio basis function neural network for identification of protein phosphorylation sites: comparison with pattern recognition algorithms.

Authors:  Emily A Berry; Andrew R Dalby; Zheng Rong Yang
Journal:  Comput Biol Chem       Date:  2004-02       Impact factor: 2.877

7.  Recognizing protein binding sites using statistical descriptions of their 3D environments.

Authors:  L Wei; R B Altman
Journal:  Pac Symp Biocomput       Date:  1998

8.  Characterizing the microenvironment surrounding protein sites.

Authors:  S C Bagley; R B Altman
Journal:  Protein Sci       Date:  1995-04       Impact factor: 6.725

Review 9.  Protein kinases and phosphatases that act on histidine, lysine, or arginine residues in eukaryotic proteins: a possible regulator of the mitogen-activated protein kinase cascade.

Authors:  H R Matthews
Journal:  Pharmacol Ther       Date:  1995       Impact factor: 12.310

10.  The importance of intrinsic disorder for protein phosphorylation.

Authors:  Lilia M Iakoucheva; Predrag Radivojac; Celeste J Brown; Timothy R O'Connor; Jason G Sikes; Zoran Obradovic; A Keith Dunker
Journal:  Nucleic Acids Res       Date:  2004-02-11       Impact factor: 16.971

View more
  3 in total

1.  Identifying protein kinase target preferences using mass spectrometry.

Authors:  Jacqueline Douglass; Ruwan Gunaratne; Davis Bradford; Fahad Saeed; Jason D Hoffert; Peter J Steinbach; Mark A Knepper; Trairak Pisitkun
Journal:  Am J Physiol Cell Physiol       Date:  2012-06-20       Impact factor: 4.249

2.  Phospho3D 2.0: an enhanced database of three-dimensional structures of phosphorylation sites.

Authors:  Andreas Zanzoni; Daniel Carbajo; Francesca Diella; Pier Federico Gherardini; Anna Tramontano; Manuela Helmer-Citterich; Allegra Via
Journal:  Nucleic Acids Res       Date:  2010-10-21       Impact factor: 16.971

3.  Detection and characterization of 3D-signature phosphorylation site motifs and their contribution towards improved phosphorylation site prediction in proteins.

Authors:  Pawel Durek; Christian Schudoma; Wolfram Weckwerth; Joachim Selbig; Dirk Walther
Journal:  BMC Bioinformatics       Date:  2009-04-21       Impact factor: 3.169

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.