| Literature DB >> 19383128 |
Pawel Durek1, Christian Schudoma, Wolfram Weckwerth, Joachim Selbig, Dirk Walther.
Abstract
BACKGROUND: Phosphorylation of proteins plays a crucial role in the regulation and activation of metabolic and signaling pathways and constitutes an important target for pharmaceutical intervention. Central to the phosphorylation process is the recognition of specific target sites by protein kinases followed by the covalent attachment of phosphate groups to the amino acids serine, threonine, or tyrosine. The experimental identification as well as computational prediction of phosphorylation sites (P-sites) has proved to be a challenging problem. Computational methods have focused primarily on extracting predictive features from the local, one-dimensional sequence information surrounding phosphorylation sites.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19383128 PMCID: PMC2683816 DOI: 10.1186/1471-2105-10-117
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Comparison of general structural properties associated with phosphorylated (pos.) vs. non-phosphorylated (neg.) residues. Serine: left column, threonine: middle column, tyrosine: right column. Annotations were taken from PDBFINDER [47]. (a) Side chain accessibility to solvent relative to the large possible accessibility for serine. (b) Re-scaled crystallographic B-factors describe the attenuation of x-ray scattering caused by thermal motion or quenched disorder and is applicable measure for local structural rigidity. B-Factors from PDB-structures in the range of [10,40] are mapped to the range [09] by PDBFINDER; 0 signifying rigid structures, 9 – indicating unresolved, rather flexible structural regions. (c) DSSP secondary structure association. B = residue in isolated beta-bridge; C = Loop, irregular stretches; E = extended strand, participates in beta ladder; G = 3-helix (3/10 helix); H = alpha helix; S = bend; T = hydrogen bonded turn.
Structural features of phosphorylation sites
| Accessibility | 4.25 | 3.70 | 1.32 E-03 | 1.93 E-03 | |
| B-Factor | 5.65 | 4.93 | 7.40 E-04 | 2.79 E-04 | |
| Accessibility | 3.92 | 3.57 | 1.49 E-1 | 2.07 E-01 | |
| B-Factor | 5.30 | 4.76 | 1.11 E-1 | 8.26 E-02 | |
| Accessibility | 2.98 | 2.36 | 7.33 E-07 | 2.52E-06 | |
| B-Factor | 5.56 | 5.15 | 6.96 E-02 | 2.68E-02 | |
Statistics for significance of the observed differences of solvent accessibility and crystallographic B-factor of phosphorylated (pos) vs. non-phosphorylated (neg) for serine, threonine and tyrosine sites.
Figure 2Sequence logos and radial cumulative propensity plots (RCP-plots). Sequence logos and radial cumulative propensity plots (RCP-plots) illustrating enrichment as well as depletion of particular amino acid types in the local sequence (sequence logo), sequence-local spatial environment including the 6 flanking amino acid residues on either side of the central serine/threonine/tyrosine, (left RCP-plot), spatially-local, but non-sequence local; i.e., excluding residues in the flanking sequence (middle plot), and combined information (right-most RCP-plot). For every amino acid type, the two different sub-sectors correspond to the statistics obtained by using the closest detected atom and the interaction center, respectively, and in clockwise order.
Figure 3Phylogenetic tree of serine-kinase groups. Phylogenetic tree of serine-kinase groups whose targets can be found in the protein structure database (PDB) according to the original Hanks and Hunter classification scheme [45] and associated sequence logos [28]. Kinases with high similarity tend to share similar targets. The major classes of kinase targets are characterized by a proline and glutamate next to the central serine, CMGC group I, II, II and respectively ATM, a group with preferentially negatively charged amino acid residues, CMGC IV and AGC IV, and a large group of targets with an arginine and lysine at the second or third position relative to the central serine, CaMK-Group and AGC-Group except the AGC IV sub family. For kinase families PKA, PKC, as well as CKII and MAPK most targets with resolved structure were available and were used for kinase family-specific predictors in this study.
Figure 4Sequence logos and radial cumulative propensity plots (RCP-plots) of kinase-specific sequence motifs. Sequence logos and radial cumulative propensity plots (RCP-plots) of kinase specific sequence motifs, illustrating enrichment as well as depletion of particular amino acid types in the local sequence (sequence logo), sequence-local spatial environment including the 6 flanking amino acid residues on either side of the central serine/threonine/tyrosine, (left RCP-plot), spatially-local, but non-sequence local; i.e., excluding residues in the flanking sequence (middle plot), and combined information (right RCP-plot). For every amino acid type, the two different sub-sectors correspond to the statistics obtained by using the closest detected atom and the interaction center, respectively, and in clockwise order.
Prediction performance as measured by the AUC
| Ser kinases | 363 | 0.74 ± 0.02 | 0.69 ± 0.02 | 0.73 ± 0.05 | 0.63 ± 0.05 | ||
| PKA | 34 | 0.91 ± 0.04 | 0.91 ± 0.03 | ||||
| PKC | 31 | 0.83 ± 0.05 | 0.78 ± 0.05 | ||||
| MAPK | 12 | 0.89 ± 0.07 | 0.78 ± 0.09 | ||||
| CKII | 19 | 0.73 ± 0.07 | 0.76 ± 0.07 | ||||
| Thr kinases | 134 | 0.72 ± 0.03 | 0.66 ± 0.03 | 0.72 ± 0.06 | 0.66 ± 0.05 | ||
| Tyr kinases | / | 0.69 ± 0.02 | 0.65 ± 0.02 | 0.56 ± 0.06 | 0.54 ± 0.05 | ||
| SRC | 24 | 0.72 ± 0.07 | 0.62 ± 0.07 | ||||
| unspecific predictor | 750 | 0.71 ± 0.01 | 0.67 ± 0.01 | 0.68 ± 0.03 | 0.63 ± 0.03 | ||
Results from the cross-validation of the various prediction approaches. The sequence-only and Spatial-information enriched methods were developed as part of this study and compared to NetPhos 3.1b that includes the kinase-specific predictor NetPhos/K, DisPhos1.3 and KinasePhos2.0. As KinasePhos reports only decision values of positively predicted sites, the evaluation of kinase specific prediction was not possible due to missing score values for sites not predicted to be phosphorylated. However, the kinase-specific predictions were feasible as KinasePhos essentially reports all submitted sites as being phosphorylated by at least one kinase. For the evaluation of the predictor, the highest reported decision value was used for each site. Best performing methods are printed in bold-face.
Prediction Performance as measured by accuracy, sensitivity (sn), and specificity (sp)
| Ser kinases | / | 363 | 0.69 ± 0.01 | 0.64 ± 0.01 | 0.68 ± 0.01 | 0.50 ± 0.00 | |
| PKA | 34 | 0.83 ± 0.03 | 0.82 ± 0.02 | 0.71 ± 0.03 | |||
| PKC | 31 | 0.72 ± 0.02 | 0.64 ± 0.03 | ||||
| MAPK | 12 | 0.69 ± 0.02 | 0.61 ± 0.05 | ||||
| CKII | 19 | 0.70 ± 0.03 | 0.62 ± 0.03 | ||||
| Thr kinases | / | 134 | 0.68 ± 0.01 | 0.63 ± 0.01 | 0.66 ± 0.03 | 0.50 ± 0.00 | |
| Tyr kinases | / | 0.65 ± 0.01 | 0.62 ± 0.01 sn:0.54 ± 0.00 | 0.53 ± 0.02 | 0.50 ± 0.00 | ||
| SRC | 24 | 0.70 ± 0.03 | 0.57 ± 0.01 | 0.70 ± 0.04 | |||
| unspecific predictor | 750 | 0.66 ± 0.01 | 0.63 ± 0.01 | 0.62 ± 0.01 | 0.50 ± 0.00 | ||
Results from the cross-validation of the various prediction approaches. The sequence-only and Spatial-information enriched methods were developed as part of this study and compared to to NetPhos 3.1b that includes the kinase-specific predictor NetPhos/K, Disphos1.3 and KinasePhos2.0. The size of the negative set was adjusted to the size of the positive sites, ensuring equal sizes of the sets and a comparison to original reports of accuracies of alternative prediction approaches. In the case of the kinase unspecific prediction of KinasePhos2.0, all sites were predicted to be phosphorylated by at least one kinase. Best performing methods are printed in bold-face. Sn denotes sensitivity, while sp denotes the specificity for the stated accuracy.
Predicted ratios of sites in loop regions
| All | 86% | 66% (53%) | 53% (45%) |
| Ser | 88% | 68% (57%) | 55% (54%) |
| Thr | 93% | 72% (64%) | 53% (47%) |
| Tyr | 75% | 60% (41%) | 51% (34%) |
Predicted ratios of sites in loop regions as judged by prediction by DisEmbl 1.5 [37]. Percent of sites in loop regions according to the annotation of secondary structure of B, C, S, and T by DSSP is given in brackets.