| Literature DB >> 18940828 |
Thanh Hai Dang1, Koenraad Van Leemput, Alain Verschoren, Kris Laukens.
Abstract
MOTIVATION: Phosphorylation is a crucial post-translational protein modification mechanism with important regulatory functions in biological systems. It is catalyzed by a group of enzymes called kinases, each of which recognizes certain target sites in its substrate proteins. Several authors have built computational models trained from sets of experimentally validated phosphorylation sites to predict these target sites for each given kinase. All of these models suffer from certain limitations, such as the fact that they do not take into account the dependencies between amino acid motifs within protein sequences in a global fashion.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18940828 PMCID: PMC2639296 DOI: 10.1093/bioinformatics/btn546
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
The size of positive and negative datasets for some common protein kinases, obtained from Phospho.ELM version 0707
| Protein kinase | Positive size | Negative size |
|---|---|---|
| Abl(Proto-oncogene tyrosine-protein kinase) | 45 | 1209 |
| ATM (Ataxia telangiectasia mutated) | 55 | 1882 |
| CaM-KII (Calcium/calmodulin-dependent protein kinases) | 50 | 1829 |
| CDK (Cyclin-dependent kinases) | 104 | 1990 |
| CK1 (Casein kinases 1) | 42 | 1051 |
| CK2 (Casein kinases 2) | 226 | 3875 |
| DNA-PK (DNA-dependent protein kinase catalytic subunit) | 20 | 632 |
| EGFR (Epidermal growth factor receptor) | 44 | 823 |
| Fyn (Proto-oncogene tyrosine-protein kinase) | 48 | 1409 |
| GSK-3 (Glycogen synthase kinases 3) | 32 | 866 |
| InsR (Insulin receptor) | 44 | 724 |
| Met (Hepatocyte growth factor receptor) | 13 | 132 |
| mTOR (FK506 binding protein 12-rapamycin associated protein 1) | 13 | 50 |
| PKA (cAMP-dependent protein kinase) | 310 | 8823 |
| PKB (Protein kinases B) | 79 | 3563 |
| PKC (Protein kinase) | 227 | 4428 |
| Src (Proto-oncogene tyrosine-protein kinase) | 141 | 2681 |
| Syk (Tyrosine-protein kinase) | 45 | 680 |
The chemical classes to which the 20 amino acids belong, based on Wong et al. (2007)
| Group name | Amino Acids |
|---|---|
| Sulfur | C, M |
| Aliphatic 1 | A, G, P |
| Aliphatic 2 | I, L, V |
| Acid | D, E |
| Base | H, K, R |
| Aromatic | F, W, Y |
| Amide | N, Q |
| Small hydroxy | S, T |
Fig. 1.Method for transforming an amino acid sequence to a data object of the central amino acid.
Fig. 3.ROC curves of our method for some well-studied kinases, using 10-fold cross-validation (CRPhos). CRF* stands for the equivalent curve for a CRF model learned from both the positive and negative training dataset. For comparison, corresponding performance measures reported in literature are shown: PPSP (Xue et al., 2006), Scansite (Obenauer et al., 2003), NetPhosK (Blom et al., 2004), KinasePhos 1.0 (Huang et al., 2005a), KinasePhos 2.0 (Wong et al., 2007), GPS (Zhou et al., 2004) and PredPhospho (Kim et al., 2004).
Fig. 2.Relation between expected and observed specificity values of obtained predictor. All lines are generated using linear regression.
Fig. 4.Performance of CRPhos with the testing dataset that is created according to the scheme in Wan et al. (2008). The remaining dataset after removing this testing data from Phospho.ELM v.07 was used to train CRPhos. The performance measure of other existing methods, reported by Wan et al. (2008), are shown for comparison.