| Literature DB >> 34320631 |
Hangyuan Yang1, Minghui Wang1,2, Xia Liu1, Xing-Ming Zhao3,4,5, Ao Li1,2.
Abstract
MOTIVATION: Phosphorylation is one of the most studied post-translational modifications, which plays a pivotal role in various cellular processes. Recently, deep learning methods have achieved great success in prediction of phosphorylation sites, but most of them are based on convolutional neural network that may not capture enough information about long-range dependencies between residues in a protein sequence. In addition, existing deep learning methods only make use of sequence information for predicting phosphorylation sites, and it is highly desirable to develop a deep learning architecture that can combine heterogeneous sequence and protein-protein interaction (PPI) information for more accurate phosphorylation site prediction.Entities:
Year: 2021 PMID: 34320631 PMCID: PMC8665744 DOI: 10.1093/bioinformatics/btab551
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.The integrated deep learning architecture of PhosIDN
Fig. 2.ROC curves of PhosIDNSeq for different window sizes on S/T and Y sites
AUC values (%) of PhosIDN with sequence information for kinase-specific phosphorylation site prediction
| Kinase | DCCNN | PhosIDNSeq |
|---|---|---|
| Group | ||
| AGC | 87.0 |
|
| Atypical | 82.3 |
|
| CAMK | 89.4 |
|
| CMGC | 89.8 |
|
| TK | 80.4 |
|
| Family | ||
| CDK | 94.1 |
|
| CK2 | 92.4 |
|
| MAPK | 94.5 |
|
| PKC | 83.7 |
|
| Src | 81.2 |
|
Note: Best performance values are highlighted in bold.
AUC values (%) of PhosIDN with both sequence and PPI information for kinase-specific phosphorylation site prediction
| Kinase | Baseline | IFENet* | HFCNet* | PhosIDN |
|---|---|---|---|---|
| Group | ||||
| AGC | 89.7 | 91.5 | 90.9 |
|
| Atypical | 84.7 | 87.8 | 87.1 |
|
| CAMK | 92.0 | 94.2 | 93.9 |
|
| CMGC | 93.0 | 93.7 | 94.3 |
|
| TK | 88.5 | 91.0 | 89.5 |
|
| Family | ||||
| CDK | 97.0 | 97.6 | 97.4 |
|
| CK2 | 95.5 | 96.0 | 95.5 |
|
| MAPK | 96.1 | 96.6 | 96.7 |
|
| PKC | 88.7 | 90.6 | 89.1 |
|
| Src | 86.9 | 88.8 | 89.0 |
|
Baseline, direct concatenation of PPI embedding and the output of SFENet followed by one fully connected layer; IFENet*, direct concatenation of the outputs of SFENet and IFENet followed by one fully connected layer; HFCNet*, combination of PPI embedding and the output of SFENet via HFCNet; PhosIDN, our proposed integrated deep neural network. Best performance values are highlighted in bold.
The values (%) of Sn, Acc, MCC, Pre and F1 of PhosIDN for kinase-specific phosphorylation site prediction at medium and high stringency levels
| Kinase | Method | Sp = 90% | Sp = 95% | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Sn | Acc | Mcc | Pre | F1 | Sn | Acc | Mcc | Pre | F1 | ||
| Group AGC | Baseline | 68.1 | 78.6 | 59.2 | 88.3 | 76.9 | 58.4 | 75.7 | 56.6 | 92.7 | 71.6 |
| IFENet* | 75.6 | 82.5 | 66.0 | 89.4 | 81.9 | 65.0 | 79.2 | 62.2 | 93.4 | 76.7 | |
| HFCNet* | 72.2 | 80.7 | 62.9 | 88.9 | 79.7 | 59.7 | 77.9 | 57.6 | 92.9 | 72.7 | |
| PhosIDN |
|
|
|
|
|
|
|
|
|
| |
| Group Atypical | Baseline | 54.2 | 72.0 | 47.2 | 84.2 | 66.0 | 28.0 | 61.4 | 30.8 | 84.6 | 42.0 |
| IFENet* | 57.6 | 73.7 | 50.1 | 85.0 | 68.7 | 45.8 | 70.3 | 46.7 | 90.0 | 60.7 | |
| HFCNet* | 54.2 | 72.0 | 47.2 | 84.2 | 66.0 | 40.7 | 67.8 | 42.4 | 88.9 | 55.8 | |
| PhosIDN |
|
|
|
|
|
|
|
|
|
| |
| Group CAMK | Baseline | 71.7 | 80.1 | 62.1 | 89.9 | 79.7 | 62.4 | 77.2 | 59.5 | 93.9 | 75.0 |
| IFENet* | 83.8 | 86.7 | 73.7 | 91.2 | 87.3 | 68.8 | 80.7 | 64.9 | 94.4 | 79.6 | |
| HFCNet* | 78.6 | 84.1 | 69.1 | 90.7 | 83.7 | 65.7 | 79.8 | 61.1 | 94.0 | 77.7 | |
| PhosIDN |
|
|
|
|
|
|
|
|
|
| |
Note: Best performance values are highlighted in bold.
AUC values (%) of different methods for kinase-specific phosphorylation site prediction
| Kinase | GPS | PPSP | MusiteDeep | DeepPhos | PhosphoPredict | PhosIDNSeq | PhosIDN |
|---|---|---|---|---|---|---|---|
| Group | |||||||
| AGC | 56.5 | 78.0 | – | 88.4 | – | 89.1 |
|
| Atypical | 76.5 | 64.4 | – | 83.2 | – | 84.2 |
|
| CAMK | 70.6 | 71.3 | – | 90.9 | – | 91.6 |
|
| CMGC | 83.2 | 82.1 | – | 91.9 | – | 92.6 |
|
| TK | 60.4 | 70.6 | – | 82.0 | – | 83.6 |
|
| Family | |||||||
| CDK | 90.5 | 86.1 | 93.0 | 96.0 | 96.6 | 97.0 |
|
| CK2 | 84.1 | 84.7 | 92.5 | 93.7 | 96.3 | 95.4 |
|
| MAPK | 92.1 | 84.4 | 93.4 | 95.4 | 95.5 | 95.4 |
|
| PKC | 66.2 | 76.1 | 80.5 | 84.2 | 90.2 | 86.4 |
|
| Src | 70.2 | 68.8 | – | 83.0 | – | 83.6 |
|
Note: Best performance values are highlighted in bold.
The values (%) of Sn, Acc, MCC, Pre and F1 of different methods for kinase-specific phosphorylation site prediction at high stringency level
| Kinase | Method | Sn | Acc | MCC | Pre | F1 |
|---|---|---|---|---|---|---|
| Group AGC | GPS | 5.9 | 48.3 | 1.8 | 56.1 | 10.7 |
| PPSP | 32.9 | 62.4 | 34.9 | 87.7 | 47.9 | |
| DeepPhos | 50.9 | 71.8 | 50.4 | 91.7 | 65.5 | |
| PhosIDNSeq | 54.2 | 73.6 | 53.1 | 92.1 | 68.3 | |
| PhosIDN |
|
|
|
|
| |
| Group Atypical | GPS | 16.1 | 55.5 | 17.9 | 76.0 | 26.6 |
| PPSP | 15.3 | 55.1 | 16.8 | 75.0 | 25.4 | |
| DeepPhos | 36.4 | 65.7 | 38.7 | 87.8 | 51.5 | |
| PhosIDNSeq | 39.8 | 67.4 | 41.6 | 88.7 | 55.0 | |
| PhosIDN |
|
|
|
|
|
Note: Best performance values are highlighted in bold.
Fig. 3.Visualization of original one-hot encoding features, sequence features extracted by PhosIDNSeq and combined features extracted by PhosIDN. The red dot represents the phosphorylation sites with kinase annotation belonging to (a) group Atypical or (b) group CAMK, the blue dot represents the non-phosphorylation sites