| Literature DB >> 35310161 |
Xia Liu1, Minghui Wang1, Ao Li1.
Abstract
Human DNA sequencing has revealed numerous single nucleotide variants associated with complex diseases. Researchers have shown that these variants have potential effects on protein function, one of which is to disrupt protein phosphorylation. Based on conventional machine learning algorithms, several computational methods for predicting phospho-variants have been developed, but their performance still leaves considerable room for improvement. In recent years, deep learning has been successfully applied in biological sequence analysis with its efficient sequence pattern learning ability, which provides a powerful tool for improving phospho-variant prediction based on protein sequence information. In the study, we present PhosVarDeep, a novel unified deep-learning framework for phospho-variant prediction. PhosVarDeep takes reference and variant sequences as inputs and adopts a Siamese-like CNN architecture containing two identical subnetworks and a prediction module. In each subnetwork, general phosphorylation sequence features are extracted by a pre-trained sequence feature encoding network and then fed into a CNN module for capturing variant-aware phosphorylation sequence features. After that, a prediction module is introduced to integrate the outputs of the two subnetworks and generate the prediction results of phospho-variants. Comprehensive experimental results on phospho-variant data demonstrates that our method significantly improves the prediction performance of phospho-variants and compares favorably with existing conventional machine learning methods. ©2022 Liu et al.Entities:
Keywords: Deep learning; Prediction; Sequential
Year: 2022 PMID: 35310161 PMCID: PMC8929166 DOI: 10.7717/peerj.12847
Source DB: PubMed Journal: PeerJ ISSN: 2167-8359 Impact factor: 2.984
The sizes of positive and negative sets with respect to different phosphorylation site types.
| S/T sites | Y sites | |
|---|---|---|
| Positive set | 763 | 440 |
| Negative set1 | 5796 | 2372 |
| Negative set2 | 2285 | 715 |
| Negative set3 | 17204 | 9901 |
Figure 1Illustration of the proposed PhosVarDeep framework.
Details of CNN module and prediction module.
| Layers | Details | |
|---|---|---|
| Multi-layer CNN | Conventional layer(+ReLU) | 32 filters |
| Conventional layer(+ReLU) | 64 filters | |
| Conventional layer(+ReLU) | 128 filters | |
| Dropout | P = 0.3 | |
| MaxPooling | pool_size = 2 | |
| Multi-layer DNN | Fully connected layer(+ReLU) | 128 neurons |
| Fully connected layer(+ReLU) | 64 neurons | |
| Fully connected layer(+ReLU) | 32 neurons | |
| Dropout | P = 0.3 | |
| Output layer | Fully connected layer (+softmax) | 2 neurons |
Figure 2ROC curves and AUC values of DeepPhos and Musitedeep for general phosphorylation site prediction on S/T and Y sites.
AUC values of PhosVarDeep for phospho-variant prediction.
| Method | Test set1 | Test set2 | Test set3 | |||
|---|---|---|---|---|---|---|
| S/T sites | Y sites | S/T sites | Y sites | S/T sites | Y sites | |
| PhosFEN* | 0.845 | 0.827 | 0.915 | 0.898 | 0.719 | 0.661 |
| PM* | 0.909 | 0.878 | 0.930 | 0.917 | 0.848 | 0.812 |
| PhosVarDeep |
|
|
|
|
|
|
Notes.
Best performance values are highlighted in bold.
The values (%) of Sn, Acc, MCC, Pre and F1 of PhosVarDeep on S/T sites.
| Method | Sp = 90% | Sp = 95% | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Sn | Acc | Mcc | Pre | F1 | Sn | Acc | Mcc | Pre | F1 | ||
| Test set1 | PhosFEN* | 78.3 | 84.2 | 68.9 | 88.8 | 83.2 | 58.6 | 76.6 | 57.2 | 91.8 | 71.5 |
| PM* | 83.6 | 86.8 | 73.8 | 89.4 | 86.4 | 70.4 | 82.6 | 67.2 | 93.0 | 80.1 | |
| PhosVarDeep |
|
|
|
|
|
|
|
|
|
| |
| Test set2 | PhosFEN* | 81.6 | 85.9 | 72.0 | 89.2 | 85.2 | 78.3 | 86.5 | 74.0 | 93.7 | 85.3 |
| PM* | 88.8 | 89.5 | 79.0 | 90.0 | 89.4 | 84.2 | 89.5 | 79.4 | 94.1 | 88.9 | |
| PhosVarDeep |
|
|
|
|
|
|
|
|
|
| |
| Test set3 | PhosFEN* | 54.6 | 72.4 | 47.9 | 84.7 | 66.4 | 42.1 | 68.4 | 43.3 | 88.9 | 57.1 |
| PM* | 65.1 | 77.6 | 57.1 | 86.8 | 74.4 | 47.4 | 71.1 | 47.8 | 90.0 | 62.1 | |
| PhosVarDeep |
|
|
|
|
|
|
|
|
|
| |
The values (%) of Sn, Acc, MCC, Pre and F1 of PhosVarDeep on Y sites.
| Method | Sp = 90% | Sp = 95% | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Sn | Acc | Mcc | Pre | F1 | Sn | Acc | Mcc | Pre | F1 | ||
| Test set1 | PhosFEN* | 60.2 | 75.0 | 52.3 | 85.5 | 70.7 | 56.8 | 76.1 | 56.7 | 92.6 | 70.4 |
| PM* | 70.5 | 80.1 | 61.4 | 87.3 | 78.0 | 67.0 | 81.3 | 65.2 | 93.7 | 78.1 | |
| PhosVarDeep |
|
|
|
|
|
|
|
|
|
| |
| Test set2 | PhosFEN* | 71.6 | 80.7 | 62.4 | 87.5 | 78.8 | 64.8 | 80.1 | 63.3 | 93.4 | 76.5 |
| PM* | 81.8 | 86.4 | 73.0 | 90.0 | 85.7 | 71.6 | 83.5 | 69.0 | 94.0 | 81.3 | |
| PhosVarDeep |
|
|
|
|
|
|
|
|
|
| |
| Test set3 | PhosFEN* | 52.3 | 71.0 | 45.4 | 83.6 | 64.3 | 15.9 | 55.7 | 18.8 | 77.8 | 26.4 |
| PM* | 58.0 | 73.9 | 50.3 | 85.0 | 68.9 | 27.3 | 61.4 | 31.1 | 85.7 | 41.4 | |
| PhosVarDeep |
|
|
|
|
|
|
|
|
|
| |
AUC values of different methods for phospho-variant prediction.
| Method | Test set1 | Test set2 | Test set3 | |||
|---|---|---|---|---|---|---|
| S/T sites | Y sites | S/T sites | Y sites | S/T sites | Y sites | |
| MIMP | 0.797 | 0.725 | 0.830 | 0.737 | 0.739 | 0.611 |
| PhosphoPICK-SNP | 0.852 | 0.794 | 0.850 | 0.827 | 0.823 | 0.784 |
| PhosVarDeep |
|
|
|
|
|
|
Figure 3The values of Sn, Acc, Pre and F1 of different methods at Sp = 90.0% and Sp = 95.0% on S/T sites.
Prediction scores of confirmed phospho-variants.
| Gene | Protein | Variant | Phos.site | PhosVarDeep | MIMP |
|---|---|---|---|---|---|
| TP53 |
| P47S | S46 | 0.943 | 0.743 |
| TP53 |
| R213Q | S215 | 0.983 | 0.867 |
| TP53 |
| R282W | T284 | 0.987 | 0.879 |
| BDNF |
| V66M | T62 | 0.911 | 0.839 |
| PER2 |
| S662G | S662 | 0.982 | <0.5 |
| MeCP2 |
| R306C | T308 | 0.983 | 0.884 |
| NKX3-1 |
| R52C | S48 | 0.988 | <0.5 |
| ABCB4 |
| T34M | T34 | 0.787 | <0.5 |
| GLUT1 |
| R223W | S226 | 0.976 | 0.979 |
| CLIP1 |
| E1012K | S1009 | 0.977 | 0.952 |
| CTNNB1 |
| S37C | S33 | 0.824 | 0.756 |
| CTNNB1 |
| G34R | S47 | 0.996 | <0.5 |
| Cyclin D1 |
| T286R | T286 | 0.947 | <0.5 |
| hOG1 |
| S326C | S326 | 0.991 | <0.5 |
| UBE3A |
| T485A | T485 | 0.038 | <0.5 |
| PLN |
| R14C | S16 | 0.993 | 0.925 |
| MAF |
| P59H | T58 | 0.988 | 0.986 |
| Gab1 |
| T387N | T387 | 0.977 | <0.5 |
| hERG1 |
| K897T | T897 | 0.985 | <0.5 |
| STAT1 |
| L706S | Y701 | 0.972 | <0.5 |
Figure 4Visualization of original combined one-hot encoding features and combined features extracted by PhosVarDeep.
Red dots represent positive examples of phosphor-variants on (A) S/T sites or (B) Y sites of test set3, blue dots represent negative examples of phosphor-variants.