| Literature DB >> 35860402 |
Lingkuan Meng1,2, Wai-Sum Chan1, Lei Huang1, Linjing Liu1, Xingjian Chen1, Weitong Zhang1, Fuzhou Wang1, Ke Cheng2, Hongyan Sun2, Ka-Chun Wong1.
Abstract
Post-translational modifications (PTMs) are closely linked to numerous diseases, playing a significant role in regulating protein structures, activities, and functions. Therefore, the identification of PTMs is crucial for understanding the mechanisms of cell biology and diseases therapy. Compared to traditional machine learning methods, the deep learning approaches for PTM prediction provide accurate and rapid screening, guiding the downstream wet experiments to leverage the screened information for focused studies. In this paper, we reviewed the recent works in deep learning to identify phosphorylation, acetylation, ubiquitination, and other PTM types. In addition, we summarized PTM databases and discussed future directions with critical insights.Entities:
Keywords: AAindex, Amino acid index; ATP, Adenosine triphosphate; AUC, Area under curve; Ac, Acetylation; BE, Binary encoding; BLOSUM, Blocks substitution matrix; Bi-LSTM, Bidirectional LSTM; CKSAAP, Composition of k-spaced amino acid Pairs; CNN, Convolutional neural network; CNNOH, CNN with the one-hot encoding; CNNWE, CNN with the word-embedding encoding; CNNrgb, CNN red green blue; CV, Cross-validation; DC-CNN, Densely connected convolutional neural network; DL, Deep learning; DNNs, Deep neural networks; Deep learning; E. coli, Escherichia coli; EBGW, Encoding based on grouped weight; EGAAC, Enhanced grouped amino acids content; IG, Information gain; K, Lysine; KNN, k nearest neighbor; LASSO, Least absolute shrinkage and selection operator; LSTM, Long short-term memory; LSTMWE, LSTM with the word-embedding encoding; M.musculus, Mus musculus; MDC, Modular densely connected convolutional networks; MDCAN, Multilane dense convolutional attention network; ML, Machine learning; MLP, Multilayer perceptron; MMI, Multivariate mutual information; Machine learning; Mass spectrometry; NMBroto, Normalized Moreau-Broto autocorrelation; P, Proline; PSP, PhosphoSitePlus; PSSM, Position-specific scoring matrix; PTM, Post-translational modifications; Ph, Phosphorylation; Post-translational modification; Prediction; PseAAC, Pseudo-amino acid composition; R, Arginine; RF, Random forest; RNN, Recurrent neural network; ROC, Receiver operating characteristic; S, Serine; S. typhimurium, Salmonella typhimurium; S.cerevisiae, Saccharomyces cerevisiae; SE, Squeeze and excitation; SEV, Split to Equal Validation; ST, Source and target; SUMO, Small ubiquitin-like modifier; SVM, Support vector machines; T, Threonine; Ub, Ubiquitination; Y, Tyrosine; ZSL, Zero-shot learning
Year: 2022 PMID: 35860402 PMCID: PMC9284371 DOI: 10.1016/j.csbj.2022.06.045
Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN: 2001-0370 Impact factor: 6.155
Fig. 1Overview of deep learning approaches for PTM prediction. [29].
Fig. 2The statistics of published literature on machine/deep learning-based PTM prediction. (a) Number of articles published in different peer-reviewed journals. Note that the year 2022 only includes publications up to January 2022. Abbreviations: DL = deep learning, ML = machine learning, PTM = post-translational modification. (b) Word cloud based on the collective concordance ranking with the size of terms proportional to their frequency in the above articles.
Summary of PTM databases harbored.
| UniProt | 2005 | Varies according to the keyword search | Multiple-type PTM sites for multi-species | ||
| PLMD | 2017 | 284,780 | Protein lysine modification sites for multi-species | ||
| PhosphoSitePlus | 2012 | 598,976 | Multiple-type PTM sites for multi-species | ||
| Phospho.ELM | 2010 | 42,914 | Phosphorylation sites for Eukaryotic | ||
| mUbiSida | 2014 | 110,976 | Uniquitination sites mainly for Human and Mouse | ||
| DEPOD | 2015 | 1,215 | Dephosphorylation interactions |
Summary of recently deep learning tools associated with PTM sites prediction.
| MusiteDeep | Multiple | Human | CNN | 5-fold CV | 997,687 | 2017/2020 | ||
| PROSPECT | Phosphorylation | Escherichia coli | CNN | 10-fold CV and independent test | 1,664 | * | 2020 | |
| DeepKinZero | Phosphorylation | Human | ZSL | holdout | 12,901 | * | 2020 | |
| PhosTransfer | Phosphorylation | – | CNN | holdout | 43,785 | 2020 | ||
| GPS-PBS | Phosphorylation | Multiple | seven-layer DNNs | 10-fold CV | 4,458 | – | 2020 | |
| DeepPPSite | Phosphorylation | Mammals and Arabidopsis thaliana | LSTM | 10-fold CV | 41,436 | 2021 | ||
| DeepIPs | Phosphorylation | Human | CNN + LSTM | 5-fold CV | 10.978 | 2021 | ||
| PhosIDN | Phosphorylation | Human | Multi-layer DNNs | holdout | more than 160,000 | 2021 | ||
| EMBER | Phosphorylation | Multiple | CNN + RNN | 5-fold CV | 8,389 | 2022 | ||
| DNNAce | Acetylation | Multiple | DNN | 10-fold CV and independent test | 96,372 | 2020 | ||
| Deep-PLA | Acetylation | Human and | DNN | 5- and 10-fold CV | 1,331 | 2020 | ||
| MDC-Kace | Acetylation | Multiple | MDC | 10-fold CV and independent test | 11,583 | 2020 | ||
| DeepTL-Ubi | Ubiquitination | Multiple | CNN | holdout | 94,518 | 2020 | ||
| Wang et al.’s work | Ubiquitination | Multiple | CNN | 10-fold CV | 121,742 | * | 2020 | |
| UbiComb | Ubiquitination | Multiple | LSTM | 10-fold CV | 121,742 | 2021 | ||
| SSMFN | Methylation | Human and Mouse | CNN + LSTM | holdout | 6,754 | * | 2021 | |
| Malebary et al.’s work | Methylation | Human | CNN | 10-fold CV and jackknife | 2000 | https://github.com/s2018 | 2022 | |
| RecSNO | S-Nitrosylation | – | BiLSTM | 5-fold CV | 4,762 | 2021 | ||
| MDCAN-Lys | Succinylation | Human | MDCAN | 10-fold CV and independent test | 77,418 | – | 2021 | |
| LSTMCNNsucc | Succinylation | Multiple | LSTM + CNN | holdout | 18,593 | 2021 | ||
| DeepMal | Malonylation | Multiple | CNN + DNN | 10-fold CV and independent test | 17,288 | 2020 | ||
| K_net | Malonylation | Human and Mice | CNN | 10-fold CV and SEV | 85,204 | – | 2020 | |
| DeepCSO | S-Sulphenylation | Homo sapiens and Arabidopsis thaliana | LSTM | 10-fold CV | 10,354 | * | 2020 | |
| DeepSSPred | S-Sulphenylation | Homo Sapiens | 2D-CNN | jackknife | 7,756 | * | 2021 | |
| pKcr | Crotonylation | Papaya | CNN | 10-fold CV and independent test | 58,769 | * | 2020 | |
| Deep-Kcr | Crotonylation | Human | CNN | 10-fold CV | 19,928 | 2020 | ||
| DeepKcrot | Crotonylation | Multiple | CNN | 10-fold CV and independent test | 10,702/1,265/2,044/5,995 | * | 2021 | |
| nhKcr | Crotonylation | Human | CNNrgb | 10-fold CV and independent test | 180,312 | 2021 | ||
| DeepKhib | 2-Hydroxyisobutyrylation | Multiple | CNN | 10-fold CV and independent test | 18,946/15,444/12,756/19,330/2,098 | * | 2020 | |
| DeepGlut | Glutarylation | Prokaryotes and Eukaryote | CNN | 10-fold CV | 4,572 | * | 2020 | |
| NPalmitoylDeep-PseAAC | N-Palmitoylation | Human | DNN | holdout | 4,364 | 2021 | ||
| DTL-DephosSite | Dephosphorylation | Human | Bi-LSTM | 5-fold CV and independent test | 4,956 | 2021 | ||
| PreCar_Deep | Carbonylation | Human and other Mammals | CNN + BiLSTM | 10-fold CV and independent test | 5,003 | 2021 | ||
| He et al.'s work | SUMOylation Ubiquitylation | – | CNN + DNN | 10-fold CV | 280,731 | 2021 |
Note: *, Link is not working at the time of writing. Multiple, more than three species or PTM types. -, data not available.
Comparison of deep learning-based phosphorylation sites predictors.
| MusiteDeep | Keras/TensorFlow | One-hot | 33 | 0.880 | |
| PROSPECT | PyTorch | One-hot, EGAAC, CKSAAGP | 27 | 0.770 | |
| DeepKinZero | TensorFlow | Word embedding | 15 | – | |
| PhosTransfer | TensorFlow | Word embedding | – | 0.898 | |
| GPS-PBS | Keras/TensorFlow | BLOSUM62 | 21 | 0.832 | |
| DeepPPSite | Keras/TensorFlow | BE, EBGW, CKSAAP, PSPM, IPCP | 21 | 0.872 | |
| DeepIPs | Keras/TensorFlow | Word embedding | 15 | 0.909 | |
| PhosIDN | Keras/TensorFlow | One-hot, PPI embedding | 21 | 0.939 | |
| EMBER | PyTorch | One-hot | 15 | 0.928 |
Note: -, data not available. AUC: Area under the Curve of ROC.
AUC values on different ubiquitination prediction tools. [106].
| 0.753 | 0.789 | 0.720 | 0.772 | 0.824 | 0.814 | ||
| 0.598 | 0.625 | 0.561 | 0.548 | 0.607 | 0.611 | ||
| 0.624 | 0.661 | 0.644 | 0.600 | 0.630 | 0.638 | ||
| 0.656 | 0.693 | 0.659 | 0.664 | 0.715 | 0.681 | ||
Fig. 3Sankey diagram depicting the distribution of PTM types, core network models, evaluation strategies, and published years.