| Literature DB >> 32321437 |
Niraj Thapa1, Meenal Chaudhari1, Sean McManus1, Kaushik Roy2, Robert H Newman3, Hiroto Saigo4, Dukka B Kc5.
Abstract
BACKGROUND: Protein succinylation has recently emerged as an important and common post-translation modification (PTM) that occurs on lysine residues. Succinylation is notable both in its size (e.g., at 100 Da, it is one of the larger chemical PTMs) and in its ability to modify the net charge of the modified lysine residue from + 1 to - 1 at physiological pH. The gross local changes that occur in proteins upon succinylation have been shown to correspond with changes in gene activity and to be perturbed by defects in the citric acid cycle. These observations, together with the fact that succinate is generated as a metabolic intermediate during cellular respiration, have led to suggestions that protein succinylation may play a role in the interaction between cellular metabolism and important cellular functions. For instance, succinylation likely represents an important aspect of genomic regulation and repair and may have important consequences in the etiology of a number of disease states. In this study, we developed DeepSuccinylSite, a novel prediction tool that uses deep learning methodology along with embedding to identify succinylation sites in proteins based on their primary structure.Entities:
Keywords: Convolutional neural network; Deep learning; Embedding; Long short-term memory; Recurrent neural network; Succinylation
Mesh:
Substances:
Year: 2020 PMID: 32321437 PMCID: PMC7178942 DOI: 10.1186/s12859-020-3342-z
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.307
Number of positive and negative sites for training and testing dataset
| Dataset | Positive | Negative |
|---|---|---|
| Training | 4750 | 4750 |
| Independent Test | 254 | 254 |
Fig. 1a Window size of 33 in FASTA format is the input. It is converted into integers which is then encoded either using one-hot encoding or embedding layer. This will be the input for CNN layers. b The output from either of the encoding is then fed as input into the deep learning architecture. Finally, after the flattening and fully connected layers we get the final output which contains two nodes with outputs [0 1] for positive and [1 0] for negative sites
Parameters in DeepSuccinylSite
| Parameters | Settings |
|---|---|
| Embedding Output Dimension | 21 |
| Learning Rate | 0.001 |
| Batch Size | 256 |
| Epochs | 80 |
| Conv2d_1 number of filters | 64 |
| Conv2d_1 filter size | 17 × 3 (For window size 33) |
| Conv2d_1 padding | Disabled |
| Dropout | 0.6 |
| Conv2d_1 number of filters | 128 |
| Conv2d_1 filter size | 3 × 3 |
| Conv2d_1 padding | Enabled |
| Dropout | 0.6 |
| MaxPooling2d | 2 × 2 |
| Dense 1 | 768 |
| Dropout | 0.5 |
| Dense_2 | 256 |
| Dropout | 0.5 |
| Checkpointer | Best validation accuracy |
Performance metrics for different window sizes. The highest values in each category are highlighted in boldface. MCC: Matthew’s Correlation Coefficient
| Window Size | One-Hot Encoding | Embedding (Dimension = 21) | ||||
|---|---|---|---|---|---|---|
| Sensitivity | Specificity | MCC | Sensitivity | Specificity | MCC | |
| 9 | 0.70 | 0.55 | 0.25 | 0.80 | 0.57 | 0.39 |
| 15 | 0.73 | 0.33 | 0.58 | 0.42 | ||
| 21 | 0.79 | 0.55 | 0.34 | 0.76 | 0.67 | 0.43 |
| 27 | 0.79 | 0.59 | 0.38 | 0.81 | 0.63 | 0.45 |
| 33 | 0.55 | 0.79 | ||||
| 39 | 0.81 | 0.53 | 0.36 | 0.75 | 0.63 | 0.40 |
| 45 | 0.81 | 0.55 | 0.38 | 0.76 | 0.67 | 0.43 |
Fig. 2ROC curve for different window sizes for embedding
Performance metrics for different embedding dimensions. The highest values in each category are shown in bold. MCC: Matthew’s Correlation Coefficient
| Dimension | Sensitivity | Specificity | MCC |
|---|---|---|---|
| 9 | 0.58 | 0.45 | |
| 15 | 0.73 | 0.44 | |
| 21 | 0.79 | 0.67 | |
| 27 | 0.75 | 0.66 | 0.41 |
| 33 | 0.77 | 0.68 | 0.45 |
Fig. 3ROC curves for different embedding dimensions
Comparison of DeepSuccinylSite with other deep learning architectures for window size 33. The highest value in each category is shown in bold. MCC: Matthew’s Correlation Coefficient; RNN: Recurrent neural network; LSTM: Long short-term memory model
| Models | Sensitivity | Specificity | MCC |
|---|---|---|---|
| RNN | 0.70 | 0.49 | 0.20 |
| LSTM-RNN | 0.66 | 0.57 | 0.23 |
| LSTM | 0.74 | 0.66 | 0.36 |
| DeepSuccinylSite-one_hot | 0.55 | 0.41 | |
| DeepSuccinylSite-Embedding | 0.79 |
Fig. 4ROC curve for different deep learning architectures
Comparison of DeepSuccinylSite with existing predictors using an independent test dataset. The highest value in each category is shown in bold
| Prediction Schemes | Sensitivity | Specificity | MCC |
|---|---|---|---|
| iSuc-PseAAC | 0.12 | 0.01 | |
| iSuc-PseOpt | 0.30 | 0.76 | 0.04 |
| pSuc-Lys | 0.22 | 0.83 | 0.04 |
| SuccineSite | 0.37 | 0.88 | 0.20 |
| SuccineSite2.0 | 0.45 | 0.88 | 0.26 |
| GPSuc | 0.50 | 0.88 | 0.30 |
| PSuccE | 0.38 | 0.20 | |
| DeepSuccinylSite | 0.69 |