| Literature DB >> 34541311 |
Favorisen Rosyking Lumbanraja1, Bharuno Mahesworo2,3, Tjeng Wawan Cenggoro2,4, Digdo Sudigyo2, Bens Pardamean2,5.
Abstract
BACKGROUND: Conventional in vivo methods for post-translational modification site prediction such as spectrophotometry, Western blotting, and chromatin immune precipitation can be very expensive and time-consuming. Neural networks (NN) are one of the computational approaches that can predict effectively the post-translational modification site. We developed a neural network model, namely the Sequential and Spatial Methylation Fusion Network (SSMFN), to predict possible methylation sites on protein sequences.Entities:
Keywords: CNN; Deep Learning; LSTM; Methylation; Prediction; Sequential; Spatial
Year: 2021 PMID: 34541311 PMCID: PMC8409337 DOI: 10.7717/peerj-cs.683
Source DB: PubMed Journal: PeerJ Comput Sci ISSN: 2376-5992
Protein sequence dataset example.
| No | Sequence | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1st | 2nd | 3rd | . | . | 8th | 9th |
| 11th | 12th | . | . | 17th | 18th | 19th | |
| 1 | V | E | S | . | . | V | T |
| L | H | . | . | H | M | N |
| 2 | K | N | H | . | . | I | S |
| H | H | . | . | D | P | Q |
| 3 | H | P | P | . | . | R | L |
| G | I | . | . | W | D | H |
| . | . | . | . | . | . | . | . |
| . | . | . | . | . | . | . |
| . | . | . | . | . | . | . | . |
| . | . | . | . | . | . | . |
| n | R | S | I | . | . | A | C |
| I | R | . | . | K | W | Y |
Amino acids sequences dataset list.
|
|
|
|
|---|---|---|
|
| Positive | 1,038 |
| Negative | 5,190 | |
| Positive | 1,038 | |
| Negative | 1,038 | |
|
| Positive | 1,131 |
| Negative | 3,033 | |
| Positive | 1,131 | |
| Negative | 1,131 | |
| Positive | 260 | |
| Negative | 260 |
Figure 1Research workflow.
The chart shows that the data we used in this research was retrieved from Kumar et al. (2017). The data was afterward balanced accordingly. In the first experiment, we trained our model using the balanced training dataset. Subsequently, we validated and tested the model on the balanced and the imbalanced dataset. We did a similar workflow for the second experiment. However, instead of the balanced dataset, we trained the model on the imbalanced training dataset.
Figure 2Proposed neural network architecture.
Hyperparameter settings.
|
|
|
|---|---|
|
| 0.001 |
|
| 500 |
|
| Adam |
|
| 21 |
|
| 21 × 19 = 399 |
|
| 2 |
|
| |
|
| 64 |
|
| 0.5 |
|
| 32 |
|
| |
|
| 64 |
|
| Rectified linear units |
|
| 0.5 |
|
| 32 |
Figure 3The standard multi-layer perceptron architecture.
The first ablation study, trained on the balanced training dataset.
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|
|
| ||||||
| SSMFN CNN | 0.7891 | 0.7649 | 0.5745 | 0.9368 | 0.5649 | 0.8120 |
| SSMFN LSTM |
|
|
| 0.9354 |
| 0.8326 |
| SSMFN Merged | 0.8187 | 0.7943 | 0.6175 |
| 0.6143 |
|
|
| ||||||
| SSMFN CNN |
|
|
| 0.8149 |
| 0.8120 |
| SSMFN LSTM | 0.8302 | 0.3020 | 0.8195 | 0.8417 | 0.6609 | 0.8326 |
| SSMFN Merged | 0.8360 | 0.8358 | 0.8130 |
| 0.6738 |
|
|
| ||||||
| SSMFN CNN | 0.7962 | 0.7960 |
| 0.7831 | 0.5929 | 0.7962 |
| SSMFN LSTM | 0.7981 | 0.7980 | 0.8063 | 0.7903 | 0.5964 | 0.7981 |
| SSMFN Merged |
|
| 0.8000 |
|
|
|
Note.
The highest value of each parameter from each measurement experiment is shown in bold.
The second ablation study, trained on the imbalanced training dataset.
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|
|
| ||||||
| SSMFN CNN | 0.8939 | 0.8502 |
| 0.8834 | 0.7230 | 0.8179 |
| SSMFN LSTM |
|
| 0.9100 |
|
|
|
| SSMFN Merged | 0.9078 | 0.8774 | 0.8895 | 0.9133 | 0.7598 | 0.8596 |
|
| ||||||
| SSMFN CNN | 0.7529 | 0.7372 |
| 0.6698 | 0.5798 | 0.8179 |
| SSMFN LSTM | 0.8638 | 0.8624 | 0.9567 |
|
|
|
| SSMFN Merged |
|
| 0.9672 | 0.8003 | 0.7491 | 0.8596 |
|
| ||||||
| SSMFN CNN | 0.7404 | 0.7228 |
| 0.6598 | 0.5566 | 0.7404 |
| SSMFN LSTM | 0.8442 | 0.8418 | 0.9590 |
| 0.7110 | 0.8442 |
| SSMFN Merged |
|
| 0.9688 | 0.7744 |
|
|
Note.
The highest value of each parameter from each measurement experiment is shown in bold.
The first experiment, trained on the balanced training dataset.
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|
|
| ||||||
| DeepRMethylSite CNN | 0.7819 | 0.7557 | 0.5668 | 0.9259 | 0.5428 | 0.7990 |
| DeepRMethylSite LSTM | 0.7699 | 0.7479 | 0.5480 | 0.9394 | 0.5430 | 0.8024 |
| DeepRMethylSite Merged | 0.7743 | 0.7518 | 0.5474 | 0.9394 | 0.5481 | 0.8021 |
| SMLP | 0.7209 | 0.7018 | 0.4922 | 0.9281 | 0.4719 | 0.7649 |
| SSMFN Merged |
|
|
|
|
|
|
|
| ||||||
| DeepRMethylSite CNN | 0.8090 | 0.8089 | 0.7944 | 0.8251 | 0.6188 | 0.7990 |
| DeepRMethylSite LSTM | 0.7993 | 0.7993 | 0.7618 | 0.8493 | 0.6048 | 0.8024 |
| DeepRMethylSite Merged | 0.8059 | 0.8051 | 0.7659 | 0.8504 | 0.6169 | 0.8021 |
| SMLP | 0.7073 | 0.7073 | 0.7041 | 0.7107 | 0.4147 | 0.7649 |
| SSMFN Merged |
|
|
|
|
|
|
|
| ||||||
| MeMo* | 0.68 | na | 0.38 | 0.99 | 0.46 | na |
| MASA* | 0.65 | na | 0.31 | 0.99 | 0.41 | na |
| BPB-PPMS* | 0.56 | na | 0.12 |
| 0.25 | na |
| PMeS* | 0.58 | na | 0.43 | 0.73 | 0.16 | na |
| iMethyl-PseAAC* | 0.59 | na | 0.18 |
| 0.3 | na |
| PSSMe* | 0.72 | na | 0.6 | 0.83 | 0.44 | na |
| MePred-RF* | 0.69 | na | 0.41 | 0.97 | 0.46 | na |
| PRmePRed** |
| na |
| 0.8660 |
|
|
| DeepRMethylSite CNN | 0.7846 | 0.7846 | 0.7803 | 0.7891 | 0.5693 | 0.7846 |
| DeepRMethylSite LSTM | 0.8000 | 0.7989 | 0.7617 | 0.8514 | 0.6065 | 0.8000 |
| DeepRMethylSite Merged | 0.7942 | 0.7929 | 0.7508 | 0.8447 | 0.5959 | 0.7904 |
| SMLP | 0.8077 | 0.8076 | 0.8175 | 0.7985 | 0.6157 | 0.8077 |
| SSMFN Merged | 0.8115 |
| 0.8000 | 0.8240 | 0.6235 | 0.8115 |
Note.
The highest value of each parameter from each measurement experiment is shown in bold.
Second experiment, trained on the imbalanced training dataset.
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|
|
| ||||||
| DeepRMethylSite CNN | 0.8948 | 0.8550 | 0.9072 | 0.8916 | 0.7242 | 0.8283 |
| DeepRMethylSite LSTM | 0.9092 | 0.8782 | 0.9044 | 0.9106 | 0.7634 | 0.8576 |
| DeepRMethylSite Merged |
|
| 0.9047 | 0.9115 |
| 0.8589 |
| SMLP | 0.9071 | 0.8670 |
| 0.8873 | 0.7635 | 0.8295 |
| SSMFN Merged | 0.9078 | 0.8774 | 0.8895 |
| 0.7598 |
|
|
| ||||||
| DeepRMethylSite CNN | 0.8289 | 0.8249 | 0.9709 | 0.7527 | 0.6899 | 0.8283 |
| DeepRMethylSite LSTM | 0.8576 | 0.8557 | 0.9644 | 0.7908 | 0.7350 | 0.8576 |
| DeepRMethylSite Merged | 0.8585 | 0.8567 | 0.9645 | 0.7919 | 0.7365 | 0.8589 |
| SMLP | 0.7582 | 0.7432 |
| 0.6740 | 0.5899 | 0.8295 |
| SSMFN Merged |
|
| 0.9672 |
|
|
|
|
| ||||||
| DeepRMethylSite CNN | 0.7808 | 0.7727 | 0.9506 | 0.7039 | 0.6063 | 0.7808 |
| DeepRMethylSite LSTM | 0.8115 | 0.8070 | 0.9500 | 0.7382 | 0.6548 | 0.8115 |
| DeepRMethylSite Merged | 0.8135 | 0.8088 | 0.9553 | 0.7390 | 0.6598 | 0.8135 |
| SMLP | 0.7250 | 0.7025 |
| 0.6452 | 0.5388 | 0.7250 |
| SSMFN Merged |
|
| 0.9688 |
|
|
|
Note.
The highest value of each parameter from each measurement experiment is shown in bold.