| Literature DB >> 33967828 |
Ang Li1, Yingwei Deng1, Yan Tan1, Min Chen1.
Abstract
Lysine propionylation is a newly discovered posttranslational modification (PTM) and plays a key role in the cellular process. Although proteomics techniques was capable of detecting propionylation, large-scale detection was still challenging. To bridge this gap, we presented a transfer learning-based method for computationally predicting propionylation sites. The recurrent neural network-based deep learning model was trained firstly by the malonylation and then fine-tuned by the propionylation. The trained model served as feature extractor where protein sequences as input were translated into numerical vectors. The support vector machine was used as the final classifier. The proposed method reached a matthews correlation coefficient (MCC) of 0.6615 on the 10-fold crossvalidation and 0.3174 on the independent test, outperforming state-of-the-art methods. The enrichment analysis indicated that the propionylation was associated with these GO terms (GO:0016620, GO:0051287, GO:0003735, GO:0006096, and GO:0005737) and with metabolism. We developed a user-friendly online tool for predicting propoinylation sites which is available at http://47.113.117.61/.Entities:
Keywords: deep learning; long short term memory; malonylation; propionylation; recurrent neural network; support machine vector; transfer learning
Year: 2021 PMID: 33967828 PMCID: PMC8096918 DOI: 10.3389/fphys.2021.658633
Source DB: PubMed Journal: Front Physiol ISSN: 1664-042X Impact factor: 4.566
FIGURE 1The workflow of the proposed method.
FIGURE 2Illustration of segmenting protein sequences. (A) is normal segment; (B) is segment when the number of residues is less than 8.
FIGURE 3The RNN-based deep learning model.
Performance of various window size in the 10-fold crossvalidation.
| Size | SN | SP | ACC | MCC |
| 21 | 0.6579 | 0.7862 | 0.7220 | 0.4478 |
| 23 | 0.7631 | 0.8421 | 0.8026 | 0.6072 |
| 25 | 0.7697 | 0.8553 | 0.8125 | 0.6273 |
| 27 | 0.7533 | 0.7763 | 0.7648 | 0.5297 |
| 29 | 0.8158 | |||
| 31 | 0.7697 | 0.8059 | 0.7878 | 0.5760 |
| 33 | 0.7928 | 0.8240 | 0.6493 | |
| 35 | 0.7664 | 0.7796 | 0.7730 | 0.5461 |
| 37 | 0.7500 | 0.7697 | 0.7599 | 0.5198 |
| 39 | 0.7467 | 0.7336 | 0.7401 | 0.4803 |
| 41 | 0.7697 | 0.7434 | 0.7566 | 0.5133 |
The best 15 combinations in the searching space.
| Gamma | Kernel | Average accuracy | |
| 1 | Scale | rbf | 0.8389 |
| 1 | Auto | rbf | 0.8389 |
| 0.5 | Scale | rbf | 0.8356 |
| 0.5 | Auto | rbf | 0.8356 |
| 1.5 | Scale | rbf | 0.8307 |
| 1.5 | Auto | rbf | 0.8307 |
| 2 | Scale | rbf | 0.8258 |
| 2 | Auto | rbf | 0.8241 |
| 2.5 | Scale | rbf | 0.8143 |
| 2.5 | Auto | rbf | 0.8143 |
| 3 | Auto | rbf | 0.8093 |
| 3 | Scale | rbf | 0.8093 |
| 0.5 | Auto | Sigmoid | 0.7960 |
| 0.5 | Scale | Sigmoid | 0.7960 |
| 1 | Auto | Sigmoid | 0.7664 |
Performances of the PropPred method.
| SN | SP | ACC | MCC | |
| 10-fold | 0.7928 | 0.7599 | 0.7763 | 0.5529 |
| Independent | 0.4904 | 0.6442 | 0.5673 | 0.1362 |
FIGURE 4Receiver operating characteristic curves of (A) 10-fold cross validation and (B) independent test.
Function groups of proteins.
| UNIPROT _ACCESSION | Gene name | Enrichment score |
| Q5SIR5 | Ribose-5-phosphate isomerase A (TTHA1299) | 3.8325 |
| Q5SIC8 | Fructose 1,6-bisphosphatase II (glpX) | |
| Q5SM35 | Transketolase (TTHA0108) | |
| Q5SHF7 | Fructose-1,6-bisphosphate aldolase (TTHA1773) | |
| Q5SM37 | Ribulose-phosphate 3-epimerase (TTHA0106) | |
| Q5SLJ4 | Glucokinase (TTHA0299) | |
| Q5SJM8 | Hypothetical protein (TTHA0980) | |
| Histidyl-tRNA synthetase (hisS) | 3.2378 | |
| Leucyl-tRNA synthetase (leuS) | ||
| Q5SJX7 | Seryl-tRNA synthetase (TTHA0875) | |
| P56881 | Threonyl-tRNA synthetase (thrS) | |
| P56206 | Glycyl-tRNA synthetase (TTHA0543) | |
| P56690 | Isoleucyl-tRNA synthetase (ileS) | 2.5835 |
| P23395 | Methionyl-tRNA synthetase (TTHA1298) | |
| Histidyl-tRNA synthetase (hisS) | ||
| Leucyl-tRNA synthetase (leuS) | ||
| Q5SJ45 | Valyl-tRNA synthetase (valS) | |
| Q5SIH0 | Tyrosyl-tRNA synthetase (TTHA1399) | |
| P80380 | 30S ribosomal protein S20 (rpsT) | 1.8414 |
| Q5SHQ2 | 30S ribosomal protein S8 (rpsH) | |
| Q5SHP6 | 50S ribosomal protein L29 (TTHA1684) | |
| Q5SHQ5 | 30S ribosomal protein S5 (rpsE) | |
| Q5SLP7 | 50S ribosomal protein L1 (rplA) | |
| Q5SHQ0 | 50S ribosomal protein L5 (rplE) | |
| P80377 | 30S ribosomal protein S13 (rpsM) | |
| Q5SHN3 | 30S ribosomal protein S12 (rpsL) | |
| P35871 | 50S ribosomal protein L33 (rpmG) | |
| Q8VVE2 | 50S ribosomal protein L7/L12 (rplL) | |
| Q5SLY1 | 30S ribosomal protein S1 (rpsA) | |
| P17291 | 30S ribosomal protein S7 (TTHA1696) | |
| Q9Z9H5 | 50S ribosomal protein L17 (rplQ) |
Significantly enriched GO terms.
| Category | Term | Count | |
| GOTERM_CC_DIRECT | GO:0005737 cytoplasm | 38 | 1.07E-05 |
| GOTERM_MF_DIRECT | GO:0016620 oxidoreductase activity, acting on the aldehyde or oxo group of donors, NAD or NADP as acceptor | 5 | 1.91E-03 |
| GOTERM_BP_DIRECT | GO:0006096 glycolytic process | 6 | 3.07E-03 |
| GOTERM_MF_DIRECT | GO:0051287 NAD binding | 8 | 3.09E-03 |
| GOTERM_MF_DIRECT | GO:0003735 structural constituent of ribosome | 13 | 9.83E-03 |
Significant KEGG pathways.
| Term | Count | |
| ttj01200:Carbon metabolism | 35 | 2.18E-09 |
| ttj01120:Microbial metabolism in diverse environments | 44 | 1.49E-07 |
| ttj01130:Biosynthesis of antibiotics | 43 | 4.16E-06 |
| ttj00010:Glycolysis/gluconeogenesis | 15 | 3.92E-05 |
| ttj00020:Citrate cycle (TCA cycle) | 12 | 1.52E-04 |
| ttj00620:Pyruvate metabolism | 14 | 5.84E-04 |
| ttj00710:Carbon fixation in photosynthetic organisms | 8 | 5.95E-04 |
| ttj01110:Biosynthesis of secondary metabolites | 50 | 7.43E-04 |
| ttj01100:Metabolic pathways | 85 | 8.13E-04 |