| Literature DB >> 28662047 |
Shun-Long Weng1,2,3, Hui-Ju Kao4, Chien-Hsun Huang4,5, Tzong-Yi Lee4,6.
Abstract
S-palmitoylation, the covalent attachment of 16-carbon palmitic acids to a cysteine residue via a thioester linkage, is an important reversible lipid modification that plays a regulatory role in a variety of physiological and biological processes. As the number of experimentally identified S-palmitoylated peptides increases, it is imperative to investigate substrate motifs to facilitate the study of protein S-palmitoylation. Based on 710 non-homologous S-palmitoylation sites obtained from published databases and the literature, we carried out a bioinformatics investigation of S-palmitoylation sites based on amino acid composition. Two Sample Logo indicates that positively charged and polar amino acids surrounding S-palmitoylated sites may be associated with the substrate site specificity of protein S-palmitoylation. Additionally, maximal dependence decomposition (MDD) was applied to explore the motif signatures of S-palmitoylation sites by categorizing a large-scale dataset into subgroups with statistically significant conservation of amino acids. Single features such as amino acid composition (AAC), amino acid pair composition (AAPC), position specific scoring matrix (PSSM), position weight matrix (PWM), amino acid substitution matrix (BLOSUM62), and accessible surface area (ASA) were considered, along with the effectiveness of incorporating MDD-identified substrate motifs into a two-layered prediction model. Evaluation by five-fold cross-validation showed that a hybrid of AAC and PSSM performs best at discriminating between S-palmitoylation and non-S-palmitoylation sites, according to the support vector machine (SVM). The two-layered SVM model integrating MDD-identified substrate motifs performed well, with a sensitivity of 0.79, specificity of 0.80, accuracy of 0.80, and Matthews Correlation Coefficient (MCC) value of 0.45. Using an independent testing dataset (613 S-palmitoylated and 5412 non-S-palmitoylated sites) obtained from the literature, we demonstrated that the two-layered SVM model could outperform other prediction tools, yielding a balanced sensitivity and specificity of 0.690 and 0.694, respectively. This two-layered SVM model has been implemented as a web-based system (MDD-Palm), which is now freely available at http://csb.cse.yzu.edu.tw/MDDPalm/.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28662047 PMCID: PMC5491019 DOI: 10.1371/journal.pone.0179529
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Data resource and statistics of training and independent testing dataset.
| Dataset | Resource | Species | Number of | Number of |
|---|---|---|---|---|
| dbPTM 3.0 | Human | 498 | 4,671 | |
| Mouse | 107 | 1,822 | ||
| Rat | 43 | 603 | ||
| Others | 109 | 1,279 | ||
| Forrester MT | Human | 66 | 1,036 | |
| Yang W | Human | 790 | 8,533 | |
| All | 710 | 5,676 | ||
| Gould | Mouse | 613 | 5,412 |
Fig 1The conceptual diagram of constructing two-layered SVMs based on MDDLogo-identified substrate motifs.
Fig 2Amino acids composition of the S-palmitoylation sites.
(A) Comparison of amino acids composition between positive data (710 S-palmitoylation sites) and negative data (5,676 non-S-palmitoylation sites). (B) Position-specific amino acids composition surrounding the S-palmitoylation sites based on frequency plot of WebLogo. (C) The compositional biases of amino acids around S-palmitoylation sites (upper panel) compared to the non-S-palmitoylation sites (lower panel) based on TwoSampleLogo (p-value < 0.01).
Five-fold cross validation results on single SVM model trained with various features.
Sn, sensitivity; Sp, specificity; Acc, accuracy; MCC, Matthews Correlation Coefficient; AUC, area under the curve of ROC.
| Training features | Sn | Sp | Acc | MCC | AUC |
|---|---|---|---|---|---|
| 20D Binary code (AA) | 0.60 | 0.62 | 0.62 | 0.16 | 0.61 |
| BLOSUM62 (B62) | 0.62 | 0.63 | 0.63 | 0.18 | 0.62 |
| Amino Acid Composition (AAC) | 0.68 | 0.69 | 0.69 | 0.29 | 0.70 |
| Amino Acid Pair Composition (AAPC) | 0.67 | 0.68 | 0.68 | 0.26 | 0.68 |
| Accessible Surface Area (ASA) | 0.56 | 0.57 | 0.57 | 0.09 | 0.58 |
| Position Weight Matrix (PWM) | 0.65 | 0.66 | 0.66 | 0.22 | 0.66 |
| Position-specific scoring matrix (PSSM) | 0.67 | 0.68 | 0.68 | 0.26 | 0.68 |
| AAC + AA | 0.68 | 0.69 | 0.69 | 0.29 | 0.70 |
| AAC + B62 | 0.67 | 0.70 | 0.70 | 0.31 | 0.73 |
| AAC + AAPC | 0.71 | 0.71 | 0.71 | 0.34 | 0.76 |
| AAC + ASA | 0.66 | 0.68 | 0.68 | 0.24 | 0.67 |
| AAC + PWM | 0.70 | 0.70 | 0.70 | 0.32 | 0.75 |
| AAC + PSSM | 0.72 | 0.73 | 0.73 | 0.38 | 0.78 |
Fig 3ROC curves of the single SVM models trained using various features based on five-fold cross-validation.
Fig 4Tree-like view of MDDLogo-identified motif signatures on 710 non-homologous S-palmitoylated sequences.
Fig 5ROC curves of the SVM models trained from MDDLogo-identified motifs based on five-fold cross-validation.
Five-fold cross-validation performance for five SVM models trained from MDDLogo-identified motifs.
| Dataset | Number of positive data | Number of negative data | Sn | Sp | Acc | MCC | AUC |
|---|---|---|---|---|---|---|---|
| All data | 710 | 5,676 | 0.72 | 0.73 | 0.73 | 0.38 | 0.78 |
| Palm1 | 112 | 895 | 0.82 | 0.83 | 0.83 | 0.50 | 0.87 |
| Palm2 | 104 | 831 | 0.81 | 0.81 | 0.81 | 0.48 | 0.86 |
| Palm3 | 183 | 1463 | 0.83 | 0.84 | 0.84 | 0.53 | 0.89 |
| Palm4 | 107 | 856 | 0.79 | 0.81 | 0.81 | 0.47 | 0.86 |
| Palm5 | 204 | 1631 | 0.73 | 0.73 | 0.73 | 0.39 | 0.79 |
Comparison of independent testing results between our methods and other S-palmitoylation prediction tools.
| Methods | TP | FN | TN | FP | Sn | Sp | Acc | MCC |
|---|---|---|---|---|---|---|---|---|
| Single SVM | 358 | 255 | 3801 | 1611 | 0.584 | 0.702 | 0.690 | 0.184 |
| Two-Layered SVM | 423 | 190 | 3755 | 1657 | 0.690 | 0.694 | 0.693 | 0.244 |
| SeqPalm | 22 | 591 | 5141 | 271 | 0.036 | 0.950 | 0.857 | -0.020 |
| CSKAAP-Palm | 43 | 570 | 5102 | 310 | 0.070 | 0.943 | 0.854 | 0.017 |
| CSS-Palm 4.0 | 209 | 404 | 4817 | 595 | 0.341 | 0.890 | 0.834 | 0.205 |
| NBA-Palm | 19 | 594 | 4673 | 739 | 0.031 | 0.863 | 0.778 | -0.096 |
| WAP-Palm | 102 | 511 | 4711 | 701 | 0.167 | 0.870 | 0.798 | 0.033 |
| PalmPred | 169 | 444 | 4474 | 938 | 0.276 | 0.827 | 0.771 | 0.080 |
Fig 6A case study of S-palmitoylation site prediction on human CD9 antigen (CD9_HUMAN).