| Literature DB >> 23173045 |
Sheng-Bao Suo1, Jian-Ding Qiu, Shao-Ping Shi, Xing-Yu Sun, Shu-Yun Huang, Xiang Chen, Ru-Ping Liang.
Abstract
Protein lysine acetylation is a type of reversible post-translational modification that plays a vital role in many cellular processes, such as transcriptional regulation, apoptosis and cytokine signaling. To fully decipher the molecular mechanisms of acetylation-related biological processes, an initial but crucial step is the recognition of acetylated substrates and the corresponding acetylation sites. In this study, we developed a position-specific method named PSKAcePred for lysine acetylation prediction based on support vector machines. The residues around the acetylation sites were selected or excluded based on their entropy values. We incorporated features of amino acid composition information, evolutionary similarity and physicochemical properties to predict lysine acetylation sites. The prediction model achieved an accuracy of 79.84% and a Matthews correlation coefficient of 59.72% using the 10-fold cross-validation on balanced positive and negative samples. A feature analysis showed that all features applied in this method contributed to the acetylation process. A position-specific analysis showed that the features derived from the critical neighboring residues contributed profoundly to the acetylation site determination. The detailed analysis in this paper can help us to understand more of the acetylation mechanism and can provide guidance for the related experimental validation.Entities:
Mesh:
Substances:
Year: 2012 PMID: 23173045 PMCID: PMC3500252 DOI: 10.1371/journal.pone.0049108
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Comparison of sequence information between acetylation sites and non-acetylation sites.
(A) Amino acid average composition of acetylation and non-acetylation sites. (B) A two-sample logo of the compositional biases around the acetylation sites compared to the non-acetylation sites.
Figure 2Comparison of KNN scores between acetylation sites and non-acetylation sites.
(A) Box plots of KNN scores for acetylation sites and non-acetylation sites. The bottom and top of the box are the 25th and 75th percentiles, respectively. (B) Comparison of mean KNN scores between acetylation sites and non-acetylation sites.
Figure 3The average accessible surface area (AASA) of residues around acetylation sites and non-acetylation sites.
Figure 4The information gain values at different positions of residues in the sequence fragments.
The sizes and positions of IG window.
| IG window size | Positions in original 21-mer acetylation sequence fragment |
| 9 | −8, −7, −6, −5, −4, −3, −2, −1, +1 |
| 11 | −8, −7, −6, −5, −4, −3, −2, −1, +1, +4, +6 |
| 13 | −8, −7, −6, −5, −4, −3, −2, −1, +1, +3, +4, +6, +7 |
| 15 | −8, −7, −6, −5, −4, −3, −2, −1, +1, +3, +4, +5, +6, +7, +9 |
| 17 | −8, −7, −6, −5, −4, −3, −2, −1, +1, +2, +3, +4, +5, +6, +7, +8, +9 |
The predictive performance of the models trained with various features with an IG window size of 13.
| Trainingfeatures | Accuracy(%) | Sensitivity (%) | Specificity (%) | MCC (%) |
| BE | 68.00±0.17 | 63.94±0.28 | 72.06±0.26 | 36.12±0.33 |
| KNN | 74.98±0.39 | 72.66±0.67 | 77.31±0.30 | 50.02±0.78 |
| AASA | 65.28±0.13 | 62.66±0.45 | 67.90±0.41 | 30.61±0.26 |
| BE+KNN+AASA | 79.84±0.18 | 78.02±0.20 | 81.66±0.17 | 59.72±0.35 |
Abbreviations: BE, binary encoding; KNN, K nearest neighbors; AASA, average accessible surface area. The corresponding measurement was represented as the average value ± standard deviation.
Prediction performance of the models trained with different IG window sizes.
| Performance | 9 | 11 | 13 | 15 | 17 |
| Accuracy (%) | 76.69±0.14 | 77.56±0.23 | 79.84±0.18 | 76.97±0.10 | 76.76±0.21 |
| Sensitivity (%) | 75.00±0.35 | 75.60±0.14 | 78.02±0.20 | 75.47±0.19 | 74.17±0.24 |
| Specificity (%) | 78.38±0.35 | 79.51±0.46 | 81.66±0.17 | 78.46±0.12 | 79.36±0.35 |
| MCC (%) | 53.42±0.29 | 55.15±0.47 | 59.72±0.35 | 53.96±0.19 | 53.60±0.42 |
The corresponding measurement was represented as the average value ± standard deviation.
The predictive performance of the model trained with optimal feature with general window size.
| General window size | Accuracy (%) | Sensitivity (%) | Specificity (%) | MCC (%) |
| 9 (−4∼K∼+4) | 73.72±0.25 | 71.98±0.23 | 75.47±0.43 | 47.48±0.51 |
| 11 (−5∼K∼+5) | 75.42±0.19 | 73.53±0.31 | 77.31±0.14 | 50.87±0.38 |
| 13 (−6∼K∼+6) | 76.01±0.30 | 73.52±0.51 | 78.49±0.46 | 52.08±0.60 |
| 15 (−7∼K∼+7) | 76.85±0.19 | 74.90±0.28 | 78.80±0.41 | 53.74±0.39 |
| 17 (−8∼K∼+8) | 76.82±0.20 | 74.93±0.20 | 78.70±0.56 | 53.67±0.42 |
| 19 (−9∼K∼+9) | 76.83±0.13 | 74.55±0.31 | 79.11±0.18 | 53.71±0.25 |
| 21 (−10∼K∼+10) | 76.52±0.09 | 74.00±0.46 | 79.04±0.31 | 53.11±0.16 |
The corresponding measurement was represented as the average value ± standard deviation.
The comparison of predictive performance between our method and other prediction methods on independent test data sets.
| Prediction method | Accuracy (%) | Sensitivity (%) | Specificity (%) | MCC (%) |
| LysAcet | 54.43±0.99 | 56.55±0.00 | 52.30±1.97 | 8.87±1.97 |
| EnsemblePail | 57.17±0.93 | 76.12±0.00 | 38.22±1.86 | 15.49±1.88 |
| Phosida | 64.73±0.44 | 39.98±0.00 | 89.47±0.88 | 33.91±1.20 |
| PSKAcePred | 78.79±0.33 | 77.34±0.00 | 80.24±0.66 | 57.61±0.67 |
The corresponding measurement was represented as the average value ± standard deviation.