| Literature DB >> 24586884 |
Ting Hou1, Guangyong Zheng2, Pingyu Zhang3, Jia Jia4, Jing Li4, Lu Xie4, Chaochun Wei5, Yixue Li1.
Abstract
BACKGROUND: Lysine acetylation is a crucial type of protein post-translational modification, which is involved in many important cellular processes and serious diseases. However, identification of protein acetylated sites through traditional experiment methods is time-consuming and laborious. Those methods are not suitable to identify a large number of acetylated sites quickly. Therefore, computational methods are still very valuable to accelerate lysine acetylated site finding. RESULT: In this study, many biological characteristics of acetylated sites have been investigated, such as the amino acid sequence around the acetylated sites, the physicochemical property of the amino acids and the transition probability of adjacent amino acids. A logistic regression method was then utilized to integrate these information for generating a novel lysine acetylation prediction system named LAceP. When compared with existing methods, LAceP overwhelms most of state-of-the-art methods. Especially, LAceP has a more balanced prediction capability for positive and negative datasets.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24586884 PMCID: PMC3930742 DOI: 10.1371/journal.pone.0089575
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1The data process pipeline of LAceP.
The dataset was derived from SysPTM 2.0 (http://lifecenter.sgst.cn/SysPTM/) and PhosphoSitePlus (http://www.phosphosite.org/). After eliminating redundancy, the non-redundant sites were obtained. Independent dataset was selected from positive dataset and negative dataset randomly at first. Then the remaining positive items and the same number of negative items, selected randomly from the whole negative dataset, were combined to construct training datasets. The selection process was iterated 10 times. After encoding three types of features, the logistic regression algorithm was utilized to build the classifier. After parameter optimization and performance evaluation, the best model was created. Finally, a web server of LAceP was established for biologist to use the prediction model.
Figure 2Compositional distribution of amino acids between acetylated and non-acetylated peptides.
The composition of amino acids in acetylated and non-acetylated peptides was displayed with the Two Logo software. It showed that for a position, composition of amino acids had a wide disparity between acetylated and non-acetylated peptides, especially those located in the positions of −7∼ −1 and 1∼7.
The impact of window sizes on the performance of LAceP.
| Window size | Sn (%) | Sp (%) | Acc (%) | MCC (%) |
| 13 | 66.15 | 69.10 | 67.62 | 35.26 |
| 15 | 67.06 | 69.57 | 68.31 | 36.64 |
| 17 | 67.29 | 69.86 | 68.57 | 37.17 |
| 19 | 67.93 |
| 68.95 | 37.91 |
| 21 |
| 69.95 |
|
|
| 23 | 67.96 | 69.91 | 68.93 | 37.88 |
| 25 | 67.97 | 69.81 | 68.89 | 37.78 |
The performance of models trained with different types of features.
| Training feature | Sn (%) | Sp (%) | Acc (%) | MCC (%) |
| AAPP | 61.24 | 63.29 | 62.27 | 24.54 |
| TPM | 65.08 | 64.90 | 64.99 | 29.98 |
| PSSC | 65.29 | 67.44 | 67.44 | 34.91 |
| AAPP+TPM+PSSC |
|
|
|
|
The comparison of performance between LAceP and existing methods.
| Method | Sn (%) | Sp (%) | Acc (%) | MCC (%) |
| EnsemblePail | 49.33 | 62.67 | 56.00 | 12.11 |
| PHOSIDA | 42.33 |
| 67.33 |
|
| PLMLA |
| 44.29 | 61.64 | 24.76 |
| PSKAcePred | 72.24 | 49.66 | 60.97 | 22.49 |
| LAceP | 61.33 | 75.40 |
| 37.88 |
Figure 3The web interface of LAceP.