| Literature DB >> 27183223 |
Qiqige Wuyun1, Wei Zheng1, Yanping Zhang2, Jishou Ruan1,3, Gang Hu1.
Abstract
Lysine acetylation is a major post-translational modification. It plays a vital role in numerous essential biological processes, such as gene expression and metabolism, and is related to some human diseases. To fully understand the regulatory mechanism of acetylation, identification of acetylation sites is first and most important. However, experimental identification of protein acetylation sites is often time consuming and expensive. Therefore, the alternative computational methods are necessary. Here, we developed a novel tool, KA-predictor, to predict species-specific lysine acetylation sites based on support vector machine (SVM) classifier. We incorporated different types of features and employed an efficient feature selection on each type to form the final optimal feature set for model learning. And our predictor was highly competitive for the majority of species when compared with other methods. Feature contribution analysis indicated that HSE features, which were firstly introduced for lysine acetylation prediction, significantly improved the predictive performance. Particularly, we constructed a high-accurate structure dataset of H.sapiens from PDB to analyze the structural properties around lysine acetylation sites. Our datasets and a user-friendly local tool of KA-predictor can be freely available at http://sourceforge.net/p/ka-predictor.Entities:
Mesh:
Substances:
Year: 2016 PMID: 27183223 PMCID: PMC4868276 DOI: 10.1371/journal.pone.0155370
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Statistics of acetylated proteins and sites information in training dataset and independent test set among four species, i.e. H. sapiens, M. musculus, E. coli, and S. typhimurium.
(The non-acetylated sites were selected at the ratio of 1:1 compared to acetylated sites.) Additionally, the statistics data of S. cerevisiae and R. norvegicus was marked with asterisk (*) which means that we removed them from our analysis because these two species had too few samples for training model.
| Species | Training dataset | Independent test set | ||
|---|---|---|---|---|
| Acetylated proteins | Acetylated sites | Acetylated proteins | Acetylated sites | |
| 930 | 1885 | 190 | 477 | |
| 341 | 744 | 84 | 188 | |
| 119 | 195 | 24 | 51 | |
| 147 | 200 | 35 | 50 | |
| Total | 1537 | 3024 | 333 | 766 |
Wilcoxon signed-rank test comparison of the statistical significance of the HSE between the acetylation sites and non-acetylation sites for all four species, i.e., H. sapiens, M. musculus, E. coli and S. typhimurium.
The tests were based on the training dataset. (p-value < 0.05 by Wilcoxon signed-rank test)
| Species | HSEAU | HSEAD | HSEBU | HSEBD |
|---|---|---|---|---|
| 0.0050 | 2.1756e-18 | 0.0159 | 0.3466 | |
| 0.0074 | 2.3397e-09 | 0.0012 | 8.3233e-08 | |
| 0.2886 | 0.1067 | 0.3876 | 0.8038 | |
| 0.0436 | 0.0060 | 0.0096 | 0.0262 |
The prediction performance of KA-predictor based on 5-fold cross-validation on training dataset.
| Species | MCC | ACC | SEN | SPE | PRE | AUC |
|---|---|---|---|---|---|---|
| 0.351 | 0.676 | 0.679 | 0.673 | 0.675 | 0.737 | |
| 0.342 | 0.671 | 0.685 | 0.657 | 0.667 | 0.723 | |
| 0.416 | 0.708 | 0.687 | 0.728 | 0.717 | 0.787 | |
| 0.386 | 0.693 | 0.720 | 0.665 | 0.682 | 0.756 |
Performance comparison of our predictor with other existing methods on independent test set.
As we utilized the same independent test set as SSPKA, the performance of other exiting methods are from SSPKA[22].
| Species | Methods | MCC | ACC | SEN | SPE | PRE | AUC |
|---|---|---|---|---|---|---|---|
| PLMLA | 0.296 | 0.648 | 0.633 | 0.663 | 0.667 | 0.689 | |
| Phosida | 0.136 | 0.568 | 0.553 | 0.583 | 0.585 | 0.597 | |
| LysAcet | 0.120 | 0.558 | 0.503 | 0.616 | 0.583 | 0.552 | |
| ensemblePail | 0.076 | 0.535 | 0.457 | 0.618 | 0.560 | 0.534 | |
| PSKAcePred | 0.111 | 0.556 | 0.553 | 0.558 | 0.571 | 0.556 | |
| BRABSB | 0.275 | 0.637 | 0.612 | 0.663 | 0.659 | 0.645 | |
| SSPKA | 0.214 | 0.600 | 0.482 | 0.725 | 0.652 | 0.606 | |
| Our Predictor | 0.257 | 0.629 | 0.696 | 0.558 | 0.626 | 0.657 | |
| PLMLA | 0.182 | 0.590 | 0.521 | 0.659 | 0.609 | 0.604 | |
| Phosida | 0.035 | 0.517 | 0.516 | 0.519 | 0.522 | 0.525 | |
| LysAcet | 0.137 | 0.568 | 0.590 | 0.546 | 0.569 | 0.590 | |
| ensemblePail | 0.104 | 0.550 | 0.431 | 0.670 | 0.570 | 0.555 | |
| PSKAcePred | 0.282 | 0.635 | 0.511 | 0.762 | 0.686 | 0.652 | |
| BRABSB | 0.172 | 0.584 | 0.511 | 0.659 | 0.604 | 0.592 | |
| SSPKA | 0.222 | 0.611 | 0.638 | 0.584 | 0.609 | 0.661 | |
| Our Predictor | 0.314 | 0.657 | 0.648 | 0.665 | 0.663 | 0.713 | |
| PLMLA | 0.255 | 0.627 | 0.608 | 0.647 | 0.633 | 0.675 | |
| Phosida | 0.258 | 0.627 | 0.706 | 0.549 | 0.610 | 0.662 | |
| LysAcet | 0.045 | 0.520 | 0.275 | 0.765 | 0.538 | 0.440 | |
| ensemblePail | -0.064 | 0.471 | 0.275 | 0.667 | 0.452 | 0.452 | |
| PSKAcePred | 0.020 | 0.510 | 0.412 | 0.608 | 0.512 | 0.492 | |
| BRABSB | 0.118 | 0.559 | 0.510 | 0.608 | 0.565 | 0.582 | |
| SSPKA | 0.321 | 0.657 | 0.549 | 0.765 | 0.700 | 0.687 | |
| Our Predictor | 0.375 | 0.686 | 0.745 | 0.627 | 0.667 | 0.734 | |
| PLMLA | 0.101 | 0.550 | 0.600 | 0.500 | 0.545 | 0.520 | |
| Phosida | 0.000 | 0.500 | 0.560 | 0.440 | 0.500 | 0.442 | |
| LysAcet | 0.100 | 0.550 | 0.560 | 0.540 | 0.549 | 0.514 | |
| ensemblePail | 0.000 | 0.500 | 0.280 | 0.720 | 0.500 | 0.491 | |
| PSKAcePred | 0.120 | 0.560 | 0.560 | 0.560 | 0.560 | 0.504 | |
| BRABSB | 0.042 | 0.520 | 0.360 | 0.680 | 0.529 | 0.495 | |
| SSPKA | 0.222 | 0.610 | 0.540 | 0.680 | 0.628 | 0.581 | |
| Our Predictor | 0.040 | 0.520 | 0.560 | 0.480 | 0.519 | 0.542 |
Performance comparison of different feature selection methods on independent test set.
| Species | Selection Methods | MCC | ACC | SEN | SPE | PRE | AUC |
|---|---|---|---|---|---|---|---|
| MRMD | 0.180 | 0.591 | 0.667 | 0.511 | 0.592 | 0.625 | |
| mRMR | 0.196 | 0.599 | 0.652 | 0.542 | 0.603 | 0.632 | |
| MI | 0.230 | 0.616 | 0.690 | 0.538 | 0.614 | 0.653 | |
| PCC | 0.257 | 0.629 | 0.696 | 0.558 | 0.626 | 0.657 | |
| MRMD | 0.255 | 0.627 | 0.670 | 0.584 | 0.621 | 0.686 | |
| mRMR | 0.303 | 0.651 | 0.660 | 0.643 | 0.653 | 0.705 | |
| MI | 0.206 | 0.603 | 0.601 | 0.605 | 0.608 | 0.683 | |
| PCC | 0.314 | 0.657 | 0.648 | 0.665 | 0.663 | 0.713 | |
| MRMD | 0.394 | 0.696 | 0.745 | 0.647 | 0.679 | 0.698 | |
| mRMR | 0.257 | 0.627 | 0.686 | 0.569 | 0.614 | 0.689 | |
| MI | 0.137 | 0.569 | 0.608 | 0.529 | 0.564 | 0.598 | |
| PCC | 0.375 | 0.686 | 0.745 | 0.627 | 0.667 | 0.734 | |
| MRMD | 0.062 | 0.530 | 0.640 | 0.420 | 0.525 | 0.561 | |
| mRMR | -0.201 | 0.490 | 0.620 | 0.360 | 0.492 | 0.503 | |
| MI | 0.081 | 0.540 | 0.600 | 0.480 | 0.536 | 0.508 | |
| PCC | 0.040 | 0.520 | 0.560 | 0.480 | 0.519 | 0.542 |
Performance comparison of LibSVM and LibD3C on independent test set.
| Species | Classifiers | MCC | ACC | SEN | SPE | PRE |
|---|---|---|---|---|---|---|
| LibD3C | 0.165 | 0.584 | 0.652 | 0.511 | 0.587 | |
| LibSVM | 0.257 | 0.629 | 0.696 | 0.558 | 0.626 | |
| LibD3C | 0.314 | 0.657 | 0.691 | 0.622 | 0.650 | |
| LibSVM | 0.314 | 0.657 | 0.648 | 0.665 | 0.663 | |
| LibD3C | 0.281 | 0.637 | 0.745 | 0.529 | 0.613 | |
| LibSVM | 0.375 | 0.686 | 0.745 | 0.627 | 0.667 | |
| LibD3C | 0 | 0.500 | 0.560 | 0.440 | 0.500 | |
| LibSVM | 0.040 | 0.520 | 0.560 | 0.480 | 0.519 |