| Literature DB >> 27153555 |
Jianhua Jia1,2, Zi Liu3, Xuan Xiao1,2, Bingxiang Liu1, Kuo-Chen Chou2,4.
Abstract
Carbonylation is a posttranslational modification (PTM or PTLM), where a carbonyl group is added to lysine (K), proline (P), arginine (R), and threonine (T) residue of a protein molecule. Carbonylation plays an important role in orchestrating various biological processes but it is also associated with many diseases such as diabetes, chronic lung disease, Parkinson's disease, Alzheimer's disease, chronic renal failure, and sepsis. Therefore, from the angles of both basic research and drug development, we are facing a challenging problem: for an uncharacterized protein sequence containing many residues of K, P, R, or T, which ones can be carbonylated, and which ones cannot? To address this problem, we have developed a predictor called iCar-PseCp by incorporating the sequence-coupled information into the general pseudo amino acid composition, and balancing out skewed training dataset by Monte Carlo sampling to expand positive subset. Rigorous target cross-validations on a same set of carbonylation-known proteins indicated that the new predictor remarkably outperformed its existing counterparts. For the convenience of most experimental scientists, a user-friendly web-server for iCar-PseCp has been established at http://www.jci-bioinfo.cn/iCar-PseCp, by which users can easily obtain their desired results without the need to go through the complicated mathematical equations involved. It has not escaped our notice that the formulation and approach presented here can also be used to analyze many other problems in computational proteomics.Entities:
Keywords: Monte Carlo sampling; PseAAC; carbonylation; random forest algorithm; sequence-coupling model
Mesh:
Substances:
Year: 2016 PMID: 27153555 PMCID: PMC5085176 DOI: 10.18632/oncotarget.9148
Source DB: PubMed Journal: Oncotarget ISSN: 1949-2553
Figure 1A semi-screenshot of the top-page for the web-server iCar-PseCp at http://www.jci-bioinfo.cn/iCar-PseCp
A comparison of the proposed predictor with the existing methods based on the 10-fold cross-validation on the same 250 carbonylated proteins
| Predictor | Metrics and graph | Type of carbonylation | |||
|---|---|---|---|---|---|
| K | P | R | T | ||
| PTMPred | Acc (%) | 88.59 | 82.93 | 86.64 | 88.39 |
| CarSpred | 87.22 | 82.93 | 86.22 | 86.61 | |
| iCar-PseCp | 84.43 | 86.79 | 84.23 | 86.17 | |
| PTMPred | MCC | 0.1892 | 0.2573 | 0.1878 | 0.2186 |
| CarSpred | 0.2268 | 0.2331 | 0.2245 | 0.2040 | |
| iCar-PseCp | 0.5906 | 0.6006 | 0.6076 | 0.6185 | |
| PTMPred | Sn (%) | 23.45 | 21.43 | 20.02 | 22.38 |
| CarSpred | 23.17 | 25.34 | 25.47 | 21.39 | |
| iCar-PseCp | 45.18 | 48.20 | 46.67 | 50.68 | |
| PTMPred | Sp (%) | 92.99 | 93.20 | 90.99 | 91.36 |
| CarSpred | 92.43 | 93.28 | 93.39 | 93.42 | |
| iCar-PseCp | 99.25 | 98.54 | 99.57 | 98.58 | |
| PTMPred | AUC | 0.6858 | 0.6903 | 0.5981 | 0.6563 |
| CarSpred | 0.6849 | 0.7163 | 0.7158 | 0.7134 | |
| iCar-PseCp | 0.8728 | 0.8484 | 0.8668 | 0.8603 | |
The predictor developed in [33], where ξ = 13; i.e. the sample length is 27.
The predictor developed in [32], where the sample length was not fixed.
The predictor proposed in this paper.
See Eq.9 for the definition of metrics.
The area under the curve of Figure.2; the greater the AUC value is, the better the corresponding predictor will be [52, 53].
Figure 2The intuitive graphs of ROC curves to show the performance of PTMPred, CarSpred, iCar-PseCp, respectively, for the case of the center residue is
(A) K, (B) P, (C) R, and (D) T. See the main text for further explanation.
Summary of carbonylation site samples in the benchmark dataset
| Subset | Carbonylation type and number of samples | |||
|---|---|---|---|---|
| ⊛ = K | ⊛ = P | ⊛ = R | ⊛ = T | |
| Positive | 300 | 126 | 136 | 121 |
| Negative | 1,949 | 792 | 847 | 732 |
See Eq.3 and the relevant text for further explanation.