| Literature DB >> 27322424 |
Wang-Ren Qiu1,2, Bi-Qian Sun1, Xuan Xiao1,3, Zhao-Chun Xu1, Kuo-Chen Chou3,4,5.
Abstract
Protein hydroxylation is a posttranslational modification (PTM), in which a CH group in Pro (P) or Lys (K) residue has been converted into a COH group, or a hydroxyl group (-OH) is converted into an organic compound. Closely associated with cellular signaling activities, this type of PTM is also involved in some major diseases, such as stomach cancer and lung cancer. Therefore, from the angles of both basic research and drug development, we are facing a challenging problem: for an uncharacterized protein sequence containing many residues of P or K, which ones can be hydroxylated, and which ones cannot? With the explosive growth of protein sequences in the post-genomic age, the problem has become even more urgent. To address such a problem, we have developed a predictor called iHyd-PseCp by incorporating the sequence-coupled information into the general pseudo amino acid composition (PseAAC) and introducing the "Random Forest" algorithm to operate the calculation. Rigorous jackknife tests indicated that the new predictor remarkably outperformed the existing state-of-the-art prediction method for the same purpose. For the convenience of most experimental scientists, a user-friendly web-server for iHyd-PseCp has been established at http://www.jci-bioinfo.cn/iHyd-PseCp, by which users can easily obtain their desired results without the need to go through the complicated mathematical equations involved.Entities:
Keywords: PTMs; general PseAAC; hydroxylysine; hydroxyproline; sequence-coupling model
Mesh:
Substances:
Year: 2016 PMID: 27322424 PMCID: PMC5190098 DOI: 10.18632/oncotarget.10027
Source DB: PubMed Journal: Oncotarget ISSN: 1949-2553
Figure 1A semi-screenshot to show the top-page of the iHyd-PseCp web-server at http://www.jci-bioinfo.cn/iHyd-PseCp
A comparison of the proposed predictor with the state-of-the-art method in identifying the HyP sites in proteins
| Predictor | Acc (%) | MCC | Sn (%) | Sp (%) |
|---|---|---|---|---|
| iHyd-PseAAC | 80.57 | 0.51 | 80.66 | 80.54 |
| iHyd-PseCp | 96.58 | 0.89 | 86.35 | 99.12 |
The scores here were generated by the rigorous jackknife tests on the 164 hydroxyproline proteins as adopted by Xu et al. [10].
The predictor developed by Xu et al. [10].
The predictor proposed in this paper.
See Eq.9 for the metrics definition.
A comparison of the proposed predictor with the state-of-the-art method in identifying the HyL sites in proteins
| Predictor | Acc (%) | MCC | Sn (%) | Sp (%) |
|---|---|---|---|---|
| iHyd-PseAAC | 83.56 | 0.50 | 87.85 | 83.01 |
| iHyd-PseCp | 97.08 | 0.86 | 78.77 | 99.80 |
The scores here were generated by the rigorous jackknife tests on the 33 hydroxylysine proteins as adopted by Xu et al. [10].
The predictor developed by Xu et al. [10].
The predictor proposed in this paper.
See Eq.9 for the metrics definition.
Figure 2The intuitive graphs of ROC curves to show the performance of iHyd-PseAAC [10] and iHyd-PseCp proposed in this paper, respectively, for the case of (A) HyP and (B) HyL
See the main text for further explanation.