| Literature DB >> 24857907 |
Yan Xu1, Xin Wen2, Xiao-Jian Shao3, Nai-Yang Deng4, Kuo-Chen Chou5.
Abstract
Post-translational modifications (PTMs) play crucial roles in various cell functions and biological processes. Protein hydroxylation is one type of PTM that usually occurs at the sites of proline and lysine. Given an uncharacterized protein sequence, which site of its Pro (or Lys) can be hydroxylated and which site cannot? This is a challenging problem, not only for in-depth understanding of the hydroxylation mechanism, but also for drug development, because protein hydroxylation is closely relevant to major diseases, such as stomach and lung cancers. With the avalanche of protein sequences generated in the post-genomic age, it is highly desired to develop computational methods to address this problem. In view of this, a new predictor called "iHyd-PseAAC" (identify hydroxylation by pseudo amino acid composition) was proposed by incorporating the dipeptide position-specific propensity into the general form of pseudo amino acid composition. It was demonstrated by rigorous cross-validation tests on stringent benchmark datasets that the new predictor is quite promising and may become a useful high throughput tool in this area. A user-friendly web-server for iHyd-PseAAC is accessible at http://app.aporc.org/iHyd-PseAAC/. Furthermore, for the convenience of the majority of experimental scientists, a step-by-step guide on how to use the web-server is given. Users can easily obtain their desired results by following these steps without the need of understanding the complicated mathematical equations presented in this paper just for its integrity.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24857907 PMCID: PMC4057693 DOI: 10.3390/ijms15057594
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Figure 1.Schematic drawing to show protein hydroxylation occurring at (a) proline and (b) lysine to form hydroxyproline (HyP) and hydroxylysine (HyL), respectively.
Figure 2.An illustration to show Chou’s scheme for peptides with (2ξ + 1) residues and their centers being (a) proline and (b) lysine. Adapted from Chou [27,29] with permission.
Figure 3.Flowchart to show the process of how the iHyd-PseAAC (identify hydroxylation pseudo amino acid composition) predictor works in identifying the hydroxylated sites in proteins. PSDP, position-specific dipeptide propensity.
The jackknife test results by the new predictor on the benchmark datasets in the Supplementary Information S1 and S2. HyP, hydroxyproline; HyL, hydroxylysine; Sn, sensitivity; Sp, specificity; Acc, accuracy; MCC, Matthews correlation coefficient.
| Benchmark dataset | Sn (%) | Sp (%) | Acc (%) | MCC |
|---|---|---|---|---|
| 80.66 | 80.54 | 80.57 | 0.51 | |
| 87.85 | 83.01 | 83.56 | 0.50 |
None of the sequences included has more than 80% pairwise sequence identity with any other.
The jackknife test results by the iHyd-PseAAC predictor on the benchmark datasets in Supplementary Information S3 and S4.
| Benchmark dataset | Sn (%) | Sp (%) | Acc (%) | MCC |
|---|---|---|---|---|
| 70.68 | 89.03 | 78.42 | 0.52 | |
| 79.04 | 86.37 | 83.12 | 0.51 |
None of sequences included has more than 40% pairwise sequence identity with any other.
The overall success rates in identifying hydroxylated sites for the proteins retrieved from the Swiss-Prot database.
| Hydroxylated type | Sn (%) | Sp (%) | Acc (%) |
|---|---|---|---|
| Proline | 71.2 | 79.3 | 75.3 |
| Lysine | 72.7 | 80.6 | 76.8 |
Figure 4.The top-page of the web-server, iHud-PseAAC, at http://app.aporc.org/iHyd-PseAAC/.