| Literature DB >> 29099805 |
Xiaowei Zhao1,2, Xiaosa Zhao3, Lingling Bao4, Yonggang Zhang5, Jiangyan Dai6, Minghao Yin7.
Abstract
Glycation is a non-enzymatic process occurring inside or outside the host body by attaching a sugar molecule to a protein or lipid molecule. It is an important form of post-translational modification (PTM), which impairs the function and changes the characteristics of the proteins so that the identification of the glycation sites may provide some useful guidelines to understand various biological functions of proteins. In this study, we proposed an accurate prediction tool, named Glypre, for lysine glycation. Firstly, we used multiple informative features to encode the peptides. These features included the position scoring function, secondary structure, AAindex, and the composition of k-spaced amino acid pairs. Secondly, the distribution of distinctive features of the residues surrounding the glycation and non-glycation sites was statistically analysed. Thirdly, based on the distribution of these features, we developed a new predictor by using different optimal window sizes for different properties and a two-step feature selection method, which utilized the maximum relevance minimum redundancy method followed by a greedy feature selection procedure. The performance of Glypre was measured with a sensitivity of 57.47%, a specificity of 90.78%, an accuracy of 79.68%, area under the receiver-operating characteristic (ROC) curve (AUC) of 0.86, and a Matthews's correlation coefficient (MCC) of 0.52 by 10-fold cross-validation. The detailed analysis results showed that our predictor may play a complementary role to other existing methods for identifying protein lysine glycation. The source code and datasets of the Glypre are available in the Supplementary File.Entities:
Keywords: feature analysis; glycation sites; support vector machine
Mesh:
Substances:
Year: 2017 PMID: 29099805 PMCID: PMC6150326 DOI: 10.3390/molecules22111891
Source DB: PubMed Journal: Molecules ISSN: 1420-3049 Impact factor: 4.411
Figure 1System architectures of the proposed method.
Figure 2The position conservation M(l) value around the glycation and non-glycation sites.
Figure 3The sequence logo of lysine glycation sites.
Figure 4The distribution of coil structures of residues around glycation and non-glycation sites.
Figure 5The distribution of helix structures of residues around glycation and non-glycation sites.
Figure 6The distribution of the sheet structure of residues around glycation and non-glycation sites.
The accession numbers of the 20 amino acid indices.
| Accession Number | |||
|---|---|---|---|
| QIAN880101 | FUKS010109 | RACS820107 | QIAN880118 |
| FUKS010102 | CEDJ970104 | NAKH920106 | GEIM800107 |
| FUKS010101 | KUMS000103 | KARP850103 | PARS000102 |
| QIAN880102 | CHAM830102 | FUKS010104 | FUKS010110 |
| PALJ810108 | RACS820104 | QIAN880104 | BURA740102 |
Figure 7The distribution of the six amino acid indices of residues around glycation and non-glycation sites.
Figure 8The composition of the top-30 residue pairs resulting from the IG method.
Figure 9The two-sample-logos of the position-specific residue composition surrounding the glycation and non-glycation sites.
Figure 10The GFS curves of glycation site prediction.
Experimental results for our proposed predictor on Dataset 1. The results are the mean values (standard variation).
| Cross-Validation | Sen (%) | Spe (%) | Acc (%) | AUC | MCC |
|---|---|---|---|---|---|
| 10-fold | 57.47 (1.31) | 90.78 (0.56) | 79.68 (0.57) | 0.8629 (0.0035) | 0.5232 (0.0140) |
| 8-fold | 57.10 (1.41) | 90.95 (0.65) | 79.67 (0.71) | 0.8629 (0.0050) | 0.5227 (0.0175) |
| 6-fold | 56.30 (1.74) | 91.06 (0.74) | 79.47 (0.88) | 0.8600 (0.0059) | 0.5175 (0.0218) |
| LOO | 57.62 | 90.24 | 79.37 | 0.8693 | 0.5162 |
Experimental results for Glypre and the existing methods Gly-PseAAC, GlyNN, and PreGly on Dataset 2.
| Predictor | Sen (%) | Spe (%) | Acc (%) | AUC | MCC |
|---|---|---|---|---|---|
| Glypre a | 85.11 | 93.06 | 89.77 | 0.9557 | 0.7884 |
| Glypre b | 80.96 | 91.55 | 87.16 | 94.20 | 0.7344 |
| Gly-PseAAC | 56.06 | 80.17 | 68.12 | 0.7705 | 0.38 |
| PreGly a | 71.06 | 95.85 | 85.51 | - | 0.70 |
| GlyNN b | 78.65 | 80.15 | 79.50 | 0.77 | 0.58 |
a The result was obtained by 10-fold cross-validation b the result was obtained by three-fold cross-validation.
Experimental results for Glypre and Gly-PseAAC on the independent test dataset. We highlighted the posterior probability scores of successfully detecting glycation sites. The glycation sites of protein were listed, and the posterior probability scores of these two predictors were also shown.
| Protein | Glycation | Glypre | Gly-PseAAC |
|---|---|---|---|
| P62760 | 7, 18 | 0.3278, 0.5980 | <0.35, 0.3835 |
| Q9Y5I3 | 677 | 0.8407 | 0.5878 |
| Q9Y6P5 | 55 | 0.2178 | <0.35 |
| A6NE02 | 302 | 0.0722 | <0.35 |
| Q9NPC3 | 119 | 0.7346 | 0.3831 |
| P29122 | 573 | 0.4627 | <0.35 |
| O96005 | 207, 209 | 0.3228, 0.3646 | 0.5519, 0.5515 |
| P47869 | 231, 247 | 0.7250, 0.2822 | <0.35, <0.35 |
| Q8TC59 | 770 | 0.2745 | <0.35 |
| Q8IUR6 | 216, 493 | 0.1153, 0.2307 | <0.35, <0.35 |
| Q9Y587 | 53 | 0.0524 | <0.35 |
| P28289 | 191, 214, 221, 228, 249, 255, 286, 297, 308, 314 | 0.4737, 0.1668, 0.3140, 0.4933, 0.1197, 0.4824, 0.1432, 0.1932, 0.0227, 0.2218 | <0.35, <0.35, 0.7305, 0.4924, <0.35, <0.35, <0.35, <0.35, 0.3583, <0.35 |
| O94919 | 252, 281, 300 | 0.0252, 0.4890, 0.6587 | <0.35, <0.35, 0.3520 |
| P01877 | 155 | 0.1260 | <0.35 |
| Q93034 | 137 | 0.2791 | <0.35 |
| Q13011 | 267, 276 | 0.6926, 0.8448 | <0.35, 0.4122 |
| Q6P6C2 | 274 | 0.0244 | <0.35 |
| Q8IZI9 | 70 | 0.5000 | <0.35 |
| Q15084 | 73, 245 | 0.6981, 0.0404 | <0.35, <0.35 |
| Q8IY21 | 1077 | 0.5758 | <0.35 |
‘<0.35’ indicates the posterior probability score is less than 0.35.