| Literature DB >> 21789187 |
Tzong-Yi Lee1, Yi-Ju Chen, Tsung-Cheng Lu, Hsien-Da Huang, Yu-Ju Chen.
Abstract
S-nitrosylation, the covalent attachment of a nitric oxide to (NO) the sulfur atom of cysteine, is a selective and reversible protein post-translational modification (PTM) that regulates protein activity, localization, and stability. Despite its implication in the regulation of protein functions and cell signaling, the substrate specificity of cysteine S-nitrosylation remains unknown. Based on a total of 586 experimentally identified S-nitrosylation sites from SNAP/L-cysteine-stimulated mouse endothelial cells, this work presents an informatics investigation on S-nitrosylation sites including structural factors such as the flanking amino acids composition, the accessible surface area (ASA) and physicochemical properties, i.e. positive charge and side chain interaction parameter. Due to the difficulty to obtain the conserved motifs by conventional motif analysis, maximal dependence decomposition (MDD) has been applied to obtain statistically significant conserved motifs. Support vector machine (SVM) is applied to generate predictive model for each MDD-clustered motif. According to five-fold cross-validation, the MDD-clustered SVMs could achieve an accuracy of 0.902, and provides a promising performance in an independent test set. The effectiveness of the model was demonstrated on the correct identification of previously reported S-nitrosylation sites of Bos taurus dimethylarginine dimethylaminohydrolase 1 (DDAH1) and human hemoglobin subunit beta (HBB). Finally, the MDD-clustered model was adopted to construct an effective web-based tool, named SNOSite (http://csb.cse.yzu.edu.tw/SNOSite/), for identifying S-nitrosylation sites on the uncharacterized protein sequences.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21789187 PMCID: PMC3137596 DOI: 10.1371/journal.pone.0021849
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
The statistics of experimentally verified S-nitrosylation sites in training set and independent test set.
| Data set | Species | Number of S-nitrosylated proteins | Number of S-nitrosylated cysteine | Number of non-S-nitrosylated cysteine |
|
| Mouse | 384 | 586 | 2,728 |
|
| Multiple | 327 | 479 | 2,501 |
Figure 1The flowchart of MDD clustering.
Figure 2Frequency plot of sequence logo of S-nitrosylation sites with 21-mer window length.
Figure 3The compositional biases of amino acids around S-nitrosylation sites compared to the non-S-nitrosylation sites.
The amino acids that are significantly enriched or depleted (P-value<0.05) around S-nitrosylation sites are presented.
Figure 4Comparison of average percentage of ASA in the 21-mer window (−10∼+10) between S-nitrosylation and non-S-nitrosylation sites.
The cross-validation performance of the models trained with various features.
| Training features | Sn | Sp | Pre | Acc | BAcc | MCC |
| Amino Acid (AA_20D) | 0.556 | 0.574 | 0.199 | 0.572 | 0.566 | 0.097 |
| Amino Acid (AA_PWM) | 0.585 | 0.586 | 0.212 | 0.587 | 0.586 | 0.127 |
| Amino Acid Composition (AAC) | 0.579 | 0.605 | 0.218 | 0.602 | 0.593 | 0.137 |
| Accessible Surface Area (ASA) | 0.540 | 0.553 | 0.187 | 0.552 | 0.547 | 0.069 |
| 21 Motifs | 0.556 | 0.563 | 0.195 | 0.562 | 0.560 | 0.088 |
|
|
|
|
|
|
|
|
| AA_PWM+ASA | 0.561 | 0.583 | 0.204 | 0.580 | 0.573 | 0.108 |
| AA_PWM+21 Motifs | 0.561 | 0.572 | 0.199 | 0.570 | 0.567 | 0.098 |
| AA_PWM+AAC+ASA | 0.578 | 0.603 | 0.217 | 0.599 | 0.591 | 0.134 |
| AA_PWM+AAC+21 Motifs | 0.572 | 0.601 | 0.204 | 0.593 | 0.587 | 0.130 |
| AA_PWM+AAC+ASA+21 Motifs | 0.588 | 0.589 | 0.214 | 0.589 | 0.589 | 0.131 |
Abbreviation: AA_20D, amino acids coding with 20-dimensional vector; AA_PWM, positional weighted matrix of flanking amino acids; ASA, accessible surface area; Pre, precision; Sn, sensitivity; Sp, specificity; Acc, accuracy; BAcc, balanced accuracy; MCC, Matthews Correlation Coefficient.
Figure 5The top twenty physicochemical properties of S-nitrosylation sites ranked by the average value of F-score measurement in 21-mer window.
KRIW710101, side chain interaction parameter [48]; OOBM850105, optimized side chain interaction parameter [49]; FINA910104, helix termination parameter at position j+1 [50]; FAUJ880111, positive charge [51]; GUYH850104, apparent partition energies calculated from Janin index [52]; KIDA850101, hydrophobicity-related index [53]; GUYH850101, partition energy [52]; JANJ780101, average accessible surface area [54]; ROSM880102, side chain hydropathy [55]; CHOC760102, residue accessible surface area in folded protein [56]; FASG890101, hydrophobicity index [57]; RACS820103, average relative fractional occurrence in AL(i) [58]; KARP850103, flexibility parameter for two rigid neighbors [59]; OOBM770101, average non-bonded energy per atom [60]; LEVM760101, hydrophobic parameter [61]; MEIH800102, average reduced distance for side chain [62]; GUYH850105, apparent partition energies calculated from Chothia index [52]; KRIW790102, fraction of site occupied by water [63]; PUNT030101, knowledge-based membrane-propensity scale from 1D_Helix in MPtopo databases [64]; WEBA780101, RF value in high salt chromatography [65].
The cross-validation performance of the models trained individually with twenty physicochemical properties.
| AAindex ID | Description | Sn | Sp | Pre | Acc | BAcc | MCC |
| KRIW710101 | Side chain interaction parameter (Krigbaum-Rubin, 1971) | 0.554 | 0.572 | 0.197 | 0.569 | 0.563 | 0.093 |
| OOBM850105 | Optimized side chain interaction parameter (Oobatake | 0.547 | 0.560 | 0.191 | 0.558 | 0.554 | 0.079 |
| FINA910104 | Helix termination parameter at posision j+1 (Finkelstein | 0.558 | 0.566 | 0.196 | 0.565 | 0.562 | 0.092 |
|
|
|
|
|
|
|
|
|
| GUYH850104 | Apparent partition energies calculated from Janin index (Guy, 1985) | 0.551 | 0.574 | 0.197 | 0.570 | 0.562 | 0.092 |
| KIDA850101 | Hydrophobicity-related index (Kidera | 0.549 | 0.555 | 0.189 | 0.554 | 0.552 | 0.076 |
| GUYH850101 | Partition energy (Guy, 1985) | 0.545 | 0.570 | 0.194 | 0.566 | 0.558 | 0.085 |
| JANJ780101 | Average accessible surface area (Janin | 0.563 | 0.575 | 0.201 | 0.573 | 0.569 | 0.102 |
| ROSM880102 | Side chain hydropathy, corrected for solvation (Roseman, 1988) | 0.540 | 0.545 | 0.183 | 0.544 | 0.542 | 0.062 |
| CHOC760102 | Residue accessible surface area in folded protein (Chothia, 1976) | 0.565 | 0.582 | 0.204 | 0.579 | 0.574 | 0.109 |
| FASG890101 | Hydrophobicity index (Fasman, 1989) | 0.535 | 0.560 | 0.187 | 0.556 | 0.547 | 0.069 |
| RACS820103 | Average relative fractional occurrence in AL(i) (Rackovsky-Scheraga, 1982) | 0.511 | 0.533 | 0.172 | 0.530 | 0.522 | 0.033 |
| KARP850103 | Flexibility parameter for two rigid neighbors (Karplus-Schulz, 1985) | 0.542 | 0.565 | 0.191 | 0.561 | 0.553 | 0.079 |
| OOBM770101 | Average non-bonded energy per atom (Oobatake-Ooi, 1977) | 0.545 | 0.563 | 0.191 | 0.560 | 0.554 | 0.080 |
| LEVM760101 | Hydrophobic parameter (Levitt, 1976) | 0.545 | 0.556 | 0.189 | 0.554 | 0.551 | 0.075 |
| MEIH800102 | Average reduced distance for side chain (Meirovitch | 0.542 | 0.559 | 0.189 | 0.556 | 0.550 | 0.074 |
| GUYH850105 | Apparent partition energies calculated from Chothia index (Guy, 1985) | 0.556 | 0.563 | 0.194 | 0.562 | 0.560 | 0.088 |
| KRIW790102 | Fraction of site occupied by water (Krigbaum-Komoriya, 1979) | 0.531 | 0.554 | 0.184 | 0.550 | 0.542 | 0.062 |
| PUNT030101 | Knowledge-based membrane-propensity scale from 1D_Helix in MPtopo databases (Punta-Maritan, 2003) | 0.543 | 0.566 | 0.192 | 0.562 | 0.555 | 0.081 |
| WEBA780101 | RF value in high salt chromatography (Weber-Lacey, 1978) | 0.538 | 0.553 | 0.186 | 0.550 | 0.545 | 0.067 |
The physicochemical property that contains the highest accuracy is highlighted in bold. Abbreviation: Pre, precision; Sn, sensitivity; Sp, specificity; Acc, accuracy; BAcc, balanced accuracy; MCC, Matthews Correlation Coefficient.
The predictive performance of MDD-clustered models using an independent test set (GPS-SNO).
| Species | Number of proteins | Number of positive data | Number of negative data | TP | TN | FP | FN | Sn | Sp | Pre | Acc | BAcc | MCC |
|
| 327 | 479 | 2501 | 386 | 1485 | 1016 | 93 | 0.805 | 0.593 | 0.275 | 0.627 | 0.699 | 0.294 |
|
| 117 | 211 | 1055 | 159 | 655 | 400 | 52 | 0.753 | 0.620 | 0.284 | 0.642 | 0.687 | 0.280 |
|
| 84 | 106 | 568 | 88 | 378 | 190 | 18 | 0.830 | 0.665 | 0.316 | 0.691 | 0.747 | 0.366 |
|
| 70 | 105 | 597 | 86 | 385 | 212 | 19 | 0.819 | 0.644 | 0.288 | 0.670 | 0.731 | 0.334 |
|
| 39 | 37 | 152 | 26 | 112 | 40 | 11 | 0.702 | 0.736 | 0.393 | 0.730 | 0.719 | 0.365 |
|
| 5 | 5 | 36 | 5 | 5 | 31 | 0 | 1 | 0.138 | 0.138 | 0.243 | 0.569 | 0.138 |
|
| 2 | 5 | 34 | 5 | 4 | 30 | 0 | 1 | 0.117 | 0.142 | 0.230 | 0.558 | 0.129 |
Abbreviation: TP, true positive; TN, true negative; FP, false positive; FN, false negative; Pre, precision; Sn, sensitivity; Sp, specificity; Acc, accuracy; BAcc, balanced accuracy; MCC, Matthews Correlation Coefficient; ARATH, Arabidopsis thaliana; BOVIN, Bos taurus.
Figure 6A case study of Bos taurus dimethylarginine dimethylaminohydrolase 1 (DDAH1) which contains two S-nitrosylation sites at positions 222 and 274.