| Literature DB >> 26679222 |
Ahmet Sinan Yavuz, Namık Berk Sözer, Osman Uğur Sezerman.
Abstract
BACKGROUND: Neddylation is a reversible post-translational modification that plays a vital role in maintaining cellular machinery. It is shown to affect localization, binding partners and structure of target proteins. Disruption of protein neddylation was observed in various diseases such as Alzheimer's and cancer. Therefore, understanding the neddylation mechanism and determining neddylation targets possibly bears a huge importance in further understanding the cellular processes. This study is the first attempt to predict neddylated sites from protein sequences by using several sequence and sequence-based structural features.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26679222 PMCID: PMC4682398 DOI: 10.1186/1471-2105-16-S18-S9
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Optimisation of window size and feature selection. (a) Effect of window size on mean classification AUC. Mean AUC of 100 repeats of 5-fold stratified cross validation was reported. Two standard errors are shown as error bars. (b) Feature selection using mRMR and incremental feature selection strategy. Number of features to be retained was determined using the mean classification AUC of 100 runs of 5-fold stratified cross validation as the main performance measure. X-axis represents the number of features used in classification. Two standard errors are represented by error bars in the graph. Maximum AUC was found to be 0.95 at 49 features.
Figure 2Sequence properties of neddylation sites in complete dataset. Sequence logo of lysine-centred windows, showing enrichment and depletion of amino acids in particular positions. This logo was created using Two-Sample Logos [9] with default parameters.
Top 10 of the selected features.
| Order | Feature | Position |
|---|---|---|
| 1 | M presence | -1 |
| 2 | PSSM score of K | -7 |
| 3 | I/V/L/M presence | +8 |
| 4 | Termini | - |
| 5 | D/E occurrence count | - |
| 6 | R presence | -3 |
| 7 | I presence | -5 |
| 8 | Hydrophobicity | -2 |
| 9 | A presence | -7 |
| 10 | V presence | -4 |
Effect of different feature sets in selected features on prediction performance.
| All selected features | 0.91 | 0.91 | 0.75 | 0.44 | 0.95 |
| without amino acid preferences | 0.88 | 0.89 | 0.63 | 0.33 | 0.88 |
| without amino acid preferences (grouped) | 0.89 | 0.90 | 0.65 | 0.35 | 0.91 |
| without disorder | 0.89 | 0.90 | 0.72 | 0.40 | 0.93 |
| without termini | 0.91 | 0.93 | 0.68 | 0.42 | 0.94 |
| without amino acid occurrence counts | 0.91 | 0.91 | 0.74 | 0.44 | 0.94 |
| without hydrophobicity features | 0.91 | 0.91 | 0.75 | 0.44 | 0.94 |
| without amino acid occurrence ratios | 0.91 | 0.92 | 0.72 | 0.43 | 0.94 |
| without PSSM features | 0.91 | 0.92 | 0.74 | 0.44 | 0.94 |
† Cross-validation (CV) results were reported as means of 100 repeats. As two standard errors were not exceeding 0.01, they were not reported.
Performance of classification model under different evaluation strategies.
| Evaluation Strategy | Acc | Sp | Sn | MCC | AUC |
|---|---|---|---|---|---|
| Self-consistency | 0.92 | 0.92 | 0.94 | 0.56 | |
| 5-fold stratified cross-validation† | 0.91 | 0.91 | 0.75 | 0.44 | |
| 10-fold stratified cross-validation† | 0.91 | 0.91 | 0.76 | 0.45 | |
| Validation set | 0.90 | 0.91 | 0.67 | 0.39 | |
| Holdout set | 0.90 | 0.91 | 0.64 | 0.35 | |
† Cross-validation (CV) results were reported as means of 100 repeats. As two standard errors were not exceeding 0.01, they were not reported.
Figure 3ROC curves. ROC curves of classification model with different evaluation strategies.
Comparison of SVM prediction performance to known or predicted motifs using test set.
| Predictor | Acc | Sp | Sn | MCC |
|---|---|---|---|---|
| AAIV[RQ]IMKS1 | 0.96 | 1.00 | 0.09 | 0.30 |
| IVRIMKS2 | 0.96 | 1.00 | 0.09 | 0.30 |
| [IL][VIT][RQ][IS][MLV]K[MAS][RHE]3 | 0.95 | 1.00 | 0.09 | 0.20 |
| SVM | 0.90 | 0.91 | ||
1This motif was derived using MEME normal mode. 2This motif was derived using MEME discriminative mode. For prediction of both motifs, same training sites with SVM training were used. 3 This motif was reviewed in [10].
Figure 4Venn diagrams of shared modification target sites between predicted neddylation sites and known sumoylation and ubiquitylation sites. Experimentally identified sumoylation and ubiquitylation sites were obtained from dbPTM [19]. Possible neddylation sites of these proteins were then predicted with neddylation prediction method. Number of common sumoylation, ubiquitylation and neddylation sites using a) medium and b) high neddylation prediction thresholds were reported in the diagram.