| Literature DB >> 28361707 |
Shun-Long Weng1,2,3, Kai-Yao Huang4,5, Fergie Joanda Kaunang5, Chien-Hsun Huang5,6, Hui-Ju Kao5, Tzu-Hao Chang7, Hsin-Yao Wang8, Jang-Jih Lu9,10, Tzong-Yi Lee11,12.
Abstract
BACKGROUND: Protein carbonylation, an irreversible and non-enzymatic post-translational modification (PTM), is often used as a marker of oxidative stress. When reactive oxygen species (ROS) oxidized the amino acid side chains, carbonyl (CO) groups are produced especially on Lysine (K), Arginine (R), Threonine (T), and Proline (P). Nevertheless, due to the lack of information about the carbonylated substrate specificity, we were encouraged to develop a systematic method for a comprehensive investigation of protein carbonylation sites.Entities:
Keywords: Amino acid composition; Physicochemical properties; Protein carbonylation; Reactive Oxygen Species (ROS)
Mesh:
Substances:
Year: 2017 PMID: 28361707 PMCID: PMC5374553 DOI: 10.1186/s12859-017-1472-8
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Analytical flowchart of the identification of protein carbonylation sites
Data statistics of positive and negative sequences (with window size 21) in training and testing datasets
| Dataset | Residues | Number of proteins | Number of positive sequences | Number of negative sequences |
|---|---|---|---|---|
| Training dataset | K | 155 | 206 | 1166 |
| R | 90 | 101 | 504 | |
| T | 81 | 96 | 488 | |
| P | 77 | 94 | 412 | |
| Independent testing dataset | K | 67 | 78 | 301 |
| R | 65 | 67 | 276 | |
| T | 50 | 53 | 124 | |
| P | 71 | 82 | 304 |
Fig. 2Comparison of amino acid composition between carbonylated and non-carbnylated sites on K, R, T and P residues
Fig. 3Entropy and frequency plots of position-specific amino acid composition of four carbonylated residues
Fig. 4TwoSampleLogo of four carbonlated residues. a Two-Sample Logo of Lysine (K). b Two-Sample Logo of Arginine (R). c Two-Sample Logo of Thereonine (T). d Two-Sample Logo of Proline (P)
Fig. 5The frequency differences of 20 × 20 amino acid pairs between carbonylated sites and non-carbonylated sites of lysine, arginine, threonine and proline. The amino acid pair with red box indicates an over-representation in carbonylated sites (positive data) comparing to non-carbonylated sites (negative data); on the other hand, green box means an under-representation
Fig. 6Comparison of the solvent-accessible surface area between carbonylated and non-carbonylated sites on K, R, T and P residues
Fig. 7Top 10 physicochemical properties of carbonylated sites on lysine ranked by the average value of F-score measurement in 21-mer window
Five-fold cross-validation results of the models trained with various features for classifying between 206 carbonylated and 1166 non-carbonylated lysine residues
| Classifier | Training features | Sensitivity | Specificity | Accuracy | MCC |
|---|---|---|---|---|---|
| SVM | AA | 0.680 | 0.643 | 0.649 | 0.235 |
| AAC | 0.728 | 0.686 | 0.692 | 0.305 | |
| AAPC | 0.699 | 0.696 | 0.697 | 0.294 | |
| PWM |
| 0.715 |
|
| |
| PSSM | 0.704 | 0.686 | 0.689 | 0.288 | |
| ASA | 0.592 | 0.571 | 0.574 | 0.117 | |
| AAindex | 0.709 |
| 0.719 | 0.323 | |
| J48 DT | AA | 0.534 | 0.557 | 0.554 | 0.066 |
| AAC | 0.655 | 0.678 | 0.674 | 0.246 | |
| AAPC | 0.670 | 0.683 | 0.681 | 0.261 | |
| PWM | 0.689 | 0.674 | 0.676 | 0.267 | |
| PSSM | 0.621 | 0.660 | 0.655 | 0.207 | |
| ASA | 0.515 | 0.563 | 0.555 | 0.055 | |
| AAindex | 0.660 | 0.682 | 0.679 | 0.253 | |
| RF | AA | 0.660 | 0.635 | 0.638 | 0.214 |
| AAC | 0.704 | 0.686 | 0.689 | 0.288 | |
| AAPC | 0.709 | 0.703 | 0.704 | 0.307 | |
| PWM | 0.718 | 0.707 | 0.708 | 0.317 | |
| PSSM | 0.699 | 0.686 | 0.688 | 0.285 | |
| ASA | 0.583 | 0.583 | 0.583 | 0.119 | |
| AAindex | 0.709 | 0.717 | 0.716 | 0.319 |
The numbers marked with italicized font are the highest values in four measurements
Five-fold cross-validation results of the models trained with various features for classifying between 101 carbonylated and 504 non-carbonylated arginine residues
| Classifier | Training features | Sensitivity | Specificity | Accuracy | MCC |
|---|---|---|---|---|---|
| SVM | AA | 0.614 | 0.603 | 0.605 | 0.163 |
| AAC | 0.653 | 0.683 | 0.678 | 0.259 | |
| AAPC | 0.663 | 0.687 | 0.683 | 0.270 | |
| PWM |
| 0.718 | 0.717 |
| |
| PSSM | 0.624 | 0.685 | 0.674 | 0.239 | |
| ASA | 0.594 | 0.599 | 0.598 | 0.145 | |
| AAindex | 0.693 |
|
| 0.329 | |
| J48 DT | AA | 0.554 | 0.603 | 0.595 | 0.119 |
| AAC | 0.594 | 0.683 | 0.668 | 0.214 | |
| AAPC | 0.614 | 0.687 | 0.674 | 0.233 | |
| PWM | 0.614 | 0.675 | 0.664 | 0.222 | |
| PSSM | 0.554 | 0.665 | 0.646 | 0.169 | |
| ASA | 0.535 | 0.599 | 0.588 | 0.101 | |
| AAindex | 0.646 | 0.690 | 0.683 | 0.259 | |
| RF | AA | 0.614 | 0.605 | 0.607 | 0.165 |
| AAC | 0.634 | 0.683 | 0.674 | 0.244 | |
| AAPC | 0.653 | 0.683 | 0.678 | 0.259 | |
| PWM | 0.713 | 0.716 | 0.716 | 0.334 | |
| PSSM | 0.624 | 0.685 | 0.674 | 0.239 | |
| ASA | 0.594 | 0.599 | 0.598 | 0.145 | |
| AAindex | 0.693 | 0.724 | 0.719 | 0.327 |
The numbers marked with italicized font are the highest values in four measurements
Five-fold cross-validation results of the models trained with various features for classifying between 96 carbonylated and 488 non-carbonylated threonine residues
| Classifier | Training features | Sensitivity | Specificity | Accuracy | MCC |
|---|---|---|---|---|---|
| SVM | AA | 0.625 | 0.615 | 0.616 | 0.180 |
| AAC | 0.667 | 0.656 | 0.658 | 0.244 | |
| AAPC | 0.646 | 0.660 | 0.658 | 0.232 | |
| PWM |
| 0.672 |
|
| |
| PSSM | 0.656 | 0.656 | 0.656 | 0.236 | |
| ASA | 0.573 | 0.590 | 0.587 | 0.122 | |
| AAindex | 0.667 | 0.654 | 0.656 | 0.242 | |
| J48 DT | AA | 0.604 | 0.594 | 0.596 | 0.148 |
| AAC | 0.635 | 0.635 | 0.635 | 0.204 | |
| AAPC | 0.635 | 0.641 | 0.640 | 0.209 | |
| PWM | 0.625 | 0.637 | 0.635 | 0.198 | |
| PSSM | 0.604 | 0.598 | 0.599 | 0.151 | |
| ASA | 0.573 | 0.590 | 0.587 | 0.122 | |
| AAindex | 0.646 | 0.641 | 0.642 | 0.217 | |
| RF | AA | 0.625 | 0.617 | 0.618 | 0.181 |
| AAC | 0.656 | 0.652 | 0.652 | 0.233 | |
| AAPC | 0.646 | 0.652 | 0.651 | 0.225 | |
| PWM | 0.677 | 0.668 | 0.670 | 0.262 | |
| PSSM | 0.656 | 0.656 | 0.656 | 0.236 | |
| ASA | 0.583 | 0.594 | 0.592 | 0.133 | |
| AAindex | 0.656 |
| 0.673 | 0.254 |
The numbers makred with italicized font are the highest values in four measurements
Five-fold cross-validation results of the models trained with various features for classifying between 94 carbonylated and 412 non-carbonylated proline residues
| Classifier | Training features | Sensitivity | Specificity | Accuracy | MCC |
|---|---|---|---|---|---|
| SVM | AA | 0.638 | 0.655 | 0.652 | 0.233 |
| AAC | 0.713 | 0.716 | 0.715 | 0.347 | |
| AAPC | 0.646 | 0.728 | 0.713 | 0.309 | |
| PWM |
| 0.733 | 0.735 | 0.388 | |
| PSSM | 0.670 | 0.709 | 0.702 | 0.307 | |
| ASA | 0.585 | 0.607 | 0.603 | 0.151 | |
| AAindex | 0.702 |
|
| 0.375 | |
| J48 DT | AA | 0.617 | 0.607 | 0.609 | 0.176 |
| AAC | 0.638 | 0.631 | 0.632 | 0.212 | |
| AAPC | 0.638 | 0.636 | 0.636 | 0.216 | |
| PWM | 0.660 | 0.680 | 0.676 | 0.271 | |
| PSSM | 0.670 | 0.709 | 0.702 | 0.307 | |
| ASA | 0.574 | 0.583 | 0.581 | 0.123 | |
| AAindex | 0.649 | 0.709 | 0.698 | 0.290 | |
| RF | AA | 0.628 | 0.660 | 0.654 | 0.229 |
| AAC | 0.723 | 0.716 | 0.717 | 0.355 | |
| AAPC | 0.646 | 0.728 | 0.713 | 0.309 | |
| PWM | 0.734 | 0.733 | 0.733 | 0.380 | |
| PSSM | 0.660 | 0.704 | 0.696 | 0.294 | |
| ASA | 0.585 | 0.607 | 0.603 | 0.151 | |
| AAindex | 0.734 | 0.743 | 0.741 |
|
The numbers makred with italicized font are the highest values in four measurements
Five-fold cross-validation results of the models trained with the combination of hybrid features obtaining best predictive performance in training datasets
| Residue | Classifier | Hybrid features | Sn | Sp | Acc | MCC |
|---|---|---|---|---|---|---|
| K | SVM | PWM + AAC + AAindex | 0.796 | 0.767 | 0.771 | 0.432 |
| R | SVM | PWM + AAindex + AAPC | 0.782 | 0.798 | 0.795 | 0.472 |
| T | SVM | PWM + AAindex | 0.750 | 0.795 | 0.788 | 0.443 |
| P | RF | PWM + AAC + AAindex | 0.787 | 0.777 | 0.779 | 0.467 |
Comparison of independent testing results between our method and an available prediction tool (CarSPred)
| Method | Residue | TP | FP | TN | FN | Sensitivity | Specificity | Accuracy | MCC |
|---|---|---|---|---|---|---|---|---|---|
| Our method | K | 50 | 101 | 200 | 28 | 0.641 | 0.664 | 0.659 | 0.252 |
| R | 45 | 75 | 201 | 22 | 0.672 | 0.725 | 0.714 | 0.329 | |
| T | 40 | 49 | 75 | 13 | 0.755 | 0.605 | 0.650 | 0.329 | |
| P | 62 | 105 | 199 | 20 | 0.756 | 0.658 | 0.679 | 0.342 | |
| CarSPred | K | 44 | 112 | 189 | 34 | 0.564 | 0.631 | 0.617 | 0.161 |
| R | 40 | 80 | 196 | 27 | 0.597 | 0.706 | 0.685 | 0.252 | |
| T | 43 | 74 | 50 | 10 | 0.811 | 0.403 | 0.525 | 0.208 | |
| P | 56 | 134 | 170 | 26 | 0.683 | 0.559 | 0.585 | 0.198 |