| Literature DB >> 29322938 |
Hui-Ju Kao1, Shun-Long Weng2,3,4, Kai-Yao Huang1,5, Fergie Joanda Kaunang1, Justin Bo-Kai Hsu6, Chien-Hsun Huang7,8, Tzong-Yi Lee9,10.
Abstract
BACKGROUND: Carbonylation, which takes place through oxidation of reactive oxygen species (ROS) on specific residues, is an irreversibly oxidative modification of proteins. It has been reported that the carbonylation is related to a number of metabolic or aging diseases including diabetes, chronic lung disease, Parkinson's disease, and Alzheimer's disease. Due to the lack of computational methods dedicated to exploring motif signatures of protein carbonylation sites, we were motivated to exploit an iterative statistical method to characterize and identify carbonylated sites with motif signatures.Entities:
Keywords: Maximal dependence decomposition; Profile hidden Markov model; Protein carbonylation; Reactive oxygen species (ROS); Substrate motifs
Mesh:
Substances:
Year: 2017 PMID: 29322938 PMCID: PMC5763492 DOI: 10.1186/s12918-017-0511-4
Source DB: PubMed Journal: BMC Syst Biol ISSN: 1752-0509
Number of positive and negative training sequences on K, R, T, and P residues
| Residue | Number of carbonylated proteins | Dataset | Number of sequences | TOTAL |
|---|---|---|---|---|
| K | 162 | Positive | 256 | 768 |
| Negative | 512 | |||
| R | 96 | Positive | 115 | 345 |
| Negative | 230 | |||
| T | 85 | Positive | 109 | 327 |
| Negative | 218 | |||
| P | 82 | Positive | 109 | 327 |
| Negative | 218 |
Fig. 1Analytical flowchart of maximal dependence decomposition
Number of positive and negative testing sequences on K, R, T, and P residues
| Residue | Number of carbonylated proteins | Dataset | Number of sequences | TOTAL |
|---|---|---|---|---|
| K | 80 | Positive | 85 | 255 |
| Negative | 170 | |||
| R | 71 | Positive | 72 | 216 |
| Negative | 144 | |||
| T | 62 | Positive | 63 | 189 |
| Negative | 126 | |||
| P | 71 | Positive | 82 | 246 |
| Negative | 164 |
Fig. 2Amino acid composition of carbonylation and non-carbonylation sites on K, R, T and P residues
Five-fold cross-validation results of the SVM models trained with various features for discriminating between positive and negative training datasets
| Residue | Training features | Sn | Sp | Acc | MCC | AUC |
|---|---|---|---|---|---|---|
| K | Amino acid composition (AAC) | 0.70 | 0.69 | 0.69 | 0.37 | 0.78 |
| Amino acid pairs composition (AAPC) | 0.66 | 0.65 | 0.65 | 0.29 | 0.71 | |
| Amino acid sequence (AA) | 0.68 | 0.64 | 0.65 | 0.23 | 0.67 | |
| Positional weighted matrix (PWM) | 0.74 | 0.67 | 0.69 | 0.37 | 0.78 | |
| Position specific scoring matrix (PSSM) | 0.63 | 0.61 | 0.62 | 0.16 | 0.61 | |
| BLOSUM62 (B62) | 0.63 | 0.60 | 0.61 | 0.15 | 0.59 | |
| R | Amino acid composition (AAC) | 0.66 | 0.63 | 0.64 | 0.28 | 0.70 |
| Amino acid pairs composition (AAPC) | 0.62 | 0.61 | 0.61 | 0.22 | 0.65 | |
| Amino acid sequence (AA) | 0.62 | 0.62 | 0.62 | 0.17 | 0.62 | |
| Positional weighted matrix (PWM) | 0.71 | 0.70 | 0.70 | 0.39 | 0.80 | |
| Position specific scoring matrix (PSSM) | 0.61 | 0.56 | 0.58 | 0.14 | 0.59 | |
| BLOSUM62 (B62) | 0.62 | 0.62 | 0.62 | 0.17 | 0.62 | |
| T | Amino acid composition (AAC) | 0.74 | 0.70 | 0.72 | 0.41 | 0.82 |
| Amino acid pairs composition (AAPC) | 0.69 | 0.68 | 0.69 | 0.35 | 0.75 | |
| Amino acid sequence (AA) | 0.63 | 0.62 | 0.62 | 0.18 | 0.63 | |
| Positional weighted matrix (PWM) | 0.69 | 0.67 | 0.68 | 0.32 | 0.73 | |
| Position specific scoring matrix (PSSM) | 0.65 | 0.65 | 0.65 | 0.29 | 0.70 | |
| BLOSUM62 (B62) | 0.58 | 0.50 | 0.53 | 0.08 | 0.53 | |
| P | Amino acid composition (AAC) | 0.72 | 0.70 | 0.70 | 0.39 | 0.80 |
| Amino acid pairs composition (AAPC) | 0.68 | 0.64 | 0.65 | 0.30 | 0.71 | |
| Amino acid sequence (AA) | 0.64 | 0.66 | 0.65 | 0.23 | 0.67 | |
| Positional weighted matrix (PWM) | 0.72 | 0.73 | 0.73 | 0.42 | 0.82 | |
| Position specific scoring matrix (PSSM) | 0.66 | 0.68 | 0.67 | 0.32 | 0.73 | |
| BLOSUM62 (B62) | 0.61 | 0.58 | 0.59 | 0.15 | 0.60 |
Fig. 3MDDLogo-identified substrate motifs of carbonylated (a) K, (b) R, (c) T, and (d) P sites
Comparison of independent testing results among various models in this work
| Residue | Model | Sn | Sp | Acc | MCC | AUC |
|---|---|---|---|---|---|---|
| K | Single SVM trained with AAC | 0.65 | 0.68 | 0.67 | 0.31 | 0.72 |
| Single SVM trained with PWM | 0.67 | 0.68 | 0.68 | 0.33 | 0.73 | |
| Single profile HMM trained from all data | 0.69 | 0.68 | 0.69 | 0.35 | 0.74 | |
| Multiple profile HMMs trained from MDDLogo-clustered subgroups | 0.85 | 0.47 | 0.60 | 0.31 | 0.68 | |
| Single SVM trained from multiple profile HMMs (MDD-Carb) | 0.80 | 0.76 | 0.77 | 0.53 | 0.84 | |
| R | Single SVM trained with AAC | 0.62 | 0.62 | 0.62 | 0.23 | 0.66 |
| Single SVM trained with PWM | 0.65 | 0.65 | 0.65 | 0.29 | 0.70 | |
| Single profile HMM trained from all data | 0.68 | 0.66 | 0.67 | 0.33 | 0.72 | |
| Multiple profile HMMs trained from MDDLogo-clustered subgroups | 0.90 | 0.55 | 0.67 | 0.44 | 0.81 | |
| Single SVM trained from multiple profile HMMs (MDD-Carb) | 0.79 | 0.73 | 0.75 | 0.49 | 0.83 | |
| T | Single SVM trained with AAC | 0.63 | 0.71 | 0.69 | 0.34 | 0.73 |
| Single SVM trained with PWM | 0.67 | 0.71 | 0.70 | 0.36 | 0.74 | |
| Single profile HMM trained from all data | 0.67 | 0.71 | 0.70 | 0.36 | 0.74 | |
| Multiple profile HMMs trained from MDDLogo-clustered subgroups | 0.93 | 0.56 | 0.69 | 0.48 | 0.80 | |
| Single SVM trained from multiple profile HMMs (MDD-Carb) | 0.79 | 0.76 | 0.77 | 0.53 | 0.84 | |
| P | Single SVM trained with AAC | 0.63 | 0.61 | 0.62 | 0.23 | 0.66 |
| Single SVM trained with PWM | 0.69 | 0.67 | 0.68 | 0.34 | 0.74 | |
| Single profile HMM trained from all data | 0.69 | 0.67 | 0.68 | 0.34 | 0.74 | |
| Multiple profile HMMs trained from MDDLogo-clustered subgroups | 0.88 | 0.49 | 0.62 | 0.35 | 0.76 | |
| Single SVM trained from multiple profile HMMs (MDD-Carb) | 0.77 | 0.74 | 0.75 | 0.49 | 0.82 |
Fig. 4Comparison of independent testing results between MDD-Carb and two existing prediction tools. (a) Independent testing results on K carbonylation sites, (b) Independent testing results on T carbonylation sites, (c) Independent testing results on R carbonylation sites, and (d) Independent testing results on P carbonylation sites
Fig. 5A case study of carbonylation sites prediction on Protein FRG2-like-1 (FRG2B)