| Literature DB >> 26828594 |
Zhao Li1, Jijun Tang1,2, Fei Guo1.
Abstract
The 14-3-3 proteins are a highly conserved family of homodimeric and heterodimeric molecules, expressed in all eukaryotic cells. In human cells, this family consists of seven distinct but highly homologous 14-3-3 isoforms. 14-3-3σ is the only isoform directly linked to cancer in epithelial cells, which is regulated by major tumor suppressor genes. For each 14-3-3 isoform, we have 1,000 peptide motifs with experimental binding affinity values. In this paper, we present a novel method for identifying peptide motifs binding to 14-3-3σ isoform. First, we propose a sampling criteria to build a predictor for each new peptide sequence. Then, we select nine physicochemical properties of amino acids to describe each peptide motif. We also use auto-cross covariance to extract correlative properties of amino acids in any two positions. Finally, we consider elastic net to predict affinity values of peptide motifs, based on ridge regression and least absolute shrinkage and selection operator (LASSO). Our method tests on the 1,000 known peptide motifs binding to seven 14-3-3 isoforms. On the 14-3-3σ isoform, our method has overall pearson-product-moment correlation coefficient (PCC) and root mean squared error (RMSE) values of 0.84 and 252.31 for N-terminal sublibrary, and 0.77 and 269.13 for C-terminal sublibrary. We predict affinity values of 16,000 peptide sequences and relative binding ability across six permutated positions similar with experimental values. We identify phosphopeptides that preferentially bind to 14-3-3σ over other isoforms. Several positions on peptide motifs are in the same amino acid category with experimental substrate specificity of phosphopeptides binding to 14-3-3σ. Our method is fast and reliable and is a general computational method that can be used in peptide-protein binding identification in proteomics research.Entities:
Mesh:
Substances:
Year: 2016 PMID: 26828594 PMCID: PMC4734684 DOI: 10.1371/journal.pone.0147467
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1The architecture of the computational approach to identifying 14-3-3 Proteins Phosphopeptide-Binding Specificity.
Five categories of 20 amino acids.
| Category | Amino Acids |
|---|---|
| Amino Acids with Positive Charged Side Chains | R, H, K |
| Amino Acids with Negative Charged Side Chains | D, E |
| Amino Acids with Polar Uncharged Side Chains | S, T, N, Q |
| Amino Acids with Hydrophobic Side Chains | A, I, L, M, F, W, Y, V |
| Special Cases | C, G, P |
a Standard abbreviations are used for all amino acids.
Nine physicochemical properties for 20 amino acid types.
| Physicochemical Properties | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| SASA | NCI | MASS | |||||||
| A | 0.62 | -0.5 | 2 | 27.5 | 8.1 | 0.046 | 1.181 | 0.007187 | 71.0788 |
| C | 0.29 | -1 | 2 | 44.6 | 5.5 | 0.128 | 1.461 | -0.03661 | 103.1388 |
| D | -0.9 | 3 | 4 | 40 | 13 | 0.105 | 1.587 | -0.02382 | 115.0886 |
| E | -0.74 | 3 | 4 | 62 | 12.3 | 0.151 | 1.862 | 0.006802 | 129.1155 |
| F | 1.19 | -2.5 | 2 | 115.5 | 5.2 | 0.29 | 2.228 | 0.037552 | 147.1766 |
| G | 0.48 | 0 | 2 | 0 | 9 | 0 | 0.881 | 0.179052 | 57.0519 |
| H | -0.4 | -0.5 | 4 | 79 | 10.4 | 0.23 | 2.025 | -0.01069 | 137.1411 |
| I | 1.38 | -1.8 | 2 | 93.5 | 5.2 | 0.186 | 1.81 | 0.021631 | 113.1594 |
| K | -1.5 | 3 | 2 | 100 | 11.3 | 0.219 | 2.258 | 0.017708 | 128.1741 |
| L | 1.06 | -1.8 | 2 | 93.5 | 4.9 | 0.186 | 1.931 | 0.051672 | 113.1594 |
| M | 0.64 | -1.3 | 2 | 94.1 | 5.7 | 0.221 | 2.034 | 0.002683 | 131.1986 |
| N | -0.78 | 2 | 4 | 58.7 | 11.6 | 0.134 | 1.655 | 0.005392 | 114.1039 |
| P | 0.12 | 0 | 2 | 41.9 | 8 | 0.131 | 1.468 | 0.239531 | 97.1167 |
| Q | -0.85 | 0.2 | 4 | 80.7 | 10.5 | 0.18 | 1.932 | 0.049211 | 128.1307 |
| R | -2.53 | 3 | 4 | 105 | 10.5 | 0.18 | 1.932 | 0.049211 | 156.1875 |
| S | -0.18 | 0.3 | 4 | 29.3 | 9.2 | 0.062 | 1.298 | 0.004627 | 87.0782 |
| T | -0.05 | -0.4 | 4 | 51.3 | 8.6 | 0.108 | 1.525 | 0.003352 | 101.1051 |
| V | 1.08 | -1.5 | 2 | 71.5 | 5.9 | 0.14 | 1.645 | 0.057004 | 99.1326 |
| W | 0.81 | -3.4 | 3 | 145.5 | 5.4 | 0.409 | 2.663 | 0.037977 | 186.2132 |
| Y | 0.26 | -2.3 | 3 | 117.3 | 6.2 | 0.298 | 2.368 | 0.023599 | 163.1760 |
a H1, hydrophobicity; H2, hydrophicility; H3, hydrogen bond; V, volumes of side chains; P1, polarity; P2, polarizability; SASA, solvent-accessible surface area; NCI, net charge index of side chains; MASS, average mass of amino acid.
Details on predicting peptide motifs binding to 14-3-3 isoforms.
| N-terminal | C-terminal | |||
|---|---|---|---|---|
| PCC | RMSE | PCC | RMSE | |
| 0.84 | 252.31 | 0.77 | 269.13 | |
| 0.72 | 229.12 | 0.63 | 245.10 | |
| 0.83 | 417.38 | 0.75 | 491.73 | |
| 0.81 | 230.83 | 0.71 | 252.94 | |
| 0.86 | 470.08 | 0.79 | 463.40 | |
| 0.78 | 637.67 | 0.72 | 678.95 | |
| 0.87 | 2087.20 | 0.81 | 2365.42 | |
Fig 2Position-specific scoring matrix on top 50 motifs identified from 1,000 peptide sequences against individual 14-3-3 isoforms.
14-3-3 preferences determined with different methods on 1,000 peptide motifs.
| Position Relative to p(S/T) | ||||||
|---|---|---|---|---|---|---|
| H.S. Lu | ||||||
| Our Method | ||||||
Fig 3Binding affinity of seven 14-3-3 isoforms across six positions from top-50 peptides from both N- and C-terminal sublibrary.
Prediction results of peptide motifs binding to 14-3-3 isoforms by different regression techniques.
| Elastic Net | Simple Linear Regression | Support Vector Regression | Neural Network | |||||
|---|---|---|---|---|---|---|---|---|
| PCC | RMSE | PCC | RMSE | PCC | RMSE | PCC | RMSE | |
| N-terminal | ||||||||
| 0.84 | 252.31 | 0.82 | 261.69 | 0.79 | 283.16 | 0.60 | 368.39 | |
| 0.72 | 229.12 | 0.69 | 238.40 | 0.70 | 236.18 | 0.57 | 270.43 | |
| 0.83 | 417.38 | 0.82 | 498.71 | 0.80 | 529.34 | 0.64 | 675.74 | |
| 0.81 | 230.83 | 0.80 | 238.09 | 0.79 | 239.43 | 0.55 | 327.70 | |
| 0.86 | 470.08 | 0.86 | 474.16 | 0.83 | 506.56 | 0.59 | 745.79 | |
| 0.78 | 637.67 | 0.78 | 637.58 | 0.75 | 669.53 | 0.56 | 844.41 | |
| 0.87 | 2087.20 | 0.88 | 2042.67 | 0.84 | 2306.04 | 0.56 | 3526.35 | |
| C-terminal | ||||||||
| 0.77 | 269.13 | 0.76 | 273.19 | 0.74 | 279.54 | 0.64 | 321.78 | |
| 0.63 | 245.10 | 0.61 | 247.96 | 0.59 | 252.64 | 0.51 | 269.64 | |
| 0.75 | 491.73 | 0.74 | 479.30 | 0.73 | 483.90 | 0.63 | 550.81 | |
| 0.71 | 252.94 | 0.69 | 256.66 | 0.69 | 257.90 | 0.48 | 311.73 | |
| 0.79 | 463.40 | 0.79 | 459.40 | 0.80 | 454.01 | 0.68 | 558.68 | |
| 0.72 | 678.95 | 0.71 | 686.52 | 0.70 | 691.33 | 0.59 | 786.58 | |
| 0.81 | 2365.42 | 0.80 | 2352.32 | 0.79 | 2429.84 | 0.66 | 3012.30 | |
Fig 4Position-specific scoring matrix on top 500 motifs identified from 16,000 peptide sequences against individual 14-3-3 isoforms.
Fig 5Binding affinity of seven 14-3-3 isoforms across six positions from top-500 peptides from both N- and C-terminal sublibrary.
14-3-3 preferences determined with different methods on 16,000 peptide sequences.
| Position Relative to p(S/T) | ||||||
|---|---|---|---|---|---|---|
| Yaffe | RKH | |||||
| Our Method | X | |||||
List of 51 consensus top binders from 1,000 peptide sequences against all seven 14-3-3 isoforms.
| No. | N-terminal | No. | N-terminal | No. | C-terminal | |||
|---|---|---|---|---|---|---|---|---|
| 1 | FFRpS/TXXX | 20 | RLRpS/TXXX | 36 | XXXpS/TAGF | |||
| 2 | RAApS/TXXX | 21 | * | RPApS/TXXX | 37 | XXXpS/TAGP | ||
| 3 | * | RAFpS/TXXX | 22 | * | RPKpS/TXXX | 38 | * | XXXpS/TAPF |
| 4 | * | RAKpS/TXXX | 23 | * | RPLpS/TXXX | 39 | * | XXXpS/TAPL |
| 5 | * | RALpS/TXXX | 24 | * | RPQpS/TXXX | 40 | * | XXXpS/TAPP |
| 6 | * | RAQpS/TXXX | 25 | * | RPRpS/TXXX | 41 | XXXpS/TAPR | |
| 7 | * | RARpS/TXXX | 26 | RPVpS/TXXX | 42 | * | XXXpS/TFPF | |
| 8 | * | RAVpS/TXXX | 27 | RRApS/TXXX | 43 | * | XXXpS/TFPL | |
| 9 | * | RFApS/TXXX | 28 | * | RRFpS/TXXX | 44 | XXXpS/TFPP | |
| 10 | * | RFFpS/TXXX | 29 | * | RRKpS/TXXX | 45 | XXXpS/TLPF | |
| 11 | * | RFKpS/TXXX | 30 | RRLpS/TXXX | 46 | * | XXXpS/TLPL | |
| 12 | * | RFRpS/TXXX | 31 | * | RRQpS/TXXX | 47 | XXXpS/TLPP | |
| 13 | RGApS/TXXX | 32 | RRRpS/TXXX | 48 | XXXpS/TLPR | |||
| 14 | RGKpS/TXXX | 33 | * | RVApS/TXXX | 49 | * | XXXpS/TVPF | |
| 15 | RGQpS/TXXX | 34 | * | RVKpS/TXXX | 50 | * | XXXpS/TVPL | |
| 16 | RGRpS/TXXX | 35 | * | RVRpS/TXXX | 51 | * | XXXpS/TVPP | |
| 17 | RGVpS/TXXX | |||||||
| 18 | RLApS/TXXX | |||||||
| 19 | RLKpS/TXXX |
a The motif with label * is the same with experimental binding sequences of H.S. Lu.
b The basic residue X means any of 20 amino acid types.
List of four preferable binders of 14-3-3σ from 1,000 peptide sequences.
| No. | N-terminal | No. | C-terminal |
|---|---|---|---|
| 1 | RAGpS/TXXX | 4 | XXXpS/TFGP |
| 2 | EAKpS/TXXX | ||
| 3 | RGGpS/TXXX |
List of six preferable binders of 14-3-3σ from 16,000 peptide sequences.
| No. | N-terminal | No. | C-terminal |
|---|---|---|---|
| 1 | HCDpS/TXXX | 3 | XXXpS/TMMG |
| 2 | ICPpS/TXXX | 4 | XXXpS/TMYH |
| 5 | XXXpS/TYYC | ||
| 6 | XXXpS/TYYK |