| Literature DB >> 21127034 |
Xiaojian Shao1, Chris S H Tan, Courtney Voss, Shawn S C Li, Naiyang Deng, Gary D Bader.
Abstract
MOTIVATION: Predicting protein interactions involving peptide recognition domains is essential for understanding the many important biological processes they mediate. It is important to consider the binding strength of these interactions to help us construct more biologically relevant protein interaction networks that consider cellular context and competition between potential binders.Entities:
Mesh:
Substances:
Year: 2010 PMID: 21127034 PMCID: PMC3031032 DOI: 10.1093/bioinformatics/btq657
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Overview of the quantitative prediction method. (A) Positive and negative PDZ domain–peptide pairs were previously determined by a combination of protein microarray and fluorescence polarization experiments. PDZ domain and peptide features calculated from primary sequence information were used to construct a quantitative binding predictor using our novel semi-quantitative support vector regression (SemiSVR) method, where negative data are used to help regression learning. (B) Conceptual illustration of how SemiSVR works. Sample data for illustration purposes were generated using the function: y=x (black solid line) with normally distributed noise. Quantitative data (positive) are shown as open black circles while the qualitative data (negative) are shown as filled red circles. The SemiSVR method (red dashed dot line), which considers the quantitative data and qualitative data, better learns the function (y=x) used to generate the input data compared with the SVR method (blue dashed line), which only considers the quantitative data (open circles). In this way, incorporating qualitative negative data using SemiSVR improves quantitative prediction.
Performance comparison of single domain SemiSVR, SVR and PWM on 23 PDZ domains in leave two domain–peptide interactions out cross-validation testing
| PDZ domain | SemiSVR | SVR | PWM |
|---|---|---|---|
| CHAPSYN-110_2/3 | 0.57 | 0.71 | |
| CHAPSYN-110_3/3 | 0.60 | 0.79 | |
| GM1582_2/3 | 0.64 | 0.68 | |
| HTRA3_1/1 | 0.66 | 0.70 | |
| LIN7C_1/1 | 0.59 | 0.76 | |
| MAGI-2_2/6 | 0.55 | 0.73 | |
| MAGI-2_6/6 | 0.67 | 0.69 | |
| MAGI-3_1/5 | 0.49 | 0.64 | |
| MALS2_1/1 | 0.55 | 0.40 | 0.60 |
| OMP25_1/1 | 0.63 | 0.65 | |
| PDZK3_1/1 | 0.64 | 0.70 | |
| PDZ-RGS3_1/1 | 0.80 | 0.68 | |
| PSD95_2/3 | 0.37 | 0.65 | |
| PSD95_3/3 | 0.70 | 0.80 | |
| PTP-BL_2/5 | 0.60 | 0.77 | |
| SAP102_2/3 | 0.63 | 0.66 | |
| SAP97_1/3 | 0.57 | 0.69 | |
| SAP97_2/3 | 0.50 | 0.71 | |
| SCRB1_3/4 | 0.59 | 0.75 | |
| SHANK1_1/1 | 0.88 | 0.81 | |
| SHANK3_1/1 | 0.82 | 0.80 | |
| G1-SYNTROPHIN_1/1 | 0.58 | 0.79 | |
| ZO-1_1/3 | 0.51 | ||
| Average Performance | 0.61 | 0.72 |
Numbers indicate the average percentage of correct predictions. Bold numbers indicate the best performance.
Performance comparison of SemiSVR and SVR on 23 PDZ domains with associated peptides for Multi-domain model testing
| Performance measure | SemiSVR | SVR |
|---|---|---|
| Spearman's correlation | 0.501 | |
| Pearson correlation | 0.574 |
Performance comparison based on leave-one-PDZ-domain out cross validation. A pairwise polynomial kernel (P = 2) using the whole PDZ (118AA) and whole peptide (10AA) as feature input was used for both predictors. Bold numbers indicate the best performance.
Performance comparison of different prediction algorithms
| Performance measure | Spearman's correlation/Pearson correlation | ||
|---|---|---|---|
| PDZ domain | SemiSVR wholePDZ-118AA | SemiSVR 38 pairs | Chen |
| CHAPSYN-110_2/3 | 0.94/ | 0.80/0.79 | |
| CHAPSYN-110_3/3 | 0.60/0.57 | 0.59/0.50 | |
| GM1582_2/3 | 0.41/0.35 | 0.36/0.19 | |
| HTRA3_1/1 | 0.24/0.36 | 0.20/0.13 | |
| LIN7C_1/1 | 0.47/0.56 | −0.37/−0.17 | |
| MAGI-2_2/6 | 0.63/ | 0.11/0.21 | |
| MAGI-2_6/6 | 0.63/0.52 | 0.28/0.17 | |
| MAGI-3_1/5 | 0.73/0.68 | 0.54/0.52 | |
| MALS2_1/1 | 0.33/0.37 | 0.17/0.15 | |
| OMP25_1/1 | 0.51/ | 0.32/0.37 | |
| PDZK3_1/1 | −0.20/ | − | −0.22/0.02 |
| PDZ-RGS3_1/1 | −0.002/−0.05 | −0.08/ | |
| PSD95_2/3 | 0.82/0.87 | 0.53/0.66 | |
| PSD95_3/3 | 0.597/0.68 | 0.22/0.17 | |
| PTP-BL_2/5 | 0.34/ | 0.18/0.16 | |
| SAP102_2/3 | 0.91/0.92 | 0.91/0.94 | |
| SAP97_1/3 | 0.34/ | −0.16/0.14 | |
| SAP97_2/3 | 0.91/0.92 | 0.77/0.85 | |
| SCRB1_3/4 | 0.48/0.69 | 0.37/0.47 | |
| SHANK1_1/1 | 0.51/0.44 | 0.95/0.96 | |
| SHANK3_1/1 | 0.36/0.51 | 0.69/0.70 | |
| G1-SYNTROPHIN_1/1 | 0.17/0.13 | 0.21/0.16 | |
| ZO-1_1/3 | 0.61/0.64 | 0.26/0.16 | |
| Average performance | 0.52/0.56 | 0.36/0.39 | |
Performance comparison based on leave-one-PDZ-domain out cross-validation. Performance, measured by Spearman's and Pearson correlation coefficients for each domain are shown. The performance of SemiSVR with whole PDZ sequence (118AAs) and SemiSVR with 38 contacting residue position pairs and Chen's Backfitting method are listed in columns two to four. For the SemiSVR using 38 contacting residue position pairs as feature input, the linear kernel was used. The Chen method was run using the published implementation. All methods used all 10AA positions of the peptide. Bold numbers indicate the best performance for a given domain.
Fig. 2.Sequence similarity of a test PDZ domain to a training domain is an important performance determinant. PDZ domain similarity is defined by percent sequence identity and is calculated between each test PDZ domain to its nearest neighbor in the training set composed of 81 other PDZ domains. The prediction performance of the corresponding SemiSVR model is shown as Spearman's correlation.
Performance of our SemiSVR versus local information-based models using different PDZ domain similarity definitions
| Performance measurement | Spearman's correlation | Pearson correlation | |
|---|---|---|---|
| SemiSVR | 118AA | 0.605 | 0.653 |
| Nearest neighbor SemiSVR | 118AA | 0.471 | 0.487 |
| Naïve PWM transfer (Identity) | 118AA | 0.303 | 0.323 |
| 16BSs | 0.305 | 0.319 | |
| 10BS | 0.326 | 0.303 | |
| Naïve PWM transfer (Blosum62) | 118AA | 0.305 | 0.311 |
| 16BSs | 0.296 | 0.274 | |
| 10BS | 0.354 | 0.286 | |
Fig. 3.SemiSVR can predict changes in affinity resulting from point mutations introduced into known binding peptides of the a1syn PDZ domain. The three wild-type peptides are denoted by asterisks (*). Each mutant within a set is labeled by a different shape. Residue mutations are highlighted in red. One KIF1B mutant had no measurable binding, so it was excluded from our analysis. Performance of the SemiSVR on peptide mutation of a1synPDZ is very high (Spearman's correlation, 0.921, P < 1e-16 and Pearson correlation, 0.922, P = 1.414e−07). All affinities are scaled to the range [−1, 1] after taking log10.