| Literature DB >> 24019881 |
Waqasuddin Khan1, Fergal Duffy, Gianluca Pollastri, Denis C Shields, Catherine Mooney.
Abstract
Disordered regions of proteins often bind to structured domains, mediating interactions within and between proteins. However, it is difficult to identify a priori the short disordered regions involved in binding. We set out to determine if docking such peptide regions to peptide binding domains would assist in these predictions.We assembled a redundancy reduced dataset of SLiM (Short Linear Motif) containing proteins from the ELM database. We selected 84 sequences which had an associated PDB structures showing the SLiM bound to a protein receptor, where the SLiM was found within a 50 residue region of the protein sequence which was predicted to be disordered. First, we investigated the Vina docking scores of overlapping tripeptides from the 50 residue SLiM containing disordered regions of the protein sequence to the corresponding PDB domain. We found only weak discrimination of docking scores between peptides involved in binding and adjacent non-binding peptides in this context (AUC 0.58).Next, we trained a bidirectional recurrent neural network (BRNN) using as input the protein sequence, predicted secondary structure, Vina docking score and predicted disorder score. The results were very promising (AUC 0.72) showing that multiple sources of information can be combined to produce results which are clearly superior to any single source.We conclude that the Vina docking score alone has only modest power to define the location of a peptide within a larger protein region known to contain it. However, combining this information with other knowledge (using machine learning methods) clearly improves the identification of peptide binding regions within a protein sequence. This approach combining docking with machine learning is primarily a predictor of binding to peptide-binding sites, and is not intended as a predictor of specificity of binding to particular receptors.Entities:
Mesh:
Substances:
Year: 2013 PMID: 24019881 PMCID: PMC3760854 DOI: 10.1371/journal.pone.0072838
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1ROC curves – Training set.
ROC curves plotting the true positive rate of peptide binding residue identification against the false positive rate, with thresholds for a positive identification decreasing from 1 to 0, tested on the training set of 84 ELM containing examples (A) normalised Vina scores (B) ten-fold cross-validation PepBindPred predictions.
AUC calculated from ROC curves for BRNN trained with seven different input options.
| BRNN Input | AUC |
| Sequence, secondary structure and disorder | 0.63 |
| Sequence and secondary structure | 0.64 |
| Sequence and disorder | 0.64 |
| Sequence and Vina | 0.68 |
| Sequence, secondary structure and Vina | 0.71 |
| Sequence, disorder and Vina | 0.71 |
| Sequence, secondary structure, disorder and Vina | 0.72 |
Predictors trained with either secondary structure, disorder or Vina score along with the protein sequence, or a combination of two, or all three of these features.
Figure 2ROC curves – independent test set.
ROC curves plotting the true positive rate of peptide binding residue identification against the false positive rate, with thresholds for a positive identification decreasing from 1 to 0, tested on the independent test set of 21 examples (A) Vina (B) ANCHOR (C) MoRFpred (D) SLiMPred (E) PepBindPred.
PepBindPred predictions, averaged over the nine motif residues, and evolutionary conservation p-value for the motif instance.
| UniProt AC | Motif | PepBindPred Score | Evolutionary conservation p-value |
| O75376* | LADHICQII | 0.715 | 0.048 |
| P05160 | LTFIIILII | 0.680 | 0.969 |
| Q9Y618* | LEAIIRKAL | 0.557 | 0.026 |
| Q9UBF1 | LLIIILSVI | 0.543 | 1 |
| Q96BZ9 | LIDIILLIL | 0.520 | 0.208 |
| Q8NI22 | LINIIDGVL | 0.486 | 0.214 |
| Q07325 | LLGIILLVL | 0.467 | 0.396 |
| Q5SVZ6 | LKLIIENIL | 0.434 | 0.329 |
| Q96AH8 | LKLIIVGAI | 0.425 | 0.75 |
| Q9HAU8 | LTFIISSIL | 0.401 | 0.436 |
| Q9NRU3 | LEDIIEEII | 0.384 | 0.393 |
| P53618 | LMTIIRFVL | 0.363 | 0.321 |
| O75376* | LEDIIRKAL | 0.362 | 0.044 |
| Q8IWF6 | LRTHIDAII | 0.350 | 0.197 |
| Q8NHV5 | LFFIIMGII | 0.341 | 1 |
| Q9UPM8 | LRLHIIEII | 0.338 | 0.3 |
| Q9Y618* | LAQHISEVI | 0.335 | 0.055 |
| Q96N64 | LDHIIEDAL | 0.333 | 0.562 |
| O95477 | LSRIIWKAL | 0.332 | 0.614 |
| O00273 | LASHILTAL | 0.327 | 0.546 |
| Q7Z3J2 | LQLIIKKVI | 0.325 | 0.096 |
| Q09161 | LNYHIVEVI | 0.306 | 0.799 |
| Q08AE8 | LGIIIYKAL | 0.294 | 0.164 |
| Q8TDJ6 | LNNHIHDIL | 0.287 | 0.115 |
| P07384 | LYQIILKAL | 0.283 | 0.518 |
| Q5MIZ7 | LYEIIRGIL | 0.282 | 0.295 |
| Q8TDR0 | LHDIITEVI | 0.282 | 0.43 |
| Q96PN6 | LKNIITVVI | 0.276 | 0.645 |
| Q8TCG5 | LGQHIEDAL | 0.274 | 0.294 |
| Q6ZMV5 | LYEIIKGIL | 0.272 | 0.321 |
| Q8IX04 | LQYIITNVL | 0.267 | 0.539 |
| Q93100 | LVIHIGWII | 0.267 | 0.81 |
| Q8TDL5 | LKNIITEII | 0.267 | 0.617 |
| Q9C093 | LVDIIVNAI | 0.266 | 0.086 |
| Q7RTX7 | LARIIRVIL | 0.262 | 0.345 |
| Q9UIA9 | LVYIIGAVI | 0.257 | 0.104 |
| Q8NEG5 | LCKHICWVL | 0.257 | 0.114 |
| Q9Y6X3 | LLGHIFYVL | 0.232 | 0.885 |
| Q8IZQ1 | LAQIILDAI | 0.221 | 0.708 |
| P35556 | LNNHIRYVI | 0.216 | 0.145 |
| Q14185 | LLSHILEVL | 0.209 | 0.067 |
| O95801 | LKAIIRGAL | 0.199 | 0.732 |
| A6NHC0 | LYQIIRKAL | 0.193 | 0.817 |
| P56192 | LGNIIGCVL | 0.189 | 0.542 |
| A6NES4 | LTSIIVAVI | 0.184 | 1 |
| O95714 | LCTHIGDIL | 0.183 | 0.034 |
| Q8NF50 | LVGIILDAL | 0.182 | 0.629 |
| O95450 | LGAHINVVL | 0.174 | 0.364 |
| Q8N485 | LRHIIAQVL | 0.174 | 0.898 |
| Q8TCG1 | LKMHIAKIL | 0.173 | 0.55 |
| Q6R327 | LDHIIQKAI | 0.162 | 0.067 |
| Q5T215 | LCGIIRGAL | 0.160 | 0.131 |
| Q8WZ26 | LSTHICVVL | 0.159 | 1 |
| P51124 | LTFHIKAAI | 0.158 | 0.658 |
| Q99698 | LNSIIDQAL | 0.156 | 0.672 |
| Q562E7 | LSDITYYVY | 0.156 | 0.94 |
| Q9UJ70 | LGRHIVAVL | 0.153 | 0.136 |
| Q0VDD8 | LDKHIKSAI | 0.152 | 1 |
| Q8N1T3 | LFGIIASVL | 0.151 | 0.326 |
| P17655 | LFKIIQKAL | 0.148 | 0.804 |
| P52743 | LHVIIDFIL | 0.147 | 1 |
| Q6PGP7 | LEDIIGFAL | 0.146 | 0.128 |
| Q13572 | LLNHIATVL | 0.134 | 0.551 |
| O15072 | LGVHINVVL | 0.128 | 0.362 |
| Q8WXS8 | LGVHINIAL | 0.110 | 0.585 |
| Q9UG01 | LVEHITAAL | 0.107 | 0.082 |
| P30307 | LGGHIQGAL | 0.061 | 0.072 |
PepBindPred predictions, averaged over the nine motif residues, and evolutionary conservation p-value for the motif instance, calculated using SLiMSearch [39], for the 67 CORNR box motif instances in the human proteome. Scores closer to 1 indicate that PepBindPred is more confident that regions is peptide binding, whereas SLiMSearch p-values closer to 0 indicate that the motif is more likely to be a true positive due to conservation. Two of the sequences have two instances of the motif, O75376 and Q9Y618. *True positives identified on the ELM server.
Figure 3Histogram showing the distribution of PepBindPred scores.
The scores have been averaged over the 9 CORNR box motif residues, for each of the 67 instances.
Figure 4PepBindPred output for Q9UBF1 (MAGC2_HUMAN).
Melanoma-associated antigen C2; Motif: LLIIILSVI, residues 229–237.