| Literature DB >> 24711967 |
Heiko Horn1, Niall Haslam2, Lars Juhl Jensen1.
Abstract
Many protein domains bind to short peptide sequences, called linear motifs. Data on their sequence specificities is sparse, which is why biologists usually resort to basic pattern searches to identify new putative binding sites for experimental follow-up. Most motifs have poor specificity and prioritization of the matches is thus crucial when scanning a full proteome with a pattern. Here we present a generic method to prioritize motif occurrence predictions by using cellular contextual information. We take 2 parameters as input: the motif occurrences and one or more of the interacting domains. The potential hits are ranked based on how strongly the context network associates them with a protein containing one of the specified domains, which leads to an increased predictive performance. The method is available through a web interface at doremi.jensenlab.org, which allows for an easy application of the method. We show that this approach leads to improved predictions of binding partners for PDZ domains and the SUMO binding domain. This is consistent with the earlier observation that coupling sequence motifs with network information improves kinase-specific substrate predictions.Entities:
Keywords: Linear motifs; Prediction method; Protein interaction network; Web server
Year: 2014 PMID: 24711967 PMCID: PMC3970808 DOI: 10.7717/peerj.315
Source DB: PubMed Journal: PeerJ ISSN: 2167-8359 Impact factor: 2.984
Detailed performance numbers for the benchmarking.
The first number states the performance without contextual information while the second number includes contextual information.
| Sensitivity | Specificity | AROC | |
|---|---|---|---|
| SUMO—Split set—RegExp | 0.50/0.50 | 0.98/0.99 | −/0.46 |
| SUMO—Split set—PSSM | 0.59/0.62 | 0.96/0.96 | 0.82/0.84 |
| SUMO—Partitioning | 0.75/0.77 | 0.90/0.90 | 0.82/0.84 |
| PDZ—Partitioning | 0.43/0.49 | 0.90/0.90 | 0.70/0.72 |
Figure 1Flowchart of the typical workflow of DoReMi.
(1) The user typically provides the motif description in 3 different ways: regular Expression(s) define a motif by the basic format allowed in most implementations. Amino acids are defined by their one-letter code with “.” standing for any. Multiple potential residues can be encoded by using square brackets, e.g., “[DE]”. To quantify selected residues, we allow the basic operators “*”, “+”, “{2}” or “{3, 4}”. To use a PSSM for the motif search, the user simply provides a set of known binding motifs; these are used to calculate the amino acid distribution at each position, correcting for the overall amino acid distribution of the proteome. As a last option, users can provide results from other tools like SLiMsearch. (2) The second required input is the set of interacting domains. We provide the domains in PFAM-A. Proteins carrying any of the selected domains are defined as potential interaction partners. The highest scoring interacting protein for each motif instance is selected as potential binding partner. (3) The two scores are combined to rank each instance of the found motif.
Figure 2Web interface of DoReMi.
(A) The interface allows the user to search for Pfam domains by their accession, id or description. From the list of search results, the user can select the relevant domains. If necessary, multiple searches can be performed to select differently named domains. (B) The output page shows a brief summary of the analysis. This includes plots of score distributions for each score (motif, network and combined score) to aid in the selection of an appropriate score cut-off for downstream analysis of the results after downloading.