| Literature DB >> 28460141 |
Roman Prytuliak1, Michael Volkmer1, Markus Meier2, Bianca H Habermann1,3.
Abstract
Short linear motifs (SLiMs) in proteins are self-sufficient functional sequences that specify interaction sites for other molecules and thus mediate a multitude of functions. Computational, as well as experimental biological research would significantly benefit, if SLiMs in proteins could be correctly predicted de novo with high sensitivity. However, de novo SLiM prediction is a difficult computational task. When considering recall and precision, the performances of published methods indicate remaining challenges in SLiM discovery. We have developed HH-MOTiF, a web-based method for SLiM discovery in sets of mainly unrelated proteins. HH-MOTiF makes use of evolutionary information by creating Hidden Markov Models (HMMs) for each input sequence and its closely related orthologs. HMMs are compared against each other to retrieve short stretches of homology that represent potential SLiMs. These are transformed to hierarchical structures, which we refer to as motif trees, for further processing and evaluation. Our approach allows us to identify degenerate SLiMs, while still maintaining a reasonably high precision. When considering a balanced measure for recall and precision, HH-MOTiF performs better on test data compared to other SLiM discovery methods. HH-MOTiF is freely available as a web-server at http://hh-motif.biochem.mpg.de.Entities:
Mesh:
Year: 2017 PMID: 28460141 PMCID: PMC5570144 DOI: 10.1093/nar/gkx341
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.(A) Workflow of HH-MOTiF. Starting from a set of input queries, HH-MOTiF first searches for closely related orthologs, builds HMMs and performs an all against all, short linear motif-adapted HH-comparison. Identified motif trees are further evaluated and trimmed prior to reporting (for details, see main text). (B) Motif tree assembly from HMM–HMM alignments. Overlapping alignment hits (red-shaded boxes) are joined into hierarchical motif trees. Each tree has a root (overlapping part of black-framed boxes) and leaves (corresponding aligned parts of light gray-framed boxes). Motif leaves are independent. Alignment hits that fail to show a sufficiently strong overlap (non-framed boxes) are ignored. (C) Motif tree evaluation. Shown is the iterative process of motif tree evaluation and trimming based on an example of a motif tree with initial 6 leaves in different proteins, assuming Nmin = 3. The score of each position in the whole tree is derived from the alignment sign in hhalign (‘+’,’|’ 2 points; ‘.’ 1 point; ‘-’, ‘=’, or gap 0 points). The score at a given position in a leaf cannot be higher than the respective overall position score in the whole tree. Leaves with a score <6 are removed, after which each position is re-evaluated for Nmin, and if necessary, removed from the motif tree. In the given example, discarding leaf 5 leads to removal of one position and re-assignment of the score for another position in the motif. Consequently, leaf 1 does not fulfill the minimal score requirements and is eliminated from the motif tree. The motif is trimmed to the last conserved position at each of its borders.
Figure 2.Output web page of HH-MOTiF. Identified SLiMs are reported in association with their input query and their position within the query. Motif trees are highlighted in red in the full-length sequences. Upon selection, a motif is connected to its tree. Next to the full-length sequences, the sequence logo, as well as the Pseudo-MSA of the selected motif is displayed. Results can also be downloaded in FASTA-format.
Performance measures of de novo SLiM prediction methods. For details, see main text and Supplementary Tables S2–S6
| Site-based | Residue-based | |||||
|---|---|---|---|---|---|---|
| Recall | Precision | F1 | Recall | Precision | F1 | |
| HH-MOTiF | 0.236 | 0.564 | 0.333 | 0.210 | 0.420 | 0.280 |
| MEME | 0.249 | 0.099 | 0.142 | 0.219 | 0.061 | 0.095 |
| GLAM2 | 0.413 | 0.164 | 0.235 | 0.380 | 0.073 | 0.123 |
| SLiMFinder | 0.272 | 0.389 | 0.320 | 0.203 | 0.350 | 0.257 |
Figure 3.Performance as measured by F1 of HH-MOTiF and other tested de novo SLiM search methods. HH-MOTiF was compared against MEME, GLAM2 and SLiMFinder. Both, site- (blue) and residue-based (red) F1 were calculated based on recall and precision of the software suites in discovering SLiMs from the SLiM collection of the ELM database. For details, see Supplementary Tables S2–S6.