| Literature DB >> 28387819 |
Izabella Krystkowiak1,2, Norman E Davey1,2.
Abstract
The extensive intrinsically disordered regions of higher eukaryotic proteomes contain vast numbers of functional interaction modules known as short linear motifs (SLiMs). Here, we present SLiMSearch, a motif discovery tool that scans a motif consensus, representing the specificity determinants of a motif-binding domain, against a proteome to discover putative novel motif instances. SLiMSearch applies several distinct and complementary approaches exploiting the common properties of SLiMs to predict novel motifs. Consensus matches are annotated with overlapping sequence annotation, including feature information describing protein modular architecture, post-translational modification, structure, sequence variation and experimental characterisation of functional regions. Discriminatory motif attributes such as conservation and accessibility are also calculated. In addition, SLiMSearch provides functional enrichment and evolutionary analysis tools. The enrichment tool analyses GO terms, keywords and interacting partner enrichment to indicate possible motif function. The evolutionary tool evaluates motif taxonomic range and the conservation of motif sequence context. Consensus matches can be filtered based on motif attributes such as accessibility and taxonomic range; or by the localisation, interacting partners or ontology annotation of the peptide-containing protein. SLiMSearch supports a range of species of experimental and therapeutic relevance and is available online at http://slim.ucd.ie/slimsearch/.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28387819 PMCID: PMC5570202 DOI: 10.1093/nar/gkx238
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Benchmarking of evolutionary annotation used in SLiMSearch. (A) Example alignment of a [KR]xTQT Dynein Light Chain binding motif across different species showing the attributes of motif conservation measured by the relative local conservation and taxonomic range. (B) Motif consensus conservation of human motif instances across different species. The motif consensus taxonomic range of the validated human instances in the ELM resource compared to the non-validated instances (Instances in the human proteome which match a motif consensus from the ELM database, but are not annotated as a ‘true positive’ in the ELM database). (C) Relative local conservation (see Supplementary Material) for each residue in the defined, wildcard and flanking regions of a motif for validated instances from the ELM resource. (D) Relative local conservation for each residue in the defined, wildcard and flanking regions of a motif for consensus matches not annotated as validated instances from the ELM resource.
Figure 2.Benchmarking of the functional enrichment analysis approaches used by SLiMSearch. (A) Plot of the median GO term enrichment scores against the average number of disordered amino acids per protein for GO terms returned from the enrichment analysis of the random benchmarking set (see Supplementary Material). (B) Plot of the average p-value for a GO term against the percentage of GO terms with that p-value or less for the random benchmarking set. In this dataset, which should have no functional motif consensuses and therefore no enriched GO-terms, the data points should fall along the diagonal. The classical hypergeometric test clearly diverges from the diagonal and is under the line, as such it strongly over predicts the significance of each GO term. P-values are calculated using classical hypergeometric test with Benjamini–Hochberg correction (classical hypergeometric); hypergeometric test with Benjamini–Hochberg correction with motif search space correction (corrected hypergeometric); and Mann–Whitney U rank test for enrichment analysis based on conservation (QFO) (conservation rank test). (C) The distribution of corrected hypergeometric and conservation rank test P-values of GO terms for consensus searches of ELM class regular expressions (split into extended GO terms annotated in the ELM resource as functionally related to an ELM class, and extended GO terms not annotated for the ELM class), reversed ELM classes regular expressions and shuffled ELM classes regular expressions. Enrichment analysis performed with motif search space correction (corrected hypergeometric) and based on QFO conservation (conservation rank test). Both analyses used UniRef50 clustering of related proteins. The stars denote the mean value and red plus values denote outliers.