Literature DB >> 22638578

iELM--a web server to explore short linear motif-mediated interactions.

Robert J Weatheritt¹, Peter Jehl, Holger Dinkel, Toby J Gibson.

Abstract

The recent expansion in our knowledge of protein-protein interactions (PPIs) has allowed the annotation and prediction of hundreds of thousands of interactions. However, the function of many of these interactions remains elusive. The interactions of Eukaryotic Linear Motif (iELM) web server provides a resource for predicting the function and positional interface for a subset of interactions mediated by short linear motifs (SLiMs). The iELM prediction algorithm is based on the annotated SLiM classes from the Eukaryotic Linear Motif (ELM) resource and allows users to explore both annotated and user-generated PPI networks for SLiM-mediated interactions. By incorporating the annotated information from the ELM resource, iELM provides functional details of PPIs. This can be used in proteomic analysis, for example, to infer whether an interaction promotes complex formation or degradation. Furthermore, details of the molecular interface of the SLiM-mediated interactions are also predicted. This information is displayed in a fully searchable table, as well as graphically with the modular architecture of the participating proteins extracted from the UniProt and Phospho.ELM resources. A network figure is also presented to aid the interpretation of results. The iELM server supports single protein queries as well as large-scale proteomic submissions and is freely available at http://i.elm.eu.org.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2012 PMID： 22638578 PMCID： PMC3394315 DOI： 10.1093/nar/gks444

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

The interactions of Eukaryotic Linear Motif (iELM) web server facilitates the exploration of short linear motif- (SLiM) mediated interfaces within protein–protein interaction (PPI) networks (1). The importance of SLiMs in the regulatory and signalling mechanisms of the cell is becoming increasingly apparent, as highlighted by their use as molecular switches coordinating phase transitions in the cell (2) and their increasing association with disease (3–5). SLiMs are key components in a wide range of biological pathways and are known to act as sites for post-translational modifications such as phosphorylation or ubiquitination, as targeting signals for particular subcellular locations and as ligand-binding sites for protein recruitment (6,7). The majority of known motifs bind onto the surface of globular domains and exhibit specificity for a particular subgroup of a domain family (1). SLiMs tend to be just 3–10 amino acids in length with only 2–5 residues responsible for the majority of the binding affinity and specificity (6). This means that discriminating bioinformatically between a stochastic match and a result of biological relevance is fraught with difficulties (8). A number of resources have undertaken the task of annotating experimentally validated SLiM classes with the most notable examples being the Eukaryotic Linear Motif (ELM) (3), MiniMotif (9) and ScanSite (10) databases. These resources also allow searching of protein sequences for novel instances of these annotated classes using regular expression patterns or position-specific scoring matrices. However, due to the high likelihood of motifs occurring in a stochastic manner, the use of pattern matching alone produces a large number of false positive hits (6). Methods have, therefore, been developed to incorporate additional filters based on the attributes of SLiMs, including sequence conservation (11–13), structural availability (14–16), biophysical feasibility (17) and biological keywords (18). Recently, a number of de novo motif prediction tools have also emerged, capable of predicting new classes of SLiMs (19–22). However, difficulties arise in removing the experimental bias towards medically relevant proteins as well as biases due to evolutionary relationships (12). A number of resources have been developed using PPI data, to help predict the SLiM functional class associated with a particular protein-binding domain. Dilimot (21) and SLiMFinder (19) use the over-representation of sequence motifs in proteins, known to interact with a particular globular domain, to predict the regular expression of the binding SLiM; the ADAN database (23) uses high-resolution structures to predict SLiM-mediated interactions for well-known modular protein domains (SH3, SH2, WW, etc). In contrast, NetworKIN (24) employs interaction data to predict which kinase is responsible for a particular phosphorylation site. The identification of SLiM-mediated interactions within PPI data on the fly has however, to the best of our knowledge, not been investigated. To alleviate this, we introduce the iELM server that uses the annotated ELM regular expressions, especially trained Hidden Markov Models (HMMs) based on the manual annotation of SLiM-binding domains and PPI data to identify SLiM-mediated interactions. In addition, iELM takes into consideration many of the important attributes of SLiMs identified in some of the aforementioned studies, including the tendency of SLiMs to occur in regions of intrinsic disorder (25) and the propensity of functional motifs to be evolutionary conserved (6,13). The iELM web server allows the identification of SLiM-mediated interactions associated with a protein of interest or within a users’ PPI network.

THE iELM ALGORITHM

The iELM algorithm has been previously described and benchmarked (1) and can be summarized as follows: iELM assesses binary protein associations for SLiM-mediated interactions using the ELM annotated regular expressions together with HMMs (26) trained on manually annotated SLiM-binding domains and their orthologs. The assessment of whether or not a binary interaction is SLiM-mediated can be divided into four sections. The first module uses the 3DID database (27) to check if the two proteins interact via a domain–domain interaction. If they do not, the second and third parts occur simultaneously to assess each protein for SLiM and SLiM-binding domain matches. In the second part, SLiMs are identified using the regular expressions annotated by the ELM resource, and scored using the SLiMSearch algorithm (12) based on the conservation of the motif in a multiple sequence alignment of the queried protein and its orthologs (12). The predicted SLiM and its surrounding amino acids are also assessed by the IUPred algorithm (14) for their propensity to be in a region of intrinsic disorder (14). In the third part, SLiM-binding domains are identified by HMMs trained to recognize SLiM-binding domains using the HMMSearch programme (26). An option is also available to search using Pfam HMMs (28); however, these domains do not take into account the specificity of motifs for subgroups of a domain family. If a complementary SLiM and SLiM-binding domain partnership exists within the two associated proteins, the algorithm uses a cut-off system based on the results from the benchmarking data sets (see Supplementary Figure S1), as well as recommendations present in the respective papers (1,12,14). The respective cut-offs are 0.3 for disorder scores, 0.6 for motifs scores and 0.35 for domain scores. Any scores below these values will not be returned by the web server.

Precalculated data

The calculations by iELM are time-consuming and therefore, to ensure the results from the iELM server are returned in a reasonable time, the majority of the data is precalculated. The HMMs for SLiM-binding domains (1) were used to scan the human UniProt database (29) and all hits above a predefined cut-off were recorded. The precalculated conservation scores were calculated using the SLiMSearch algorithm based on a multiple sequence alignment of orthologous proteins identified using the Gopher programme (30) from a database of 70 complete EnsEMBL proteomes (Ensembl 59) (31). The SLiMSearch algorithm used all the SLiM classes annotated within the ELM database. Disorder scores for each motif were calculated using IUPred. All the protein–protein associations annotated within the STRING database (version 9.0 – STRING score >0.6) (32) were assessed by iELM for SLiM-mediated interactions.

Technical details of the web server

The web server is built using the Django web framework with an underlying PostgreSQL database and is written primarily in python. The tables are produced using the jQuery library; the graphical displays by the JavaScript libraries Raphael and Dracula. The server is HTML 4.01 compliant and compatible with most commonly used web browsers.

USER INTERFACE

The iELM web server is freely available at http://i.elm.eu.org with no login required. The server aims to provide a user-friendly interface for exploring a protein or proteome of interest for SLiM-mediated interactions. The server can be queried in two ways: ‘protein iELM’ searches the precalculated high-quality associations (score >0.6) from the STRING resource for SLiM-mediated interactions, whereas ‘proteomic iELM’ allows users to explore their own protein–protein interactome of interest for SLiMs. The server also provides a list of all 835 annotated linear motif-binding domains that can be freely downloaded at http://i.elm.eu.org/domains.

Protein iELM

For a single query protein, the ‘protein iELM’ server searches a precalculated database, based on results of the iELM algorithm using the high-quality interactions from the STRING database (see Figure 1).

Figure 1.

An overview of the iELM server. The iELM server is divided into two sections: ‘protein iELM’ and ‘proteomic iELM’, each with different inputs. In the flowchart, the yellow coloured arrows are common to both processes whereas the orange arrows and the grey arrows are specific to ‘proteomic iELM’ and ‘protein iELM’, respectively. The processes run by the iELM server can be divided into three sections: the input section at the top is displayed with a light blue background, the processing section is displayed in blue and the output section is displayed at the bottom in dark blue. The scripting languages and packages used for each section are displayed to the right of the flowchart.

Input

A single protein ID is required as input, with a drop-down menu available to specify the type of sequence ID, which is subsequently used to query the ID mapping service provided by UniProt. The user can also choose between the especially trained iELM HMMs and the Pfam HMMs. Upon submitting the job, precalculated data are searched ensuring results are returned promptly.

Output

The output is divided into a tabular and a graphical display: The tabular output (see Figure 2A) consists of the two tables: the first table (if applicable) consists of SLiMs found within the query protein; the second table (if applicable) consists of SLiM-binding domains found within the query protein. Both tables are divided into three parts: the left part contains the UniProt ID of the motif-containing protein, the motif type (ELM functional class), the location of the motif, its sequence and the associated scores. The central portion shows the UniProt ID of the protein containing the motif-binding domain, the domain name (Pfam) and the domain score. The final part provides a link to Pepsite (17), via the ‘Structure’ button, for a structural prediction of the interaction and a biophysical feasibility assessment (if applicable). The table is fully searchable and can be copied to the clipboard, printed or downloaded as a comma-separated values (CSV) document.

Figure 2.

Description of iELM outputs. (A) Screenshot of the output from the ‘protein iELM’ with only the motif table shown. The web server also shows an identical domain table (if applicable) describing the interactions of the SLiM-binding domain(s) in the queried protein. The table is divided into two sections as displayed in the diagram. Also displayed in the figure, above the table, are the predicted motifs and SLiM-binding domains, as well as information from the UniProt and Phospho.ELM resources about the modular architecture of the queried protein. The predicted motifs and domains are fully clickable resulting in the sorting of the table whereas the annotated domains and phosphorylation sites link out to their respective resources. (B) Screenshot of the network diagram displayed as an output for ‘proteomic iELM’. The colour of the edges designates the type of ELM class associated with the interaction. The colours of the nodes represent whether the protein contains a motif (yellow), a SLiM-binding domain (green) or both a SLiM-binding domain and a motif (diagonal partition with both colours). Network diagrams specific for each interaction can be produced by clicking the ‘Interaction’ button in the table displayed in the output section of ‘proteomic iELM’.

The graphical output (see Figure 2A) displays a representation of the predicted SLiMs and SLiM-binding domains along with the modular architecture of the query protein extracted from the UniProt database and the annotated phosphorylation sites from Phospho.ELM (33). The modular architecture predicted by the iELM method is divided by colour into SLiM functional types, as classified by ELM (Ligand, Targeting, Cleavage and Modification), with the annotated instances from ELM outlined in green. A key describing these types is fully clickable enabling the filtering of both the graphical and tabular content. Tool tips are also integrated to allow the user to gain immediate information on individual SLiMs, SLiM-binding domains and UniProt domains, as well as to help the user interpret the output. The annotated domains are linked to the UniProt database and the individual predicted motifs are also clickable resulting in the filtering of the tabular content. Description of iELM outputs. (A) Screenshot of the output from the ‘protein iELM’ with only the motif table shown. The web server also shows an identical domain table (if applicable) describing the interactions of the SLiM-binding domain(s) in the queried protein. The table is divided into two sections as displayed in the diagram. Also displayed in the figure, above the table, are the predicted motifs and SLiM-binding domains, as well as information from the UniProt and Phospho.ELM resources about the modular architecture of the queried protein. The predicted motifs and domains are fully clickable resulting in the sorting of the table whereas the annotated domains and phosphorylation sites link out to their respective resources. (B) Screenshot of the network diagram displayed as an output for ‘proteomic iELM’. The colour of the edges designates the type of ELM class associated with the interaction. The colours of the nodes represent whether the protein contains a motif (yellow), a SLiM-binding domain (green) or both a SLiM-binding domain and a motif (diagonal partition with both colours). Network diagrams specific for each interaction can be produced by clicking the ‘Interaction’ button in the table displayed in the output section of ‘proteomic iELM’.

Proteomic iELM

This section allows the user to input an individualised PPI network that is searched using the iELM algorithm (see Figure 1). The user may submit either a tabulated list of interactions or a list of IDs that will be searched in an all-against-all manner. Once again, a drop-down menu is available to specify the type of ID that the user wishes to input and for the type of HMMs the user wishes to use. There is a limit of 75 000 interactions for a tabulated list and 400 IDs for an all-against-all search. Upon submitting the job, the user is redirected to a wait page while the results are calculated. The waiting time is normally less than 5 min. As with the iELM section, the output is divided into two sections: The tabular output is of the same structure as described in ‘protein iELM’ (see Figure 2A), except only one table is displayed containing all the interactions, the originally queried protein is displayed next to the converted UniProt ID and there is an additional button called ‘Interaction’. Clicking on this button leads to the production of a graphical representation of the PPIs linked to this interaction. If there are associations that are not predicted to be SLiM-mediated, an additional table is displayed in the left-hand column for users’ inspection. In the same column, if any of the IDs submitted fail to be converted, a link is displayed that connects to a page displaying these proteins. The graphical output contains the modular architecture as outlined above, as well as a network of all the connecting interactions in one connected cluster of up to 75 proteins (Figure 2B). On the initial production of the results page, a network is displayed based on the best scoring SLiM-mediated interaction; pressing the aforementioned ‘Interaction’ button in the table can alter this. The edges of the network are coloured depending on the type of interaction (ELM type) and the nodes are coloured depending on whether they contain a SLiM, a SLiM-binding domain or both. Clicking on the ‘Interaction’ button also reveals the globular architecture of the interacting proteins of interest (as described in ‘Protein iELM’ Section).

FUTURE WORK

Currently, only the human proteome is fully searchable, however, in the near future we plan to include additional model organisms. We also wish to incorporate an additional section that will allow users to search PPIs with their own regular expression and SLiM-binding domains. We will update iELM regularly to ensure newly annotated binding domains are incorporated into the precalculated data. To further facilitate our annotation process, we have included a form in the domains section, which allows users to inform us of known linear motif-binding domains that are not presently annotated in iELM.

CONCLUSIONS

The iELM web server is, to the best of our knowledge, the first algorithm that facilitates the exploration and identification of SLiM-mediated interactions within PPI networks on the fly. The user-friendly platform allows enquiries at the single protein level as well as within large-scale proteomic studies. The iELM resource can, therefore, be useful in guiding experimental studies and facilitating the analysis of pathways within PPI networks. To accommodate a wide range of users, the server supports multiple database types as input format and allows the download of results as easily parsable CSV data file. The web server is freely available at http://i.elm.eu.org.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online: Supplementary Figure 1.

FUNDING

EMBL international PhD program fellowship to R.J.W. Funding for open access charge: EMBL. Conflict of interest statement. None declared.

33 in total

1. Scansite 2.0: Proteome-wide prediction of cell signaling interactions using short sequence motifs.

Authors: John C Obenauer; Lewis C Cantley; Michael B Yaffe
Journal: Nucleic Acids Res Date: 2003-07-01 Impact factor: 16.971

2. IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content.

Authors: Zsuzsanna Dosztányi; Veronika Csizmok; Peter Tompa; István Simon
Journal: Bioinformatics Date: 2005-06-14 Impact factor: 6.937

3. Local structural disorder imparts plasticity on linear motifs.

Authors: Monika Fuxreiter; Peter Tompa; István Simon
Journal: Bioinformatics Date: 2007-03-25 Impact factor: 6.937

4. A computational strategy for the prediction of functional linear peptide motifs in proteins.

Authors: Holger Dinkel; Heinrich Sticht
Journal: Bioinformatics Date: 2007-10-31 Impact factor: 6.937

Review 5. Profile hidden Markov models.

Authors: S R Eddy
Journal: Bioinformatics Date: 1998 Impact factor: 6.937

6. Systematic discovery of new recognition peptides mediating protein interaction networks.

Authors: Victor Neduva; Rune Linding; Isabelle Su-Angrand; Alexander Stark; Federico de Masi; Toby J Gibson; Joe Lewis; Luis Serrano; Robert B Russell
Journal: PLoS Biol Date: 2005-11-15 Impact factor: 8.029

7. NetworKIN: a resource for exploring cellular phosphorylation networks.

Authors: Rune Linding; Lars Juhl Jensen; Adrian Pasculescu; Marina Olhovsky; Karen Colwill; Peer Bork; Michael B Yaffe; Tony Pawson
Journal: Nucleic Acids Res Date: 2007-11-02 Impact factor: 16.971

8. The SLiMDisc server: short, linear motif discovery in proteins.

Authors: Norman E Davey; Richard J Edwards; Denis C Shields
Journal: Nucleic Acids Res Date: 2007-06-18 Impact factor: 16.971

9. SLiMFinder: a probabilistic method for identifying over-represented, convergently evolved, short linear motifs in proteins.

Authors: Richard J Edwards; Norman E Davey; Denis C Shields
Journal: PLoS One Date: 2007-10-03 Impact factor: 3.240

10. A tree-based conservation scoring method for short linear motifs in multiple alignments of protein sequences.

Authors: Claudia Chica; Alberto Labarga; Cathryn M Gould; Rodrigo López; Toby J Gibson
Journal: BMC Bioinformatics Date: 2008-05-06 Impact factor: 3.169

16 in total

1. Tissue-aware data integration approach for the inference of pathway interactions in metazoan organisms.

Authors: Christopher Y Park; Arjun Krishnan; Qian Zhu; Aaron K Wong; Young-Suk Lee; Olga G Troyanskaya
Journal: Bioinformatics Date: 2014-11-26 Impact factor: 6.937

2. Schnurri-3 regulates ERK downstream of WNT signaling in osteoblasts.

Authors: Jae-Hyuck Shim; Matthew B Greenblatt; Weiguo Zou; Zhiwei Huang; Marc N Wein; Nicholas Brady; Dorothy Hu; Jean Charron; Heather R Brodkin; Gregory A Petsko; Dennis Zaller; Bo Zhai; Steven Gygi; Laurie H Glimcher; Dallas C Jones
Journal: J Clin Invest Date: 2013-08-15 Impact factor: 14.808

3. FunMod: a Cytoscape plugin for identifying functional modules in undirected protein-protein networks.

Authors: Massimo Natale; Alfredo Benso; Stefano Di Carlo; Elisa Ficarra
Journal: Genomics Proteomics Bioinformatics Date: 2014-08-19 Impact factor: 7.691

Review 4. Identification of inhibitors of biological interactions involving intrinsically disordered proteins.

Authors: Daniela Marasco; Pasqualina Liana Scognamiglio
Journal: Int J Mol Sci Date: 2015-04-02 Impact factor: 5.923

5. Drosophila Cyclin G and epigenetic maintenance of gene expression during development.

Authors: Camille A Dupont; Delphine Dardalhon-Cuménal; Michael Kyba; Hugh W Brock; Neel B Randsholt; Frédérique Peronnet
Journal: Epigenetics Chromatin Date: 2015-05-07 Impact factor: 4.954

6. DoReMi: context-based prioritization of linear motif matches.

Authors: Heiko Horn; Niall Haslam; Lars Juhl Jensen
Journal: PeerJ Date: 2014-03-20 Impact factor: 2.984

7. Mechnetor: a web server for exploring protein mechanism and the functional context of genetic variants.

Authors: Juan Carlos González-Sánchez; Mustafa F R Ibrahim; Ivo C Leist; Kyle R Weise; Robert B Russell
Journal: Nucleic Acids Res Date: 2021-07-02 Impact factor: 16.971

8. Alternative splicing tends to avoid partial removals of protein-protein interaction sites.

Authors: Alessio Colantoni; Valerio Bianchi; Pier Federico Gherardini; Gianpaolo Scalia Tomba; Gabriele Ausiello; Manuela Helmer-Citterich; Fabrizio Ferrè
Journal: BMC Genomics Date: 2013-06-07 Impact factor: 3.969

9. The eukaryotic linear motif resource ELM: 10 years and counting.

Authors: Holger Dinkel; Kim Van Roey; Sushama Michael; Norman E Davey; Robert J Weatheritt; Diana Born; Tobias Speck; Daniel Krüger; Gleb Grebnev; Marta Kuban; Marta Strumillo; Bora Uyar; Aidan Budd; Brigitte Altenberg; Markus Seiler; Lucía B Chemes; Juliana Glavina; Ignacio E Sánchez; Francesca Diella; Toby J Gibson
Journal: Nucleic Acids Res Date: 2013-11-07 Impact factor: 16.971

10. ELM 2016--data update and new functionality of the eukaryotic linear motif resource.

Authors: Holger Dinkel; Kim Van Roey; Sushama Michael; Manjeet Kumar; Bora Uyar; Brigitte Altenberg; Vladislava Milchevskaya; Melanie Schneider; Helen Kühn; Annika Behrendt; Sophie Luise Dahl; Victoria Damerell; Sandra Diebel; Sara Kalman; Steffen Klein; Arne C Knudsen; Christina Mäder; Sabina Merrill; Angelina Staudt; Vera Thiel; Lukas Welti; Norman E Davey; Francesca Diella; Toby J Gibson
Journal: Nucleic Acids Res Date: 2015-11-28 Impact factor: 16.971