| Literature DB >> 19920119 |
Cathryn M Gould1, Francesca Diella, Allegra Via, Pål Puntervoll, Christine Gemünd, Sophie Chabanis-Davidson, Sushama Michael, Ahmed Sayadi, Jan Christian Bryne, Claudia Chica, Markus Seiler, Norman E Davey, Niall Haslam, Robert J Weatheritt, Aidan Budd, Tim Hughes, Jakub Pas, Leszek Rychlewski, Gilles Travé, Rein Aasland, Manuela Helmer-Citterich, Rune Linding, Toby J Gibson.
Abstract
Linear motifs are short segments of multidomain proteins that provide regulatory functions independently of protein tertiary structure. Much of intracellular signalling passes through protein modifications at linear motifs. Many thousands of linear motif instances, most notably phosphorylation sites, have now been reported. Although clearly very abundant, linear motifs are difficult to predict de novo in protein sequences due to the difficulty of obtaining robust statistical assessments. The ELM resource at http://elm.eu.org/ provides an expanding knowledge base, currently covering 146 known motifs, with annotation that includes >1300 experimentally reported instances. ELM is also an exploratory tool for suggesting new candidates of known linear motifs in proteins of interest. Information about protein domains, protein structure and native disorder, cellular and taxonomic contexts is used to reduce or deprecate false positive matches. Results are graphically displayed in a 'Bar Code' format, which also displays known instances from homologous proteins through a novel 'Instance Mapper' protocol based on PHI-BLAST. ELM server output provides links to the ELM annotation as well as to a number of remote resources. Using the links, researchers can explore the motifs, proteins, complex structures and associated literature to evaluate whether candidate motifs might be worth experimental investigation.Entities:
Mesh:
Year: 2009 PMID: 19920119 PMCID: PMC2808914 DOI: 10.1093/nar/gkp1016
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.The ELM Resource hierarchy represented as a pyramid. ‘Functional Site’ provides a general description of the biology, for example, MAP Kinases have a docking motif in their substrates. There are more than one class of MAPK docking motifs and ELM currently provides two ‘ELM Motif’ entries. These contain the motif regular expression and are annotated with more specific information as well as linking out to remote resources including PubMed, NCBI Taxonomy and GO. At the base of the pyramid are the ‘ELM Instances’ that belong to a given ‘ELM Motif’ entry. The instances are annotated with information about experimental methods and instance quality and link to external resources including UniProt, PDB and PubMed.
The four classes of LM in the ELM classification and some representative examples
| Class | Class description | ELM_ID | Regular expression | ELM description |
|---|---|---|---|---|
| LIG | Motifs acting as ligands to globular protein domains. | LIG_MAPK1_1 | [KR]{0,2}[KR].{0,2}[KR].{2,4}[ILVM].[ILVF] | MAPK interacting molecules (e.g. MAPKKs, substrates, phosphatases) carry docking motifs that help to regulate specific interactions in the MAPK signalling networks. The classic motif approximates (R/K)xxxx#x# where # is a hydrophobic residue. |
| LIG_APCC_Dbox_1 | .R. .L. .[LIVM]. | An RxxL-based motif that binds to the Cdh1 and Cdc20 components of APC/C thereby targeting the protein for destruction in a cell cycle dependent manner. | ||
| TRG | Motifs within proteins that are sufficient for recognition and targeting to subcellular compartments. | TRG_AP2beta_CARGO_1 | [DE].{1,2}F[^P][^P][FL][^P][^P][^P]R | AP-2 beta appendage platform subdomain (top surface) binding motif used in targeting cargo for internalization. |
| TRG_PEX_1 | W … [FY] | Specific ELM present in Pex5p and binding to Pex13p and Pex14p. Part of the peroxisomal matrix protein import system | ||
| MOD | Sites of post-translational modification of proteins. | MOD_N-GLC_1 | .(N)[^P][ST]. . | Generic motif for |
| MOD_ProDKin_1 | …([ST])P. . | Proline-Directed Kinase (e.g. MAPK) phosphorylation site in higher eukaryotes. | ||
| CLV | Cleavage sites recognized by proteases for the processing of precursor proteins into biologically active products. | CLV_TASPASE1 | Q[MLVI]DG. .[DE] | Taspase1 is a threonine aspartase which was first identified as the protease responsible for processing the trithorax (MLL) type of histone methyltransferases. |
| CLV_PCSK_FUR_1 | R.[RK]R. | Furin (PACE) cleavage site (Arg-Xaa-[Arg/Lys]-Arg-|-Xaa) |
aRegular expression help is available at: http://elm.eu.org/help.html#regular_expressions.
Summary of the data stored in the ELM RDB
| Number of functional site entries | ELM motifs | Instances | Links to PDB structure entries | Go terms | PubMed links | |||||
|---|---|---|---|---|---|---|---|---|---|---|
| Totals | 110 | 146 | 1327 | 100 | 308 | 1125 | ||||
| By category | LIG | 89 | Human | 828 | Biological process | 152 | From ELM motif | 704 | ||
| MOD | 30 | Mouse | 104 | |||||||
| TRG | 19 | Rat | 65 | Cell compartment | 69 | From instance | 683 | |||
| CLV | 8 | Fly | 47 | |||||||
| Yeast | 88 | Molecular function | 87 | |||||||
| Other | 195 | |||||||||
Figure 2.Details from browse pages for the entry LIG_CAP-Gly_1 (http://elm.eu.org/elmPages/LIG_CAP-Gly_1.html). The upper window shows the description and the regular expression for the motif. Scrolling down past the references and the GO terms (not shown) leads to the table of known instances (middle window). Key information in the table includes whether an instance is a true positive, a link to the UniProt sequence entry and, if available, links to PDB structure entries (49). Clicking on the linked sequence for the instance in the EB1 protein (MARE1_HUMAN) opens a new page summarizing the annotated experimental evidence for the given instance. In this case, the motif has been exhaustively analysed and the supporting evidence is solid.
Figure 3.Graphic from the output page of the ELM server queried with Epsin-1 sequence from the UniProt entry EPN1_HUMAN. The key indicates the content of the various coloured bars, e.g. the three connected by dotted arrows. Thirteen true LM instances are annotated either in this sequence or an orthologue from another species (magenta and red bar codes, respectively). Mouseover provides panels with different information depending on context, three examples of which are shown. One indicates an ENTH domain retrieved from SMART. A second points at an annotated DPW motif. The third mouseover provides the most detail: a structure for the ENTH domain (PDB entry d1h0) was used by the SF (41) to report that a cyclin motif candidate is too buried to be significant. Clicking on any object in the graphic will link to further details.
Web Service interfaces for the ELM tool suite
| Resource module | Purpose of resource module | Links to WSDLs |
|---|---|---|
| ELM Database | Retrieve data stored by ELM | |
| ELMMatcher | Map ELM Motifs to query sequence | |
| ELM CS Filter | Evaluate conservation of LM matches in reference sequence | |
| ELM SF | Evaluate accessibility and structure context of LM matches in query sequence given a reference structure | |
| GlobPlot | Evaluate disorder propensity in query sequence | |
| Phospho.ELM | Retrieve phosphorylation data stored by Phospho.ELM |
Figure 4.Representative results from the CS web interface, displayed with the annotated sequence alignment editor JalView (86). The alignment shows the set of sequences obtained by the CS filter with the human Epsin1 query sequence at top: the sequences belong to several paralogous families of Epsins. Four motif matches are highlighted in the reference sequence (magenta, annotated in this sequence; red, assigned by the instance mapper; blue, unannotated match) and in other sequences that align to the reference motif (green). The left-most match is a known instance of TRG_AP2beta_CARGO_1 and gives a top score of 1.00 despite only being present in sequences belonging to two of the Epsin paralogues. This is because most sequences that lack the motif have gaps aligned to it that do not affect the CS score. The second motif is a candidate instance for MOD_PKA_2 but is poorly conserved, scoring 0.05. This candidate would probably not be worth investigating unless there was prior evidence of phosphorylation at the site. The remaining two motifs are known instances of LIG_Clathr_ClatBox_1 and LIG_EH_1, which obtain the maximum CS score since they are conserved in all Epsin paralogues.
Figure 5.Flow scheme for the ELM Instance Mapper. For each predicted LM from an ELM database search, a PHI-BLAST search is performed against a database containing all sequences with known instances of the predicted LM. Input to PHI-BLAST is the query sequence and the ELM Regular Expression (which is adapted for use with PHI-BLAST). Each of the aligned motifs, between query and ELM instance sequence, are evaluated and scored (see main text). If the motif in the ELM instance sequence is a known instance, and the calculated score is above a threshold (S ≥ 0.3), it is reported as a mapped instance. Both the ELM instance mapper and the underlying PHI-BLAST results are returned to the ELM server, for the user to inspect.
Figure 6.Workflow diagram illustrating how a user might explore LM candidates with ELM. The pipeline proceeds through three main phases utilizing the ELM resource (beige background) ELM associated tools (green) and more general bioinformatics resources (pink). Candidate LMs can be rejected by ELM filters if in unsuitable contexts. Sequence conservation and enrichment in interaction data using DiLiMot or SLiMFinder can provide additional scores to rank motifs. In the final phase any potentially relevant bioinformatics resources should be examined to provide further context to motif candidates. If promising candidates survive this process, the end point of the bioinformatics pipeline has been reached and laboratory validation is now required.
The main experimental methods used in motif validation, as recorded in ELM
| Experimental method | PSI-MI ID | Number of occurrences |
|---|---|---|
| Mutation analysis | MI:0074 | 305 |
| Pull down assay | MI:0096 | 200 |
| Yeast 2 hybrid assay | MI:0018 | 115 |
| Co-immunoprecipitation | MI:0019 | 98 |
| X-ray crystallography | MI:0114 | 75 |
| Motif Deletion | MI:0573 | 53 |
| Competitive binding assay | MI:0405 | 39 |
| Protein overlay assay | MI:0049 | 38 |
| Colocalization by immunostaining | MI:0022 | 37 |
| Nuclear magnetic resonance | MI:0077 | 30 |
| Isothermal titration calorimetry (ITC) | MI:0065 | 29 |
| Protein truncation mutants | 28 | |
| Immunological detection and localization | MI:0422 | 27 |
| Mass spectrometry | MI:0427 | 24 |
| Motif transplantation | 20 | |
| Western blot | MI:0113 | 19 |
| Radiolabelling/pulse chase | MI:0517 | 19 |
| Surface plasmon resonance | MI:0107 | 15 |
aIdentifier for the HUPO PSI-MI exchange standard entry that either defines or encompasses the listed experiment (92).