| Literature DB >> 16845024 |
Victor Neduva1, Robert B Russell.
Abstract
Discovery of protein functional motifs is critical in modern biology. Small segments of 3-10 residues play critical roles in protein interactions, post-translational modifications and trafficking. DILIMOT (DIscovery of LInear MOTifs) is a server for the prediction of these short linear motifs within a set of proteins. Given a set of sequences sharing a common functional feature (e.g. interaction partner or localization) the method finds statistically over-represented motifs likely to be responsible for it. The input sequences are first passed through a set of filters to remove regions unlikely to contain instances of linear motifs. Motifs are then found in the remaining sequence and ranked according to a statistic that measure over-representation and conservation across homologues in related species. The results are displayed via a visual interface for easy perusal. The server is available at http://dilimot.embl.de.Entities:
Mesh:
Substances:
Year: 2006 PMID: 16845024 PMCID: PMC1538856 DOI: 10.1093/nar/gkl159
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1The server process and output. (A) Schematic showing how submitted sequences are filtered, motifs found and arranged into a ranked list sorted by P (left). When the species is provided, sequences are assigned to the orthologous groups, species–specific probabilities for over-represented motifs are calculated (coloured box) the list resorted by SCONS (right). (B) Example of server output. A list of putative motifs is reported in an interactive table (left), which gives general details for each of them. Clicking on each motif launches an additional page (right) showing sequences containing the motif, where the motif is found in them and the degree to which the motif is conserved in related species. Motif locations (red bars) and other features found in the sequences, such as domains, are shown graphically and detailed below each image.
Identification of linear motifs in various protein sets
| Domain→motif | Source | 1st correct motif (rank) | SCONS | N/M |
|---|---|---|---|---|
| EB1→IP | Manual | SxIP(1) | N/A* | 7/9 |
| Nuclear localization | LifeBD | KxxKxK(1) | 9.4 × 10−34 | 9/27 |
| PKB→RxRxx(ST) | Phospho.ELM | RxRxxS(1) | 1.7 × 10−64 | 17/28 |
| CDK→(ST)Px(KR) | Phospho.ELM | SPxR(2) | 8.5 × 10−31 | 13/42 |
| PKA→(KR)(KR)x(ST) | Phospho.ELM | RRxS(1) | 0.0 | 36/77 |
| CK-2→(ST)xxE | Phospho.ELM | SDxE(4) | 1.5 × 10−65 | 19/70 |
Source indicates where the set of proteins came from (see text). N is the number of proteins in the set containing the motif, M is the number of sequences in the set.
*SCONS could not be calculated for EB1 as the proteins came from different species. The corresponding P-value is 2.1 × 10−10.
Figure 2The EB1 motif SxIP detected by the server. (A) A sequence logo (27) for the EB1 binding motif, generated using all instances of the motif in the input set. (B) Examples of EB1 binding proteins from the input set (represented as boxes) and multiple alignments of putative motif containing regions. Dark blue regions in the boxes denote those removed by the domain and redundancy filters. A known EB1 binding region (in APC) lies at the C-terminus of a Pfam domain. To avoid its removal, we simply cut the sequence down to this region alone (switching the Pfam filter off will have similar effect). Sequences for the motif-containing region are shown aligned to the best homologues in closely related species. Amino acids in the alignments are coloured according to residue type: blue, positive; red, negative; light-blue, small; yellow, hydrophobic; green, aromatic; magenta, polar; Proline, orange. Positions within the predicted motif are denoted by red triangles. Species abbreviations: Hsa, H.sapiens; Mmu, M.musculus; Rno, R.norwegicus; Gga, G.gallus; Fru, F.rubripes; Cgi, Candida glabrata; Kla, Kluyveromyces lactis; Kwa, Kluyveromyces waltii; Ego, Eremothecium gossypii; Sce, Saccharomyces cerevisiae; Dha, Debaryomyces hansenii.
Figure 3Features of known linear motifs. (A) Distributions of length (red), number of specified (i.e. non-‘x’; green) and invariant (i.e. a single specific residue; blue) positions for 120 known linear motifs extracted from the ELM database (7). Note that four motifs with lengths of 13–18 are not shown in the first (red) plot for clarity. (B) Degree to which residues are over-represented in known motifs. Numbers show the ratio of the abundance of the residue within the 120 motifs from ELM to the abundance in globular domains as computed from the protein databank [PDB; (28)]. ‘ALL’ includes all 120, ‘LIG’ are the 66 ligand binding, ‘TRG’ the 16 targeting and ‘MOD’ the 30 modification site motifs. For 7 of 40 residues in the latter two categories there were too few counts to obtain a confident measurement (i.e. <5); these are denoted by an asterix. Note that we have not included a fourth ELM category CLV, which includes protein cleavage sites, as there were too few examples to compute meaningful numbers. Colour scheme: red, strongly favoured in linear motifs compared to globular proteins; orange, moderately favoured; light-blue moderately disfavoured; blue strongly disfavoured.