| Literature DB >> 26581338 |
Toby J Gibson1, Holger Dinkel2, Kim Van Roey3,4, Francesca Diella5.
Abstract
It has become clear in outline though not yet in detail how cellular regulatory and signalling systems are constructed. The essential machines are protein complexes that effect regulatory decisions by undergoing internal changes of state. Subcomponents of these cellular complexes are assembled into molecular switches. Many of these switches employ one or more short peptide motifs as toggles that can move between one or more sites within the switch system, the simplest being on-off switches. Paradoxically, these motif modules (termed short linear motifs or SLiMs) are both hugely abundant but difficult to research. So despite the many successes in identifying short regulatory protein motifs, it is thought that only the "tip of the iceberg" has been exposed. Experimental and bioinformatic motif discovery remain challenging and error prone. The advice presented in this article is aimed at helping researchers to uncover genuine protein motifs, whilst avoiding the pitfalls that lead to reports of false discovery.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26581338 PMCID: PMC4652402 DOI: 10.1186/s12964-015-0121-y
Source DB: PubMed Journal: Cell Commun Signal ISSN: 1478-811X Impact factor: 5.712
Fig. 1Linear motifs in T cell signalling complex assembly. Four structures of SLiM-domain complexes are combined to show the involvement of motifs in assembly of the T cell receptor signalling complex around the adaptor molecule Linker for activation of T-cells family member 1 (LAT). A phosphorylated SH2 domain-binding motif (YxN) in LAT (189-REYVNV-194, shown in dark blue with the phosphorylated Y191 in red) recruits GRB2-related adapter protein 2 (GADS) via its SH2 domain (grey) (bottom left) (PDB:1R1Q) [79], while the C-terminal SH3 domain of GADS (grey) binds an SH3 domain-binding motif in Lymphocyte cytosolic protein 2 (SLP-76) (233-PSIDRSTKP-241, shown in green) (bottom right) (PDB:2D0N) [80]. Further components are recruited to the complex through other motifs in SLP-76, including an SH3 domain-binding motif (185-QPPVPPQRPM-194, shown in green) that interacts with the SH3 domain of 1-phosphatidylinositol 4,5-bisphosphate phosphodiesterase gamma-1 (PLCG1) (purple) (top right) (PDB:1YWO) [81], and an SH2 domain-binding motif (143-ADYEPP-148, shown in green with the phosphorylated Y145 in red) binding to the SH2 domain of Tyrosine-protein kinase ITK/TSK (ITK) (light blue) (top left) (PDB:2ETZ) [82]
Fig. 2Linear Motifs in T cell receptor signalling pathway hsa04660. “T cell receptor signaling pathway” obtained from KEGG [83] and redrawn using Cytoscape [84] and KEGGScape [85]. Colour coding illustrates the use of linear motifs according to instances annotated in ELM [26] as follows: docking motifs in blue; degradation motifs (degrons) in yellow; ligand-binding motifs in green; sites for post-translational modification in pink; and targeting/trafficking motifs in orange. Note that only motif interactions annotated in the ELM resource have been considered for colouring: Other functionality is not coloured
Fig. 3Example of a protein containing multiple linear motifs. Depicted is the output of an ELM [26] query using the p21Cip1 Cyclin-dependent kinase inhibitor 1 (Uniprot-Acc:P38936). Upper rows contain annotations/predictions from phospho.ELM [86], SMART [52]/PFAM [51] domain content, and GlobPlot [87]/IUPred [54] disorder predictors. Each subsequent line represents a linear motif class as annotated by ELM with the name on the left side and the instances found depicted on the right side in graphical representation. The already known motifs are annotated (coloured in dark red), the remaining matches (coloured in shades of blue) are candidates of varying likelihood to be real, with one measure being how conserved they are in proteins from other species
Bioinformatics tools useful for motif discovery. Each resource is listed with its name, weblink, main reference, and short description
| Motif Resources/Predictors | ||
| ELM |
| [ |
| To explore candidate functional sites in proteins and to learn about known motifs | ||
| MiniMotif Miner |
| [ |
| To analyse protein queries for the presence of short contiguous peptide motifs that have a known function in at least one other protein | ||
| Scansite |
| [ |
| To identify short protein sequence motifs that are recognized by modular signalling domains, phosphorylated by protein Ser/Thr- or Tyr-kinases or mediate specific interactions with proteins or phospholipids | ||
| PePSite |
| [ |
| To predict binding of a given peptide to a protein structure | ||
| Motif Discovery | ||
| DILIMOT |
| [ |
| To find short, over-represented peptide patterns/linear motifs, in a set of proteins | ||
| SLiMFinder |
| [ |
| To find novel, significantly over-represented, short protein motifs | ||
| Sequence Retrieval/Analysis | ||
| BLAST |
| [ |
| To identify regions of local similarity between nuleotide or protein sequences, which can be used to infer functional and evolutionary relationships between sequences as well as help identify members of gene families | ||
| BioMART |
| [ |
| Provides free software and data services to foster scientific collaboration and facilitate the scientific discovery proces; the project adheres to the open source philosophy that promotes collaboration and code reuse | ||
| Alignment | ||
| Clustal |
| [ |
| General purpose DNA or protein multiple sequence alignment program | ||
| MAFFT |
| [ |
| Multiple alignment program for amino acid or nucleotide sequences | ||
| Jalview |
| [ |
| Lightweight Java applet for use in web applications, and a powerful desktop application that employs web services for sequence alignment | ||
| Phylogenetic Tree/Orthology | ||
| TreeFam |
| [ |
| Database composed of phylogenetic trees inferred from animal genomes, providing orthology/paralogy predictions as well the evolutionary history of genes | ||
| EggNog |
| [ |
| Database of orthologous groups of genes annotated with functional categories derived from COG/KOG categories | ||
| COG |
| [ |
| Database providing phylogenetic classification of proteins encoded in complete genomes | ||
| Motif Conservation | ||
| Conscore |
| [ |
| Linear motif conservation filter | ||
| Consurf |
| [ |
| To identify functional regions in proteins | ||
| SLiMPrints |
| [ |
|
| ||
| Protein Domains | ||
| SMART |
| [ |
| To identify and annotate genetically mobile domains and to analyse domain architectures | ||
| PFAM |
| [ |
| Database providing a large collection of protein families, each represented by multiple sequence alignments and hidden Markov models | ||
| InterPro |
| [ |
| To classify sequences into protein families and to predict the presence of important domains and sites | ||
| Structure/Disorder | ||
| PDB |
| [ |
| Single worldwide repository of information about the 3D structures of large biological molecules, including proteins and nucleic acids | ||
| PDBsum |
| [ |
| Pictorial database providing an at-a-glance overview of the contents of each 3D structure deposited in PDB | ||
| IUPred |
| [ |
| To predict intrinsically unstructured regions in proteins | ||
| D2P2 |
| [ |
| Community resource, providing pre-computed disorder predictions on a large library of proteins from completely-sequenced genomes | ||
| MobiDB |
| [ |
| Centralized resource for annotations of intrinsic protein disorder | ||
| DISPROT |
| [ |
| Database providing information about proteins that lack fixed 3D structure in their putatively native states, either in their entirety or in part | ||
| Protein-Protein Interactions | ||
| BioGRID |
| [ |
| Online interaction respository with data compiled through comprehensive curation efforts | ||
| STRING |
| [ |
| Provides known and predicted protein-protein interactions | ||
| IntAct |
| [ |
| Freely available, open source database system and analysis tools for molecular interaction data; all interactions are derived from literature curation or direct user submissions and are freely available | ||
| PiSITE |
| [ |
| Web-based database of protein interaction sites, providing information on interaction sites of a protein from multiple PDB entries | ||
| DOMINO |
| [ |
| Database of domain-peptide interactions | ||
| ComPPI |
| [ |
| Cellular compartment-specific database for protein-protein interaction network analysis | ||
| iELM |
| [ |
| Web server to explore short linear motif-mediated interactions | ||
| KEGG |
| [ |
| Database resource for understanding high-level functions and utilities of the biological system, such as the cell, the organism and the ecosystem, from molecular-level information, especially large-scale molecular datasets generated by genome sequencing and other high-throughput experimental technologies | ||
| CORUM |
| [ |
| Collection of experimentally verified mammalian protein complexes | ||
| Subcellular Localization | ||
| CELLO2GO |
| [ |
| Web server for protein subcellular localization prediction with functional gene ontology annotation | ||
| LocDB |
| [ |
| Database that collects experimental annotations for the subcellular localization of proteins in Homo sapiens and Arabidopsis thaliana | ||
| GeneOntology |
| [ |
| Collaborative effort to address the need for consistent descriptions of gene products across databases | ||
| Compartments |
| [ |
| Database of protein subcellular localization data manually curated from the literature or obtained from high-throughput microscopy-based screens | ||
| LOCATE |
| [ |
| Curated database providing data that describe the membrane organization and subcellular localization of proteins from the RIKEN FANTOM4 mouse and human protein sequence set | ||
| Tissue Expression | ||
| Protein Atlas |
| [ |
| Publicly available database with millions of high-resolution images showing the spatial distribution of proteins in 44 different normal human tissues and 20 different cancer types, as well as 46 different human cell lines | ||
| TISSUES |
| [ |
| Resource integrating evidence on tissue expression from manually curated literature, proteomics and transcriptomics screens, and automatic text mining | ||
| Generic Resources | ||
| UniProt |
| [ |
| Manually annotated, non-redundant protein sequence and sequence isoform database; related information about the biological function of protein are curated from the scientific literature | ||
| Antibodypedia |
| [ |
| Open-access database of publicly available antibodies against human protein targets; contains data on the antibody efficacy in a range of biochemical and cell biological techniques | ||
| IUPAC |
| [ |
| Serves to advance the worldwide aspects of the chemical sciences and to contribute to the application of chemistry in science | ||
Fig. 4Multiple Sequence Alignment detail for the C-termini of LAT proteins. The three most conserved regions are the critical YxN motifs that bind the GRB2/GADS SH2 domains (see Fig. 1), to assemble the signalling complex. The residue colours are Clustal defaults with less conserved positions faded. LAT protein sequences from representative species were aligned with Clustal Omega [49]. Figure prepared with Jalview [48]
Fig. 5Pipeline for SLiM discovery. Once a candidate sequence location has been identified in a protein, it is evaluated by applying available bioinformatics resources. If the sequence is conserved, accessible to interact and other information is compatible with the motif function, it may pass to experimentation. Both in vitro and in-cell experiments should be undertaken (See Fig. 6 for expanded experimental options). Given a positive outcome of the research it may then be published. On occasion, it may also be of value to publish a negative outcome
Fig. 6Key experimental approaches to investigate linear motifs. Best-practice experiments to study short linear motifs can be classified into “general” and “motif type-specific”. We highlight a core set of experiments that have been proven to be useful for investigating short linear motif functionality. See the Additional file 1: Table S1 for the list of experiments used in motif discovery, as extracted from the ELM annotation. PSI-MI terms have been used throughout this diagram wherever possible [78]
Fig. 7Example of a discovery process mapped onto the pipeline in Fig. 5. Novel motifs were discovered in KANSL1 and KANSL2, binding to different surface locations of the WDR5 protein [69]. Prior knowledge of the NSL protein complex obviated the use of some of the bioinformatics pipeline: these parts are blurred
Rule of thumb quality scoring scheme
| Score | Evidence |
|---|---|
| −1 | Contradictory evidence |
| 0 | No evidence |
| 1 | Indirect supporting evidence |
| 2 | Direct supporting evidence for binding but not for in-cell function |
| 2 | Evidence in-cell that proteins associate, but direct supporting evidence for motif binding in vitro is lacking |
| 3 | Direct supporting evidence for both binding and in-cell function |