| Literature DB >> 26589632 |
Norman E Davey1, Martha S Cyert2, Alan M Moses3,4.
Abstract
Short sequence motifs are ubiquitous across the three major types of biomolecules: hundreds of classes and thousands of instances of DNA regulatory elements, RNA motifs and protein short linear motifs (SLiMs) have been characterised. The increase in complexity of transcriptional, post-transcriptional and post-translational regulation in higher Eukaryotes has coincided with a significant expansion of motif use. But how did the eukaryotic cell acquire such a vast repertoire of motifs? In this review, we curate the available literature on protein motif evolution and discuss the evidence that suggests SLiMs can be acquired by mutations, insertions and deletions in disordered regions. We propose a mechanism of ex nihilo SLiM evolution - the evolution of a novel SLiM from "nothing" - adding a functional module to a previously non-functional region of protein sequence. In our model, hundreds of motif-binding domains in higher eukaryotic proteins connect simple motif specificities with useful functions to create a large functional motif space. Accessible peptides that match the specificity of these motif-binding domains are continuously created and destroyed by mutations in rapidly evolving disordered regions, creating a dynamic supply of new interactions that may have advantageous phenotypic novelty. This provides a reservoir of diversity to modify existing interaction networks. Evolutionary pressures will act on these motifs to retain beneficial instances. However, most will be lost on an evolutionary timescale as negative selection and genetic drift act on deleterious and neutral motifs respectively. In light of the parallels between the presented model and the evolution of motifs in the regulatory segments of genes and (pre-)mRNAs, we suggest our understanding of regulatory networks would benefit from the creation of a shared model describing the evolution of transcriptional, post-transcriptional and post-translational regulation.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26589632 PMCID: PMC4654906 DOI: 10.1186/s12964-015-0120-z
Source DB: PubMed Journal: Cell Commun Signal ISSN: 1478-811X Impact factor: 5.712
Fig. 1Conservation of functionally important motifs and the proliferation of motifs through ex nihilo motif acquisition. a Alignment of the PCNA-binding PIP box motif of Flap endonuclease 1 (FEN1) showing the motif conservation spanning over 3 billion years of evolution across all Eukaryotes and Archaea (representative species - Thermococcus kodakaraensis) [24, 25, 108]. b An alignment of a representative selection of PxIxIT motif instances: Nuclear factor of activated T-cells, cytoplasmic 1 (NFATC1) [109], A-kinase anchor protein 5 (AKAP5) [110] and Potassium channel subfamily K member 18 (KCNK18) [111] from human; Phosphatidylinositol 4,5-bisphosphate-binding protein SLM1 (Slm1) [112], Protein HPH1 (Hph1) [113] and Transcriptional regulator CRZ1 (Crz1) from yeast [114]; and Ankyrin repeat domain-containing protein A238L from African swine fever virus (ASFV) [115]. Each motif instance occurs in a non-homologous protein (see panel c) and the most likely mode of acquisition for these functional modules is by ex nihilo evolution through random mutation. The alignment shows a clear preference for specific residues at a given position in the peptide with each position allowing a different level of degeneracy. These preferences reflect the preferences of the Calcineurin PxIxIT binding pocket (see panel d). c The modular architecture of the proteins from panel B showing the distinct organisation of the non-homologous proteins. Domains (grey), transmembrane regions (green) and PxIxITs (blue) are shown. Proteins are aligned around the PxIxIT instances. d Structure of the PxIxIT binding pocket of the human calcineurin catalytic A subunit bound to the PxIxIT of African swine fever virus A238L (PDB ID:4F0Z) [115]. The peptide binds by beta-augmentation and the defined residues at P1, P3, P5 sit in a conserved hydrophobic pocket explaining the strong preferences at these positions in known PxIxIT instances (light blue surface on the domain denotes hydrophobic residues) [109, 110, 116]
Fig. 2Examples of ex nihilo motif gain and motif loss. a The N-terminus of the SHOC2 contains an S2- > G mutation in multiple Noonan-like syndrome patients that “knocks in” an N-myristoylation motif [26]. Blue bold residues signify the specificity determining residues of the motif. b A PxIxIT calcineurin-docking motif in S. cerevisiae Serine/threonine-protein kinase ELM1 (Elm1) has likely evolved in the common ancestor of S. cerevisiae and S. paradoxus [27]. c A human-centric phylogeny of E3 ubiquitin-protein ligase Mdm2 (Mdm2). An RxL Cyclin docking motif was gained in the rodent Mdm2 proteins as a result of a four amino acid deletion (grey region) [117]. Green bold residues signify the position of the residues corresponding to the specificity determining residues of the motif before the SDSI deletion. d Example of motif loss contributing to functional divergence post-duplication. S. cerevisiae ohnologues Ace2 and Swi5 were both retained after the whole genome duplication (WGD) but have functionally diverged post duplication, in part, by the loss of a serine/threonine-protein kinase Cbk1 docking site and two Cbk1 phosphosites in the Swi5 lineage. A representative example of a single pre-WGD homologue in Lachancea waltii shows the modular architecture of the Ace2/Swi5 ancestor [36]. e Example of motif gain contributing to functional divergence post-duplication. The Cyclin A and Cyclin B regulatory subunits of the CDK family protein kinases share a common ancestor that contained a D box motif to recruit the APC/C E3 ubiquitin ligase promoting Cyclin destruction during mitosis. Post-duplication the Cyclin A lineage gained an ABBA motif allowing Cyclin A to be destroyed earlier than Cyclin B during prometaphase [40]. f The accumulation of the Nx[TS] glycosylation motifs in hemagglutinin of Influenza H3N2 over the last 40 years. The number of glycosylation motifs has increased from two to seven tuning the trade-off between host receptor binding and immune evasion [118]
Table of characterised examples of motif gain and loss modulating protein function
| Species protein (Gene) | Motif | Sequencea | Evolution | Function |
|---|---|---|---|---|
| Ex nihilo motif acquisition | ||||
|
| N-myristoylation motif |
1
| Allele with a single | N-myristoylation of SHOC2 |
|
| ABL1 SH3 domain binding motif |
69PPV |
| Recruitment of ABL1 to CRK |
|
| SCF Cdc4 degrons |
364QVP |
| Degradation of Cdc6 by the SCF E3 Ub ligase |
|
| Cyclin binding motif |
179KKRR | Acquisition in the rodent lineage via a four amino acid deletion [ | Recruitment of and phosphorylation by CDK2 |
|
| Groucho interacting motif |
198QAS |
| Recruitment of groucho |
|
| Calcineurin docking motif |
465KVT |
| Recruitment of and dephosphorylation by calcineurin |
|
| N-glycosylation motifs | Five NxT sites | Strains spanning the last 40 years have shown gradual acquisition of five novel N-glycosylation sites [ | Increased immune system evasion and decreased infectivity |
| Motif gain/loss post duplication | ||||
|
| APC/C Cdc20 binding KEN box |
27ETQ | Lost in the Bub1 functional homologues after Mad3-like/Bub1-like duplications [ | Loss of APC/C inhibitory function |
|
| Cbk1 docking motif |
280NGG | Lost in Swi5 after Swi5/Ace2 duplication [ | Loss of Cbk1 regulated localisation |
|
| APC/C CDC20 binding ABBA motif |
96QPA |
| Early degradation during an active spindle assembly checkpoint |
| Tuning of motif specificity/affinity | ||||
|
| Sho1 SH3 domain binding motif |
90IVN | Only binds the SH3 domain of yeast Sho1 but can be recognised by multiple non-yeast SH3 domains [ | Specific interaction with Sho1 |
|
| Pex14 SH3 domain binding motif |
84AMP | Promiscuous | Promiscuous in vitro interactions |
| Ex nihilo co-operative/competitive interface evolution | ||||
|
| SCF Cdc4 degron |
90TGT |
| Degradation of Eco1 by the SCF E3 Ub ligase |
| Mck1 modification site |
91GTI | |||
| Cdc7 modification site |
95PLN | |||
| Cdk1 modification site |
96LNS | |||
|
| Cdk1 modification sites |
758SKR |
| Regulation of nucleocytoplasmic shuttling of Mcm3 |
|
762PQK | ||||
|
| MAPK D-site |
97HSL | Acquisition of MAPK D-site followed by | Competitive recognition of substrate by kinase and phosphatase |
| Calcineurin PxIxIT |
103RVP | |||
|
| Cyclin A docking site |
873
| Co-evolution of PP1 and Cyclin A recognition motifs [ | Competitive recognition of substrate by kinase and phosphatase |
| PP1 binding RVxF |
873K | |||
| Motif gain/loss post de novo gene birth | ||||
| Human immunodeficiency virus type 1 (HIV-1) Protein Vpu (vpu) | SCF β-TrCP degron |
48RAE |
| Highjacking of the host SCF-β-TrCP E3 Ub ligase |
aSequence overlapping motif - the major specificity and affinity determining residues of the motif are underlined and in bold
Fig. 3The relationship between compact degenerate motifs, occurrence likelihoods and ex nihilo evolution. a The homeodomain of Drosophila Segmentation polarity homeobox protein engrailed (en) bound to a TAATTA subsite [119]. b The RRM of Transformer-2 protein homolog beta (TRA2B) bound to an AGAA exonic splicing enhancer (ESE) motif [120]. c The SH3 domain of Adapter molecule crk (CRK) bound to a PxxP motif from Rap guanine nucleotide exchange factor 1 (RAPGEF1) [121]. d The number of nucleotides or residues expected between instances of a motif occurring by chance in a sequence. A non-degenerate x-mer nucleotide motif instance would be expected to occur once every 4x nucleotides (e.g. a 6-mer every 46 or 4,096 nucleotides) and an non-degenerate x-mer protein motif would be expected to occur once every 20x amino acids (e.g. a 3-mer peptide motif every 203 or 8000 amino acids). The disparity in the length of the regions that contain these motifs (DNA, (pre-)mRNA and proteins) means that the number of random instances will vary by several fold across the three classes of biomolecule. Ranges are illustrative and are therefore approximate, based on over predictive consensuses (see motifs below) and use equal nucleotide (1/4) and amino acid (1/20) frequencies. Protein SLiMs: proline-directed phosphosite ([ST]P) [29]; D box degron (RxxLxx[ILMVK]) [69]; PxIxIT Calcineurin docking motif (Px[IVLF]x[IVLF][TSHEDQNKR]) [27]; SH3 domain-binding motif (PxxPx[KR]) [32]; PTAP late domain motif (P[TS]AP) [122]; and Fbw7 SCF degron([ILMVP]TPxx[ST]) [123]. RNA motif: A single RRM binding site (4 nucleotides) [124]; a single Zinc Finger recognition site (3 nucleotides) [125]; and an miRNA seed regions (6–8 nucleotides) [126]. DNA motifs: a single Zinc Finger recognition site (3 nucleotides) [127]; Homeobox domain (TAAT[GT][GT]) [128]; CAAT box ([TC]GATTGG[TC][TC][AG]) [129]; and P53 regulatory element (C[AT][AT]GNNNNNNC[AT][AT]G) [130]. e Simple model for motif acquisition by DNA, RNA and proteins (see text for details of model). f Potential mechanism of ex nihilo motif evolution illustrated using a hypothetical LxCxE pRB-binding motif (see text for details of model)
Fig. 4Examples of motif-binding pocket evolution. a Representative selection of motif-binding pockets in the WD40 repeat fold demonstrating the simplicity of motif-binding pocket birth. Each pocket has evolved independently and subsequently multiple proteins (representative examples listed) have acquired the motifs necessary to recruit the various WD40 repeat containing proteins. The figure includes: an ABBA motif (dark blue – consensus [ILV][FHY]x[DE]), a D box degron motif (red – consensus RxxLxx[ILVK]) and a KEN box degron motif (yellow – consensus KEN) from APC/C-CDH1 modulator 1 (Acm1) bound to the WD40 domain of the APC/C activator protein CDH1 (Cdh1) [69]; an Fbw7 degron motif (orange – consensus pTPxxpS) from Cyclin E bound to the WD40 domain of the F-box/WD repeat-containing protein 7 (FBW7) [123]; a β-TrCP1 degron motif (light blue – consensus DpSGxxpS) from β-Catenin bound to the WD40 domain of the F-box/WD repeat-containing protein 1A (BTRC) [131]; and an EH1 motif (green – consensus [FHY]x[IVM]xx[ILM][ILMV]) bound to the WD40 domain of the Transducin-like enhancer protein 1 (TLE) [132]. See the ELM resource for more details and examples [9]. b Example of specificity divergence after motif–binding domain duplication. A homologous pocket on the protein phosphatase 1 (PP1) and calcineurin holoenzymes bind RVxF and PxIxIT motifs respectively. The structure shows the canonical PP1 binding sequence RVxF motif (light blue) of myosin phosphatase targeting subunit (MYPT1) bound to PP1 (grey). The PxIxIT of African swine fever virus A238L (A238L) (orange) is superimposed showing the shared but diverged binding pocket [115]. The valine and phenylalanine of the RVxF motif sit in the hydrophobic P1 and P3 regions occupied by the proline and first isoleucine of the PxIxIT binding pocket (see Fig. 1d) but the additional specificity/affinity determinants of the two motifs utilise different surfaces of the domain and do not overlap [50, 133]
Table of several classical SLiM-binding domain families, and representative DNA and RNA motif-binding domain familiesa
| Domain type | Domain | Ath | Ddi | Sce | Dme | Hsa |
|---|---|---|---|---|---|---|
|
| SH3 domain | 5 | 29 | 23 | 59 | 204 |
| PDZ domain | 17 | 1 | 2 | 67 | 145 | |
| SH2 domain | 2 | 13 | 0 | 32 | 110 | |
| WW domain | 9 | 5 | 6 | 21 | 41 | |
| Kinase domain | 1066 | 309 | 132 | 289 | 523 | |
|
| C2H2/C2HC zinc finger | 22 | 6 | 34 | 197 | 659 |
|
| RRM domain | 268 | 96 | 58 | 137 | 265 |
aThe number of instances of each family in Arabidopsis thaliana (Ath), Dictyostelium discoideum (Ddi), Saccharomyces cerevisiae (Sce), Drosophila melanogaster (Dme), Homo sapiens (Hsa). Data from Vogel et al. [95]