| Literature DB >> 26626130 |
Kim Van Roey1,2, Norman E Davey3.
Abstract
A substantial portion of the regulatory interactions in the higher eukaryotic cell are mediated by simple sequence motifs in the regulatory segments of genes and (pre-)mRNAs, and in the intrinsically disordered regions of proteins. Although these regulatory modules are physicochemically distinct, they share an evolutionary plasticity that has facilitated a rapid growth of their use and resulted in their ubiquity in complex organisms. The ease of motif acquisition simplifies access to basal housekeeping functions, facilitates the co-regulation of multiple biomolecules allowing them to respond in a coordinated manner to changes in the cell state, and supports the integration of multiple signals for combinatorial decision-making. Consequently, motifs are indispensable for temporal, spatial, conditional and basal regulation at the transcriptional, post-transcriptional and post-translational level. In this review, we highlight that many of the key regulatory pathways of the cell are recruited by motifs and that the ease of motif acquisition has resulted in large networks of co-regulated biomolecules. We discuss how co-operativity allows simple static motifs to perform the conditional regulation that underlies decision-making in higher eukaryotic biological systems. We observe that each gene and its products have a unique set of DNA, RNA or protein motifs that encode a regulatory program to define the logical circuitry that guides the life cycle of these biomolecules, from transcription to degradation. Finally, we contrast the regulatory properties of protein motifs and the regulatory elements of DNA and (pre-)mRNAs, advocating that co-regulation, co-operativity, and motif-driven regulatory programs are common mechanisms that emerge from the use of simple, evolutionarily plastic regulatory modules.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26626130 PMCID: PMC4666095 DOI: 10.1186/s12964-015-0123-9
Source DB: PubMed Journal: Cell Commun Signal ISSN: 1478-811X Impact factor: 5.712
Representative examples of protein, RNA and DNA motifs
| Motif type | Example motif | Consensus sequencea | Function |
|---|---|---|---|
| Protein short linear motifs | |||
| Ligand - promote complex formation | SH3 ligand | PxxPx[KR] | Complex formation with SH3 domains [ |
| Nuclear receptor box | LxxLL | Complex formation with Nuclear receptors [ | |
| LD motif | [LV][DE]x[LM][LM]xxL | Complex formation with FAT domains [ | |
| LxCxE motif | [IL]xCxE | Complex formation with Rb [ | |
| RGD motif | RGD | Complex formation with Integrin family members [ | |
| Localisation - recruit targeting and transport pathways to control protein localisation | Nuclear Export Signal (NES) | ΦxxΦxxxΦxxΦxΦ | Translocation from the nucleus to the cytoplasm [ |
| KDEL ER retrieval signal | [KH]DEL-COOH | Translocation from the Golgi to the endoplasmatic reticulum (ER) [ | |
| Ciliary targeting signal | RVxP | Transport to the plasma membrane of the cilia [ | |
| Peroxisomal targeting signal | [KRH]xxΦ$ or [KRH]Φ$ | Import into the peroxisomal lumen [ | |
| Tyrosine endocytic signal | YxxΦ | Directs endocytosis of membrane proteins [ | |
| Enzyme recruitment - recruit enzymes to the protein/complex to modify/demodify a site distinct from the bound motif | Cyclin docking motif | [RK]xLx{0,1}[LF] | Recruitment of the Cyclin-Cdk holoenzyme [ |
| PP1 docking motif | RVxF | Recruitment of the PP1 phosphotase holoenzyme [ | |
| Tankyrase docking motif | Rxx[PGAV][DEIP]G | Recruitment of the Tankyrase poly-(ADP-ribose) polymerase [ | |
| USP7 docking motif | PxxS | Recruitment of the USP7 deubiquitylating enzyme [ | |
| NEDD4 docking motif | PPxY | Recruitment of the NEDD4 ubiquitylating enzyme [ | |
| Stability - recruit E3 ubiquitin ligases and promote substrate polyubiquitylation to control protein stability | APC/C D box degron | RxxLxxΦ | APC/C E3 ubiquitin ligase [ |
| PIP degron | Φ[ST]D[FY][FY]xxx[KR] | Recruitment of the Cdt2 CRL4 E3 ubiquitin ligase [ | |
| Fbw7 degron | pTPxxp[ST] | Recruitment of the Fbw7 SCF E3 ubiquitin ligase [ | |
| Oxygen dependent VHL degron | [IL]AoPx{6,8}ΦxΦ | Recruitment of von Hippel-Lindau protein (pVHL) containing E3 ubiquitin ligase [ | |
| MDM2 degron | FxxxWxxΦ | Recruitment of the MDM2 ubiquitin ligase [ | |
| Modification - act as sites of moiety attachment/removal, isomerisation or cleavage | PIKK phosphorylation site | ([ST])Q | Phosphorylation by PIKK family kinases [ |
| Pin1 isomerisation site | p[ST](P) | Isomerisation by the Pin1 phosphorylation-dependent prolyl isomerase [ | |
| N-Glycosylation site | Nx([ST]) | Glycosylation by Oligosaccharyltransferase [ | |
| Caspase-3 and −7 cleavage motif | [DE]xxD|[AGS] | Cleavage by Caspase family proteases [ | |
| Myristoylation site | NH2-M(G)xxx[AGSTCN] | Myristoylation by Myristoyl-CoA:protein N-myristoyltransferase [ | |
| RNA motifs | |||
| Stability | Adenosine and uridine (AU)-rich elements (ARE) | AUUUA | Recruits positive and negative regulators of mRNA stability [ |
| Splicing | 5′ splice junction | AG/GURAGU | Recruits splice site recognising U1 snRNA component of the spliceosome [ |
| Modification | Polyadenylation signal | AUUAAA | Recruits cleavage and polyadenylation specificity factor (CPSF) to cleave and polyadenylate 3′-UTRs [ |
| Localisation | Muscleblind binding motifs | YGCUKY | Targets mRNAs to membranes [ |
| miRNA recruitment | miR-125b miRNA response element | CUCAGGG | Regulates expression of mutiple proteins [ |
| DNA regulatory elements | |||
| Basal machinery recruitment | TATA box | TATAAAA | Recruitment of the basal transcription machinery to the core gene promoter required for initiation of transcription [ |
| Promoters/Enhancers | CCAAT/enhancer binding protein (C/EBP) site | CCAAT | Promotion of gene expression [ |
| Silencers/Insulators | CCCTC-binding factor (CTCF) binding site | CCGCGNGGNGGCAG | Diverse functions including acting as a transcriptional repressor and insulator [ |
| Endonucleases | EcoRI restriction site | G|AATTC | Sequence specific cleavage of DNA [ |
aPatterns are representative and roughly define the specificity of the motif binding partner. Pattern syntax for proteins: letters denote a specific amino acid; “x” denotes any amino acid; square brackets denote a subset of allowed amino acids; curly brackets denote length variability; round brackets indicate a position targeted for post-translational modification after motif recognition; “p” denotes a phosphorylation site required for binding; “o” denotes a hydroxylation site required for binding; “|” denotes a cleavage site; “Φ” (phi) denotes a aliphatic residue; “NH2-” indicates the amino-terminus of the protein; “-COOH” indicates the carboxyl-terminus of the protein. Pattern syntax for DNA and RNA: “/” denotes a splice site. “K” denotes a guanine or a uracil; “Y” denotes an adenine or a cytosine; “R” denotes an adenine or a guanine; “N” denotes any base; “|” denotes a cleavage site
Fig. 1Motif-dependent co-regulation of proteins. a Schema showing the expansion of a regulatory network. The original ancestral network will likely contain a limited number of targets. Proteins can be added to the network as they acquire the necessary motifs through ex nihilo evolution of novel motifs. Different species will have different regulatory networks [26, 28–30, 122, 123]. b Representative motif used to perform basal functionality. Importin-alpha bound to a nuclear localisation signal (NLS)-containing peptide from Myc [124] and representative examples of NLS motifs [125–130], showing the shared residues complementary to the binding pocket (side chains shown in structure) that result in the consensus sequence. c Representative motif involved in conditional transmission of cell state information to the motif-containing protein. Cyclin-A2 bound to a Cyclin docking motif in Cellular tumor antigen p53 [131] and representative examples of Cyclin docking motifs [131–135]. d Representative motif involved in conditional transmission of cell state information to the motif-containing protein. PKB beta bound to a PKB phosphorylation site peptide from Glycogen synthase kinase-3 beta [136] and representative examples of PKB phosphorylation sites [137–141]. The modified residue is shown in orange. e Representative motif used to recruit variable components to an invariant complex core. The PIP box-binding pocket of PCNA bound to a PIP box from p21 [142] and representative examples of PIP boxes [142–147]. f Examples of conditional motif-driven regulatory networks in which motifs underlie the co-regulation of multiple biomolecules in a coordinated manner to respond to changes in Ca2+ levels. Increased Ca2+ levels can result in motif-dependent phosphorylation (p+), dephosphorylation (p-) or competitive binding events (calcium/calmodulin-dependent protein kinase (CaMK) recognises Rxx[ST] [64], Calcineurin (CN) phosphatase recruits substrates through PxIxIT or LxVP docking motifs [65], and Calmodulin (CaM) recognises hydrophobic helical IQ motifs [66])
Fig. 2Examples of co-operative interactions mediated by DNA, RNA and protein motifs. a DNA motif specificity through multivalent interactions with motif-binding domains in multimeric complexes. Structure of Retinoic acid receptor alpha (RARA) (green) and Retinoic acid receptor RXR-alpha (RXRA) (red) heterodimer bound to a retinoic acid response element (5′-AGGTCAAAGGTCA-3′) (blue) [107]. Each protein binds to a 6-mer “half-site” (5′-AGGTCA-3′) giving the complex specificity for a 12-mer motif. b RNA motif specificity through multivalent interactions with tandem arrays of motif-binding domains. Structure of the tandem Zinc Fingers of Zinc finger protein 36, C3H1 type-like 2 (ZFP36L2) (green) bound to an RNA class II AU-rich element (ARE) (5′-UUAUUUAUU-3′) (blue). Each Zinc Finger recognises 4 nucleotides of RNA, allowing the tandem domains to recognise an 8-mer motif [78]. c Protein motif specificity through multivalency. Structure of yeast APC/C-Cdh1 modulator 1 (Acm1) (blue) bound to APC/C activator protein Cdh1 (green) showing the 3 binding pockets for the D box (RxxLxxL), KEN box (KEN) and ABBA motif (FxLYxE) on the WD40 repeat of Cdh1 [80]. d Example of competitive motif-mediated binding involving two motifs. Binding of a single biomolecule/complex to a motif is sufficient to perform the biological function; however, when a second biomolecule is present, the function facilitated by the first site is inhibited [19, 87, 148–150]. e Schematic example of co-operative motif-mediated interactions involving two motifs. In the example, binding of a single interface is insufficient to elicit the functional outcome of binding. Once the second motif-binding interface associates, the trimeric complex can bind with sufficient affinity/avidity to elicit the biological outcome. f Modification on or near a regulatory motif can modulate the motif either positively [89, 151–154] or negatively [18, 19, 94]. g Motif accessibility is required for binding partner recruitment and, consequently, is often utilised as a step of regulation [18, 19, 99, 100, 155]
Fig. 3Distinct regulatory programs and protein modularity. a The higher eukaryotic cell has a large repertoire of protein modules, represented here by different shapes with different colours, that are reused by evolution to encode many aspects of protein functionality, including its subcellular localisation (pentagons), stability (triangles), modification state (circles) and interactome (rectangles). The ex nihilo acquisition of a targeting SLiM can result in protein relocalisation. For instance, while a protein without an NLS motif (top) is expressed ubiquitously throughout the cytoplasm (blue zone), acquisition of an NLS motif (bottom, red pentagon) results in specific localisation of the protein in the nucleus (blue zone). b The ex nihilo acquisition of a degradation motif can result in changes to the temporal, spatial or conditional local abundance of a protein. For instance, while the abundance of a protein without a cell cycle-specific degron (top) is independent of the different phases of the cell cycle, acquisition of a cell cycle-specific degron (bottom, green triangle), for example a D box motif, allows the abundance of the protein to be adjusted for a specific phase of the cell cycle. c Example of co-regulation of a protein by the same motif (boxed blue pentagon). The three different proteins will be regulated in a similar manner under specific conditions through recruitment of the same binding partner by the shared motif, for instance cell cycle-dependent degradation of cell cycle regulators such as Acm1 [156], Cyclin A [157] and Securin [158], which are targeted to the APC/C for ubiquitylation through their D box motifs. d Proteins with instances of the same globular domain (boxed brown rectangle) can have hugely different life cycles depending on the set of motifs present in the protein. While the proteins have a similar activity due to the shared globular domain, their distinct motif content subjects them to specific regulatory programs and diversely controls their life cycle, as is the case for the different members of the CDC25 family of phosphatases [117] and the Cyclin-dependent kinase inhibitor family [118]
Representative examples of motifs modulating the abundance and function of Cyclin-dependent kinase inhibitor 1 (p21)
| Motif | Motif sequence | Binding domain/partner | Function |
|---|---|---|---|
| Protein short linear motifs | |||
| Cyclin docking motif [ | 19RRLF22 | Cyclin fold of G1/S-specific cyclin-E1 | Inhibition of Cyclin E-Cdk2 catalytic activity and substrate recruitment |
| Cyclin docking motif [ | 155RRLIF159 | Cyclin fold of G1/S-specific cyclin-E1 | Docking to the Cyclin E subunit of the Cyclin E-Cdk2 kinase complex, which results in phosphorylation of p21 at S130 by Cdk2 and subsequent destabilisation of p21 |
| PCNA-binding PIP box [ | 144QTSMTDFYHS153 | Proliferating cell nuclear antigen | Inhibition of the DNA polymerase delta processivity factor PCNA, resulting in G1 and G2 cell cycle arrest |
| Nuclear localisation signal (NLS) [ | 142RRQTSMTDFYHSKRRLI158 | Armadillo domain of Importin-alpha | Translocation of p21 from the cytosol to the nucleus where it exerts it’s effects on cell proliferation |
| APC/C-binding D Box degron [ | 86RDELGGGR93 | WD40 repeat of Cell division cycle protein 20 homolog | Ubiquitylation of p21, thereby targeting the protein for proteasomal degradation during prometaphase |
| PIP degron motif [ | 145TSMTDFYHSKRRL157 | WD40 repeat of Denticleless protein homolog | PCNA- and ubiquitin-dependent proteasomal degradation of p21 in S phase and after UV irradiation |
| Cdk2 phosphosite [ | 130(S)P131 | Kinase domain of Cyclin-dependent kinase 2 | Targets p21 for ubiquitylation and subsequent proteasomal degradation |
| PKB phosphosite [ | 140RKRRQ(T)145 | Kinase domain of Protein kinase B (PKB) | Results in cytoplasmic localisation of p21, prevents complex formation with PCNA, and decreases the inhibitory effect on Cyclin-Cdk complexes |
| NDR phosphosite [ | 141KRRQT(S)146 | Kinase domain of nuclear-Dbf2-related (NDR) kinases | Destabilisation of p21 protein to control G1/S progression |
| RNA motifs | |||
| miRNA [ | miRNA seed region (AAAGUGC) complementary sites within the 3′-UTR | miRNA miR-17,20a, 20b, 93, 106a, and 106b | Down-regulation of p21 expression |
| HuD binding site [ | 688UUGUCUU695 | RRM domain of ELAV-like protein 4 | Increased stability of p21 mRNA |
| HuR binding site [ | AU-rich elements within nt 751–850 | RRM domain of ELAV-like protein 1 | Increased stability of p21 mRNA |
| RNPC1 binding site [ | AU-rich elements within nt 621–750 | RRM domain of RNA-binding protein 38 | Increased stability of p21 mRNA |
| Msi-1-binding site [ | 1819GUAGU1823 (on a loop portion of a stem–loop–stem structure) | RRM domain of RNA-binding protein Musashi homolog 1 | Inhibition of p21 mRNA translation to regulate progenitor maintenance |
| GC-rich sequence [ | within nt 37–59 | RRM domain of CUGBP Elav-like family member 1 | Increased translation of p21 mRNA |
| GC-rich stem–loop structure [ | within nt 37–59 | Calreticulin | Blocks translation of p21 mRNA via stabilisation of a stem-loop structure within the 5′ region |
| CU-rich sequence [ | CCANNCC within the 3′-UTR | KH domain of Heterogeneous nuclear ribonucleoprotein K | Repression of p21 mRNA translation |
| DNA regulatory elements | |||
| p53-responsive element [ | GAACATGTCCCAACATGTT at −2233 and GAAGAAGACTGGGCATGTCT at −1351 | Cellular tumor antigen p53 | p53-mediated up-regulation of p21 gene transcription in response to stress signals such as DNA damage |
| E-box motif [ | CAGCTG at −420, −163, −20 and −5 | Helix-Loop-Helix of Transcription factor AP-4 | AP-4-dependent repression of p21 gene transcription in response to mitogenic signals |
| Retinoid X response element (RXRE) [ | AGGTCAGGGGTGT at −1198 and GAGGCAAAGGTGA at −1221 | zf-C4 zinc finger of Retinoic acid receptor RXR-alpha | RXR ligand-dependent induction of p21 gene expression by RXR-alpha |
| Retinoid acid response element (RARE) [ | AGGTGAAGTCCAGGGGA at −1212 | zf-C4 zinc finger of Retinoic acid receptor alpha (RAR-alpha) | Retinoic acid-dependent induction of p21 gene expression by RAR-alpha |
| Vitamin D response element (VDRE) [ | AGGGAGATTGGTTCA at −770 | zf-C4 zinc finger of Vitamin D3 receptor | 1,25-dihydroxyvitamin D3-dependent induction of p21 gene expression by Vitamin D3 receptor |
| CDX binding site [ | Three TTTAT within −471 to −434 | Homeobox domain of Homeobox protein CDX-2 | Activation of p21 gene transcription by CDX-2 |
| T-element [ | AGGTGTGA close to the transcription start site (TSS) | T-box of T-box transcription factor TBX2 | Repression of the p21 gene promoter by TBX2 |
| STAT binding element [ | TTCCCGGAA at −647, TTCTGAGAAA at −2541 and CTTCTTGGAAAT at −4183 | STAT fold of Signal transducer and activator of transcription (STAT) proteins STAT1/STAT3/STAT5 | STAT-dependent activation of p21 gene expression in response to several cytokines |
| NF-IL6 site [ | GTACTTAAGAAATATTGAA at approximately −1900 | bZIP domain of CCAAT/enhancer-binding protein beta | Induction of p21 gene expression by CCAAT/enhancer-binding protein beta |
| Sp1 binding site [ | 6 GC-rich Sp1-binding sites between −120 and TSS | C2H2 zinc finger of Transcription factor Sp1/Sp3 | Sp1/Sp3-dependent induction of p21 gene expression |
| AP2 binding site [ | GCGGTGGGC at −103 | Transcription factor AP-2-alpha | Induction of p21 transcription and growth arrest by AP-2-alpha |
| E2F binding site [ | CTCCGCGC at −155 and CGCGC at −103, −89 and −36 | Winged-Helix of Transcription factor E2F1 | Activation of the p21 gene at the G1/S boundary by E2F1 |
| Forkhead binding site [ | TGTGTGC at +200 3′ of TSS | Forkhead domain of Forkhead box protein P3 | Induction of p21 transcription by Forkhead box protein P3 |
Fig. 4Modular architecture of p21 gene, pre-mRNA and protein, showing known functional modules (see Table 2). a The p21 gene contains: two p53-responsive elements [159, 160]; four E-box motifs for binding Transcription factor AP-4 [161]; retinoid X response [162], retinoid acid response [163] and Vitamin D response [164] elements; three STAT-binding elements that recruit STAT1, STAT3 and STAT5 dimers [165, 166]; three CDX-binding sites that bind homeobox protein CDX-2 [167]; a T-element that binds the T-box transcription factor TBX2 [168]; a binding site for CCAAT/enhancer-binding protein beta [169]; six Sp1-binding sites [170–173]; a site for binding Transcription factor AP-2-alpha [174]; sites for Transcription factor E2F1 [175]; a Forkhead-binding site for Forkhead box protein P3 [176]. b The p21 (pre-)mRNA contains: AU-rich elements in the 3′-UTR for binding ELAV-like protein 4 [177], ELAV-like protein 1 [178], and RNA-binding protein 38 [179]; a binding site for RNA-binding protein Musashi homolog 1 [180]; GC-rich sequence binding CUGBP Elav-like family member 1 and calreticulin (CRT) [148]; CU-rich sequence in the 3′-UTR for binding heterogeneous nuclear ribonucleoprotein K [181]; splice donor and acceptor site for recruitment of the spliceosome machinery for intron removal. ORF: open reading frame. c The p21 protein contains: the intrinsically disordered Cyclin-dependent Kinase Inhibitor (CKI) region [182]; a PIP degron recruiting Denticleless protein homolog [183, 184]; a D box for docking to the Cell division cycle protein 20 homolog subunit of the APC/C [185]; a PIP box for docking to the DNA polymerase delta processivity factor PCNA [142, 186]; one N-terminal and one C-terminal RxL Cyclin docking motif for binding to the Cyclin E subunit of the Cyclin E-Cdk2 kinase complex [187, 188]; an NLS for recruitment to the nuclear import machinery [189]; a modification motif for phosphorylation at T145 by PKB [190, 191]; a modification motif for phosphorylation at S146 by nuclear-Dbf2-related (NDR) kinases [192]; a modification motif for phosphorylation at S130 by Cyclin E-Cdk2 kinase complex [193, 194]