| Literature DB >> 32802274 |
Gergely Nagy1, Laszlo Nagy1,2.
Abstract
Collaboration of transcription factors (TFs) and their recognition motifs in DNA is the result of coevolution and forms the basis of gene regulation. However, the way how these short genomic sequences contribute to setting the level of gene products is not understood in sufficient detail. The biological problem to be solved by the cell is complex, because each gene requires a unique regulatory network in each cellular condition using the same genome. Thus far, only some components of these networks have been uncovered. In this review, we compiled the features and principles of the motif grammar, which dictates the characteristics and thus the likelihood of the interactions of the binding TFs and their coregulators. We present how sequence features provide specificity using, as examples, two major TF superfamilies, the bZIP proteins and nuclear receptors. We also discuss the phenomenon of "weak" (low affinity) binding sites, which appear to be components of several important genomic regulatory regions, but paradoxically are barely detectable by the currently used approaches. Assembling the complete set of regulatory regions composed of both weak and strong binding sites will allow one to get more comprehensive lists of factors playing roles in gene regulation, thus making possible the deeper understanding of regulatory networks.Entities:
Keywords: Basic leucine zipper; Motif grammar; Nuclear receptor; Transcription factor; Weak motif
Year: 2020 PMID: 32802274 PMCID: PMC7406977 DOI: 10.1016/j.csbj.2020.07.007
Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN: 2001-0370 Impact factor: 7.271
Fig. 1Classification of transcription factor binding sites. Monomer binding sites, which can be bound by single TFs, often add up to larger units. Two, essentially identical half-sites can be bound by homodimers formed by identical TFs (blue circles, right) or heterodimers formed by related TF partners (blue and green circle). Composite elements are built up form at least two monomer binding sites specific for different TFs (blue and dark purple circles). Boxes colored according to TFs represent monomer binding sites. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Features of motif and enhancer (promoter) grammar.
| Motif grammar | Enhancer grammar | Features | |||
|---|---|---|---|---|---|
| Monomer binding site | Dimer binding site | Composite element | Cluster of elements | ||
| ? | ? | ? | + | Gene regulation | |
| 3–15 | 6–20 | 10–25 | 6– hundreds | Size (bp) | |
| 1 | 2 | 2–3 | 1– tens | Number of binding sites | |
| – | –/+ | + | + | Type | |
| – | –/+ | + | + | Order | |
| – | + | + | + | Orientation | |
| – | + | + | + | Spacing | |
| + | + | + | + | Strength | |
| + | + | + | + | Shape | |
Sequence features that contribute (+) or do not contribute (−) to motif/enhancer complexity (specificity) are indicated. Homodimer binding sites, for instance, contain basically identical monomer binding sites (half-sites), so their orientation, spacing, strength, and shape can vary (+), but their type and order are self-evident (−), while in other dimer binding sites the type and order can also be determinate features (+). Unlike in the case of enhancers (promoters), the effect of individual elements on gene expression is uncertain (‘?’).
Summary of representative transcription factors and their binding sites.
| Name of transcription factors (element) | Citation(s) |
|---|---|
| MEIS1/MEIS1 | Jolma et al. 2015 |
| MEIS1/DLX3 | Jolma et al. 2015 |
| OCT4/SOX2 | Rodda et al. 2005 |
| IRF/IRF (ISRE) | Fujii et al. 1999 |
| PU.1/IRF4/8 (EICE/EIRE) | Meraro et al. 2002 |
| IRF4/8/PU.1 (IECS) | Tamura et al. 2005 |
| GCM1/ELK1 | Jolma et al. 2015 |
| STAT/STAT (GAS) | Pearse et al. 1993 |
| STAT6/STAT6 | Li et al. 2016 |
| JUN/FOS (TRE) | Deppmann et al. 2006 |
| CREB, ATF, JUN dimers (CRE) | Deppmann et al. 2006 |
| sMAF/CNC (MARE) | Inamdar et al. 1996 |
| lMAF/lMAF (MARE) | Kataoka et al. 1994 |
| C/EBP/C/EBP | Cohen et al. 2018 |
| C/EBP/ATF4 (CARE) | Cohen et al. 2018 |
| JUNB/BATF/IRF4/8 (AICE) | Glasmacher et al. 2012 |
| NR3C dimers (IR3) | Mangelsdorf et al. 1995 |
| ER/ER (ERE) | Mangelsdorf et al. 1995 |
| RAR/RXR (RARE, DR0-2,5,8, IR0) | Rastinejad et al. 1995 |
| PPAR/RXR (PPRE) | IJpenberg et al. 1997 |
| REV-ERB/REV-ERB (Rev-DR2) | IJpenberg et al. 1997 |
| RXR/LXR (LXRE) | Feldmann et al. 2013 |
| RXR/VDR (VDRE) | Rastinejad et al. 1995 |
| RXR/THR (THRE) | Rastinejad et al. 1995 |
| ROR (RORE) | IJpenberg et al. 1997 |
| NR3B | Johnston et al. 1997 |
| NR4A | Wilson et al. 1992 |
| NR5A | Lala et al. 1992 |
MEIS1, myeloid ecotropic viral integration site 1; DLX3, distal-less homeobox 3; OCT4, octamer-binding transcription factor 4; SOX2, sex determining region Y (SRY)-box 2; IRF, interferon regulatory factor; ISRE, interferon-stimulated response element; PU.1, purine-rich nucleic acid binding protein 1; ETS, erythroblast transformation-specific; EICE/EIRE, ETS:IRF composite/response element; IECS, IRF:ETS composite sequence; GCM1, glial cells missing transcription factor 1; ELK1, ETS-like 1; STAT, signal transducer and activator of transcription; GAS, interferon γ-activated sequence; RARE, retinoic acid response element; further abbreviations for bZIP and NR proteins are listed in Fig. 2, Fig. 3.
Fig. 2Schematic representation of major bZIP motifs. TRE/CRE (5′-TGA(C/G)-3′) and MARE (5′-TGCTGA(C/G)-3′) half-sites are marked by blue arrows, C/EBP half sites are marked by green arrows, and spacer nucleotides are marked by black dots. Schematic motif logos and motif/protein names are indicated (TRE, TPA response element; CRE, cAMP response element; CREB, CRE binding protein; ATF, activating transcription factor; MAF, musculoaponeurotic fibrosarcoma protein; CNC, cap′n′collar-type bZIP protein; C/EBP, CCAAT/enhancer-binding protein; (s)MARE, (small) MAF response element; CARE, C/EBP:ATF response element). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Fig. 3Schematic representation of nuclear receptor motifs. The general (5′-(A/G)GGTCA-3′) and the NR3C steroid hormone receptor-specific (5′-AGAACA-3′) half-sites are marked by blue or green arrows, respectively. 5′ extensions are marked by cyan arrows, and spacer nucleotides are marked by black dots. Schematic motif logos and motif/protein names are indicated (IR, inverted repeat; DR, direct repeat; ROR, retinoic acid receptor (RAR)-related orphan receptor; PPAR, peroxisome proliferator (PP)-activated receptor; RXR, retinoid X receptor; VDR, vitamin D receptor; THR, thyroid hormone receptor; LXR, liver X receptor; ERE, estrogen response element; RORE, ROR response element; PPRE, PP response element; VDRE, vitamin D response element; THRE, thyroid hormone response element; LXRE, LXR response element). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)