Literature DB >> 30602438

Helix-loop-helix proteins and the advent of cellular diversity: 30 years of discovery.

Abstract

Helix-loop-helix (HLH) proteins are dimeric transcription factors that control lineage- and developmental-specific gene programs. Genes encoding for HLH proteins arose in unicellular organisms >600 million years ago and then duplicated and diversified from ancestral genes across the metazoan and plant kingdoms to establish multicellularity. Hundreds of HLH proteins have been identified with diverse functions in a wide variety of cell types. HLH proteins orchestrate lineage specification, commitment, self-renewal, proliferation, differentiation, and homing. HLH proteins also regulate circadian clocks, protect against hypoxic stress, promote antigen receptor locus assembly, and program transdifferentiation. HLH proteins deposit or erase epigenetic marks, activate noncoding transcription, and sequester chromatin remodelers across the chromatin landscape to dictate enhancer-promoter communication and somatic recombination. Here the evolution of HLH genes, the structures of HLH domains, and the elaborate activities of HLH proteins in multicellular life are discussed.

Entities: Chemical

Keywords: E proteins; HLH; Id proteins; hematopoiesis; myogenesis; neurogenesis; phylogeny; programming differentiation; somitogenesis; structure

Mesh：

Substances：

Year: 2019 PMID： 30602438 PMCID： PMC6317319 DOI： 10.1101/gad.320663.118

Source DB: PubMed Journal: Genes Dev ISSN： 0890-9369 Impact factor: 12.890

More than 30 years ago, a group of DNA elements sharing conserved sequences was identified within the enhancers of the immunoglobulin heavy and light chain loci (Church et al. 1985; Ephrussi et al. 1985). In a series of elegant experiments, these distinct DNA segments, named E-box sites, were found to be protected from in vivo methylation in B-lineage cells. The E-box sites contained a signature core of six nucleotides: CANNTG. Simultaneously, MyoD was identified. MyoD was an eye-opener with the remarkable ability to convert fibroblasts into myoblasts. MyoD was the first example of programmed transdifferentiation. MyoD induced the expression of hundreds of genes associated with skeletal muscle cell identity (Davis et al. 1987). Analysis of its primary amino acid sequence revealed that MyoD shared a region of similarity with the myc family of proteins. The shared feature was called the c-myc homology region (Tapscott et al. 1988). In a parallel study, two proteins (E12 and E47) were identified. Both proteins bound to an E2-box site located in the intronic immunoglobulin light chain gene enhancer but differed from each other in the c-myc homology region (Murre et al. 1989a). Further investigation revealed that E12 and E47 were generated by differential splicing of exons encoding for the DNA-binding domain (Murre et al. 1989a; Yamazaki et al. 2018). This variation contributed to their unique affinities: E12 bound weakly to DNA, whereas E47 bound DNA with high affinity (Murre et al. 1989a; Sun and Baltimore 1991). Helical wheel analysis of both DNA-binding domains revealed two highly conserved amphipathic α helices separated by a loop domain. The conservation of the helices led to the hypothesis that the helix–loop–helix (HLH) motif functioned as a dimerization domain (Murre et al. 1989a). Indeed, simple mixing and binding experiments showed that the HLH region mediated homodimerization and heterodimerization (Murre et al. 1989a,b). Further analysis uncovered that, adjacent to the HLH region, a cluster of conserved basic amino acids mediated DNA binding (Davis et al. 1990). Hence, the name basic HLH (bHLH) proteins was established.

Characterization of HLH proteins

Thirty years have passed since those initial findings, and the field has witnessed the identification of ever-increasing additional HLH proteins. Based on expression patterns, dimerization selectivities, and DNA-binding specificities, the HLH proteins were segregated into distinct classes (Murre et al. 1989b). Class I HLH proteins, now categorized as E proteins, include E12, E47, E2-2, HEB, daughterless (Drosophila melanogaster), and HLH-2 (Caenorhabditis elegans). E proteins are abundantly expressed in many lineages and bind DNA as either homodimers or heterodimers. Class II HLH proteins include MyoD, myogenin, SCL, NeuroD1, NeuroD2, and members of the achaete–scute complex. This subset of HLH proteins binds DNA as either homodimers or heterodimers with E proteins and is lineage-restricted. Class III proteins include c-myc, TFE3, sterol regulatory element-binding protein 1a (SREBP-1), and Mi. In addition to the HLH domain, they contain a leucine zipper dimerization domain and function as either transcriptional activators or repressors (Carroll et al. 2018). The majority of yeast and plant HLH proteins belong to this class of HLH proteins. Class IV HLH proteins include Mad, Max, and Mxi. They form heterodimers with c-myc and among each other to bind a distinct set of E-box sites (CACGTG or CATGTTG) (Carroll et al. 2018). Class V HLH proteins contain a HLH domain but lack a basic region. Upon interaction with bHLH proteins, they antagonize the DNA-binding activity of their bHLH partners (Benezra et al. 1990). Prominent members are the Id proteins Id1–4 and the D. melanogaster gene product Extramacrochaete (Emc). Class VI HLH proteins are characterized by the presence of a proline residue in their basic region. They include Hes and enhancer of split proteins. Class VI HLH proteins recognize a unique DNA sequence, named the N box (CACGCG or CACGAG), and function predominantly as transcriptional repressors by binding to Groucho (Paroush et al. 1994). Finally, class VII HLH proteins display as their defining feature multiple PAS domains that function as signaling sensors to light and oxygen. They include members of the HLH proteins that control the circadian clock (BMAL and CLOCK) and a response pathway to hypoxia (hypoxia-inducible factor α [HIF-α] and aryl hydrocarbon nuclear translocator [ARNT]) (Wang et al. 1995; Lowrey and Takahashi 2004; Huang et al. 2012). This class binds predominantly ACGTG or GCGTG core sequences and functions as transcriptional repressors.

The structure of the HLH domain

The modeling suggested that the E-protein DNA-binding domain was folded into an HLH configuration. Detailed structural analysis soon followed. It validated the modeling. Crystal structures of E47 and MyoD HLH homodimers revealed two α helices connected by a loop domain as well as a basic region that contacted residues in the major groove (Fig. 1; Ellenberger et al. 1994; Ma et al. 1994). More specifically, E47 folded as a parallel four-helix bundle that intertwined, positioning the basic region in contact with each half of the E-box site (CAC or CAG). Two highly conserved amino acids—a glutamate and an arginine residue in the E47 DNA-binding region—mediated DNA-binding specificity. The glutamate residue directly contacted CA nucleotides, while arginine residues stabilized DNA binding. Dimerization united the two amphipathic helices common among all HLH proteins. The highly conserved hydrophobic interactions were stabilized through numerous van der Waals interactions, which facilitated dimerization.

Figure 1.

Representative structures for an ensemble of HLH domains. Note that for HIF:ARNT dimers, only the HLH domains are shown in color. Other domains are labeled in gray. The two chains are distinguished by dark versus light gray. The crystal structures of class III and class IV HLH domains and leucine zippers have also been elucidated. These classes bind DNA as quasisymmetric heterodimers, stabilized by hydrophobic and polar interactions that involve helix I and helix II as well as residues continuous with helix II located in the leucine zipper. The crystal structure of the Myc–Max heterodimer was particularly informative (Nair and Burley 2003). Myc and Max fold as two heterodimers that are oriented in a head-to-tail configuration, resulting in an anti-parallel four-helix bundle (Fig. 1; Nair and Burley 2003). The SREBP-1a, which modulates cholesterol metabolism, also contains an HLH domain that is continuous with a leucine zipper region (Fig. 1). The SREBP-1a domain is unusual, as it interacts with an asymmetric binding site. The asymmetric binding site recognition is instructed by a tyrosine rather than an arginine residue in the basic region, leading to a loss of polar interactions between the basic region and the major groove (Parraga et al. 1998). Of particular interest are class VII HLH proteins that contain PAS domains. Prominent among them are CLOCK and BMAL1, which contain ACGTG or GCGTG core sequences in their binding sites. The crystal structure of the CLOCK/BMAL heterodimer revealed an unusual asymmetric heterodimer in which the HLH and PAS domains are intertwined to promote heterodimerization (Fig. 1; Huang et al. 2012). The HIFα–ARNT factors also form HLH heterodimers and contain PAS domains. Recent crystal studies revealed that helix I and helix II of HIF-α and ARNT are extended and oriented such that the basic region interacts with residues in the major groove to permit interactions between the HIF-α PAS-A domain and the minor groove (Fig. 1; Wu et al. 2015). In the absence of DNA, contiguous interactions formed between the HIF-α HLH and PAS domains, whereas these domains were spatially segregated in ARNT (Fig. 1). The architecture of ARNT is distinct from CLOCK/BMAL heterodimers and may reflect a requirement to form stable HIF/ARNT heterodimers in both the absence and presence of DNA binding. The HIF/ARNT structure also uncovered five potential binding pockets for small ligands in the PAS domains. Notably, one such ligand, proflavine, bound to one of the pockets, consistent with its ability to interfere with HIF-α–ARNT heterodimerization. These findings revealed how HLH proteins can alter their structures in response to environmental signals. Despite their intricate differences in structure, the overall architecture of HLH proteins displays deep conservation. Their shared features raise questions about how HLH domains associated with multicellular life relate to unicellular HLH proteins. Pho4 is an HLH protein in yeast that regulates the expression of genes encoding for enzymes involved in the biosynthesis of phospholipids and amino acids. The crystal structure of the Pho4 HLH domain showed a four-helix bundle that binds the E-box site as a homodimer (Fig. 1; Shimizu et al. 1997). Except for a short helical region in the loop domain, the general structure of Pho4 appeared remarkably similar to mammalian HLH domains (Fig. 1). Thus, the HLH domain is an ancient DNA-binding domain that arose in unicellular organisms, and, although it expanded and diversified across the animal and plant kingdoms, its structural features essentially remained the same.

HLH genes in the light of evolution

In 1964, Theodosius Dobzhansky noted “Nothing in biology makes sense except in the light of evolution” (Dobzhansky 1964). HLH proteins are no exception. Whereas prokaryotes lack HLH genes, HLH genes have been identified in all eukaryotes, including yeast, fungi, metazoans, and plants. In Sacchromyces cerevisiae, eight HLH gene products regulate the expression of genes involved in glycolysis and in phospholipid, phosphate, and amino acid biosynthesis (Robinson and Lopes 2000). In multicellular organisms, the number of HLH genes greatly expanded and diversified (Fig. 2). Detailed sequencing analysis of genomes representing the main metazoan and plant lineages has helped to define when and how HLH families originated and to what extent they are related. These studies showed that an initial diversification of the HLH genes dates to pre-Cambrian times (600 million years ago), prior to emergence of the metazoans (Degnan et al. 2009). This initial diversification was followed by a second massive expansion of HLH genes early in metazoan evolution, culminating in the diversification of bilatarians (animals with symmetric body plans) and cnidarians (animals associated with radial symmetric body plans).

Figure 2.

Schematic diagram of HLH genes encoded within the genomes of multicellular organisms. (A) Diagram depicting the number of HLH genes associated with holozoans; i.e., animals and closest unicellular relatives but excluding fungi (Simionato et al. 2007). (B) Diagram indicating the number of HLH genes associated with the plant kingdom (Pires and Dolan 2010a). Phylogenetic relationships are indicated. Note that the lengths of the connectors in the diagram do not correspond to differences in geological time.

Deep phylogenetic analysis of metazoan HLH genes revealed a large ensemble of genes that could be segregated into 44 different families (Fig. 2; Simionato et al. 2007). The vast majority of these genes originated in protosotomes (animals with bilateral symmetry and three germ layers that undergo spiral cleavage during cell division) and deuterostomes (animals with bilateral symmetry and three germ layers that undergo radial cleavage during cell division). These genes may also have been present in urbitlateria (the hypothetical last common ancestor of the bilaterians living in pre-Cambrian times). Several of these genes are members of the E2A, MyoD, Twist, and achaete–scute families, indicating that these HLH proteins all radiated from ancestors that emerged prior to the diversification of bilaterians and nonbilaterians. During the evolution of bilaterians, the HLH gene pool essentially remained the same. Schematic diagram of HLH genes encoded within the genomes of multicellular organisms. (A) Diagram depicting the number of HLH genes associated with holozoans; i.e., animals and closest unicellular relatives but excluding fungi (Simionato et al. 2007). (B) Diagram indicating the number of HLH genes associated with the plant kingdom (Pires and Dolan 2010a). Phylogenetic relationships are indicated. Note that the lengths of the connectors in the diagram do not correspond to differences in geological time. To obtain mechanistic insight into how HLH genes were retained, gained, or lost during early animal evolution, HLH genes in the genomes of Lophotrochozoans (a highly diverse group of organisms that belong to the bilaterians) were analyzed (Bao et al. 2017). The Lophotrochozoans include the annelids, brachiopods, and the molluscs, a heterogeneous phylum of at least 100,000 species. Analysis revealed that gene duplication played a key role in generating HLH gene diversity. HLH genes that acquired new functions (paralogs) were frequently linked as clusters, some of which represented the “remains” of HLH genes dating back >540 million years ago, prior to the separation of the annelid and mollusc lineages. Others more recently duplicated from ancestral HLH genes and acquired related yet new activities. Notably, HLH genes in the molluscs generally appeared to undergo very few changes at the family level but displayed substantial diversification within families as a result of gene duplication (Bao et al. 2017). In plants, HLH diversification dates back to 400 million years ago (Fig. 2). A mere four HLH proteins were identified in the genomes of chlorophytes and red algae, whereas a staggering 100–170 HLH genes composing 26 single branches (clades) are present in land plants (Pires and Dolan 2010a). Twenty clades were found in ancestors that preceded extant mosses and vascular plants (Fig. 2; Pires and Dolan 2010b). The establishment of HLH clades early in plant evolution suggests rapid diversification of ancestral species, possibly associated with the movement into new habitats. Although HLH gene gains and losses occurred throughout subsequent plant evolution, the majority of these genes remained conserved, giving rise to HLH genes associated with the genomes of modern plants. At first glance, the dramatic expansion and diversification of HLH proteins appear to correlate with the generation of cellular diversity. The initial increase in the number of metazoan HLH genes occurred in parallel with the evolution of multicellularity (Fig. 2; Simionato et al. 2007). Analysis of HLH genes in model organisms such as humans, mice, and rats have revealed massive expansion in the number of HLH genes that occurred during bilateria evolution, which is closely associated with increasing tissue and lineage complexity. The diversification of HLH genes in metazoans and plants may be directly related to the acquisition of multicellular life forms. Locus duplication from an ancestral gene allows genes to be regulated by novel ensembles of regulatory and architectural elements to establish new patterns of gene expression. Combined with the remarkable ability of HLH genes to program and reprogram patterns of gene expression, gene duplication and diversification from ancestral genes may underpin the mechanism by which, at least in part, cellular diversity was generated.

E proteins

E12 and E47 are members of a subset of HLH proteins, branded as E proteins. E proteins also include HEB, E2-2, daughterless, and HLH-2. E proteins have been characterized predominantly in the context of immune cell development (Bain and Murre 1998). However, they are also involved in other developmental pathways, including D. melanogaster sex determination and neurogenesis and gonadogenesis in C. elegans (Caudy et al. 1988; Sallee et al. 2017). During embryogenesis, E47 promotes cortical plate neural development and, in adult neural precursor cells, represses an astrocyte-specific gene program. In contrast, the E-protein antagonist Id3 orchestrates BMP2-induced astrocyte differentiation (Pfurr et al. 2017). Upon cortical brain injury, Id3 is induced by BMP2 to neutralize E47 activity and promote astrocyte differentiation (Bohrer et al. 2015). The E2A proteins also function as tumor suppressors and are associated with a wide variety of chromosomal translocations associated with childhood leukemias (Bain et al. 1997; Aspland et al. 2001). Notably, HEB has the ability to functionally replace E2A in supporting B-cell development (Zhuang et al. 1998). These observations support the notion that gene duplication from ancestral genes may have occurred in part to permit genes being placed under novel regulatory elements rather than providing de novo biochemical activities. E-proteins bind as homodimers or heterodimers to DNA. They are not lineage-restricted, but their mRNA levels vary among different cell types (Murre et al. 1989a). E-protein levels are regulated by multiple mechanisms, such as dimerization-induced degradation (Sallee and Greenwald 2015). E proteins contain three highly conserved regions, named AD1, AD2, and AD3 (Aronheim et al. 1993; Massari et al. 1996; Zhang et al. 2004; Chen et al. 2013). The AD1 domain is present in all E proteins, folds into a helical structure, and contains a four-amino-acid motif named LDFS (Massari et al. 1996). The LDFS motif is also present in HLH proteins, including Rtg3, a nutrient sensor in S. cerevisiae. The AD1 domain regulates E-protein function by recruiting the histone acetyltransferase (HAT) P300/CBP to acetylate lysine residues across the histone tails (Bradney et al. 2003; Zhang et al. 2004). Acetylated lysine residues at histone tails recruit BRD4, a member of the bromodomain and extraterminal (BET) chromatin reader proteins (Nguyen et al. 2014). H3K27ac-marked histones interact with chromatin remodelers such as BRG1 to evict nucleosomes across E-protein-bound enhancer elements and provide DNA accessibility to cooperating transcription factors (Bossen et al. 2015). The AD1 domain also has the ability to repress downstream target gene expression by recruiting members of the ETO family (Zhang et al. 2004). Three closely related ETO members (ETO, ETO-2, and MTGR1) interact with histone deacetylases (HDACs), including HDAC1 and HDAC3. Finally, as mentioned above, E proteins selectively recruit CBP/P300 and/or ETO members (HATs vs. HDACs) to enhancer regions to acetylate and/or deacetylate lysine residues. E-protein occupancy is also closely associated with demethylation of CpG residues across the enhancer landscape (Benner et al. 2015; Lio et al. 2016; Orlanski et al. 2016). In fact, compelling studies have demonstrated that the E2A proteins promote demethylation by direct recruitment of the ten-eleven translocation (TET) proteins (Lio et al. 2016). How does E-protein occupancy at the mechanistic level activate lineage-specific programs of gene expression? Recent studies suggest that E proteins orchestrate enhancer–promoter communication at multiple levels, including geometric confinement (loop domains) and phase separation. E-protein-induced geometric confinement may involve noncoding transcription-induced loop extrusion (Isoda et al. 2017). E2A protein occupancy is closely associated with the activation of noncoding transcription, which in turn leads to recruitment of cohesion (Lin et al. 2012; Isoda et al. 2017). Once loaded, the cohesin complex activates the loop extrusion process until two CTCF sites are reached, sequestering superenhancers and promoter regions into a single loop domain (Isoda et al. 2017). E proteins may also act to orchestrate phase separation across superenhancers. It is well established that the E proteins recruit P300/CBP at both the biochemical and genome-wide levels (Bradney et al. 2003; Zhang et al. 2004; Lin et al. 2012). P300 acetylates lysine residues at H3 and H4 histone tails. The bromodomain-containing protein BRD4 binds multiple acetylated lysine residues on H3 and H4 tails (Nguyen et al. 2014). BRD4 also contains large intrinsically disordered regions (Sabari et al. 2018). These two properties may allow BRD4 to promote phase separation across loop domains associated with E-protein and P300 occupancy. Finally, E proteins themselves may self-organize into phase-separated droplets, as demonstrated recently for other HLH proteins such as c-Myc (Boija et al. 2018). Indeed, they are present in at least two distinct physical states and accumulate into droplets during developmental progression (Quong et al. 1999). E proteins interact with a wide spectrum of transcriptional regulators, coactivators, chromatin remodelers, TET proteins, and components of the splicing machinery (Teachenor et al. 2012; Lio et al. 2016). How such a large ensemble of factors interacts with the E2A proteins is an enigma but likely involves a spectrum of weak electrostatic and hydrophobic interactions that link disordered domains in a seemingly promiscuous fashion that collectively promotes a gel-like state (Boija et al. 2018). Thus, E proteins may act as a scaffold to recruit BRD4 to clusters of enhancers, concentrating an onset of biochemical reactions and ultimately activating lineage-specific gene programs. In sum, the key roles for E proteins in orchestrating gene expression involve (1) the activation of noncoding transcription and (2) the establishment of a phase-separated state across loop domains to facilitate long-range enhancer–promoter communication.

Id proteins

The DNA-binding activities of E proteins are counteracted by the Id gene products (Benezra et al. 1990). Four Id proteins, named Id1–4, have been identified in mammalian genomes. The Id proteins contain an HLH domain but lack a basic region and, upon heterodimerization, neutralize the DNA-binding activities of bHLH proteins. That Id proteins are involved primarily in targeting E proteins was revealed in studies showing that a decline in E-protein activity overcomes the need for Id expression to promote developmental progression (Yan et al. 1997; Rivera et al. 2000; Boos et al. 2007; Miyazaki et al. 2011, 2017; Zook et al. 2018). The Id proteins serve in a wide spectrum of developmental pathways in both health and disease (Lasorella et al. 2014). They function predominantly by modulating cell cycle progression, developmental progression, and tumor suppression (Lyden et al. 1999; Yokota et al. 1999; Schmitz et al. 2012; Lasorella et al. 2014; Miyazaki et al. 2015). Genes that are regulated by the E–Id module relating to cellular expansion and cell growth involve many, including p15, p16, p27, p27, p57, and cdk6 as well as von Hippel-Lindau (VHL) and c-myc (Prabhu et al. 1997; Pagliuca et al. 2000; Schwartz et al. 2006; Lasorella et al. 2014; Lee et al. 2016). In immune cell development, the E–Id protein axis regulates the expression of an armamentarium of genes that differ between cell types and developmental stages. They include genes encoding for transcription factors, components of signaling pathways, antigen receptors, chemokine receptors, DNA repair factors, enzymes involved in somatic recombination, and so on. Id transcription is regulated by a wide ensemble of receptors, including the T-cell receptor (TCR), B-cell receptor (BCR), TGFβ, BMP, FGF, and cytokine receptors (Bain et al. 2001; Kee et al. 2001; Ying et al. 2003; Yang et al. 2011). In embryonic stem cells, Id1 expression is activated by BMP involving the SMAD pathway to promote self-renewal and in the vasculature (Ying et al. 2003; Yang et al. 2014). The regulation of Id2 and Id3 expression in developing T cells is dynamic and complex. During thymocyte selection, Id3 expression is activated by pre-TCR and TCR signaling involving the ERK–MAPK–EGR1 pathway (Bain et al. 2001). In CD8 T cells, upon stimulation of the TCR, Id2 and Id3 expression rapidly declines, while E2A levels increase. Id2 levels are elevated again at the height of infection in short-lived memory and effector CD8 T cells to orchestrate survival and terminal differentiation (Omilusik et al. 2018). Id3 abundance, on the other hand, gradually increases as cells differentiate toward the long-lived memory CD8 T-cell pool (Yang et al. 2011; Omilusik et al. 2013). Regulatory T (Treg) cells express high levels of Id3 that rapidly decline upon activation, whereas Id2 levels rise (Miyazaki et al. 2014). Likewise, in naïve B cells, Id3 levels decline upon triggering of the BCR to initiate a germinal center-specific program of gene expression (Chen et al. 2016; Gloury et al. 2016). Thus, a common feature that arises is that, in adaptive immune cells, antigen receptor-mediated signaling activates or suppresses Id expression to modulate E-protein DNA-binding activity to orchestrate developmental progression beyond the checkpoint. The Id proteins are small proteins that are highly conserved and remarkably similar. Interfering with the expression of two or more Id genes leads to embryonic lethality (Lyden et al. 1999). They display similar if not identical dimerization specificities (Prabhu et al. 1997). Id proteins are short-lived, with half-lives <20 min. Their protein levels are regulated by the ubiquitin–proteasome pathway. Both E3 ligases and deubiquitylases dictate Id abundance. Id1, Id2, and Id4 all carry a D box in their C-terminal domain that is targeted by the anaphase-promoting complex/cyclosome (APC/C–CDH1) E3 ubiquitin ligase (Lasorella et al. 2006). The APC/C–CDH1 complex targets these Id proteins for degradation to promote cell cycle arrest, whereas the ubiquitin-specific protease 1 (USP1) modulates Id protein abundance to control cell growth (Williams et al. 2011). The stability of Id proteins is also critically dependent on partner choice. The Drosophila Id homolog Emc has a short half-life unless it forms heterodimers with daughterless (Li and Baker 2018). Pairing daughterless with proneural gene products rather than with Emc leads to rapid depletion of the Emc pool. Although Emc transcript levels are uniform across the progenitor cell population, Emc protein levels are dictated by competition for heterodimerization with the proneural bHLH proteins. Thus, Id expression is regulated at multiple levels, involving both transcriptional and post-transcriptional inputs to modulate cell growth and developmental progression.

The Myc network

The Myc proteins were among the first HLH proteins to be characterized in molecular detail. The founding member of this family is v-Myc, originally identified in retroviruses associated with the development of animal cancers and acquired from a cellular locus named c-Myc (Sheiness et al. 1978; Roussel et al. 1979; Sheiness and Bishop 1979). c-Myc controls cell growth, differentiation, metabolism, and death but, when aberrantly expressed, readily promotes the development of a wide spectrum of malignancies. (Carroll et al. 2018). Myc binds DNA as a heterodimer with its partners, Max and Mxd (Blackwood and Eisenman 1991; Ayer et al. 1993). Myc interacts with a coactivator complex named TRRAP to recruit HATs and chromatin remodelers to activate gene expression, whereas the Mxd–Max heterodimer recruits mSin3, which serves as a scaffold for HDACs (HDAC1 and HDAC2) to silence transcription (Ayer et al. 1995; McMahon et al. 1998). The main function of the Myc–Max heterodimers is to promote cell proliferation, metabolism, and size. Myc orchestrates proliferation by activating the expression of genes encoding for cell cycle regulators. Myc regulates cellular metabolism by regulating the expression of an armamentarium of metabolic enzymes and transporters affecting glutamine and glucose metabolism to support lipid, amino acid, and nucleotide biosynthesis (Zhang et al. 2007; Wang et al. 2011). Indirectly, c-Myc influences chromatin topology at a global scale by regulating ATP levels to modulate loop extrusion (Kieffer-Kwon et al. 2017). Although most of the genes regulated by c-Myc are transcribed by RNA polymerase II, c-Myc also activates the expression of genes transcribed by RNA polymerases I and III. Prominent among these are the transfer RNA (tRNA) and 5S RNA genes (Gomez-Roman et al. 2003) and ribosomal genes RNA polymerase I (genes encoding ribosomal RNA) (Arabi et al. 2005; Grewal et al. 2005). Most recently, elegant studies demonstrated that the c-myc proteins have the ability to form phase-separated droplets in conjunction with the MED1 subunit of the Mediator complex (Boija et al. 2018). How gelation of the Myc proteins and related proteins modulates cell growth and transformation will be an area of intense investigation for years to come.

Myogenic HLH proteins

MyoD was the first protein identified that programs transdifferentiation (Davis et al. 1987). Since its initial discovery, three MyoD-related HLH proteins have been identified, including Myf5, Mrf4, and myogenin. MyoD, Myf5, and Mrf4 expression acts coordinately to specify muscle cell fate, whereas myogenin regulates the terminal differentiation of myoblasts (Hasty et al. 1993; Nabeshima et al. 1993; Rudnicki et al. 1993; Kassar-Duchossoy et al. 2004). Interestingly, unlike MyoD/Myf-5 double-deficient mice, the D. melanogaster MyoD homolog nautilus orchestrates the development of only a subset of muscle progenitor cells (Enriquez et al. 2012). Thus, while the structures, expression patterns, and functions of many HLH proteins have been remarkably well conserved throughout evolution, the role of some bHLH proteins in cell type specification has changed. E-box sites are widespread. There are literally thousands of E-box sites that span the genome. This raises the question of how subsets of E-box-binding sites are selected. For MyoD and NeuroD2, functional specificity is dictated by differences in binding site specificities. Both factors share common binding sites but also are associated with unique binding site specificities, and it is these sites that are more closely associated with lineage-specific transcription signatures (Fong et al. 2012). This subset of MyoD-binding sites was associated with composite binding sites for the homeodomain-containing proteins PBX and MEIS (Fong et al. 2015). Notably, it was shown that MyoD can be converted to a neuronal differentiation factor by preventing its ability to interact with PBX1, demonstrating that binding site selection can dictate functional specificity (Fong et al. 2015). Once bound to DNA, MyoD recruits P300/CBP to promote (akin to E proteins) acetylation of lysine residues at the tails of H3 and H4 (Puri et al. 1997). MyoD-induced H3K27ac in turn recruits the SWI/SNF chromatin remodeling machinery to orchestrate nucleosome depletion and promote chromatin accessibility (Forcales et al. 2012). MyoD also has the ability to bind E-box sites associated with a closed chromatin environment but only upon interacting with the homeobox-containing protein PBX (Berkes et al. 2004). MyoD and myogenin exhibit unique regulatory roles at similar ensembles of target genes (Cao et al. 2006). At immediate targets, MyoD activity is sufficient to activate a full program of gene expression (Cao et al. 2006). However, at later genes, MyoD expression is not sufficient for activating downstream target gene transcription. Rather, it requires myogenin for full activation (Cao et al. 2006). These data demonstrate that closely related bHLH proteins perform unique roles, and not all of them are associated with pioneering activities. Distinctive motifs outside but adjacent to the bHLH domain seem responsible for this feature. Likewise, Myf-5 and MyoD bind overlapping sites yet have distinct functional features. Whereas Myf-5 induces histone acetylation at H4 in the absence of polymerase II recruitment, MyoD promotes histone acetylation at H4 and recruitment of RNA polymerase II (Conerly et al. 2016). Thus, MyoD and Myf5 are associated with the same DNA-binding site preference but have diverged at overlapping binding sites by segregating distinct steps in gene activation. Collectively, these as well as other observations indicate that Myf5 is predominantly a chromatin remodeler, Myogenin is primarily a transcriptional activator, and MyoD performs both functions. Consequently, Myf5 might act, at least in part, by enabling subsequent activation of a myogenic gene program by the combined activities of myogenin and MyoD (Conerly et al. 2016).

The neuronal HLH proteins

Prominent among the bHLH proteins that promote neurogenesis and specify neural cell fate are the proneural proteins (Cubas et al. 1991). In Drosophila, the proneural HLH proteins are first activated in a group of cells named the proneural cluster. Amid the proneural cluster, the cell with the highest abundance of proneural bHLH protein establishes neural identity (Doe and Goodman 1985; Cabrera 1990; Hartenstein and Posakony 1990). Paradoxically, upon reaching its maximum abundance, proneural gene expression is silenced. The decline in proneural protein abundance is a key step, since persistent abundance of these factors severely perturbs terminal differentiation (White and Jarman 2000). How is proneural abundance regulated upon reaching its highest levels of expression? An evolutionarily conserved post-translation mechanism that readily switches proneural activity on and off in progenitor cells may be key. At peak levels, proneural gene expression is switched off by phosphorylation of a single amino acid (serine or threonine) residue located in the atonal, scute, and Neurogenin-2 HLH domains (Quan et al. 2016). Phosphorylation of residues located in the proneural bHLH domains suppresses the activity of the proneural bHLH proteins to permit terminal differentiation. In a process named lateral inhibition, a single cell will differentiate, while neighboring cells are prevented from also adopting the neural cell fate. Lateral inhibition is mediated by the Notch signaling pathway, which, upon ligand engagement, activates the expression of bHLH genes located in the Enhancer of split complex (Bailey and Posakony 1995; Lai et al. 2000). Enhancer of split complex gene products in turn divert cells to adopt the epidermal cell fate by suppressing a neural-specific transcription signature. These well-designed studies show how proneural bHLH proteins dictate the epidermal versus neuronal cell fate choice among neighboring cells. In murine neural progenitors, the proneural proteins also play essential roles. Hes1 and Hes5 maintain the neural stem cell pool by repressing proneural gene expression (Nakamura et al. 2000). The proneural bHLH proteins Ascl1 and Neurogenin-2 activate a neuronal cell-specific transcription signature and antagonize the expression of an astrocytic-specific gene program, whereas Olig2 specifies the oligodendrocyte cell fate (Sun et al. 2001; Lu et al. 2002). All three genes are expressed in neural progenitors, raising the question of how distinct cell fates emerge amid competing pathways. Recent studies have been revealing. The self-renewal activities of Ascl1, neurogenin, and Olig2 were closely associated with oscillatory patterns of gene expression (Imayoshi et al. 2013). However, upon differentiation into neurons, Asc1 expression was sustained rather than oscillatory, and the expression of neurogenin and Olig2 was suppressed. Olig2 and Ascl1 expression was persistent under conditions that promote oligodendrocyte or astrocytic cell fate. Furthermore, optogenetic modulation of Ascl1 expression revealed that oscillatory Ascl1 expression was essential to promote self-renewal, whereas persistent Ascl1 expression orchestrated a neuronal gene program (Imayoshi et al. 2013). These elegant studies show how the dynamics of bHLH gene expression, rather than simple abundance, dictates developmental choice.

HLH gene expression in morphogenesis

The key morphogenetic events in the early embryo, including gastrulation, germ layer formation, and somitogenesis, are all executed by HLH proteins that act coordinately with other transcriptional regulators. For example, in D. melanogaster, Twist activates a mesodermal-specific program of gene expression, whereas the zinc finger-containing protein Snail suppresses the transcription of genes associated with ectodermal cell fate. In vertebrates, Twist proteins promote neural crest tube closure by modulating cell migration and differentiation of neural crest and mesenchymal progenitor cells. Intriguing studies have demonstrated that the Twist proteins may impact metastasis. Twist expression is particularly abundant in cells that are metastasizing, and loss of Twist expression was shown to suppress the ability of tumor cells to intravasate into the blood stream (Yang et al. 2004). More recent studies provided a new twist to Twist function. A series of elegant experiments showed that Twist1 activates a cancer stem cell-specific program of gene expression in both skin and mammary cells that is independent of its role in orchestrating epithelial–mesenchymal transition and tumor infiltration (Beck et al. 2015). The Twist proteins Hand1 and Hand2 also orchestrate cardiac myocyte development (Conway et al. 2010). Hand proteins are typical HLH proteins that form heterodimers with the Twist proteins, are partially complementarily expressed in the developing heart, and are coexpressed in the cardiac outflow tract. Mice deficient for both Hand 1 and Hand2 display severe cardiac defects (George and Firulli 2018). How the Twist proteins modulate seemingly unrelated pathways such as the epithelial–mesenchymal transition, tumor stemness, and cardiomyocyte identity deserves further scrutiny. Somitogenesis is also controlled and enforced by the bHLH proteins. Segmentation occurs very early during embryogenesis through the formation of somites that give rise to skeletal muscle and the vertebrae. During somitogenesis, the unsegmented presomitic or paraxial mesoderm progressively generates epithelial somites in an anterior-to-posterior direction. The process of somite segmentation in mice is repeated every 2 h and is again dictated by HLH proteins (Bessho et al. 2001). Hes7 expression is initiated from the posterior and then progresses into the anterior regions of the presomitic mesoderm. In presomitic cells, Hes7 transcription oscillates, with each cycle of expression giving rise to a pair of somites. In the absence of Hes7, somites, vertebrae, and ribs fail to segregate (Bessho et al. 2001). Likewise, sustained Hes7 expression readily leads to fused somites, consistent with the idea that oscillating patterns of Hes7 expression serve to orchestrate somitogenesis (Fig. 3). The oscillating pattern of Hes7 expression is controlled by a feedback mechanism. Interference with the feedback mechanism causes sustained Hes7 expression and fusion of the somites. A remarkable series of experiments has explored how neighboring cells in the somites coordinate oscillating patterns of Hes7 gene expression (Fig. 3; Shimojo et al. 2016). Briefly, oscillation was transmitted between neighboring cells by Notch–DL1-mediated signaling. Dampening the expression of the Notch ligand DLl1 interfered with Hes7 periodic transcription and blocked somitogenesis. Thus, the Notch signaling module coordinates oscillating patterns of proneural bHLH gene expression between sender and recipient cells to generate a segmented body plan (Fig. 3). These studies are provocative, raising the possibility that similar mechanisms operate in other tissues, such as the lymphoid organs, where a wide ensemble of cell types interacts to transmit information between sender and recipient cells to orchestrate developmental progression and generate an effective immune response.

Figure 3.

A molecular oscillator instructs somitogenesis. Oscillating patterns of HES 1/7 gene expression are transmitted between sender and recipient cells by the Notch–Dll1 pathway. Upon triggering by binding, the Dll1 the Notch intracellular domain (NICD) is released to activate Hes1/7 expression. Hes1/7 expression is regulated by a negative feedback circuitry that results in an oscillating pattern of gene expression. Hes1/7 expression in turn suppresses Dll1 expression to orchestrate a bidirection oscillating pattern of Dll1 expression between sender and recipient cells (Kageyama et al. 2018).

HLH–PAS proteins

The dominant players in the circadian clock are the HLH and PAS domain-containing proteins CLOCK and BMAL1 (Bargiello and Young 1984; Reddy et al. 1984, Zehring et al. 1984). The HLH and PAS domains of circadian clock proteins are intertwined to promote heterodimerization (Fig. 1; Huang et al. 2012). Their binding sites contain ACGTG or GCGTG core sequences. CLOCK and BMAL form heterodimers to induce the expression of their transcriptional repressors: Per1, Per2, Cry1, and Cry2 (Fig. 4). During the early afternoon and late evening, CLOCK/BMAL1 heterodimers bind in a rhythmic fashion to E-box target sites, where they evict nucleosomes to activate Per1 and Cry1 transcription (Allada et al. 1998; Darlington et al. 1998; Rutila et al. 1998). PER and CRY proteins accumulate in the cytoplasm, which then form heterodimers that enter the nucleus (Fig. 4; Vosshall et al. 1994). During late night and early morning, PER/CRY heterodimers form a higher-order complex with CLOCK/BMAL heterodimers to suppress downstream target gene expression. Next, PER undergoes a series of phosphorylation modifications that promotes its degradation and release from the CLOCK/BMAL activator (Fig. 4). The CLOCK/BMAL heterodimer then binds its cognate E-box-binding sites to repeat the process of evicting nucleosome at its targets. Thus, the core transcriptional HLH activators CLOCK and BMAL1 drive the expression of their own suppressors (Per1–3 and Cry1–2) to generate a remarkably well-conserved negative feedback loop (Reppert and Weaver 2001).

Figure 4.

Role of HLH proteins in orchestrating circadian gene expression. (A) The D. melanogaster HLH proteins CLOCK and CYCLE activate the expression of PER and TIM. PER and TIM transcripts are exported to the cytoplasm for translation and heterodimer formation. They enter the nucleus to suppress CLOCK:CYCLE-mediated transactivation, generating a negative feedback loop with oscillating patterns of gene expression. (B) The mammalian HLH proteins CLOCK and BMAL activate the expression of PER and CRY. PER and CRY transcripts are exported to the cytoplasm for translation and heterodimer formation. They enter the nucleus to suppress CLOCK:BAML-mediated transactivation, generating a negative feedback loop with oscillating patterns of gene expression (Lowrey and Takahashi 2004). HLH proteins that contain PAS domains also serve as oxygen sensors. Specifically, the key players in the cellular response to hypoxia are the HIF proteins (Wang et al. 1995). There are three HIF proteins (named HIF-1α, HIF-2α, and HIF-3α) that readily form heterodimers with the ARNT (HIF-1β) proteins to bind to a distinct class of E-box-binding sites (Semenza 2012). HIF-1α and HIF-1β serve as oxygen sensors, whereas the role of HIF-3α is less well understood (Keith et al. 2012). Under normal oxygen levels, two proline residues in the HIF-1α and HIF-2α oxygen-dependent degradation domain are hydroxylated by prolyl hydroxylase domain (PHD)-containing proteins (Bruick and McKnight 2001). The activity of PHD proteins requires access to both oxygen and α-ketoglutarate. Once proline residues are hydroxylated, the HIF-1α and HIF-2α proteins are targeted to the ubiquitin–proteasome pathway for degradation mediated by VHL tumor suppressor protein-dependent ubiquitination (Zhang et al. 2007; Kaelin and Ratcliffe 2008). Specifically, VHL recruits an E3 ubiquitin–protein ligase complex that catalyzes a covalent interaction of ubiquitin to HIF-1α residues, serving as a signal for degradation. However, under hypoxic conditions, the HIF proteins avoid proteolysis and readily dimerize with the ARNT proteins, activating programs of gene expression linked with angiogenesis and erythropoiesis (Semenza 2012). HIF proteins also activate the expression of genes associated with glucose transport and glycolysis to provide metabolic needs for cells growing in hypoxic conditions (Girgis et al. 2012).

HLH proteins in hematopoiesis

Together with other transcriptional regulators, HLH proteins specify the fates of virtually all immune cell types. HLH protein activity begins in hematopoietic stem cells (HSCs), which give rise to all other blood cells. At least four HLH proteins—SCL, LYL1, E2A, and Id1—are involved at this stage. SCL specifies the HSC fate, whereas SCL, LYL1, E2A, and Id1 maintain the HSC compartment (Fig. 5; Shivdasani et al. 1995; Semerad et al. 2009; Souroullas et al. 2009; Singh et al. 2018). Prominent in orchestrating early erythropoiesis is a transcription factor complex containing the HLH proteins E2A and SCL as well as an adaptor protein named LMO2 that interacts with SCL (Fig. 5; Soler et al. 2010). Recently, the structure of the E2A:SCL:LMO2 ternary complex was resolved (Omari et al. 2013). It was an eye-opener. It showed how adaptor proteins have the ability to increase DNA-binding site specificity of HLH heterodimers that, by themselves, display little DNA-binding selectivity. Specifically, upon interacting with SCL, LMO2 released new hydrogen bonds in the SCL:E47 heterodimer that strengthened heterodimerization but also induced rotation in E47 (Omari et al. 2013). The rotation in E47 altered the binding site preference for the ternary complex to such a degree that much of the complex site preference was contributed by another factor, named GATA-1 (Omari et al. 2013; Hewitt et al. 2016). This is a cardinal finding, since it shows at the atomic level how HLH proteins have the ability to select their target sites with great precision.

Figure 5.

HLH proteins and the generation of immune cell diversity. The role of HLH proteins in hematopoiesis and immune cell development is indicated. Dashed lines indicate developmental transitions likely involving multiple intermediates and transitions between primary and secondary lymphoid compartments. HLH proteins also promote the development of intermediate progenitors affecting virtually the entire spectrum of early hematopoiesis. Specifically, LYL1 promotes the development of T-cell progenitors, whereas the E2A proteins, together with Ikaros and PU.1, help direct the developmental progression of the lymphoid-primed multipotent progenitors (LMPPs) and their differentiated progeny, which include macrophage–dendritic progenitor cells (MDPs), granulocyte–macrophage progenitors (GMPs), and common lymphoid progenitors (CLPs) (Dias et al. 2008; Semerad et al. 2009; Zohren et al. 2012). MDPs give rise to macrophages and two distinct types of dendritic cells: antigen-presenting classical dendritic cells (cDCs) and interferon-producing plasmacytoid dendritic cells (pDCs). pDC development is controlled by E2-2, whereas cDC maturation is orchestrated in part by Id2 (Fig. 5; Cisse et al. 2008). In cDCs, Id2 neutralizes E2-2 activity and may also modulate dendritic cell development via other pathways. E2-2 directly activates a dendritic-specific program of gene expression by recruiting P300/CBP and, in pDCs, acts in concert with the transcriptional corepressor ETO (MTG16) to suppress the cDC-specific gene program. Id2 expression in pDCs is directly suppressed by the combined activities of E2-2 and MTG16 (Fig. 5; Grajkowska et al. 2017). CLPs give rise to innate lymphoid cells (ILCs), B-lineage cells, and T-lineage cells. ILC development does not require E2A, E2-2, or HEB activity, although E-protein expression in the thymus is essential to suppress ILC2 development (Fig. 5; Miyazaki et al. 2017; Wang et al. 2017). Id2 defines ILCs and is essential for ILC lineage progression, suppression of a stem cell gene program, and maintenance of long-term identity (Huang et al. 2017). Mature natural killer (NK) cells, a subset of ILCs, readily developed in the absence of Id2 but activated the expression of genes associated with a naïve CD8 gene program and consequently failed to mature into cytotoxic effector cells (Zook et al. 2018). Notably, the depletion of Id2 abundance was compensated for by the induction of Id3 expression, acting to neutralize E-protein activity (Zook et al. 2018). These data as well as other observations showed how an intricate circuitry involving carefully balanced E-protein and Id-protein abundance dictates the progression of maturing immune cells (Boos et al. 2007; Miyazaki et al. 2011, 2017; Zook et al. 2018).

HLH proteins and B-cell diversity

Early B-cell development relies on tightly regulated degrees of E-protein activity. In a subset of CLPs, E47 abundance is elevated (through mechanisms yet to be determined) to initiate development toward the B-cell fate (Fig. 5; Bain et al. 1994; Zhuang et al. 1994, 1996; Jaleco et al. 1999; Beck et al. 2009). Next, E2A and HEB activate the expression of FOXO1, which together induce EBF1 expression (Inlay et al. 2009; Mercer et al. 2011; Welinder et al. 2011; Miyai et al. 2018). EBF1 and FOXO1 then act in a feed-forward loop to orchestrate a B-lineage-specific gene program and suppress the expression of genes associated with alternative cell lineages (Mansson et al. 2012; Li et al. 2018). E2A proteins directly regulate critical B-cell targets, including genes encoding for components of the pre-BCR such as λ5, VpreB1-3, CD79a, and CD79b and the recombination-activating genes Rag1 and Rag2 (Kwon et al. 2008; Lin et al. 2010; Mercer et al. 2011). E proteins also shape the topology of the immunoglobulin heavy and light chain loci of early B cells. E proteins bind to a spectrum of sites across the Igh and Igκ loci, which then sequester CBP/p300 and BRG1 to remodel the local chromatin landscape (Bradney et al. 2003; Bossen et al. 2015). These partnerships ultimately induce elaborate genomic interactions across E2A-bound sites, plausibly mediated by BRD4 by interacting with P300-induced acetylation of lysine residues at H3 and H4 tails (Lin et al. 2012). Upon pre-BCR expression, E47 abundance temporarily declines to promote cellular expansion but rises again in small pre-B cells to establish a network of cross-links that spans the entire Igκ V region cluster, ultimately activating Igk VJ rearrangement (Fig. 5; Romanow et al. 2000; Goebel et al. 2001; Inlay et al. 2004; Quong et al. 2004; Lin et al. 2012; Bossen et al. 2015). Later, in immature B cells, the E2A proteins enforce another important developmental checkpoint. In the presence of autoreactivity, E2A abundance remains high to permit continued Igκ and Igλ rearrangement (Quong et al. 2004; Beck et al. 2009). However, upon completion of an innocuous immunoglobulin receptor, E2A abundance drops, preventing continued rearrangement (Fig. 5). Collectively, these data reveal a carefully tuned developmental process in which E2A proteins play a central role: High E2A abundance establishes a network of cross-links across the Igκ locus that not only initiates VJ rearrangement but also permits continued receptor editing until an innocuous receptor has been generated (Beck et al. 2009). E proteins have equally important and nuanced roles in mature B cells. In the spleen and lymph nodes, follicular B cells express high levels of E2A. In contrast, marginal zone B cells require high levels of Id3 for their development (Quong et al. 2004). In naïve mature B cells, E2A abundance is low, but, upon activation, E2A levels readily accumulate to promote germinal center B-cell development, induce AID expression, and promote class switch recombination (Fig. 5; Quong et al. 1999; Sayegh et al. 2003; Chen et al. 2016; Gloury et al. 2016; Wöhner et al. 2016). Likewise, the Id proteins modulate the germinal center reaction. In activated B cells, Id3 abundance declines, releasing E proteins from their inhibitors to induce the expression of genes associated with BCR and cytokine receptor-mediated signaling (Chen et al. 2016; Gloury et al. 2016). In maturing B cells, E2A as well as E2-2 continue their activities to promote plasma cell differentiation, whereas the HLH protein ABF1 instructs germinal center cells to adopt the memory B-cell fate (Fig. 5; Massari et al. 1998; Chiu et al. 2014; Gloury et al. 2016; Wöhner et al. 2016). Finally, two additional HLH proteins—Bhlhe40 (Dec2) and Bhlhe41 (Dec1)—orchestrate the self-renewal activity and development of B1 cells (Fig. 5; Kreslavsky et al. 2017). The remarkably ability of E proteins to orchestrate B-cell development is a consequence of their ability to activate or silence hundreds of target genes that differ from each other at each developmental step—a daunting task that must involve many other factors such as EBF1, PAX5, FOXO1, IRF4, IRF8, PU.1, and others yet to be identified.

HLH proteins and the generation of T-cell diversity

E proteins also specify T-cell identity. E2A and HEB in early T-cell progenitors activate the expression of Notch1 (Fig. 5; Bain and Murre 1998; Schotte et al. 2010; Miyazaki et al. 2017). Notch signaling then coordinately acts with E proteins to specify T-cell fate. How precisely Notch signaling and E proteins establish T-cell identity remains to be determined, but recent work points to a role for noncoding transcription. Specifically, E-protein-binding sites have been identified across the Bcl11b intergenic region, which contains a noncoding transcript named ThymoD (Longabaugh et al. 2017). Activation of ThymoD transcription repositions the Bcl11b superenhancer complex from the lamina to the nuclear interior and juxtaposes the Bcl11b enhancer and promoter regions into a single loop domain (Isoda et al. 2017; Hu et al. 2018). Bcl11b expression orchestrates a T-lineage-specific gene program and suppresses the expression of genes associated with alternative cell lineages in part by silencing Id2 expression (Ikawa et al. 2010; Li et al. 2010; Longabaugh et al. 2017). In a parallel pathway, the E proteins activate the expression of Rag1 and Rag2 and orchestrate accessibility and recombination across the TCRβ, TCRγ, and TCRδ loci (Agata et al. 2007; Miyazaki et al. 2017). Upon assembly of a pre-TCR complex, a signaling cascade involving the ERK–MAPK pathway lowers E47 levels but increases Id3 abundance to promote cellular expansion and developmental progression (Fig. 5; Engel and Murre 2004). Interestingly, γδ TCR-mediated signaling elevates Id3 abundance more than pre-TCR signaling. The difference in Id3 abundance may instruct cells to adopt either β or γδ T-lineage identity (Lauritsen et al. 2009). Id3 expression not only dictates the β-lineage and γδ-lineage fates but also orchestrates γδ effector, γδ NKT, αβ NKT, and invariant innate follicular helper (TFH) cellular expansion and cell fate (Ueda-Hayakawa et al. 2009; Verykokakis et al. 2010; D'Cruz et al. 2014; Miyazaki et al. 2015). How does the E–Id protein axis modulate innate γδ, NKT, and invariant TFH NKT cell fates? Recent studies demonstrated that the E–Id protein module intersects with yet another regulatory circuitry, the PI3K–FOXO–mTOR pathway, to control expansion, self-renewal, and differentiation of innate-like cells (Miyazaki et al. 2015). It is tempting to speculate that these modules not only regulate the self-renewal, expansion, and differentiation of invariant innate TFH cells but also act on other immune cells. Prior to thymocyte selection, HEB levels accumulate to orchestrate TCRα locus rearrangement (Jones and Zhuang 2007; D'Cruz et al. 2010). Instructed by the expressed TCR, the thymus selects useful clones and destroys harmful or useless ones (von Boehmer et al. 1989). This selection process is at least in part enforced by the E–Id protein axis (Fig. 5; Bain et al. 1999; Rivera et al. 2000; Jones and Zhuang 2007; Jones-Mason et al. 2012; Miyazaki et al. 2015). High E2A abundance interferes with positive selection, whereas high Id3 abundance instructed by TCR-mediated ERK–MAPK signaling promotes developmental progression beyond the TCR checkpoint to promote differentiation into either CD4 or CD8 expressors and exit from the thymus (Fig. 5; Bain et al. 1997). More recent studies extended these observations involving not only Id3 but also Id2 (Miyazaki et al. 2015). Specifically, Id proteins orchestrate positive selection in two steps. The first involves the activation of Id3 expression by TCR signaling. The second step involves the activation of Id2 expression in DP thymocytes that already have received a TCR signal. In the absence of both Id2 and Id3 expression, thymocytes fail to mature into CD4 or CD8 SP cells, with the exception of a slowly expanding population of innate TFH-like cells. These data suggest that differences in the strength and/or timing of Id expression, instructed by different signals involving distinct sender cells (epithelial vs. DP cells), dictate developing thymocytes to adopt either the adaptive or innate immune cell fate (Miyazaki et al. 2015). While these observations indicate heavy involvement of the E–Id protein axis in seemingly unrelated pathways, there is a common pattern: E and Id proteins orchestrate assembly of the TCR loci and enforce the pre-TCR and γδ and αβ TCR checkpoints to ensure that only cells that have generated a productive antigen receptor progress beyond the barriers.

HLH proteins and homing

Upon entry into the peripheral lymphoid organs, the vast majority of T-lineage cells remain in a naïve cell state enforced in part by Id3 (Miyazaki et al. 2011). Once exposed to invading pathogens, T cells initiate a multistep transcriptional program that instructs their differentiation from naïve to effector and/or memory-like states. Similar to B cells, the T-cell differentiation program hinges on the E–Id protein axis. While, in naïve T cells, high levels of Id3 enforce the naïve state, upon triggering the TCR, Id3 abundance declines to permit E-protein occupancy across the enhancer landscape to activate a TFH gene program (Fig. 5; Miyazaki et al. 2011). TFH cells not only down-regulate Id3 transcription but also need to silence Id2 expression. This is achieved by a transcriptional regulator named Bcl6 that is closely associated with TFH cell development (Shaw et al. 2016). Id2 in other peripheral immune cell types acts to silence a TFH gene program. For example, TH1 cells express high levels of Id2 to suppress a TFH-specific transcription signature (Shaw et al. 2016). Different patterns of Id2 versus Id3 expression have been observed in CD8 effector and memory cells. While Id3 levels initially decline during a viral infection, Id2 abundance is elevated to neutralize E-protein DNA binding to promote and maintain a terminal differentiation state for effector CD8 cells (Omilusik et al. 2018). In contrast to Id2, Id3 levels decline when a viral infection reaches maximum levels but rise again in the long-lived memory compartment to generate a functional memory compartment (Fig. 5; Yang et al. 2011). The rise and fall of Id gene expression is intriguing and indicative of a complex circuitry built to respond rapidly to cues generated by a continuously changing viral environment. Intriguing studies have recently revealed a prominent role for E and Id proteins in yet another CD8 cell type: so-called follicular cytotoxic T cells (TFC). TFC cells home to the B-cell follicles to eradicate viral infected TFH cells as well as B cells (Im et al. 2016). TFC cells share a transcription signature with TFH cells that is again orchestrated by the E and Id proteins (Im et al. 2016; Leong et al. 2016). Finally, the E–Id module also controls the homing of follicular Treg cells (TFR). In Treg cells, Id2 and Id3 expression enforces the TFR checkpoint to prevent premature maturation by silencing a TFH-like specific gene program (Fig. 5; Miyazaki et al. 2014). In later stages, declining gradients of Id2 and Id3 abundance promote the developmental progression toward a more mature TFR phenotype by activating the expression of the chemokine receptor CXCR5. Thus, a common regulatory gene network with heavy involvement of the E and Id proteins has evolved to orchestrate the homing of TFH, TFC, and TFR cells during a viral infection to the B-cell follicles.

HLH proteins and programmed transdifferentiation

Illuminating studies by Lassar and Weintraub (Davis et al. 1987) revealed that the expression of a single bHLH protein, MyoD, was sufficient to reprogram nonmuscle cells into skeletal muscle. Demonstration of MyoD-driven reprogramming in a wide ensemble of cell types soon followed. Cell types reprogrammed by MyoD include chondrocytes, adipocytes, and retinal epithelial cells (Weintraub et al. 1989). HLH proteins also have the ability to genetically transdifferentiate nonneuronal cells into neurons. Specifically, forced Ascl1 and Neurogenin-2 expression reprograms astrocytes into terminally differentiated neurons (Berninger et al. 2007; Heinrich et al. 2010). Human cortical neurons are most efficiently generated from human embryonic stem cells by forced expression of Neurogenin-2. By combining the expression of three factors (Ascl1, Brn2, and Myt1l), mouse embryonic fibroblasts readily converted into functional neurons (Vierbuchen et al. 2010). The bHLH region of Ascl1 is rather small and does not interact preferentially with the two central E-box nucleotides, indicative of occupancy to a single side of the DNA surface. Consequently, Ascl1 is a pioneer factor; i.e., it occupies the majority of binding sites in fibroblasts without help from either Brn2 and Myt1l, allowing it to activate silent genes that are sequestered in closed chromatin. Most recently, human adult peripheral T cells were genetically programmed into neuronal-like cells that displayed all of the key features associated with neurons (Tanabe et al. 2018). Finally, forced expression of the bHLH protein Neurogenin-3 in combination with PDX1 and MAFA efficiently transdifferentiated pancreatic acinar cells into β-like cells in vivo (Zhou et al. 2008). Remarkably, the programmed β-like cells were maintained for >1 year and reversed diabetes (Zhou and Melton 2018). Despite these striking results, translating transdifferentiation strategies into the clinic requires additional achievements to (1) improve the efficiency of reprogramming and (2) develop stable and functional transplantation procedures. Recent studies have shed light on how to optimize in vivo programming efficiency. The ability of HLH proteins to genetically program transdifferentiation varies greatly among recipient cells. While MyoD readily converts nonmuscle cell lines into skeletal muscle cells, forced expression of MyoD in C. elegans failed to efficiently program transdifferentiation (Fukushige and Krause 2005). Thus, the ability of MyoD to genetically program transdifferentiation is limited by cellular context. Recent elegant experiments have provided mechanistic insight into how cellular context dictates the ability to genetically program transdifferentiation (Fong et al. 2012). Two recipient cell types—embryonic fibroblasts and embryonic carcinoma cell lines (P19)—were examined for their ability to transdifferentiate into muscle or neurons upon MyoD or NeuroD2 expression. Embryonic fibroblasts readily converted into muscle upon forced MyoD expression, while NeuroD2 expression did not promote transdifferentiation. Conversely, P19 cells efficiently transdifferentiated into neurons when NeuroD2 was overexpressed, while forced MyoD expression displayed only a limited ability to orchestrate a skeletal muscle-specific program of gene expression. Interestingly, NeuroD2 and MyoD share thousands of binding sites in cells overexpressing each factor. However, a subset of binding sites was specific for each factor, and this subset was associated with the induction of a myogenic versus neuronal-specific gene program. Additionally, for each cell type, NeuroD2 or MyoD occupancy was highly enriched for E-box sites located in accessible chromatin (Fong et al. 2015). While the induction of lineage-specific gene programs by forced expression of lineage-specific bHLH proteins is dictated by the chromatin landscape, it is not the complete story. Successful programming also depends on the presence of other factors that either prevent or permit reprogramming, leaving many details to be uncovered.

Conclusion

The HLH domain is an ancient DNA-binding domain that, in unicellular organisms, became involved in regulating the biosynthesis of phospholipids and amino acids. In multicellular organisms, the structure of the HLH domain closely resembled those expressed in unicellular organisms. However, HLH proteins evolved to greater complexity by acquiring additional domains, such as the leucine zipper, the PAS domains, and highly conserved transactivation/repression domains. Despite these permutations in structure, the ability of HLH to activate lineage-specific programs of gene expression across species is universal. The most remarkable example of this deep conservation of function involves HLH proteins that control sexual dimorphism. Specifically, the human ortholog (E2A) can replace HLH-2 to orchestrate C. elegans gonadogenesis (Sallee and Greenwald 2015). As the 30 years have gone by, many questions remain: For example, what mechanisms control the ability of HLH proteins to activate lineage-specific gene programs, reprogram differentiated cell types, and faithfully orchestrate antigen receptor assembly. Numerous studies involving all classes of HLH proteins indicate that HLH proteins predominantly bind enhancer elements, recruiting HATs, chromatin remodelers, and coactivators (McMahon et al. 1998; Bradney et al. 2003; Zhang et al. 2004; Forcales et al. 2012; Lin et al. 2012; Teachenor et al. 2012; Bossen et al. 2015; Fong et al. 2015; Grajkowska et al. 2017). Prominent among the factors recruited by HLH proteins is the bromodomain protein BRD4. BRD4 binds acetylated H3 and H4 lysine residues and is associated with large intrinsically disordered domains that may establish a phase-separated state across superenhancers (Hnisz et al. 2017; Sabari et al. 2018). An HLH protein-induced phase-separated state may serve to compartmentalize enhancers and promoters or antigen receptor variable, diversity, and joining elements during the somatic recombination process. These observations lead to the cardinal conclusion that bHLH proteins primarily serve to facilitate the compartmentalization of transcriptional components at lineage-specific genes. Genes encoding for HLH proteins arose in unicellular organisms >600 million years ago, duplicated from ancestral genes, diversified, and are closely associated with the establishment of multicellularity. The notion that duplication and diversification of HLH genes from ancestral genes are closely associated with the generation of multicellular life is perhaps not surprising. Duplication from an ancestral locus allows HLH genes to be placed under the control of novel enhancers and insulators. These reconfigurations facilitate new spatial and temporal programs of gene expression, generating ever-increasing cellular diversity.

201 in total

1. E2A activity is induced during B-cell activation to promote immunoglobulin class switch recombination.

Authors: M W Quong; D P Harris; S L Swain; C Murre
Journal: EMBO J Date: 1999-11-15 Impact factor: 11.598

2. Thymocyte selection is regulated by the helix-loop-helix inhibitor protein, Id3.

Authors: R R Rivera; C P Johns; J Quan; R S Johnson; C Murre
Journal: Immunity Date: 2000-01 Impact factor: 31.745

3. Development of peripheral lymphoid organs and natural killer cells depends on the helix-loop-helix inhibitor Id2.

Authors: Y Yokota; A Mansouri; S Mori; S Sugawara; S Adachi; S Nishikawa; P Gruss
Journal: Nature Date: 1999-02-25 Impact factor: 49.962

4. The bHLH gene hes1 as a repressor of the neuronal commitment of CNS stem cells.

Authors: Y Nakamura; S i Sakakibara; T Miyata; M Ogawa; T Shimazaki; S Weiss; R Kageyama; H Okano
Journal: J Neurosci Date: 2000-01-01 Impact factor: 6.167

5. Id1 and Id3 are required for neurogenesis, angiogenesis and vascularization of tumour xenografts.

Authors: D Lyden; A Z Young; D Zagzag; W Yan; W Gerald; R O'Reilly; B L Bader; R O Hynes; Y Zhuang; K Manova; R Benezra
Journal: Nature Date: 1999-10-14 Impact factor: 49.962

Review 6. SURVEY AND SUMMARY: Saccharomyces cerevisiae basic helix-loop-helix proteins regulate diverse biological processes.

Authors: K A Robinson; J M Lopes
Journal: Nucleic Acids Res Date: 2000-04-01 Impact factor: 16.971

7. Class A helix-loop-helix proteins are positive regulators of several cyclin-dependent kinase inhibitors' promoter activity and negatively affect cell growth.

Authors: A Pagliuca; P Gallo; P De Luca; L Lania
Journal: Cancer Res Date: 2000-03-01 Impact factor: 12.701

8. Genetic modification of human B-cell development: B-cell development is inhibited by the dominant negative helix loop helix factor Id3.

Authors: A C Jaleco; A P Stegmann; M H Heemskerk; F Couwenberg; A Q Bakker; K Weijer; H Spits
Journal: Blood Date: 1999-10-15 Impact factor: 22.113

9. Drosophila atonal controls photoreceptor R8-specific properties and modulates both receptor tyrosine kinase and Hedgehog signalling.

Authors: N M White; A P Jarman
Journal: Development Date: 2000-04 Impact factor: 6.868

10. Thymocyte maturation is regulated by the activity of the helix-loop-helix protein, E47.

Authors: G Bain; M W Quong; R S Soloff; S M Hedrick; C Murre
Journal: J Exp Med Date: 1999-12-06 Impact factor: 14.307

36 in total

1. Evolution of vascular plants through redeployment of ancient developmental regulators.

Authors: Kuan-Ju Lu; Nicole van 't Wout Hofland; Eliana Mor; Sumanth Mutte; Paul Abrahams; Hirotaka Kato; Klaas Vandepoele; Dolf Weijers; Bert De Rybel
Journal: Proc Natl Acad Sci U S A Date: 2019-12-24 Impact factor: 11.205

2. TWIST1 Homodimers and Heterodimers Orchestrate Lineage-Specific Differentiation.

Authors: Xiaochen Fan; Ashley J Waardenberg; Madeleine Demuth; Pierre Osteil; Jane Q J Sun; David A F Loebel; Mark Graham; Patrick P L Tam; Nicolas Fossat
Journal: Mol Cell Biol Date: 2020-05-14 Impact factor: 4.272

3. Transcriptional and post-transcriptional regulation of extra macrochaetae during Drosophila adult peripheral neurogenesis.

Authors: Ke Li; Nicholas E Baker
Journal: Dev Biol Date: 2019-02-13 Impact factor: 3.582

4. Loss of Id3 (Inhibitor of Differentiation 3) Increases the Number of IgM-Producing B-1b Cells in Ischemic Skeletal Muscle Impairing Blood Flow Recovery During Hindlimb Ischemia.

Authors: Victoria Osinski; Prasad Srikakulapu; Young Min Haider; Melissa A Marshall; Vijay C Ganta; Brian H Annex; Coleen A McNamara
Journal: Arterioscler Thromb Vasc Biol Date: 2021-11-23 Impact factor: 8.311

Review 5. Using Sox2 to alleviate the hallmarks of age-related hearing loss.

Authors: Ebenezer N Yamoah; Mark Li; Anit Shah; Karen L Elliott; Kathy Cheah; Pin-Xian Xu; Stacia Phillips; Samuel M Young; Daniel F Eberl; Bernd Fritzsch
Journal: Ageing Res Rev Date: 2020-03-12 Impact factor: 10.895

6. LDB1 Enforces Stability on Direct and Indirect Oncoprotein Partners in Leukemia.

Authors: Justin H Layer; Michael Christy; Lindsey Placek; Derya Unutmaz; Yan Guo; Utpal P Davé
Journal: Mol Cell Biol Date: 2020-05-28 Impact factor: 4.272

7. Transcription factor binding at Ig enhancers is linked to somatic hypermutation targeting.

Authors: Ravi K Dinesh; Benjamin Barnhill; Anoj Ilanges; Lizhen Wu; Daniel A Michelson; Filip Senigl; Jukka Alinikula; Jeffrey Shabanowitz; Donald F Hunt; David G Schatz
Journal: Eur J Immunol Date: 2019-12-19 Impact factor: 5.532

8. Integrated requirement of non-specific and sequence-specific DNA binding in Myc-driven transcription.

Authors: Paola Pellanda; Mattia Dalsass; Marco Filipuzzi; Alessia Loffreda; Alessandro Verrecchia; Virginia Castillo Cano; Hugo Thabussot; Mirko Doni; Marco J Morelli; Laura Soucek; Theresia Kress; Davide Mazza; Marina Mapelli; Marie-Eve Beaulieu; Bruno Amati; Arianna Sabò
Journal: EMBO J Date: 2021-04-01 Impact factor: 11.598

Review 9. How transcription factors drive choice of the T cell fate.

Authors: Hiroyuki Hosokawa; Ellen V Rothenberg
Journal: Nat Rev Immunol Date: 2020-09-11 Impact factor: 53.106

10. Reconstruction of the full-length transcriptome of cigar tobacco without a reference genome and characterization of anion channel/transporter transcripts.

Authors: Hui Zhang; Jingjing Jin; Guoyun Xu; Zefeng Li; Niu Zhai; Qingxia Zheng; Hongkun Lv; Pingping Liu; Lifeng Jin; Qiansi Chen; Peijian Cao; Huina Zhou
Journal: BMC Plant Biol Date: 2021-06-29 Impact factor: 4.215