Literature DB >> 15186484

An overview of the basic helix-loop-helix proteins.

Susan Jones1.   

Abstract

The basic helix-loop-helix proteins are dimeric transcription factors that are found in almost all eukaryotes. In animals, they are important regulators of embryonic development, particularly in neurogenesis, myogenesis, heart development and hematopoiesis.

Entities:  

Mesh:

Substances:

Year:  2004        PMID: 15186484      PMCID: PMC463060          DOI: 10.1186/gb-2004-5-6-226

Source DB:  PubMed          Journal:  Genome Biol        ISSN: 1474-7596            Impact factor:   13.583


The basic helix-loop-helix (bHLH) proteins form a large superfamily of transcriptional regulators that are found in organisms from yeast to humans and function in critical developmental processes, including sex determination and the development of the nervous system and muscles. Because of their functional diversity and importance, this superfamily has been the subject of a number of recent reviews covering many species [1,2], and also a number of reviews specific to individual species, including Saccharomyces cerevisiae [3], Drosophila [4,5], human [6] and Arabidopsis [7-9]. The main emphasis in the recent literature has been on phylogenetic sequence analysis of bHLH families. This article gives an overview of how bHLH proteins are classified by sequence and summarizes their structures and functions.

Classifications of bHLH proteins by sequence

Members of the bHLH superfamily have two highly conserved and functionally distinct domains, which together make up a region of approximately 60 amino-acid residues. At the amino-terminal end of this region is the basic domain, which binds the transcription factor to DNA at a consensus hexanucleotide sequence known as the E box. Different families of bHLH proteins recognize different E-box consensus sequences. At the carboxy-terminal end of the region is the HLH domain, which facilitates interactions with other protein subunits to form homo- and hetero-dimeric complexes. Many different combinations of dimeric structures are possible, each with different binding affinities between monomers. The heterogeneity in the E-box sequence that is recognized and the dimers formed by different bHLH proteins determines how they control diverse developmental functions through transcriptional regulation [10]. The bHLH motif was first observed by Murre and colleagues [11] in two murine transcription factors known as E12 and E47. With the subsequent identification of many other bHLH proteins, a classification was formulated on the basis of their tissue distributions, DNA-binding specificities and dimerization potential [12]. This classification, which divides the superfamily into six classes, was initially based on a small number of HLH proteins but has since been applied to larger sets of eukaryotic proteins [1]. More recently, an approach using evolutionary relationships was used to classify bHLH proteins into four major groups (A-D) [13], taking into account E-box binding, conservation of residues in the other parts of the motif, and the presence or absence of additional domains. The sequencing of new genomes has led to the identification of additional bHLH families, and this evolutionary classification has now been extended to include two additional groups (E and F; Table 1) [6]. Parsimony analysis by Atchley and Fitch [13] of a phylogenetic tree derived from 122 sequences suggested that an ancestral HLH sequence most probably came from group B, and group B proteins are indeed the most prevalent type of bHLH proteins in animals. The situation is similar in the Arabidopisis genome, in which the G-box-binding bHLH proteins (part of group B) are the most abundant group [7].
Table 1

Classification of bHLH proteins by sequence

Phylogenetic groupDescriptionClassification according to Murre et al. [12]Examples of classified proteins (family names)
ABind to CAGCTG or CACCTGI, IIMyoD, Twist, Net
BBind to CACGTG or CATGTTGIII, IVMad, Max, Myc
CBind to ACGTG or GCGTG. Contain a PAS domainSingle-minded, aryl hydrocarbon receptor nuclear translocator (Arnt), hypoxia-inducible factor (HIF), Clock
DLack a basic domain and hence do not bind DNA but form protein-protein dimers that function as antagonists of group A proteinsVID
EBind preferentially to N-box sequences CACGCG or CACGAG. Contain an orange domain and a WRPW peptideVIHairy
FContain an additional COE domain, involved in dimerization and DNA bindingCoe (Col/Olf-1/EBF)

This classification of bHLH proteins is based on sequence comparisons, E-box binding, conservation of residues in parts of the protein other than the bHLH region and the presence or absence of additional domains. It was first adopted by Atchley and Fitch [13] and extended by Ledent and coworkers [6]. The older classification based on tissue distributions, DNA-binding specificities and dimerization, proposed by Murre and coworkers for a much smaller set of sequences [12], is shown for comparison.

One basis for the evolutionary classification shown in Table 1 is the presence or absence of additional domains, of which the most common are the PAS, orange and leucine-zipper domains. PAS domains, located carboxy-terminal to the bHLH region, are 260-310 residues long and function as dimerization motifs [14]. They allow binding with other PAS proteins, non-PAS proteins, and small molecules such as dioxin. The PAS domain is named after three proteins containing it: Drosophila Period (Per), the human aryl hydrocarbon receptor nuclear translocator (Arnt) and Drosophila Single-minded (Sim) [15]. The domain is itself made up of two repeats of approximately 50 amino-acid residues (known as PAS A and PAS B) separated by about 150 residues that are poorly conserved [16]. PAS-domain-containing bHLH proteins (bHLH-PAS proteins) form phylogenetic group C. A distinct additional domain, the orange domain, is a 30-residue sequence that is also located carboxy-terminal to the bHLH region, from which it is separated by a short, variable length of sequence. Transcription factors with this additional domain, designated bHLH-O and forming part of phylogenetic group E, include the hairy-related proteins, called HEY1, HEY2 and HEYL in mouse and humans [17]. The molecular function of the orange domain is still unclear; it has been proposed that it mediates specificity and transcriptional repression [18], but there is also evidence that it can play a role in dimerization [17]. A number of bHLH protein families, mostly in phylogenetic group B, have a leucine-zipper domain contiguous with the second helix of the HLH domain; like the HLH domain, this mediates dimerization. Proteins that have only a leucine-zipper domain coupled with a basic domain (denoted bZIP) and no HLH domain are a separate family of DNA-binding proteins in their own right (reviewed in [19]). The sequence of the zipper consists of a repeating heptad, with hydrophobic and apolar residues occurring at the first and fourth positions and polar and charged residues at the remaining positions. Leucine is the residue that predominates at position 4; it thus lends its name to the zipper motif. One bHLH protein that has a leucine-zipper domain (and that is therefore denoted a bHLHZ protein) is Max, which forms the hub of a network of bHLH transcription factors. Max is known to form homodimers and heterodimers with the group B proteins Myc, Mad, Mnt and Mga, and these complexes each have sequence-specific DNA-binding and transcriptional functions [20]. The additional domains in bHLH proteins, such as the leucine zipper, are always carboxy-terminal to the bHLH region. The position of the bHLH and additional domains within the complete sequence of the protein varies widely between different families, however. This variable pattern of domain positioning has led to the proposal that bHLH proteins have undergone modular evolution by domain shuffling, a process that involves domain insertion and rearrangement [21].

Structures of bHLH proteins

In comparison with the volume of sequence data, structural data for the bHLH superfamily of transcriptional regulators are still relatively sparse. Just nine bHLH protein structures have been deposited to date in the Protein Data Bank (PDB; see Table 2) [22]. The CATH [23] and SCOP [24] protein-structure classifications classify eight of these structures into one superfamily (Table 2; SREBP-2 has not been classified). A number of the structures (PDB codes 1an2,1ihlo, 1nlw, 1nkp, and 1am9) include an additional zipper domain that is carboxy-terminal to the HLH region. Two of the structures solved are heterodimers: a Max-Myc complex (PDB code 1nkp) and a Max-Mad complex (PDB code 1nlw). The remaining complexes are homodimeric, and all but one include the structure of the bound DNA double helix, giving insights into the binding specificity at the E box. Representatives of these bHLH structures are shown in Figure 1.
Table 2

The bHLH protein structures available in the Protein Data Bank (PDB)

PDB codeProtein ChainsProtein nameSpeciesGroupSCOP superfamilyCATH homologous superfamilyCATH sequence family
1mdy*ABCDMyoD bHLHMouseAHelix-loop-helix DNA-binding domainMyoD basic-helix-loop-helix domain, subunit B4.10.280.10.1
1an4ABUSF bHLHHumanBHelix-loop-helix DNA-binding domainMyoD basic-helix-loop-helix domain, subunit B4.10.280.10.2
1an2ACMax bHLHZMouseBHelix-loop-helix DNA-binding domainMyoD basic-helix-loop-helix domain, subunit B4.10.280.10.2
1hloABMax bHLHZHumanBHelix-loop-helix DNA-binding domainMyoD basic-helix-loop-helix domain, subunit B4.10.280.10.2
1nlw*BEMax bHLHZHumanBHelix-loop-helix DNA-binding domainMyoD basic-helix-loop-helix domain, subunit B4.10.280.10.2
1nkp*BEMax bHLHZHumanBHelix-loop-helix DNA-binding domainMyoD basic-helix-loop-helix domain, subunit B4.10.280.10.2
1nkp*ADMyc prot-oncogene bHLHZ HumanBHelix-loop-helix DNA-binding domainMyoD basic-helix-loop-helix domain, subunit B4.10.280.10.2
1am9*ABCDSREBP-1a bHLHZHumanBHelix-loop-helix DNA-binding domainMyoD basic-helix-loop-helix domain, subunit B4.10.280.10.3
1uklCDEFSREBP-2 HLHZHumanBNCNCNC
1a0a*ABPho4 bHLHS. cerevisiaeBHelix-loop-helix DNA-binding domainMyoD basic-helix-loop-helix domain, subunit B4.10.280.10.4
1nlw*ADMad bHLHZHumanBHelix-loop-helix DNA-binding domainNCNC

The PDB codes and protein names for nine bHLH proteins deposited in the PDB are shown with their superfamily names from the CATH [23] and SCOP [24] classifications and their sequence family numbers from CATH. Max has more than one structure solved, including two complexes (1nkp, Max-Myc and 1nlw, Max-Mad). *Structures shown in Figure 1; NC, the protein is not yet included in the CATH or SCOP protein structure classifications. SREBP, steroid response element binding protein. 'Protein chains' indicates the chain identification letter assigned to individual subunits in the PDB files.

Figure 1

Representative structures of bHLH proteins from the Protein Data Bank [22]. In each diagram, the protein is shown as a secondary-structure cartoon and the DNA double helix is shown in stick representation. (a) MyoD bHLH-domain homodimer (PDB code 1mdy). (b) Pho4 bHLH-domain homodimer (1am9). (c) SREBP-1a bHLH-domain homodimer (1aoaC). (d) Max-Mad heterodimer (1nlw). (e) Max-Myc heterodimer (1nkp). (f) Max-Myc heterotetramer (1nkp). In (d-f) the Max HLH monomer is shown in dark gray. The scales are not comparable between different structures.

The structure of MyoD (Figure 1a) is typical of many bHLH proteins, comprising two long α helices connected by a short loop, which in the case of MyoD is 8 residues in length. The first helix (H1) includes the basic domain, which makes contact with the major groove of the DNA. MyoD is a homodimer in which the two monomers make identical contacts with the DNA. Comparisons of this structure with that of Max (which includes an additional leucine zipper domain; Figure 1d,1e,1f) reveal that the presence or absence of this domain does not significantly affect the structure of the bHLH segment [25]. Two interesting features revealed by the three-dimensional structure of the Pho4 bHLH domain (Figure 1b) are the existence of a short stretch of α-helix in the loop region that links helix H1 to helix H2 and the recognition of DNA bases outside the E-box sequence [26]. The Pho4 protein binds DNA as homodimer, and its two subunits form a parallel four-helix bundle (Figure 1b). The short α-helix region in the loop lacks the stabilizing hydrogen-bonding network observed in other bHLH proteins. In the Pho4 structure, each half-site of the symmetrical E box is recognized by a triad of residues, but bases beyond the E box, including a GG sequence at the 3' end, are also recognized [26]. Base recognition outside the E box is also observed for MyoD, but in this structure it occurs at the 5' end of the E box [25]. Sterol regulatory element binding protein la (SREBP-1a; Figure 1c) is an example of a bHLH structure that includes one of the additional domains, the leucine zipper. SREBPs are bHLHZ transcription activators that bind to a DNA target site as a homodimer and are essential for cholesterol metabolism [27]. Unlike other bHLH proteins that recognize a symmetrical E box, SREBP-1a recognizes an asymmetrical sterol regulatory element. This asymmetric recognition is possible because of the presence of a tyrosine residue in the basic domain. The tyrosine replaces the arginine observed in other bHLH proteins such as Max, and this change results in the loss of polar interactions with the DNA [27]. Recently, a crystal structure of another SREBP, SREBP-2, has been solved [28], in which SREBP-2 is bound in a complex with importin-β, a molecule that mediates the transport of molecules into and out of the nucleus; the structure reveals that SREBP-2 is imported into the nucleus as a homodimer. Two of the most interesting structures to be solved to date are those of the Max-Mad (Figure 1d) and Max-Myc (Figure 1e) heterodimer complexes bound to double-stranded DNA [29]. In each monomer, the amino-terminal α helix is a continuous secondary-structural element that includes the basic region and the α helix H1, and the carboxy-terminal α helix is made up of two continuous α-helical segments, helix H2 and the leucine-zipper region. The Myc-Max and Mad-Max complexes are quasi-symmetric heterodimers that have interfaces made up of hydrophobic and polar interactions involving residues in helices H1 and H2 and the leucine zipper. Mutation studies suggest that dimer specificity is controlled by the amino acids Gln91 and Asn92 (in the Max numbering) in the Myc-Max dimer. The studies also show that Glu125 controls Mad-Max heterodimer formation [29]. One interesting feature of the Myc-Max crystal structure (Figure 1e,1f) is the presence of two heterodimers in the asymmetric unit of the crystals. The two structures form a heterotetramer in which the head-to-tail assembly of leucine zippers from different heterodimers results in the formation of an antiparallel four-helix bundle (Figure 1f). It has been shown previously that Myc-Max heterodimers can form higher multimeric structures [30], and there is evidence to suggest that the tetramer observed in the crystal also exists under physiological conditions [29].

Functions of bHLH proteins

The heterogeneity of DNA sequences recognized and dimers formed by the bHLH proteins enable them to function as a diverse set of regulatory factors. The bHLH proteins can be divided into those that are cell specific and those that are widely expressed. The cell-type-specific members of the superfamily are involved in cell-fate determination in many different cell lineages and form an integral part of many processes, including neurogenesis, cardiogenesis, myogenesis, and hematopoiesis (Table 3). The bHLH proteins involved in neurogenesis include Drosophila Atonal and other 'proneural' proteins [31]. In vertebrates, Mash-1, Math-1 and the neurogenins are important in the initial determination of neurons, whereas Nero-D, NeuroD2, MATH-2 and others are differentiation factors [32]. The bHLH transcription factors dHAND and eHAND are important in cardiac development in vertebrates [33]. The myogenic regulatory factors, including MyoD, MRF-4, Myf-5 and myogenin, together regulate both the establishment and differentiation of the myogenic lineage [34]. The stem cell leukemia (SCL) protein is a bHLH transcription factor that is essential for hematopoiesis and is associated with acute T-cell leukemia [35].
Table 3

Functional classes of bHLH proteins

Phylogenetic classbHLH familyExample mammalian proteinFunction
AMyoDMyf4Myogenic: initiates myogenic programme in many cell types
NeuroDNeurogenic differentiation factor NDF2Neurogenic: involved in terminal neurone differentiation
SCLTal1Hematopoietic: essential for primitive hematopoiesis and invokes enhanced proliferation and differentiation during erythroid development
HanddHandCardiogenic: regulates the morphogenetic events of asymmetric heart development
BMycc-MycCell proliferation and differentiation; oncogenic
MadMad1Regulation of cell proliferation
SREBPSREBP-2Cholesterol metabolism
CSimSingle-minded 1 (Sim1)Neurogenic: regulation of midline cell lineage in the central nervous system
DEmcId1Myogenic and neurogenic: negative inhibition of DNA binding
EHairyHes1Neurogenic: restricts differentiation of neurons from neural precursor cells
FCoeEarly B-cell factor (EBF1)Hematopoietic: essential for B-cell development

Examples of mammalian proteins and their diverse functions are shown for each phylogenetic group of bHLH proteins. The phylogenetic groups are those indicated in Table 1 and discussed in the text.

One family of bHLH proteins that is widely expressed in many different cell types is the Myc family. The Myc genes are among the most frequently affected genes in human tumors [36]. Myc proteins are known to regulate translation initiation [37] and they also function as transcriptional activators when they form heterodimers with Max proteins (also members of group B) [38]. There is some evidence, however, that these dimers may also operate as negative regulators of transcription (reviewed in [39]). Max is also known to form homodimers and heterodimerize with other bHLH proteins including Mad [38]. This dimerization network of Myc/Max/Mad transcription proteins has a large number of target genes involved in the cell cycle, and the network has been considered to function as a transcription module [20]. In summary, the bHLH superfamily constitutes a large and diverse class of proteins, with over 125 different proteins identified in humans and 145 in Arabidopsis. The discovery of their diverse functions in the cell cycle, cell-lineage development and tumorigenesis has elevated the interest in them in the 15 years since they were first identified by Murre and co-workers [11]. So what do the coming years hold in store for this superfamily? With the sequencing of more genomes, it is expected that further superfamily members and new sequence families will be identified. With an increasing number of proteins targeted and solved by structural-genomics consortia, the structural data available for this superfamily will also grow. The knowledge gained from new sequences and novel high-resolution structures will offer further insights into the mechanisms by which they control such diverse processes. This increasing knowledge base may make them good targets for new drug therapies for conditions including heart disease and cancer.
  37 in total

1.  The Protein Data Bank.

Authors:  H M Berman; J Westbrook; Z Feng; G Gilliland; T N Bhat; H Weissig; I N Shindyalov; P E Bourne
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

Review 2.  Control of cell lineage-specific development and transcription by bHLH-PAS proteins.

Authors:  S T Crews
Journal:  Genes Dev       Date:  1998-03-01       Impact factor: 11.361

Review 3.  Basic helix-loop-helix genes in neural development.

Authors:  J E Lee
Journal:  Curr Opin Neurobiol       Date:  1997-02       Impact factor: 6.627

4.  Crystal structure of PHO4 bHLH domain-DNA complex: flanking base recognition.

Authors:  T Shimizu; A Toumoto; K Ihara; M Shimizu; Y Kyogoku; N Ogawa; Y Oshima; T Hakoshima
Journal:  EMBO J       Date:  1997-08-01       Impact factor: 11.598

5.  Chromosomal translocation in a human leukemic stem-cell line disrupts the T-cell antigen receptor delta-chain diversity region and results in a previously unreported fusion transcript.

Authors:  C G Begley; P D Aplan; M P Davey; K Nakahara; K Tchorz; J Kurtzberg; M S Hershfield; B F Haynes; D I Cohen; T A Waldmann
Journal:  Proc Natl Acad Sci U S A       Date:  1989-03       Impact factor: 11.205

Review 6.  HLH proteins, fly neurogenesis, and vertebrate myogenesis.

Authors:  Y N Jan; L Y Jan
Journal:  Cell       Date:  1993-12-03       Impact factor: 41.582

Review 7.  Structure and function of helix-loop-helix proteins.

Authors:  C Murre; G Bain; M A van Dijk; I Engel; B A Furnari; M E Massari; J R Matthews; M W Quong; R R Rivera; M H Stuiver
Journal:  Biochim Biophys Acta       Date:  1994-06-21

8.  A genomewide survey of basic helix-loop-helix factors in Drosophila.

Authors:  A W Moore; S Barbel; L Y Jan; Y N Jan
Journal:  Proc Natl Acad Sci U S A       Date:  2000-09-12       Impact factor: 11.205

9.  Crystal structure of MyoD bHLH domain-DNA complex: perspectives on DNA recognition and implications for transcriptional activation.

Authors:  P C Ma; M A Rould; H Weintraub; C O Pabo
Journal:  Cell       Date:  1994-05-06       Impact factor: 41.582

10.  Co-crystal structure of sterol regulatory element binding protein 1a at 2.3 A resolution.

Authors:  A Párraga; L Bellsolell; A R Ferré-D'Amaré; S K Burley
Journal:  Structure       Date:  1998-05-15       Impact factor: 5.006

View more
  168 in total

1.  Phylogenetic analysis and classification of the fungal bHLH domain.

Authors:  Joshua K Sailsbery; William R Atchley; Ralph A Dean
Journal:  Mol Biol Evol       Date:  2011-11-22       Impact factor: 16.240

2.  The Drosophila juvenile hormone receptor candidates methoprene-tolerant (MET) and germ cell-expressed (GCE) utilize a conserved LIXXL motif to bind the FTZ-F1 nuclear receptor.

Authors:  Travis J Bernardo; Edward B Dubrovsky
Journal:  J Biol Chem       Date:  2012-01-16       Impact factor: 5.157

3.  A genomewide survey of bHLH transcription factors in the coral Acropora digitifera identifies three novel orthologous families, pearl, amber, and peridot.

Authors:  Fuki Gyoja; Takeshi Kawashima; Nori Satoh
Journal:  Dev Genes Evol       Date:  2012-03-15       Impact factor: 0.900

4.  Derepression of INO1 transcription requires cooperation between the Ino2p-Ino4p heterodimer and Cbf1p and recruitment of the ISW2 chromatin-remodeling complex.

Authors:  Ameet Shetty; John M Lopes
Journal:  Eukaryot Cell       Date:  2010-10-08

5.  NeuroD1 reprograms chromatin and transcription factor landscapes to induce the neuronal program.

Authors:  Abhijeet Pataskar; Johannes Jung; Pawel Smialowski; Florian Noack; Federico Calegari; Tobias Straub; Vijay K Tiwari
Journal:  EMBO J       Date:  2015-10-29       Impact factor: 11.598

6.  SELEX-seq: a method for characterizing the complete repertoire of binding site preferences for transcription factor complexes.

Authors:  Todd R Riley; Matthew Slattery; Namiko Abe; Chaitanya Rastogi; Dahong Liu; Richard S Mann; Harmen J Bussemaker
Journal:  Methods Mol Biol       Date:  2014

7.  The transcription factor AmeloD stimulates epithelial cell motility essential for tooth morphology.

Authors:  Yuta Chiba; Bing He; Keigo Yoshizaki; Craig Rhodes; Muneaki Ishijima; Christopher K E Bleck; Erin Stempinski; Emily Y Chu; Takashi Nakamura; Tsutomu Iwamoto; Susana de Vega; Kan Saito; Satoshi Fukumoto; Yoshihiko Yamada
Journal:  J Biol Chem       Date:  2018-11-30       Impact factor: 5.157

8.  Determinants of myogenic specificity within MyoD are required for noncanonical E box binding.

Authors:  Analeah B Heidt; Anabel Rojas; Ian S Harris; Brian L Black
Journal:  Mol Cell Biol       Date:  2007-06-11       Impact factor: 4.272

9.  Control of pathogenicity and disease specificity of a T-lymphomagenic gammaretrovirus by E-box motifs but not by an overlapping glucocorticoid response element.

Authors:  Ditte Ejegod; Karina Dalsgaard Sørensen; Ilona Mossbrugger; Leticia Quintanilla-Martinez; Jörg Schmidt; Finn Skou Pedersen
Journal:  J Virol       Date:  2008-10-22       Impact factor: 5.103

10.  The helix-loop-helix factors Id3 and E47 are novel regulators of adiponectin.

Authors:  Amanda C Doran; Nahum Meller; Alexis Cutchins; Hamid Deliri; R Parker Slayton; Stephanie N Oldham; Jae B Kim; Susanna R Keller; Coleen A McNamara
Journal:  Circ Res       Date:  2008-07-31       Impact factor: 17.367

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.