Literature DB >> 17598915

Prevalence of the EH1 Groucho interaction motif in the metazoan Fox family of transcriptional regulators.

Sergey Yaklichkin1, Alexander Vekker, Steven Stayrook, Mitchell Lewis, Daniel S Kessler.   

Abstract

BACKGROUND: The Fox gene family comprises a large and functionally diverse group of forkhead-related transcriptional regulators, many of which are essential for metazoan embryogenesis and physiology. Defining conserved functional domains that mediate the transcriptional activity of Fox proteins will contribute to a comprehensive understanding of the biological function of Fox family genes.
RESULTS: Systematic analysis of 458 protein sequences of the metazoan Fox family was performed to identify the presence of the engrailed homology-1 motif (eh1), a motif known to mediate physical interaction with transcriptional corepressors of the TLE/Groucho family. Greater than 50% of Fox proteins contain sequences with high similarity to the eh1 motif, including ten of the nineteen Fox subclasses (A, B, C, D, E, G, H, I, L, and Q) and Fox proteins of early divergent species such as marine sponge. The eh1 motif is not detected in Fox proteins of the F, J, K, M, N, O, P, R and S subclasses, or in yeast Fox proteins. The eh1-like motifs are positioned C-terminal to the winged helix DNA-binding domain in all subclasses except for FoxG proteins, which have an N-terminal motif. Two similar eh1-like motifs are found in the zebrafish FoxQ1 and in FoxG proteins of sea urchin and amphioxus. The identification of eh1-like motifs by manual sequence alignment was validated by statistical analyses of the Swiss protein database, confirming a high frequency of occurrence of eh1-like sequences in Fox family proteins. Structural predictions suggest that the majority of identified eh1-like motifs are short alpha-helices, and wheel modeling revealed an amphipathicity that supports this secondary structure prediction.
CONCLUSION: A search for eh1 Groucho interaction motifs in the Fox gene family has identified eh1-like sequences in greater than 50% of Fox proteins. The results predict a physical and functional interaction of TLE/Groucho corepressors with many members of the Fox family of transcriptional regulators. Given the functional importance of the eh1 motif in transcriptional regulation, our annotation of this motif in the Fox gene family will facilitate further study of the diverse transcriptional and regulatory roles of Fox family proteins.

Entities:  

Mesh:

Substances:

Year:  2007        PMID: 17598915      PMCID: PMC1939712          DOI: 10.1186/1471-2164-8-201

Source DB:  PubMed          Journal:  BMC Genomics        ISSN: 1471-2164            Impact factor:   3.969


Background

DNA-binding transcriptional regulatory proteins have a modular structure and are composed of a sequence-specific DNA-binding domain and trans-regulatory domains. Multiple studies have shown that short conserved peptide regions mediate the biological functions of trans-regulatory domains. In the case of transcriptional repressors, such short protein regions can autonomously mediate repression when fused to a heterologous DNA-binding domain [1,2]. It appears that these conserved regions form either α-helices or binding pockets to provide specific interacting surfaces for transcriptional corepressors. For instance, the Sin3 interaction motif of NRSF/REST adopts a short amphipathic α-helix that mediates specific physical interactions with the Sin3 transcriptional corepressor [3]. In the present study, we focus on identifying and analyzing the Engrailed homology region-1 (eh1) transcriptional repression motif in the Fox gene family of forkhead-related transcriptional regulators. This motif is known to mediate specific physical interactions of a number of protein families with transcriptional corepressors of the TLE/Groucho protein family [4-7]. The eh1 motif is composed of eight amino acid residues with the sequence pattern FS(I/V)XXΦΦX, with X representing any non-polar or charged residue and Φ representing branched hydrophobic residues. The eh1 motif was originally identified as a conserved N-terminal sequence shared between the Drosophila Engrailed protein and its vertebrate orthologs [6]. Functional analysis of the Engrailed protein has shown that the eh1 motif is required for active transcriptional repression in vivo, as well as for the physical interaction with Groucho corepressors [7,8]. An eh1-like motif was also identified in eight classes of the homeodomain protein superfamily (Emx, Dlx, Gsc, Hex, Msh, Six, Oct and Vnd) [5,9,10]. Further in vivo and in vitro studies have shown that the eh1-like motif of Gsc, Nkx, Hex and Six is required for repression function in vivo by recruiting the TLE/Groucho corepressors [5,9,11]. Eh1-like motifs have also been found in several members of the Fox family of forkhead-related transcriptional regulators [12]. Fox proteins are essential transcriptional regulators of embryogenesis, homeostasis, metabolism, and aging in metazoan organisms [13]. The highly conserved DNA-binding domain of Fox family proteins is characterized by the formation of three α-helixes, three β-strands and two loops resembling wings [14], thus the winged helix DNA-binding domain (WHD) designation. The WHD is flanked by N- and C-terminal regions that share low similarity among the Fox protein subclasses. The initial classification of Fox proteins based on sequence-relatedness within the WHD established fifteen subclasses of the Fox gene family [15], and four additional Fox subclasses were subsequently identified [16,17]. An updated list of Fox gene family members is available online [18]. Sequence analysis of several Fox proteins revealed that a short conserved C-terminal region of FoxA proteins (conserved region II or CII) was similar to the eh1 motif [12]. Further biochemical studies showed that FoxA2 physically interacts with TLE1, a mammalian Groucho protein, via the CII region [19]. These data suggest that the CII region not only resembles the eh1 motif in sequence, but also in the ability to directly binding Groucho/TLE corepressors. In addition, the Drosophila FoxG ortholog, Slp1, physically interacts with Groucho via an N-terminal eh1-like motif [20]. Furthermore, our recent studies in Xenopus have shown that FoxD3 can associate with the Xenopus Groucho ortholog, Grg4, via an eh1-like motif. The FoxD3 eh1 motif is essential for a functional interaction with Grg4 and for transcriptional repression in vivo [21]. These observations suggest an interaction of Groucho corepressors with multiple Fox family proteins, and prompted us to systematically examine all subclasses of the Fox gene family for the presence of eh1-like motifs. Given the functional importance of the eh1 motif in transcriptional regulation, annotation of the presence, pattern of distribution, and structural characteristics of this motif in the Fox gene family will facilitate further study of the diverse transcriptional and regulatory roles of Fox family proteins. Here, we present a complete systematic analysis of the presence of eh1-like motifs in metazoan Fox proteins. Eh1-like motifs are identified in more than 50% of Fox proteins representing ten Fox family subclasses (A, B, C, D, G, E, H, I, L and Q) and statistical analyses of the Swiss protein database confirm a frequent occurrence of the motif in the Fox family. Secondary structure analysis of these Fox proteins predicts that the eh1-like motifs adopt a short amphipathic α-helical structure. Taken together, the results point to a functional interaction of TLE/Groucho corepressors with many members of the Fox family and identify structural features of the eh1 motifs that will facilitate further study of the physical interaction of Fox proteins with TLE/Groucho corepressors.

Results

Identification of eh1-like motifs in ten subclasses of the Fox gene family

We performed a systematic analysis of 458 yeast and metazoan protein sequences belonging to nineteen subclasses of the Fox family of transcriptional factors for the presence of eh1-like motifs. An initial manual search was conducted for the presence of sequences composed of eight amino acids with a highly conserved hydrophobic core matching the eh1 motif pattern of FSΦXXΦΦX (X, non-polar or charged residue; Φ, branched hydrophobic residue). Conserved regions of aligned orthologous Fox protein sequences were examined for homology to the eh1 consensus sequence. Eh1-like motifs were identified in Fox protein sequences of 10 subclasses, including the A, B, C, D, E, G, H, I, L and Q, but not in Fox proteins of the F, J, K, M, N, O, P, R and S subclasses (Table 1). Fox proteins containing an eh1-like motif were found across multiple animal phyla, and included chordates, hemichordates, and a variety of invertebrates, but not yeast (Tables 2 and 3). The identified motifs exhibit high similarity to the Drosophila eh1 motif in the range of 50–87%. To summarize the results, a phylogenetic tree for the Fox gene family was constructed in which the presence of an eh1-like motif within individual Fox proteins is indicated [see Additional files 1 and 2].
Table 1

Occurrence of eh1-like motifs in the Fox subclasses.

Fox subclassTotal number of proteinsaNumber of eh1-postive proteinsbNumber of eh1-negative proteins
A39372
B40400
C27243
D745519
E22157
F19019
G21210
H14113
I25520
J29029
K15015
L21615
M909
N26026
O808
P25025
Q26260
R13013
S505

a Number of proteins from each Fox subclass analyzed for the presence of an eh1-like motif.

b Proteins containing a sequence with at least 50% similarity to the eh1 motif of the Drosophila engrailed homeodomain protein [7].

Table 2

List of the identified eh1-like motifs in eight subclasses of invertebrate Fox proteins.

SubclassProteinMotifsaHomology to eh1 motifbPositioncProtein lengthSpeciesAccession number
A
FoxAFAIKNIIA62.5%243–250321H. vulgarisAAO92606
FoxAFAIKNIIA62.5%215–222286N. vectensis42374841
FoxAFSIDRIMH50%412–419485D. japonica9309317
FoxAFSITRLLP75%302–309350H. armigera57791692
FoxAFSITNLMS62.5%375–382435P. vulgata22859616
FoxAFSITRLLP62.5%300–307349B. mori112983681
FoxAFSITRLLP62.5%372–379435A. aegypti108881332
FoxAFSITRLLP62.5%374–381437A. gambiae55233684
FoxAFSINRLLP62.5%452–459510D. melangaster7301684
FoxAFSINRLLP62.5%370–377431T. castaneum86515352
FoxAFSITRLLP75%460–467570A. mellifera110759792
FoxAFSINSIIP62.5%377–384440S. purpuratus91983614
FoxA5FSISSLMN62.5%452–459587C. intestinalisAAB61227
FoxA5FSISNLMS87.5%342–349403B. floridaeCAA65368
FoxA5FSISSLMN62.5%441–448567M. oculataAAB69278
B
FoxBFAIENLIG62.5%151–158262N. vectensisABA03229
FoxBFSIESILS75%229–236237C. elegansAAA28104.1
FoxBFTIESLIT75%222–229372D. melangaster17977684
FoxBFTIESLIT75%172–179241T. castaneum91082601
FoxBFTIESLIT75%189–196198A. gambiaeEAA07672
FoxBFTIENIIA75%313–320365A. mellifera110759134
FoxBFTIENIIS87.5%187–194360S. purpuratusNP999797
FoxBFSIENIIS87.5%305–312475C. intestinalisCAD58964
FoxBFNIENIIA62.5%181–188289B. floridaeCAD44627
C
FoxCFTVDSLMN50%260–267508D. melangaster17975538
FoxCFTVDSLMN50%266–273496A. gambiaeEAA11069
FoxCFTVDSLMN50%251–258412A. aegypti108876322
FoxCFSVDALMN50%304–311495A. mellifera110758357
FoxCYTVDSLMA50%258–265479S. purpuratus72007114
FoxCFSVDNIMT75%233–300497B. floridae57337372
D
FoxDFMISNLLK75%434–441444S. domunculaCAE51209
FoxDFSMESILS62.5%3–10333C. elegans17536629
FoxDFSISHIIS87.5%393–400455D. japonicaBAC10918
FoxDFRIETLIG50%435–442456D. melangaster17647421
FoxDFSIENLIG75%491–498504A. aegypti10886922
FoxDFSIDALIG62.5%313–320354A. mellifera110759337
FoxDFTIDSLLN62.5%308–315401S. purpuratus115953031
FoxDFSIESLIG62.5%377–384506C. savignyiBAB68347
FoxDFSIENIIG75%311–318402B. floridaeAF512537
E
FoxEFSIENIIG75%207–214393C. intestinalisBAC57420
FoxE4FSIDNIIA75%227–234381B. floridae18653452
G
FoxGFSIENILK75%12–19318M. leidyiAAN17798
FoxGFSIRQMLD50%16–23260D. japonicaBAC10917
FoxGFSILDLCP37.5%4–11270C. elegans17569837
FoxGFSINSILP50%18–25424A. gambiaeEAA43390
FoxGFGMDRLLG37.5%284–291424A. gambiaeEAA43390
FoxGFSISSILP75%156–163444T. castaneum91080905
FoxGFNMERLLA37.5%381–388444T. castaneum91080905
FoxG1FSIRSILP62.5%51–58451A. mellifera110756018
FoxG1FSMERLLQ37.5%328–335451A. mellifera110756018
FoxG1FSIDAILA62.5%12–19322D. melangasterCAA46890
FoxG2FSIDAILP62.5%62–69445D. melangasterCAA46891.1
FoxGFSVESMLS62.5%34–41507S. purpuratus72179617
FoxGFSVERLLS75%396–403507S. purpuratus72179617
FoxG1FSIRRMLS62.5%20–27402B. floridaeAF067203
FoxG1FSVERLLS75%286–293402B. floridaeAF067203
L
FoxL1FTIDNIIG75%356–363365D. melangasterQ02360
FoxL1FSIDNILA75%299–306521S. purpuratus72009133
Q
FoxQ1FSIDSILG62.5%251–258408S. purpuratus82706210
FoxQ1FSIESILS75%268–275385C. intestinalis70569660
FoxQ1FSIDAILS75%226–233324B. floridaeCAH55831
FoxQ2bFDVESLLR50%282–289380C. hemisphaerica108796163
FoxQ2aFSIENILG75%325–332387C. hemisphaerica108796161
FoxQ2FTIEAILE62.5%221–228230C. elegans17505695
FoxQ2FDVASLLA50%348–355599D. melangaster66571262
FoxQ2FDVASLLA50%233–240299T. castaneum91076112
FoxQ2FDVESLLR50%232–239307A. gambiaeXP566358
FoxQ2FSIENLAQ62.5%4–11329S. purpuratusABB89473
FoxQ2FSIDRLVG62.5%4–11271B. floridaeAY163864
Orphans
Fox1FRIEFLLK50%276–283285N. vectensisABA03228
Fox1FSISKLIL75%211–218218S. domunculaCAE51213

a The highly conserved core of the eh1-like motifs are indicated in bold.

b The percent similarity between the identified Fox eh1-like motifs and the eh1 motif (FSISNILS) of the Drosophila engrailed homeodomain protein [7].

c The location of the motifs within the amino acid sequence of the individual Fox proteins.

Table 3

List of the identified eh1-like motifs in ten subclasses of chordate Fox proteins.

SubclassProteinMotifaHomology to eh1 motifbPositioncProtein lengthSpeciesAccession number
A
FoxA1FSINNLMS75%359–366427D. rerioAAH65668
FoxA1aFSINNLMS75%356–363428X. laevisAAN76331
FoxA1bFSINNLMS75%355–362427X. laevisAAA17050
FoxA1FSINNLMS75%394–401466R. norvegicus6981034
FoxA1FSINNLMS75%396–403468M. musculusP35582
FoxA1FSINNLMS75%400–407472H. sapiens24497501
FoxA2FSINNLMS75%342–349409D. rerio18858687
FoxA2aFSINNLMS75%351–358434X. laevis45361699
FoxA2FSINNLMS75%354–361438G. gallusNP990101
FoxA2FSINNLMS75%377–384459M. musculus6753898
FoxA2FSINNLMS75%376–383458R. norvegicusNP036875
FoxA2FSINNLMS75%376–383457H. sapiens24497504
FoxA3FSITNLMS87.5%376–383441D. rerio18858689
FoxA3FSITNLMS87.5%259–266324S. salarAAC16333
FoxA3FSINNLMS75%307–314353M. musculus22477526
FoxA3FSINNLMS75%394–401466R. norvegicusCAA39418.1
FoxA3FSINNLMS75%304–311350H. sapiens24497506
FoxA4FSITNLMS87.5%345–352417A. mexicanumAAC60128
FoxA4aFSITQLMS75%328–335399X. laevisCAA46290
FoxA4bFSITQLMS75%328–335400X. laevisAAB22027
FoxA5FSISSLMN62.5%452–459587C. intestinalisAAB61227
FoxA5FSISNLMS87.5%342–349403B. floridaeCAA65368
FoxA5FSISSLMN62.5%441–448567M. oculataAAB69278
B
FoxBFSIENIIS87.5%305–312475C. intestinalisCAD58964
FoxBFNIENIIA62.5%181–188289B. floridaeCAD44627
FoxB1FAIENIIA62.5%164–171297D. rerioAAH56754
FoxB1FAIESIIA62.5%171–178289T. nigroviridis47209343
FoxB1FAIENIIA62.5%167–174319X. laevisAAC62623
FoxB1aFAIENIIA62.5%170–177325M. musculusQ64732
FoxB1bFAIENIIA62.5%169–176324M. musculusX92592
FoxB1FAIENIIA62.5%170–178324H. sapiensQ99853
FoxB2FAIENIIG62.5%176–183317X. laevisCAD31848
FoxB2FAIENIIG62.5%267–274428M. musculusNP032049
FoxB2FAIENIIG62.5%266–273425R. norvegicus109459945
FoxB2FAIENIIG62.5%270–277432H. sapiens61966923
C
FoxCFSVDNIMT75%233–300497B. floridae57337372
FoxC1.1FSVDNIMT62.5%277–284476D. rerioAF219949
FoxC1.2FSMDTIMT75%254–261433D. rerioAF219950
FoxC1FSMDTIMT75%275–282470T. nigroviridis47220394
FoxC1FSVDNIMT75%298–305495X. laevis80478512
FoxC1FSVDNIMT75%275–282528G. gallusCAA76851
FoxC1FSVDNIMT75%308–315553M. musculusAAH52011
FoxC1FSVDNIMT75%307–314502B. taurus76639995
FoxC1aFSVDNIMT75%308–315553H. sapiensQ12948
FoxC1bFSVDNIMT75%308–315553H. sapiensAAC72915
FoxC2FSVENIMT75%258–265463X. laevis47497986
FoxC2FSVENIMT75%244–251445G. gallusAAC60065
FoxC2FSVETIMT75%269–276494M. musculusQ61850
FoxC2FSVENIMT75%270–277501H. sapiensQ99958
D
FoxDFSIESLIG62.5%377–384506C. savignyiBAB68347
FoxDFSIENIIG75%311–318402B. floridaeAF512537
FoxD1FSIDNIIG75%295–302363D. rerioAAH75922
FoxD1.1FSIDSIIG62.5%277–284343D. rerio45501117
FoxD1FSIESIIG62.5%294–301345X. laevis3892202
FoxD1FSIESIIG62.5%377–384440G. gallusAAB08467
FoxD1FSIESLIG62.5%364–371455R. norvegicusXP001057782
FoxD1FSIESLIG62.5%365–372456M. musculusAAC42042
FoxD1FSIESIIG62.5%362–369465H. sapiensQ16676
FoxD2FSIDNIIG75%276–283346X. laevisCAC69867
FoxD2FSIDNIIG75%365–372443G. gallusAAC60064
FoxD2FSIDHIMG62.5%409–416492M. musculusNP032619
FoxD2FSIDHIMG62.5%412–419495H. sapiens55956928
FoxD3FSIENIIG75%297–304371D. rerioAAC06366
FoxD3aFSIENIIG75%297–304371X. laevisCAC12963
FoxD3bFSIENIIG75%297–304371X. laevisCAC12895
FoxD3FSIENIIG75%319–326394G. gallusAAC60066
FoxD3FSIENIIG75%366–373469M. musculusNM010425
FoxD3FSIENIIG75%378–385478H. sapiensNP036315
FoxD4FSIESIMQ62.5%324–331408H. sapiens18959276
FoxD4FTIESIMQ62.5%320–327444M. musculus6679841
FoxD5FSIDSIMA62.5%254–261321D. rerioNP571345
FoxD5aFSIENIMR62.5%285–292352X. laevisAAD47811
FoxD5bFSIENIMK62.5%285–292353X. laevisCAB44729
FoxD5cFSIENIMG62.5%281–288342X. laevisCAB44730
E
FoxEFSIENIIG75%207–214393C. intestinalisBAC57420
FoxE1FRINSLIG62.5%202–209354D. rerioXP696065
FoxE1FRINNLIG62.5%206–213363T. nigroviridis47214250
FoxE1FSINTLIG62.5%231–238379X. laevis46198238
FoxE3FSIDNIIS87.5%269–276422D. rerio118918391
FoxE3FSIDSLIN62.5%215–222365X. laevis6642989
FoxE3FSIDSLIS62.5%239–246383G. galus118094619
FoxE3FRLDSLLG50%195–202288M. musculus7657098
FoxE3FSVDSLVP50%179–186385C. familiaris73977761
FoxE3FSVDSLVN50%217–224319H. sapiensCAI14973
FoxE3FRLDSLLG50%193–200286R. norvegicusXP233428
FoxE4FSIDNIIA75%227–234381B. floridae18653452
G
FoxG1FSIRRMLS62.5%20–27402B. floridaeAF067203
FoxG1FSVERLLS75%286–293402B. floridaeAF067203
FoxG1FSINSLVP62.5%18–25420D. rerio18858707
FoxG1FSINSLMP62.5%18–25436X. laevisAAC79501
FoxG1FSINSLVP62.5%18–25451G. gallusU47275
FoxG1FSINSLVP62.5%18–25481M. musculusAAB42158
FoxG1FSINSLVP62.5%18–25480R. norvegicus6978845
FoxG1aFSINSLVP62.5%18–25469H. sapiensCAA55038
FoxG1bFSINSLVP62.5%18–25477H. sapiensX74142
H
FoxH1FAIDSLLH50%250–257472D. rerio18858709
FoxH1FAIDSLLH50%278–285285T. nigroviridis47223489
FoxH1FMIDSLLH50%271–278518X. laevisP70056
FoxH1FSIKSLLG62.5%198–205401R. norvegicusXP235454
FoxH1FSIKSLLG62.5%167–174310B. taurusCAD58794
FoxH1FSIKSLLG62.5%198–205401M. musculus6679845
FoxH1FSIKSLLG62.5%194–201612H. sapiens41107639
I
FoxI1FSVNNLIY75%405–412419D. rerioAAO63568
FoxI1cFSVNSLIY62.5%367–374381X. laevisCAD31849
FoxI1cFTVNSLIY62.5%345–352359G. gallus50747424
FoxI2FSVNSLIY62.5%369–376383D. rerioAAP92808
Q
FoxQ1FSIESILS75%268–275385C. intestinalis70569660
FoxQ1FSIDAILS75%226–233324B. floridaeCAH55831
FoxQ1FAIDSILS62.5%177–184383D. rerioAAH67139
FoxQ1FRIDSLLS62.5%276–283383D. rerioAAH67139
FoxQ1FTIDSILS75%196–203272T. nigroviridis47220396
FoxQ1FAIDSILS62.5%224–231381X. laevis76152394
FoxQ1FAIDSILS62.5%268–275400M. musculus31560693
FoxQ1FAIDSILS62.5%252–259439R. norvegicus12408312
FoxQ1FAIDSILR50%270–277402H. sapiens8489093
FoxQ2FTIDYLLY62.5%17–24244D. rerioXP694156
FoxQ2FTIDYLLF62.5%20–27210T. nigroviridis47209212
FoxQ2FSIDRLVG62.5%4–110271B. floridaeAY163864
L
FoxL1FSIDSILS75%284–291363D. rerio41055835
FoxL1FSIDSILA62.5%255–262336M. musculusNP032050
FoxL1FSIDSILA62.5%259–266389R. norvegicus109508994
FoxL1FSIDSILA62.5%262–269346B. taurus61823329
FoxL1FSIDSILA62.5%272–279356C. familiaris73956953
FoxL1FSIDSILA62.5%261–268245H. sapiens22779860

a The highly conserved core of the eh1-like motifs are indicated in bold.

b The percent similarity between the identified Fox eh1-like motifs and the eh1 motif (FSISNILS) of the Drosophila engrailed homeodomain protein [7].

cThe location of the motifs within the amino acid sequence of the individual Fox proteins.

Occurrence of eh1-like motifs in the Fox subclasses. a Number of proteins from each Fox subclass analyzed for the presence of an eh1-like motif. b Proteins containing a sequence with at least 50% similarity to the eh1 motif of the Drosophila engrailed homeodomain protein [7]. List of the identified eh1-like motifs in eight subclasses of invertebrate Fox proteins. a The highly conserved core of the eh1-like motifs are indicated in bold. b The percent similarity between the identified Fox eh1-like motifs and the eh1 motif (FSISNILS) of the Drosophila engrailed homeodomain protein [7]. c The location of the motifs within the amino acid sequence of the individual Fox proteins. List of the identified eh1-like motifs in ten subclasses of chordate Fox proteins. a The highly conserved core of the eh1-like motifs are indicated in bold. b The percent similarity between the identified Fox eh1-like motifs and the eh1 motif (FSISNILS) of the Drosophila engrailed homeodomain protein [7]. cThe location of the motifs within the amino acid sequence of the individual Fox proteins. To validate the results of the manual search for eh1-like motifs, we used the expectation-maximization algorithm in the MEME program [22]. We initially examined 18 FoxD3-related protein sequences, which contain a conserved and functional eh1 motif [21]. As predicted, the analysis identified eh1-like motifs (E-value of 10-75) at 18 sites corresponding to the previously described eh1 motif of FoxD3. When this approach was extended to the entire Fox family of 458 proteins, eh1-like motifs were identified at 213 sites in ten Fox subclasses (E-value of <10-16). The eh1-like motifs identified using the expectation-maximization algorithm corresponded to motifs identified in the manual sequence analysis, as well as to motifs previously identified in the Fox family [12,23]. To confirm the statistical significance of the match between identified eh1-like sequences and the eh1 consensus, a hidden Markov model (HMM) was constructed [24] for the eh1 motif of FoxD3 (eh1 FD3). This model of the eh1 motif was used to search the SWISS protein database and a summary of the results of the eh1 FD3 HHM analysis is shown in Table 4. A total of 49,363 matches with the eh1 motif were identified, and 647 matches were to proteins that are members of transcription factor families. The mean log-odds score for all transcriptional proteins was 9.07, whereas non-transcriptional proteins scored at 6.87. Among transcriptional proteins, Fox family proteins resulted in the strongest matches with the eh1 motif, with a mean log-odds scores of 14.34. The motifs were identified in 9 subclasses of the Fox protein family which included A, B, C, D, E, G, H, L and Q (the FoxI subclass is not represented in the current SWISS protein database). The search also identified a significant number of high scoring matches (mean log-odds score of 11.61) for homeodomain-containing proteins of the para-Hox cluster [25], but the score for other non-Fox, non-para-Hox transcriptional proteins was low (7.72). The results of the HMM analysis strongly supports the conclusion that eh1-like motifs are present within proteins of the Fox family at high frequency when compared with most transcriptional protein families and non-transcriptional proteins.
Table 4

Descriptive statistics of the Meta-MEME search of SWISS protein databasea using a hidden Markov model of the FoxD3 eh1-like motif.

Protein classbLog-Oddsc Mean (SD)Log-Odds MinimumLog-Odds MaximumHitsd
Non-Transcription6.87 (2.24)1.4924.4348716
Fox14.34 (5.65)5.9729.2354
Para-Hox11.61 (4.83)3.6423.45155
Other Transcription7.72 (2.67)3.4817.42318
All Transcription9.07 (4.28)3.4829.23647

a SWISS protein database integrated in the Meta-MEME software package (version 3.2).

b Protein classes were defined by the presence of a conserved DNA-binding domain for transcriptional proteins, or by the absence of a DNA-binding domain for non-transcriptional proteins. Non-Transcription, proteins that are not members of defined families of transcriptional proteins; Fox, Fox family proteins; Para-Hox, para-Hox class of homeodomain proteins; Other Transcription, transcriptional proteins excluding Fox and para-Hox proteins; All Transcription, all transcriptional proteins.

c Log-odds score is the ratio of a sequence score with respect to the foreground model versus the sequence score with respect to the background model. The log-odds score is the logarithm of an odds score in base 2. SD, standard deviation.

d Hits are positions in the background sequence that align with a motif model.

Descriptive statistics of the Meta-MEME search of SWISS protein databasea using a hidden Markov model of the FoxD3 eh1-like motif. a SWISS protein database integrated in the Meta-MEME software package (version 3.2). b Protein classes were defined by the presence of a conserved DNA-binding domain for transcriptional proteins, or by the absence of a DNA-binding domain for non-transcriptional proteins. Non-Transcription, proteins that are not members of defined families of transcriptional proteins; Fox, Fox family proteins; Para-Hox, para-Hox class of homeodomain proteins; Other Transcription, transcriptional proteins excluding Fox and para-Hox proteins; All Transcription, all transcriptional proteins. c Log-odds score is the ratio of a sequence score with respect to the foreground model versus the sequence score with respect to the background model. The log-odds score is the logarithm of an odds score in base 2. SD, standard deviation. d Hits are positions in the background sequence that align with a motif model. To evaluate the statistical significance of the eh1-like motif identification results obtained by HMM, logistic regression analysis was performed. Analysis of the log-odds scores for the transcriptional protein and non-transcriptional protein classes indicated that the association of eh1-like motifs with transcriptional proteins had high statistical significance (p < 2 × 10-9). Furthermore, analysis of the log-odds scores for the Fox family transcriptional proteins and other transcriptional protein classes were analyzed, the association of higher log-odds scores with Fox proteins was found to have high statistical significance (p < 2 × 10-9). The results strongly support the conclusion that eh1 motifs are present in members of the Fox family at high frequency, and suggest that the eh1 motif contributes to the transcriptional function of many Fox family proteins. For most of the Fox proteins analyzed, a single eh1-like motif was located C-terminal to the WHD (Fox subclasses A, B, C, D, E, H, I, L and Q). Two similar eh1-like motifs are present in the zebrafish FoxQ1 protein, with both C-terminal to the WHD. Interestingly, the C. elegans FoxD and sea urchin, amphioxus and zebrafish FoxQ2 proteins contain N-terminal eh1-like motifs, whereas a C-terminal motif location is found for the other FoxD and FoxQ orthologs. All FoxG proteins contain an eh1-like motif N-terminal to the WHD, and in sea urchin and amphioxus FoxG proteins a second eh1-like motif is located C-terminal to the WHD. The vertebrate FoxG proteins contain a C-terminal sequence that appears to be a remnant of an eh1 motif that lacks the conserved phenylalanine. Eh1-like motifs were identified in Fox proteins in several early divergent species. These included sponge (phylum Porifera) FoxD, hydra and sea anemone (phylum Cnidaria) FoxA, and comb jelly (phylum Ctenophora) FoxG. The presence of eh1 motifs in Fox proteins of these phyla suggests an ancient appearance of this motif in the Fox gene family and therefore, a functional interaction with Groucho-related corepressors early in the evolution of the Fox gene family.

Loss of eh1-like motifs within Fox gene subclasses

Our sequence analysis indicates incomplete distribution of the motif within certain Fox subclasses, suggesting the loss of the motif in a subset of Fox proteins. A striking example of the loss of the eh1-like motif is observed within the FoxE subclass for FoxE1 proteins. Sequence analysis of FoxE subclass proteins did not identify a recognizable eh1 motif in seven mammalian FoxE1 proteins, whereas FoxE1 proteins of fish and amphibia, and nine other FoxE proteins contained the motif. To assess the inheritance and loss of the eh1 motif during the evolution of FoxE proteins, a phylogenetic tree for the FoxE subclass and the FoxC and FoxD outgroups was constructed using a neighbor-joining method (Figure 1). The topology of the phylogenetic tree (bootstrap value 91%) indicates a close relatedness of the fish, amphibian, and mammalian FoxE1 proteins, which suggests a common ancestry. Therefore it is reasonable to infer that the ancestral FoxE1 protein contained the motif, and the loss of the eh1 motif occurred in the mammalian lineage or ancestors of the mammalian phyla in the course of evolution. All other members of the FoxE subclass, including the amphioxus and tunicate proteins, as well as mammalian FoxE3 proteins, contained the motif. This suggests that most likely an ancestral FoxE protein contained the motif before the separation and expansion of the FoxE subclass, and this idea is supported by the presence of the motif in nearly all members of the FoxC and FoxD outgroups.
Figure 1

A phylogenetic tree for proteins of the FoxE subclass and the FoxC and FoxD outgroups. A neighbor-joining method was used to construct the tree topology and bootstrapping values are shown at each branch point (percentage of 1000 bootstrap samples) using the MEGA 3.1 software. Gaps were deleted in pairwise comparisons. The distance scale below the tree represents the number of substitutions per site. The C and D families are collapsed for better illustration. Protein sequences that lack a recognizable eh1-like motif are represented by blue triangles. Proteins and subclasses that contain an eh1-like motif are represented by red circles.

A phylogenetic tree for proteins of the FoxE subclass and the FoxC and FoxD outgroups. A neighbor-joining method was used to construct the tree topology and bootstrapping values are shown at each branch point (percentage of 1000 bootstrap samples) using the MEGA 3.1 software. Gaps were deleted in pairwise comparisons. The distance scale below the tree represents the number of substitutions per site. The C and D families are collapsed for better illustration. Protein sequences that lack a recognizable eh1-like motif are represented by blue triangles. Proteins and subclasses that contain an eh1-like motif are represented by red circles. It should be noted that a cnidarian FoxE-related protein lacks the eh1 motif, and this may be viewed as inconsistent with the presence of the eh1 motif in the ancestral FoxE protein. However, phylogenetic analysis indicates a distant relatedness of this cnidarian protein to the FoxE subclass, arguing for different origins. Similarly, the motif is not detected in the N. vectensis FoxD- and FoxC-related proteins, which also appear to have undergone significant sequence divergence. The motif is present in cnidarian FoxA and FoxB proteins, as well as the FoxC- and FoxD-related (Fox1) proteins of the sponge S. domuncula [see Additional files 1 and 2], suggesting that ancestral precursors for these subclasses contained the motif, whereas the motif was likely lost in a subset of more divergent cnidarian Fox proteins. No eh1-like motif is detected in the tunicate FoxH-like proteins, whereas nearly all vertebrate FoxH proteins contain the motif. The absence of the eh1 motif in the tunicate FoxH proteins suggests a divergence and loss of this motif in the hemichordate lineage. However, it is also possible that the ancestral FoxH protein did not contain an eh1 motif and that the motif was recruited in the vertebrate lineage. Interestingly, a Xenopus FoxH1 paralog, FoxH3, also lacks the eh1 motif present in other vertebrate FoxH orthologs, again suggesting a loss of the motif, perhaps due to functional specialization [see Additional files 1 and 2].

Characteristics of eh1-like motifs in Fox family proteins

For the eh1-like motifs identified, the amino acid frequency at each position of the motif was determined to better define the characteristics of the motif in invertebrate and vertebrate members of the Fox gene family (Figure 2). For this frequency analysis, each position in the motif is identified as 0 to 7 in an N-terminal to C-terminal order. Although this analysis includes Fox proteins of evolutionary distant organisms, similar residue usage is observed at most positions. Overall, the identified motifs are characterized by the predominance of hydrophobic residues. The aromatic residue, phenylalanine, is absolutely conserved (100%) at position 0 of the identified motifs in vertebrates and in nearly all invertebrates. The hydrophobic core of the motif (positions 2, 5 and 6) is characterized by the frequent presence of branched hydrophobic residues such as isoleucine, leucine, methionine, and, less frequently, valine. For both vertebrates and invertebrates, isoleucine is highly represented at position 2 (75%), and leucine and isoleucine appear at similar frequencies (40–60%) at positions 5 and 6 in both invertebrates and vertebrates. Serine is highly represented at position 1 (75%) in vertebrate Fox proteins, whereas serine (55%) and threonine (30%) predominate at this position in invertebrates. Although positions 3 and 4 are variable, there is a strong bias for negatively charged residues at position 3 and the uncharged polar residues serine and asparagine at position 4. Position 7 of the eh1-like motifs is most variable, with glycine, alanine and serine residues often present. It should be noted that within individual Fox subclasses, residue identity at each position is more highly conserved, reflecting the evolutionary relatedness of the proteins in each subclass, as well as the conservation of subclass-specific functional and structural properties of the motifs [see Additional files 2 and 3].
Figure 2

The diagrams summarize the amino acid compositions of the eh1-like motifs identified in Fox proteins. The amino acid usage frequency of eh1-like motifs identified in invertebrate (A) and vertebrate (B) Fox proteins. The diagrams were generated with the WebLogo program [44].

The diagrams summarize the amino acid compositions of the eh1-like motifs identified in Fox proteins. The amino acid usage frequency of eh1-like motifs identified in invertebrate (A) and vertebrate (B) Fox proteins. The diagrams were generated with the WebLogo program [44]. The conservation of multiple hydrophobic residues in the eh1 motif is favorable for the formation of α-helices, and suggests that the eh1-like motifs identified in Fox family proteins have the potential to adopt a hydrophobic α-helical structure. To predict structural characteristics of the motifs, several algorithms (DSC, PHD, MLRC) were used to calculate the propensity of secondary structure formation [26-28]. For several Fox proteins of each subclass, regions containing the eh1-like motif were analyzed for predicted secondary structure. The results obtained using multiple algorithms predict a high likelihood of α-helical structure in the region of the eh1-like motif for the majority of Fox proteins examined. The highest scores for α-helical propensity were obtained for the eh1-like motifs present in FoxB, FoxE and FoxQ proteins, and α-helical structure was also predicted for FoxD, FoxA, FoxC and FoxL proteins, albeit with lower propensity scores [see Additional file 4 and data not shown]. In BLAST searches, the eh1-like motifs of several Fox proteins, including FoxB and FoxE proteins, show similarity to the hydrophobic regions of several membrane proteins, including the α-helical regions of the Chlorobium tepidum segregation and condensation protein B (CHPfCT, AAM71720), Pseudomonas aeruginosa probable transcriptional regulator Pa0477 (2ESND), and Drosophila ultraspiracle ligand-binding domain (ULBD, 1HG4F) (Figure 3A and data not shown). A BLAST search for sequences related to the N. vectensis Fox1 eh1-like motif identified the α-helical region of Hepatitis C RNA Polymerase (1YVZA) as the only related sequence (Figure 3B). The ability of eh1-like sequences in proteins unrelated to the Fox family to form α-helical structure supports the prediction of α-helical structure for the eh1-like motifs identified in Fox proteins.
Figure 3

(A) Multiple sequence alignments of the α-helical region of an ultraspiracle ligand binding domain from Drosophila (ULBD), α-helix of a conserved hypothetical protein from C. tepidum (CHPfCT), and the eh1 motifs of human FoxB1, murine FoxB2 and amphioxus FoxE4 proteins, which have a high likelihood of α-helix formation. (B) Sequence alignment for the α-helical region of the Hepatitis C Virus RNA Polymerase Genotype 2a (HCVRPG) and the eh1 motif of the cnidarian Fox1 protein. The defined α-helices are represented as red solid boxes and predicted α-helices are shown as red dotted boxes. Amino acid similarities are shown in yellow. hum, Human; mus, Mouse; amp, amphioxus; nem, Sea Anemone. Wheel models of the eh1-like motifs of Xenopus FoxB1 (C) and amphioxus FoxE4 (D) form an amphipathic α-helical structure. Hydrophobic residues on the wheel are shown in the red, hydrophilic residues are shown in the blue, and non-charged residues are shown in the gray.

(A) Multiple sequence alignments of the α-helical region of an ultraspiracle ligand binding domain from Drosophila (ULBD), α-helix of a conserved hypothetical protein from C. tepidum (CHPfCT), and the eh1 motifs of human FoxB1, murine FoxB2 and amphioxus FoxE4 proteins, which have a high likelihood of α-helix formation. (B) Sequence alignment for the α-helical region of the Hepatitis C Virus RNA Polymerase Genotype 2a (HCVRPG) and the eh1 motif of the cnidarian Fox1 protein. The defined α-helices are represented as red solid boxes and predicted α-helices are shown as red dotted boxes. Amino acid similarities are shown in yellow. hum, Human; mus, Mouse; amp, amphioxus; nem, Sea Anemone. Wheel models of the eh1-like motifs of Xenopus FoxB1 (C) and amphioxus FoxE4 (D) form an amphipathic α-helical structure. Hydrophobic residues on the wheel are shown in the red, hydrophilic residues are shown in the blue, and non-charged residues are shown in the gray. Helical wheel analysis of the predicted α-helical regions of the eh1-like motifs revealed an amphipathicity for a majority of the identified motifs. As an example of this analysis, the helical wheel models of the eh1-like motifs of FoxB1 and FoxE4 (Figure 3C,D) display a predicted amphipathicity of the α-helical structure. For both eh1-like motifs, a hydrophobic surface is formed by Isoleucine residues at positions 2, 5 and 6 of the predicted α-helix. The eh1-like motifs of a subset of FoxB1, FoxB2, FoxH1 and FoxQ1 proteins contain an additional hydrophobic residue (Alanine or Methionine) at position 1 that extends the hydrophobic surface of the predicted α-helix (Figure 3C and data not shown). Opposite the hydrophobic surface of the predicted α-helix is a surface consisting predominantly of hydrophilic and non-charged residues (Figure 3C,D and data not shown). Thus, the majority of the eh1-like motifs identified in Fox proteins have a predicted amphipathic α-helical structure. The validity of the predicted eh1 structure is strongly supported by a recent crystallographic study showing that the Goosecoid eh1 motif forms a short amphipathic α-helix when bound to the WD domain of TLE1 [29].

Positional distribution of C-terminal eh1-like motifs

The eh1-like motifs identified in the Fox family were further analyzed for motif position within individual Fox proteins. Given that nearly all of the eh1-like motifs identified in the Fox family are positioned C-terminal to the WHD, we limited the analysis to C-terminal motifs. To assess the variation in motif position within the C-terminus of Fox proteins, the positional distribution of the eh1-like motifs relative to the WHD was examined. A substantial variation in the relative positions of the C-terminal eh1-like motifs and the WHDs was found, with an interval ranging from 30–180 residues (Figure 4). A detailed analysis of the positional distribution of these domains in 89 Fox protein sequences revealed two groups, C-proximal and C-distal, defined by maximum interval occurrence between the two domains. For the C-proximal eh1 motifs the maximum interval occurrence is 45–60 residues with a median value of 58 residues (Figure 4A). For the C-distal motifs the maximum interval occurrence is 100–140 residues with a median value of 120 residues (Figure 4B).
Figure 4

The positional distribution of the C-terminal eh1-like motifs in Fox proteins of the B, E, H and Q subclasses (A) and the A D, C and I subclasses (B). Size of polylinker represents the distance between the first residue of the eh1 motif and the conserved C-terminal residue of the winged helix DNA-binding domain.

The positional distribution of the C-terminal eh1-like motifs in Fox proteins of the B, E, H and Q subclasses (A) and the A D, C and I subclasses (B). Size of polylinker represents the distance between the first residue of the eh1 motif and the conserved C-terminal residue of the winged helix DNA-binding domain. Positional variation of the C-terminal eh1-like motifs was also examined within Fox ortholog and paralog groups for eight subclasses. This analysis was limited to chordate Fox proteins as non-chordates lack many Fox subclasses. Proteins of Fox subclasses B, E, H and Q contain C-proximal motifs, whereas C-distal motifs are present in Fox subclasses A, C, D and I. The positional distribution of the motifs in the ortholog groups is shown in Figure 5. The analysis indicates that the position of eh1-like motifs is conserved within individual Fox protein subclasses across species, but not across subclasses within individual species. This conservation of motif position within each subclass is consistent with the existence of a common ancestral gene for the Fox genes comprising an individual subclass [17], but may also reflect a functional constraint that maintains the position of the eh1 motif. Exceptions to the conservation of motif position are observed for the FoxD and FoxQ subclasses, and for orthologs of FoxA3, FoxC1, and FoxH1. For the FoxD subclass, a shift of motif position towards the C-terminus is observed for chick, mouse and human proteins, when compared to amphixous, zebrafish and Xenopus (Figure 5A). A C-terminal shift is also observed for the eh1 motifs of Xenopus, mouse and human FoxQ proteins, compared to amphioxus and zebrafish (Figure 5B). Similarly, for FoxC1 proteins, the eh1 motif of the chick and mammalian orthologs is shifted C-terminally in comparison to the zebrafish and Xenopus orthologs. In contrast, the eh1 motif of mammalian FoxH1 proteins is shifted N-terminally, closer to the WHD, in comparison to the zebrafish and Xenopus proteins.
Figure 5

Positional fluctuations of eh1-like motifs in the ortholog and paralog groups of vertebrate Fox proteins. (A) Positional fluctuations of the eh1-like motifs of the ortholog and paralog groups of the A, C and D subclasses. (B) Positional fluctuations of the eh1-like motifs of the ortholog and paralog groups of the B, E, H and Q subclasses. Polylinker represents the distance between the first residue of the eh1-like motif and the conserved C-terminal residue of the winged helix DNA-binding domain. The paralog groups within a Fox subclass are indicated on the x-axis.

Positional fluctuations of eh1-like motifs in the ortholog and paralog groups of vertebrate Fox proteins. (A) Positional fluctuations of the eh1-like motifs of the ortholog and paralog groups of the A, C and D subclasses. (B) Positional fluctuations of the eh1-like motifs of the ortholog and paralog groups of the B, E, H and Q subclasses. Polylinker represents the distance between the first residue of the eh1-like motif and the conserved C-terminal residue of the winged helix DNA-binding domain. The paralog groups within a Fox subclass are indicated on the x-axis. For each case where eh1 motif position is not conserved, the shift in motif position correlates with changes in the size of the coding region C-terminal to the WHD. For example, sequence alignment of FoxD subclass proteins reveals the presence of polyalanine, polyglycine and polyproline repeats in the mammalian proteins that are absent in FoxD proteins of lower vertebrates (data not shown). On the other hand, mammalian FoxH1 proteins lack sequences C-terminal to the WHD that are present in the Xenopus and zebrafish orthologs (data not shown). Thus, insertion or deletion of sequences within the C-terminal domain of these mammalian Fox proteins is likely responsible for the shift of eh1 motif position.

Discussion

In this study, we have identified the presence of eh1-like Groucho interaction motifs in ten subclasses of the Fox family of transcriptional regulators by systematically analyzing 458 protein sequences of nineteen Fox subclasses. The analysis shows a widespread distribution of eh1-like motifs within the Fox protein family. The presence of the motif was identified in Fox subclasses, A, B, C, D, E, G, H, I, L and Q, and no eh1-like motif was detected in proteins of the F, J, K, M, N, O, P, R and S subclasses. The majority of the eh1-like motifs identified were located C-terminal to the WHD, including proteins of nine Fox subclasses (A, B, C, D, E, H, I, L and Q). Only the FoxG subclass proteins contained eh1-like motifs N-terminal in the WHD. For Fox proteins containing C-terminal eh1-like motifs, the position of the motif relative to the WHD defined a C-proximal group with motifs 45–60 residues from the WHD (Fox subclasses B, E, H and Q) and a C-distal group with motifs 100–140 residues from the WHD (Fox subclasses A, C, D and I). The presence of eh1 motifs in more than 50% of Fox family proteins was in marked contrast to other protein families, including both transcriptional and non-transcriptional proteins (Table 4 and data not shown). The prevalence of eh1-like motifs in the Fox family suggests that Groucho corepressors directly interact with many Fox proteins to mediate transcriptional repression activity or to inhibit the activation function of other regulatory domains. In a number of cases the functional importance of the identified eh1-like motifs is confirmed by the presence of the motifs within defined transcriptional repression domains and by the ability to mediate direct binding to Groucho proteins. The eh1 motifs are present in the C-terminal repression domains of mouse and chick FoxD3 [30,31], and Xenopus FoxD5 [32], as well as the C-terminal transcriptional inhibitory domain of mouse FoxC1 [33]. Furthermore, the eh1 motifs mediate a functional and direct interaction with Groucho corepressors in mouse FoxA2 [19], Drosophila FoxG/sloppy-paired-1 [20], mouse FoxG1 [34], and Xenopus FoxD3 [21] and FoxH1 (SY and DSK, unpublished). These results confirm the importance of eh1 motifs in Fox family proteins, and suggest that the eh1-like motifs identified in this study may mediate a previously unappreciated interaction of Groucho corepressors with many Fox proteins. Secondary structure analysis of the eh1-like motifs indicates that a majority of the identified motifs are highly likely to form an α-helical structure. In support of this secondary structure prediction, a number of the eh1-like motifs exhibit sequence similarity to regions of unrelated proteins with known α-helical structure. In addition, the eh1-like motifs exhibit amphipathicity, which argues in favor of α-helix formation by the motifs. Structural studies of a number of transcriptional regulators have demonstrated the importance of amphipathic α-helices in binding to transcriptional coregulators. The p53 tumor suppressor binds to the transcriptional coactivator, MDM2, via a 13 amino acid motif. Structural studies have shown that the MDM2 interaction motif of p53 forms an amphipathic α-helix that binds to MDM2 through hydrophobic interactions [35]. In addition, NRSF/REST binds to the Sin3 corepressor via several short amphipathic or hydrophobic α-helices [3]. Therefore, the predicted amphipathic α-helical structure of the eh1 motifs is likely an essential feature for direct, high-affinity binding of Fox proteins to Groucho corepressors. This conclusion is strongly corroborated by recent structural studies showing that the eh1 motif present in the human Goosecoid protein forms a short amphipathic α-helix when bound to the WD domain of the Groucho family protein TLE1 [29]. In general, these observations support the idea that diverse families of transcriptional regulators utilize distinct conserved motifs, which adopt a common amphipathic α-helical structure, as adaptors for the physical interaction with transcriptional coregulators. Eh1-like motifs were identified in Fox proteins of the most evolutionary ancient organisms, including marine sponge (porifera), comb jelly (ctenophora) and sea anemone (cnidaria). The presence of the eh1-like motif in Fox proteins of these organisms likely reflects the presence of the eh1-Groucho interaction functional module early in evolutionary history. Eh1-like motifs are also present in other transcriptional regulators of the sponge, including the Barx/Bsh1 (AAQ24371) and a paraHox-related homeodomain protein (CAD37941). Consistent with the presence of eh1-like motifs in transcriptional regulatory proteins of early divergent species, a Groucho gene (CN626783) has been identified in the cnidarian Hydra. These data suggest an ancient origin for eh1 motif-dependent recruitment of Groucho corepressors, a protein interaction that may have been established as early as the porifera. An intriguing question raised by these analyses is the origins of the eh1 motifs in the Fox gene family. The motifs identified in all Fox subclasses, except for the FoxG subclass, are positioned C-terminal to the WHD. The occurrence of the eh1-like motif N-terminal to the WHD in the FoxG subclass and FoxQ2 suggests that the N-terminal motif may have arisen independent of the C-terminal motif. In addition, two eh1-like motifs, positioned N-terminal and C-terminal to the WHD, were identified in the sea urchin and amphioxus FoxG1 proteins. The presence of two motifs in distinct regions of a subset of FoxG1 orthologs is consistent with independent origins for the C-terminal and N-terminal eh1 motifs. Given the small size of the eh1 motif (8 residues), it is possible that the motif arose multiple times in the Fox family. Therefore, the formation of new eh1-like motifs through the accumulation of missense mutations offers a convergent mechanism for multiple independent appearances of the motif in the Fox family. Alternatively, the Fox genes may have acquired the motif via a non-homologous recombination event that introduced a repression module containing an eh1-like motif. Such a scenario could involve the incorporation of a new exon encoding the repression module. However, since a majority of the Fox family genes lack introns, this mechanism would require intron loss subsequent to incorporation of the eh1-encoding exon. An apparent loss of eh1 motifs was observed in a subset of FoxD, FoxE, and FoxH proteins. Our analysis indicates that the loss of the motif occurred in a subset of mammalian Fox proteins and we speculate that the motif loss provided a new functional modification for these proteins that was evolutionarily beneficial. Since the presence of an eh1 motif likely mediates a functional interaction with Groucho corepressors, the loss of the motif may represent an alteration of both transcriptional activity and regulatory function for individual Fox proteins. For example, while FoxH1 proteins can function as transcriptional activators or repressors by recruitment of Smad coactivators or Groucho corepressors [36,37] (SY and DSK, unpublished), it is predicted that FoxH3 functions exclusively as an activator in association with Smad coactivators [38]. Thus, the eh1 motif may play an important role in the evolution of the Fox gene family by providing a basis for the evolutionary modification of Fox protein function.

Conclusion

The identification of eh1-like motifs in many members of the Fox gene family provides an important insight into the potential transcriptional activity of Fox family proteins, and provides a foundation for the study of eh1 motif function in the Fox family. Biochemical and transcriptional studies will now be necessary to determine if the identified eh1-like motifs mediate a direct physical interaction with Groucho corepressors to confer transcriptional repression activity. Building on our motif analyses, ongoing functional studies should yield a more comprehensive understanding of the evolution, domain organization, and transcriptional activity of the Fox gene family.

Methods

Manual sequence analysis

The Fox gene family is subdivided into nineteen subclasses on the basis of homology within the winged helix DNA-binding domain [15], and at the time of this study the nineteen subclasses comprised 458 sequences. To identify eh1-like motifs, we used the eh1 consensus sequence F0S/A+1Φ+2X+3X+4Φ+5Φ+6X+7 (Φ, branched hydrophobic residues; X, non-polar or charged residues), which has been generated based on the published data. Yeast and metazoan Fox protein sequences present in the SWISS-PROT and NCBI databases were analyzed. To identify the presence of an eh1-like motif in protein sequences of the nineteen subclasses, we performed PSI-BLAST searches of the non-redundant databases with inclusion threshold (E-value) of 0.01 using members of each Fox subclass as a query. In parallel, the sequences of all subclasses were retrieved from the NCBI database and multiple protein alignments were constructed for each subclass using the CLUSTAL W algorithm in the software package MacVector 7.2.2. Regions that were conserved within either the N-terminal or C-terminal regions of at least two species were examined for a minimum of 50% similarity to the eh1 consensus. Taken together these searches allowed for the identification of conserved sequences matching the eh1 consensus in ten Fox subclasses.

Expectation-maximization and hidden Markov model analyses

The expectation-maximization algorithm of the MEME program (Multiple Em for Motif Elicitation, version 3.5.4) [22,39] was used to analyze 458 proteins of the Fox family for the presence of eh1-like motifs. The search parameters used were 20–30 motifs per a run and a motif size of 8–10 amino acid residues. An eh1 motif position-specific probability matrix was generated for a set of FoxD3 protein sequences using MEME, and this matrix was used to construct a hidden Markov model for eh1-like motifs using the Meta-MEME program (Motif-based hidden Markov modeling of biological sequences, version 3.2) [24,40]. The SWISS protein database was searched with the FoxD3 eh1-like motif model using an E-value threshold of <104 for reported sequences. Logistic regression analysis was performed to determine whether there was a statistically significant correlation between the results of the hidden Markov model analysis (log-odds scores) and all transcriptional proteins or Fox family proteins specifically. The dependent variable in the logistic regression analysis is the dummy variable (y), which is equal to 1 when a transcriptional protein is present and 0 otherwise. The independent variable is the score (x). The estimated logistic regression equation is: , where x is the score and is an estimate of the probability that y = 1 or that the transcription factor is present given the score.

Phylogenic analysis of Fox proteins

A phylogenic tree for the FoxE subclass was generated based on the winged-helix DNA-binding domain sequences (100 residues) for FoxC, FoxD and FoxE subclass proteins. Multiple sequence alignments were constructed using Clustal W [41] and these sequences were converted into a cladogram using MEGA 3.1 [42]. Distances were calculated with Poisson correction, and a neighbor-joining method was used to construct the tree topology with bootstrap analysis of 1000 samples.

Secondary structure analysis

For secondary structure predictions, the C-terminal or N-terminal domain of selected Fox proteins of each subclass was subjected to analysis using algorithms that predict secondary structure with accuracy in the range of 0.67–0.7. The prediction algorithm is available at the Network Protein Sequence Analysis website [43]. The source code of the combiner can be obtained on request for academic use. In addition, software written by M.L. (unpublished) was used to predict the secondary structure of Fox protein sequences. This helix prediction algorithm is based on all high-resolution structures available, with the scoring function comparing homology of the sequences to known helical structures.

Authors' contributions

SY initiated these studies and was involved in all aspects of the design, execution and interpretation of these studies, as well as the writing of the manuscript. AV participated in the motif search and statistical analyses, and contributed to the writing of the manuscript. SS and ML contributed to the secondary structure analysis and amphipathic modeling. DSK contributed to the design and interpretation of these studies, data presentation and writing of the manuscript. All authors read and approved the final manuscript.

Additional file 1

Phylogenetic Tree of the Fox Gene Family Indicating the Occurrence of eh1 Motifs. A phylogenetic tree of the entire Fox gene family indicating which individual proteins contain an eh1-like motif. Click here for file

Additional file 2

Legends for Additional Files 1 and 3. Description of data presented in Additional Files 1 and 3. Click here for file

Additional file 3

The amino acid composition of eh1-like motifs identified in individual Fox protein subclasses. Diagrams representing the amino acid composition of the eh1-like motifs identified in each Fox family subclass of invertebrate and vertebrate organisms. Click here for file

Additional file 4

Propensity for α-helix formation for eh1-like motifs in selected Fox proteins. An analysis of the propensity for α-helix formation at the position of individual residues within the eh1-like motifs of selected Fox family proteins. Click here for file
  40 in total

1.  Cloning and sequence comparison of the mouse, human, and chicken engrailed genes reveal potential functional domains and regulatory regions.

Authors:  C Logan; M C Hanks; S Noble-Topham; D Nallainathan; N J Provart; A L Joyner
Journal:  Dev Genet       Date:  1992

2.  MEGA3: Integrated software for Molecular Evolutionary Genetics Analysis and sequence alignment.

Authors:  Sudhir Kumar; Koichiro Tamura; Masatoshi Nei
Journal:  Brief Bioinform       Date:  2004-06       Impact factor: 11.622

3.  Using CLUSTAL for multiple sequence alignments.

Authors:  D G Higgins; J D Thompson; T J Gibson
Journal:  Methods Enzymol       Date:  1996       Impact factor: 1.600

4.  Fitting a mixture model by expectation maximization to discover motifs in biopolymers.

Authors:  T L Bailey; C Elkan
Journal:  Proc Int Conf Intell Syst Mol Biol       Date:  1994

5.  PHD--an automatic mail server for protein secondary structure prediction.

Authors:  B Rost; C Sander; R Schneider
Journal:  Comput Appl Biosci       Date:  1994-02

6.  Co-crystal structure of the HNF-3/fork head DNA-recognition motif resembles histone H5.

Authors:  K L Clark; E D Halay; E Lai; S K Burley
Journal:  Nature       Date:  1993-07-29       Impact factor: 49.962

Review 7.  Human FOX gene family (Review).

Authors:  Masuko Katoh; Masaru Katoh
Journal:  Int J Oncol       Date:  2004-11       Impact factor: 5.650

8.  Phylogenetic relationships of the Fox (Forkhead) gene family in the Bilateria.

Authors:  Françoise Mazet; Jr Kai Yu; David A Liberles; Linda Z Holland; Sebastian M Shimeld
Journal:  Gene       Date:  2003-10-16       Impact factor: 3.688

9.  New roles for FoxH1 in patterning the early embryo.

Authors:  Matt Kofron; Helbert Puck; Henrietta Standley; Chris Wylie; Robert Old; Malcolm Whitman; Janet Heasman
Journal:  Development       Date:  2004-10       Impact factor: 6.868

10.  Six3 and Six6 activity is modulated by members of the groucho family.

Authors:  Javier López-Ríos; Kristin Tessmar; Felix Loosli; Joachim Wittbrodt; Paola Bovolenta
Journal:  Development       Date:  2003-01       Impact factor: 6.868

View more
  24 in total

1.  On becoming neural: what the embryo can tell us about differentiating neural stem cells.

Authors:  Sally A Moody; Steven L Klein; Beverley A Karpinski; Thomas M Maynard; Anthony-Samuel Lamantia
Journal:  Am J Stem Cells       Date:  2013-06-30

2.  FoxG1 and TLE2 act cooperatively to regulate ventral telencephalon formation.

Authors:  Martin Roth; Boyan Bonev; Jennefer Lindsay; Robert Lea; Niki Panagiotaki; Corinne Houart; Nancy Papalopulu
Journal:  Development       Date:  2010-03-31       Impact factor: 6.868

3.  Foxd1 is an upstream regulator of the renin-angiotensin system during metanephric kidney development.

Authors:  Renfang Song; Maria Luisa S Sequeira Lopez; Ihor V Yosypiv
Journal:  Pediatr Res       Date:  2017-08-02       Impact factor: 3.756

Review 4.  The multisystemic functions of FOXD1 in development and disease.

Authors:  Paula Quintero-Ronderos; Paul Laissue
Journal:  J Mol Med (Berl)       Date:  2018-06-29       Impact factor: 4.599

5.  Specific domains of FoxD4/5 activate and repress neural transcription factor genes to control the progression of immature neural ectoderm to differentiating neural plate.

Authors:  Karen M Neilson; Steven L Klein; Pallavi Mhaske; Kathy Mood; Ira O Daar; Sally A Moody
Journal:  Dev Biol       Date:  2012-03-10       Impact factor: 3.582

6.  FoxH1 mediates a Grg4 and Smad2 dependent transcriptional switch in Nodal signaling during Xenopus mesoderm development.

Authors:  Christine D Reid; Aaron B Steiner; Sergey Yaklichkin; Qun Lu; Shouwen Wang; Morgan Hennessy; Daniel S Kessler
Journal:  Dev Biol       Date:  2016-04-13       Impact factor: 3.582

7.  Foxh1 Occupies cis-Regulatory Modules Prior to Dynamic Transcription Factor Interactions Controlling the Mesendoderm Gene Program.

Authors:  Rebekah M Charney; Elmira Forouzmand; Jin Sun Cho; Jessica Cheung; Kitt D Paraiso; Yuuri Yasuoka; Shuji Takahashi; Masanori Taira; Ira L Blitz; Xiaohui Xie; Ken W Y Cho
Journal:  Dev Cell       Date:  2017-03-17       Impact factor: 12.270

8.  FOXD1 promotes nephron progenitor differentiation by repressing decorin in the embryonic kidney.

Authors:  Jennifer L Fetting; Justin A Guay; Michele J Karolak; Renato V Iozzo; Derek C Adams; David E Maridas; Aaron C Brown; Leif Oxburgh
Journal:  Development       Date:  2013-11-27       Impact factor: 6.868

9.  Prediction of functional engrailed homology-1 protein motif from sequence.

Authors:  Danielle S Dalafave
Journal:  Bioinformation       Date:  2009-12-02

10.  Domain duplication, divergence, and loss events in vertebrate Msx paralogs reveal phylogenomically informed disease markers.

Authors:  John R Finnerty; Maureen E Mazza; Peter A Jezewski
Journal:  BMC Evol Biol       Date:  2009-01-20       Impact factor: 3.260

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.