| Literature DB >> 21335608 |
Federico De Masi1, Christian A Grove, Anastasia Vedenko, Andreu Alibés, Stephen S Gisselbrecht, Luis Serrano, Martha L Bulyk, Albertha J M Walhout.
Abstract
Numerous efforts are underway to determine gene regulatory networks that describe physical relationships between transcription factors (TFs) and their target DNA sequences. Members of paralogous TF families typically recognize similar DNA sequences. Knowledge of the molecular determinants of protein-DNA recognition by paralogous TFs is of central importance for understanding how small differences in DNA specificities can dictate target gene selection. Previously, we determined the in vitro DNA binding specificities of 19 Caenorhabditis elegans basic helix-loop-helix (bHLH) dimers using protein binding microarrays. These TFs bind E-box (CANNTG) and E-box-like sequences. Here, we combine these data with logics, bHLH-DNA co-crystal structures and computational modeling to infer which bHLH monomer can interact with which CAN E-box half-site and we identify a critical residue in the protein that dictates this specificity. Validation experiments using mutant bHLH proteins provide support for our inferences. Our study provides insights into the mechanisms of DNA recognition by bHLH dimers as well as a blueprint for system-level studies of the DNA binding determinants of other TF families in different model organisms and humans.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21335608 PMCID: PMC3113581 DOI: 10.1093/nar/gkr070
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 4.Prediction and experimental validation of HLH-1 Leu13 mutant E-box half-site specificities. (a) Classification of the three HLH-1 mutant proteins based on the E-box half-site specificity diagram in Figure 3b. (b) Predictions of the half-site specificities for each of the HLH-1 Leu13 mutants. (c) Summary of the statistical analysis of the observed half-site specificities determined from PBM data for each of the HLH-1 mutants. (d) Boxplot representation of the PBM-derived E-box specificities of HLH-1 wild-type and mutant proteins. Significantly bound E-boxes are colored in black (see ‘Materials and Methods’ section). Eight-mers that do not contain an E-box are marked as ‘other’. For each box, the central horizontal bar shows the median of the distribution, the box’s edges mark the 25th and 75th percentile and the whiskers represent the most extreme points of the distribution, which were not determined as being outliers. The horizontal bar of ES value of 0.4 shows our significance ES threshold, as previously determined (7).
Figure 1.Deduction of interactions between bHLH monomers and CAN E-box half-sites. HLH-2/HLH-3 heterodimers are shown as an example.
Deduced half-site preferences for 19 C. elegans bHLH proteins
| bHLH monomer | Deduced half-site preference | Dimerization partner | Predicted binding sites of bHLH dimer | Observed binding sites of bHLH dimer |
|---|---|---|---|---|
| HLH-2 | CAG, CAC, CAT | Several | See below | See below |
| HLH-3 | CAG | HLH-2 | CAGCTG, CACCTG, CATCTG | CAGCTG (0.96), CACCTG (0.98), CATCTG (0.91) |
| HLH-4 | CAG | HLH-2 | CAGCTG, CACCTG, | CAGCTG (0.98), CACCTG (0.99) |
| HLH-10 | CAG, CAC, CAT | HLH-2 | CAGCTG, CACCTG, CATCTG, CACGTG, | CAGCTG (1.00), CACCTG (1.00), CATCTG (0.95), CACGTG (0.93), CATATG (0.94) |
| HLH-15 | CAG | HLH-2 | CAGCTG, CACCTG, | CAGCTG (0.96), CACCTG (0.95) |
| HLH-8 | CAT | HLH-2 | CAGATG, CACATG, CATATG | CATCTG (0.86), CACATG (0.88), CATATG (0.90) |
| LIN-32 | CAG, CAT | HLH-2 | CAGCTG, CACCTG, CATCTG, | CAGCTG (0.98), CACCTG (0.94), CATCTG (0.93), CATATG (0.88) |
| HLH-14 | CAG, CAT | HLH-2 | CAGCTG, CACCTG, CATCTG, | CAGCTG (0.93), CACCTG (0.94), CATCTG (0.87), CATATG (0.88) |
| HLH-19 | CAG, CAT | HLH-2 | CAGCTG, CACCTG, CATCTG, | CAGCTG (0.99), CACCTG (0.94), CATCTG (0.89), CATATG (0.95) |
| CND-1 | CAG, CAT | HLH-2 | CAGCTG, | CAGCTG (0.86), CATCTG (0.91), CATATG (0.98) |
| HLH-11 | CAG, CAT | Self | CAGCTG, | CAGCTG (0.93), CATATG (0.96) |
| HLH-1 | CAG | Self | CAGCTG | CAGCTG (0.95), |
| REF-1 | CAC | Self | CACGTG | CACGTG (0.99) |
| HLH-25 | CAC | Self | CACGTG | CACGTG (0.98), |
| HLH-26 | CAC | Self | CACGTG | CACGTG (0.86) |
| MXL-3 | CAC | Self | CACGTG | CACGTG (1.00), |
| MDL-1 | CAC | MXL-1 | CACGTG | CACGTG (1.00), |
| MXL-1 | CAC | MDL-1 | Same as MDL-1 | Same as MDL-1 |
| HLH-30 | CAC | Self | CACGTG | CACGTG (0.98) |
The table indicates the bHLH TF, its (deduced) preferred E-box half-sites and dimerization partner(s). Additionally shown is a comparison of E-boxes predicted to be bound based on these deduced half-site preferences (E-boxes predicted but not observed are indicated in bold text) versus E-boxes observed to be bound based on actual experimentation (E-boxes observed but not predicted are indicated in bold text). Previously published AUC values (7) for each observed E-box are indicated in parentheses to illustrate the differences in the relative binding affinities of each bHLH to the different E-boxes bound. Note that the predictions are based only on observed palindromic E-box binding.
Figure 2.Analysis of bHLH residue/DNA-contact frequency. (a) Canonical structure of a bHLH monomer contacting the CANNTG E-box. The E-box nucleotides on each DNA strand are indicated in blue, the bHLH basic region is shown in green and bHLH helices 1 and 2 are indicated in red and cyan, respectively. Residues 1, 15 and 50 indicate the start of the basic region, helices 1 and 2, respectively. (b) Plot indicating the frequency of contacts between bHLH residues and the bases in or flanking the E-box based on seven co-crystal structures. (c) As in (b) but indicating contacts between bHLH residues and the DNA backbone. P, phosphate; dR, deoxyribose. bHLH residue consensus numbering scheme based on Atchley et al. (33).
Figure 3.bHLH domain amino acids correlate with E-box half-site recognition (a) Amino acid residues at each position in C. elegans bHLH proteins that are involved in DNA contacts. The final column shows the list of recognized half-sites for each bHLH (7). ‘A’ and ‘B’ (e.g. HLH-26A and HLH-26B) refer to the first and second bHLH domain of two within the same protein. (b) Diagram illustrating the distribution of bHLH proteins according to their half-site specification. (c) Schematic representation of the rules inferred from (a) and illustrated in (b). Filled circles represent residues at position 13; purple rectangles represent half-sites and open circles represent specific sequence determinants for linking residue 13 to a particular half-site. (d) Graphical representation of the average ΔΔGint per mutation for the seven bHLH structures analyzed. Mutations for which ΔΔGint is larger than 1.0 kcal/mol (indicated by the red line) may have considerably diminished, or even abolished, the binding capabilities of the mutant. Position 13 is split between the average ΔΔGint for the scaffold of bHLH proteins that normally do or do not have an arginine in that position.