| Literature DB >> 17264128 |
Trevor W Siggers1, Barry Honig.
Abstract
Predicting the binding specificity of transcription factors is a critical step in the characterization and computational identification and of cis-regulatory elements in genomic sequences. Here we use protein-DNA structures to predict binding specificity and consider the possibility of predicting position weight matrices (PWM) for an entire protein family based on the structures of just a few family members. A particular focus is the sensitivity of prediction accuracy to the docking geometry of the structure used. We investigate this issue with the goal of determining how similar two docking geometries must be for binding specificity predictions to be accurate. Docking similarity is quantified using our recently described interface alignment score (IAS). Using a molecular-mechanics force field, we predict high-affinity nucleotide sequences that bind to the second zinc-finger (ZF) domain from the Zif268 protein, using different C2H2 ZF domains as structural templates. We identify a strong relationship between IAS values and prediction accuracy, and define a range of IAS values for which accurate structure-based predictions of binding specificity is to be expected. The implication of our results for large-scale, structure-based prediction of PWMs is discussed.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17264128 PMCID: PMC1851644 DOI: 10.1093/nar/gkl1155
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
C2H2 ZF-DNA PDB files
| PDB code | Chains | Description | Res (Å) | Topology |
|---|---|---|---|---|
| 1llm | C,D | Zif268-GCN4 (dimer) | 1.5 | 2_3:3_2 |
| 1aay | A | Zif268 | 1.6 | 3_2_1 |
| 1a1f, 1a1g, 1a1h, 1a1i, 1a1j, 1a1k, 1a1l | A | Zif268 (Fn1 mutants) | 1.6 | 3_2_1 |
| 1jk1, 1jk2 | A | Zif268 (Fn1 D20A) | 1.9 | 3_2_1 |
| 1zaa | C | Zif268 | 2.1 | 3_2_1 |
| 1mey | C,F | Designed | 2.2 | 3_2_1 |
| 1g2d, 1g2f | C,F | Designed | 2.2 | 3_2_1 |
| 1p47 | A,B | Zif268 tandem | 2.2 | 3_2_1 3_2_1 |
| 1f2i | G,H,I,J,K,L | Zif268-extension (dimer) | 2.4 | 2_1:1_2 |
| 1ubd | C | YY1 (Yin Yang 1) | 2.5 | 4_3_2_1 |
| 2gli | A | GLI (glioblastoma) | 2.6 | 5_4_3_2_1 |
| 2drp | A,D | Tramtrack | 2.8 | 2_1 |
| 1tf6 | A,D | TFIIIA | n/a (NMR) | 6_5_4_3_2_1 |
aTopology description for the individual ZF domains; 3_2_1 indicates a polydactyl ZF protein with three ZF domains, 1 refers to the N-terminal ZF domain. Dimerization interfaces between chains are indicated with a colon.
Figure 2.Canonical binding schema for Zif268 bound to its cognate DNA sequence. Arrows indicate hydrogen-bond interactions. Base identities appear as in the PDB file 1aay. Amino acids are numbered according to a canonical ZF numbering scheme (54). Solid shaded regions indicate the amino acids and bases used to calculate the docking-geometry IAS values; the solid plus hashed region indicates the interfacial residues that define the ‘binding interface’ used for modeling Zif268 ZF2 binding specificity.* IAS scores were computed using all amino acid positions −1 through 6.
Figure 1.Pair-wise docking comparison of 93 ZF domains. Pair-wise IAS values are shown as a symmetric comparison matrix. Scores below 5.0 are in white (not shown). IAS values between domains with a wild-type-docking geometry (see text) in each of the three Zif268 groups are delineated with a heavy dashed line. IAS values between certain ZF domains, or groups of domains, are highlighted: these include wild-type docking domains from the Zif268 ZF1 and ZF3 clusters (region A); 1MEY ZF3 and wild-type docking domains from the Zif268 ZF1 (region B) and ZF3 (region C) clusters. Supplementary Figure S1 contains enlarged versions of the regions along the diagonal and list PDB identifiers and interfacial residue identities for each ZF complex.$units 6,6,5,1
IAS and RMSD measures of docking similarity between hrZif268 ZF2 and five ZF domains
| PDB code | IAS | RMSD ZF helix | RMSD DNA |
|---|---|---|---|
| 1zaa_C2 | 9.2 | 0.2 | 0.3 |
| 1llm_C2 | 8.6 | 1.2 | 0.6 |
| 1f2i_J2 | 7.8 | 1.8 | 0.7 |
| 1f2i_K2 | 5.4 | 2.4 | 0.9 |
| 1g2d_C1 | 1.6 | 1.9 | 1.1 |
aPDB identifier, chain ID and ZF domain number as in Table 1.
bIAS values from an alignment with 1aay, chain A, ZF2 domain (1aay_A2).
cRMSD of ZF helix atoms after a DNA-based superimposition of each complex with 1aay_A2.
dRMSD of the DNA sugar-phosphate backbone atoms used to perform the DNA-based structural superimposition.
Figure 3.DNA-based structural superimposition of four ZF helices. Worm representation of ZF helix residues (canonical numbering 1–11) are shown for four ZF2 domains: two wild-type Zif268 proteins, 1aay/hrZif268 (gray) and 1zaa (red); and two modified Zif268 proteins, 1llm chain C (yellow) and 1f2i chain K (blue). Colors correspond to IAS color scale used in Figure 1. DNA sugar-phosphate heavy atoms used for the structural superimposition are shown in stick form. IAS values for comparisons of hrZif268 ZF2 (gray) with the three other ZF domains are shown and colored accordingly.
Highest affinity GCGNNNGCG sequences bound by Zif268
| Experiment | Exp. rel.ΔG | Prediction | Calc. rel.ΔG |
|---|---|---|---|
| TGG | 0.0 | TGG | 0.0 |
| TAG | 0.5 | GGG | 3.6 |
| GGG | 1.3 | TAG | 4.0 |
| CGG | 1.5 | CGG | 6.6 |
| AGG | 1.7 | GAG | 8.4 |
| TTG | 1.9 | TGA | 8.6 |
| GAG | 1.9 | TTG | 9.7 |
| AGG | 10.5 | ||
| GGA | 12.3 | ||
| CAG | 12.8 |
aNNN base triplet identities from the seven highest-affinity GCGNNNGCG sequences identified by Bulyk et al. (51).
bMeasured relative binding free energies (kT) of each sequence for Zif268 (51).
cHighest affinity predicted sequences listed according to calculated relative binding energies. hrZF268 was used as template structure for all predictions.
dCalculated relative binding affinities (kT) for each predicted sequence.
Figure 4.Native and predicted His(3) side-chain conformations. Side-chain conformations for the His residue at canonical ZF position 3 are shown from hrZif268 ZF2 (1aay His149; white), and from the complexes modeled with the TGG sequence (red) and TTG sequence (brown) using hrZif268 ZF2 as a template. DNA bases are shown in CPK coloring and correspond to the modeled TG (TGG) and TT (TTG) bases. Residue numbering as in Figure 1.
Dependence of Zif268 ZF2 binding-specificity predictions on template docking geometry
| PDB code | IASZif268 | Topseq | Toprank | Top 3 | Top 6 | Top 7 |
|---|---|---|---|---|---|---|
| 1aay_A2 (hrZif268) | 10.0 | TGG | 1 | 3 | 7 | 8 |
| 1jk1_A2 | 10.0 | TGG | 1 | 3 | 7 | 59 |
| 1p47_A2 | 10.0 | TGG | 1 | 3 | 8 | 55 |
| 1a1k_A2 | 9.9 | TGG | 1 | 4 | 7 | 9 |
| 1a1l_A2 | 9.9 | TGG | 1 | 3 | 7 | 12 |
| 1a1f_A2 | 9.5 | TGG | 1 | 3 | 9 | 13 |
| 1a1i_A2 | 9.5 | TGG | 1 | 6 | 7 | 16 |
| 1zaa_C2 | 9.2 | TGG | 1 | 4 | 8 | 9 |
| 1p47_B2 | 8.6 | AAG | 3 | 8 | 9 | 19 |
| 1llm_C2 | 8.6 | AGG | 2 | 6 | 15 | 18 |
| 1f2i_J2 | 7.8 | GGG | 4 | 7 | 7 | 12 |
| 1f2i_H2 | 7.3 | GGG | 10 | 21 | 14 | 21 |
| 1f2i_L2 | 6.8 | GGG | 9 | 16 | 16 | 21 |
| 1f2i_G2 | 5.5 | AGG | 4 | 7 | 7 | 11 |
| 1f2i_K2 | 5.4 | GGG | 4 | 6 | 6 | 16 |
| 1f2i_K1 | 4.5 | GAT | 18 | 18 | 32 | 52 |
| 1jk1_A1 | 4.0 | GGA | 8 | 11 | 11 | 57 |
| 1p47_B1 | 3.3 | GAG | 25 | 25 | 33 | 40 |
| 1aay_A3 | 3.0 | AAG | 4 | 9 | 9 | 16 |
| 1aay_A1 | 2.7 | GAT | 8 | 8 | 22 | 27 |
| 1g2f_F2 | 2.1 | GGG | 14 | 31 | 31 | 57 |
| 1g2f_C2 | 1.9 | TGG | 1 | 33 | 32 | 33 |
| 1g2d_C1 | 1.6 | GGG | 13 | 30 | 47 | 57 |
aPDB identifier, chain ID and ZF domain number as in Table 1.
bIAS value between each template and 1aay_A2 (hrZif268 ZF2).
cTop predicted sequence NNN (i.e. GCGNNNGCG).
dPredicted rank of the consensus TGG (i.e. GCGTGGGCG) sequence.
eIndicate how far down the list of ranked predicted sequences you need to go to include the top N binding sequences determined by Bulyk et al. (51). For example, a 7 in the top 6 column indicates that the 6 highest-affinity experimentally determined sequences are present within the top 7 predicted sequences.
Figure 5.Predicted and experimental binding-specificity logos for the three-base-pair sequence recognized by the Zif268 ZF2 domain. (A) Logo generated from the three highest-affinity experimental sequences (51) (B) to (F) Logos generated for the five template groups using the three highest-affinity predicted sequences from each template within the group. The range of template IASZif268 scores for each group is shown. Logos were generated using the WebLogo (64).