| Literature DB >> 18829719 |
Sahand Jamal Rahi1, Peter Virnau, Leonid A Mirny, Mehran Kardar.
Abstract
The binding of a transcription factor (TF) to a DNA operator site can initiate or repress the expression of a gene. Computational prediction of sites recognized by a TF has traditionally relied upon knowledge of several cognate sites, rather than an ab initio approach. Here, we examine the possibility of using structure-based energy calculations that require no knowledge of bound sites but rather start with the structure of a protein-DNA complex. We study the PurR Escherichia coli TF, and explore to which extent atomistic models of protein-DNA complexes can be used to distinguish between cognate and noncognate DNA sites. Particular emphasis is placed on systematic evaluation of this approach by comparing its performance with bioinformatic methods, by testing it against random decoys and sites of homologous TFs. We also examine a set of experimental mutations in both DNA and the protein. Using our explicit estimates of energy, we show that the specificity for PurR is dominated by direct protein-DNA interactions, and weakly influenced by bending of DNA.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18829719 PMCID: PMC2577325 DOI: 10.1093/nar/gkn589
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.PurR protein headpiece bound to its consensus sequence DNA. This structure (15) serves as the basis of our study. The DNA base pairs or the protein amino acids in this structure are mutated on the computer and the effects on the binding energy measured. Blue and red: protein chains; orange and gray: DNA.
Figure 2.Bioinformatics score versus energy. All binding energies are shown relative to the binding energy of the consensus sequence seqc (blue circle) at 0 kcal/mol. Black circles: 21 binding sequences selected by Mironov et al. (14), PurA sequences are unfilled. Red circles: random noncognate sequences selected from the E. coli genome. Green, indigo and orange triangles: FruR, GalR/GalS and MalI operator sites. The solid red lines indicate the average energy or average bioinformatics score for the random sequences; the dashed lines mark the first SD. The solid black line goes through the data point for the third worst cognate sequence (a black circle). The two cognate sequences with even worse binding energies (hollow black circle) are controversial binding sites. The linear correlation coefficient is −0.6 for the random sequences and −0.8 for all sequences displayed.
Position-specific energy matrices based on direct interaction energies and interaction energies plus bending corrections
| Δ | Δ | ||||||
|---|---|---|---|---|---|---|---|
| A | C | G | T | A | C | G | T |
|
|
| ||||||
| 0.0 | −0.5 | −0.7 | −0.7 | 0.0 | 0.3 | −0.6 | 0.2 |
| 1.3 | 0.0 | 0.4 | −0.6 | 1.9 | 0.0 | 0.0 | 0.4 |
| 4.7 | 14.3 | 0.0 | 8.4 | 7.6 | 19.6 | 0.0 | 15.1 |
| −1.0 | 0.0 | 2.3 | 1.2 | 2.3 | 0.0 | 2.4 | 0.1 |
| 0.0 | 2.1 | 3.7 | 3.4 | 0.0 | 1.6 | 2.2 | 4.2 |
| 0.0 | 3.5 | 3.4 | 3.8 | 0.0 | 4.4 | 4.5 | 5.7 |
| 0.0 | 2.0 | 0.6 | 2.1 | 0.0 | 0.5 | 2.8 | 0.4 |
| 6.2 | 0.0 | 0.1 | 5.2 | 2.3 | 0.0 | −0.6 | 3.3 |
| 5.3 | 0.1 | 0.0 | 6.0 | 3.8 | −0.8 | 0.0 | 1.4 |
| 2.2 | 0.6 | 2.1 | 0.0 | 0.4 | 2.7 | 0.3 | 0.0 |
| 3.9 | 3.4 | 3.6 | 0.0 | 8.4 | 4.8 | 6.7 | 0.0 |
| 3.4 | 3.8 | 2.1 | 0.0 | 3.3 | 2.4 | 0.3 | 0.0 |
| 0.9 | 2.3 | 0.0 | −0.9 | −0.2 | 2.1 | 0.0 | 2.2 |
| 8.4 | 0.0 | 14.4 | 4.6 | 14.9 | 0.0 | 19.3 | 7.8 |
| −0.6 | 0.6 | 0.0 | 1.4 | −0.1 | 0.2 | 0.0 | 1.8 |
| −0.7 | −0.6 | −0.6 | 0.0 | −0.2 | −0.5 | −0.2 | 0.0 |
The energies are normalized to the consensus sequence, which has, accordingly zero binding energy. This is why a ‘Δ’ appears in front of the energies. All contributions from each base pair (including bending) were considered to be independent of the other base pairs. Energies are given in units of kilo calories per mole. Only the first decimal place is shown.
Figure 3.Consensus sequence logos. (a) Bioinformatics logo from Ref. (14), based on the sequences of 21 experimentally known binding sites. (b) E interaction-based logo, obtained from the Boltzmann probabilities of residues from site-specific interaction energies listed in Table 1a. (c) (E interaction + EDNA deform)-based logo, obtained from the Boltzmann probabilities of residues from site-specific interaction energies listed in Table 1. This includes an estimate of the bending energy of the DNA as describe in the text.
Calculated changes in binding energies of DNA and amino acid point mutations compared with experiments (15,30)
| DNA Sequence | Δ | Δ |
|---|---|---|
| Binding to wild-type PurR | ||
| seqc | 0 | 0.0 |
| seq3 | 0.16 | 0.8 |
| seq1 | 2.02 | 1.6 |
| seq2 | 6.78 | 3.2 |
| Binding to the K55A mutant | ||
| seq1 | −7.53 | −0.06 |
| seq2 | −4.47 | −0.46 |
| seqc | 0 | 0 |
| seq3 | 1.13 | 0.5 |
|
| ||
| Mutant | Δ | Δ |
|
| ||
| PurR mutants bound to the consensus sequence | ||
| WT | 0 | 0 |
| L54M | 5.79 | 0.38 |
| L54S | 16 | larger, not measured |
| L54T | 10.05 | ,, |
| L54V | 6.15 | ,, |
| K55A | 12.55 | 3.48 |
When only the DNA is mutated, the binding order is correct (top panel). When both DNA and the protein are mutated (middle panel), two DNA mutants are lower in binding energy and one higher than the original sequence. This is correctly identified by our method, but the binding preference to seq1 and seq2 is reversed. When only the protein is mutated, the binding preferences of the DNA to the mutants are correctly captured (bottom panel). Energies are given in kilo calories per mole and measured relative to the respective consensus protein–DNA complex.