| Literature DB >> 17284455 |
Harianto Tjong1, Huan-Xiang Zhou.
Abstract
Structural and physical properties of DNA provide important constraints on the binding sites formed on surfaces of DNA-targeting proteins. Characteristics of such binding sites may form the basis for predicting DNA-binding sites from the structures of proteins alone. Such an approach has been successfully developed for predicting protein-protein interface. Here this approach is adapted for predicting DNA-binding sites. We used a representative set of 264 protein-DNA complexes from the Protein Data Bank to analyze characteristics and to train and test a neural network predictor of DNA-binding sites. The input to the predictor consisted of PSI-blast sequence profiles and solvent accessibilities of each surface residue and 14 of its closest neighboring residues. Predicted DNA-contacting residues cover 60% of actual DNA-contacting residues and have an accuracy of 76%. This method significantly outperforms previous attempts of DNA-binding site predictions. Its application to the prion protein yielded a DNA-binding site that is consistent with recent NMR chemical shift perturbation data, suggesting that it can complement experimental techniques in characterizing protein-DNA interfaces.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17284455 PMCID: PMC1865077 DOI: 10.1093/nar/gkm008
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Prediction results for the two-tier cross-training set
| PDB | Unbound PDB (RMSD Å) | Surface residues | Coverage (%) | Accuracy (%) | |||
|---|---|---|---|---|---|---|---|
| 1brnL | 1a2pC (0.4) | 81 | 18 | 24 | 11 ( | 44 | 46 |
| 1cl8A,B | 1qc9A (1.6) | 320 | 53 | 47 | 39 ( | 58 | 83 |
| 1cqtA,I | 132 | 48 | 33 | 30 (9) | 44 | 91 | |
| 1d5yA,B | 405 | 33 | 64 | 43 (23) | 61 | 67 | |
| 1dh3A,C | 104 | 27 | 42 | 42 (15) | 100 | 100 | |
| 1f5eP | 2alcA (5.8) | 54 | 21 | 32 | 29 (10) | 90 | 91 |
| 1gd2E,F | 119 | 32 | 44 | 43 (12) | 97 | 98 | |
| 1gm5A | 525 | 32 | 41 | 40 (14) | 81 | 98 | |
| 1gxpA,B | 1qqiA (1.7–1.8) | 152 | 44 | 50 | 41 (8) | 75 | 82 |
| 1imhC,D | 393 | 32 | 44 | 26 (13) | 41 | 59 | |
| 1l1mA,B | 1lqc (1.8–2.2) | 110 | 57 | 79 | 73 (19) | 95 | 92 |
| 1leiA,B | 1iknA,C (16.3) | 413 | 39 | 48 | 34 (16) | 46 | 71 |
| 1m3qA | 1ko9A (0.9) | 199 | 25 | 15 | 14 (2) | 48 | 93 |
| 1mowA | 176 | 71 | 49 | 47 (12) | 49 | 96 | |
| 1ornA | 2abk (2.4) | 149 | 27 | 27 | 26 (5) | 78 | 96 |
| 1r7mA | 150 | 53 | 79 | 69 (22) | 89 | 87 | |
| 1rfiB | 1qzqA (0.3) | 241 | 14 | 33 | 14 (6) | 57 | 42 |
| 1s40A | 133 | 33 | 15 | 12 (2) | 30 | 80 | |
| 1sfuA,B | 115 | 17 | 18 | 17 (7) | 59 | 94 | |
| 1u1qA | 1l3kA (2.0) | 137 | 48 | 31 | 30 (9) | 44 | 97 |
| 1xyiA | 1xx8A (2.4) | 56 | 20 | 13 | 13 (1) | 60 | 100 |
| 1zrfA,B | 1g6nA,B (2.0) | 292 | 43 | 44 | 41 (8) | 77 | 93 |
| 1ztwA | 1mml (1.6) | 181 | 8 | 6 | 4 (1) | 38 | 67 |
| 1zziA | 1zzkA (1.3) | 67 | 19 | 15 | 12 (5) | 37 | 80 |
| 2aq4A | 300 | 56 | 62 | 54 (18) | 64 | 87 | |
| All | 5004 | 870 | 955 | 804 (248) | 63.9 | 84.2 |
aFor each entry, the PDB code is followed by the chains that make up the DNA-binding protein multimer.
bCα RMSD were obtained by using the Dali server (http://www.ebi.ac.uk/DaliLite/). In three cases the bound structures (1cl8, 1gxp and 1l1m) are homodimers, but the unbound structures (1qc9, 1qqi and 1lqc) have only one chain. The RMSD of the unbound monomer against both subunits of the bound homodimer are listed. In reporting predictions using the unbound structures for these three proteins, both true and false positives were multiplied by two in order to make a fair comparison with predictions using bound structures. The sequence identity between 1orn and 2abk is only 45%; in all other cases the aligned sequences of bound proteins and their unbound counterparts have perfect or almost perfect identity.
cThe number in parentheses lists − ntp, i.e. the number of predictions that are considered true positives because they are among the four nearest neighbors of actual DNA-contacting residues.
Figure 1.Comparison between DNA-contacting surface residues and non-contacting surface residues. A Percentages of the 20 types of amino acids in the interface and non-interface groups. The abscissa is in descending order of the difference between the two groups. B Conservation scores in the interface and non-interface groups for the 20 types of amino acids, in descending order of the difference. C Solvent accessibilities in the interface and non-interface groups for the 20 types of amino acids, in descending order of the difference. Results were obtained from analysis of 56 093 surface residues in the data set of 264 representative DNA-binding proteins.
Figure 2.Predicted DNA-contacting residues shown on the protein–DNA complexes. Predictions are shown in three different colors: actual DNA-contacting residues are in blue, their nearest neighbors are in cyan and incorrect predictions are in green. The rest of the protein surface is in yellow; the bound DNA is shown as red lines. (A) 1brn. (B) 1gd2. (C) 1s40. (D) 1u1q. In the last panel, there are two protein chains related by a 2-fold rotation, one on the left and one on the right. Within the left chain, the C and N-terminal RNA recognition motifs are at the top and bottom, respectively. The pictures here and those in Figures 4 and 6 are generated with PyMOL (http://www.pymol.org).
Figure 4.Comparison of prion protein (PDB 1b10) residues (A) implicated by NMR chemical shift perturbation and (B) predicted by DISPLAR for DNA binding. Putative DNA-contacting residues are shown in red or blue.
Figure 6.Predicted nucleic acid-contacting residues shown on the protein–nucleic acid complexes. Predicted residues are shown as spheres, with blue indicating actual DNA-contacting residues, cyan their nearest neighbors, and green incorrect predictions. The rest of the protein surface is in semi-transparent gray; the backbone trace of bound DNA is displayed by red lines. (A) RNA polymerase II elongation complex (PDB 1i6h). A cylinder is drawn to indicate downstream DNA; predicted residues in its binding site are shown in magenta. (B) RecBCD–DNA complex (PDB 1w36). An arrow is drawn to indicate the 3′ exit; predicted residues along the exit are shown in magenta. (C) Ribosome (PDB 1vqp). In (A) and (B) residues shown in magenta were not used in reporting prediction accuracy since at these sites DNA structures were not resolved.
Figure 3.Two types of gross conformational changes upon DNA binding. (A) Global distortion from the unbound (PDB 2alc; in yellow) to the bound (PDB 1f5e; in green) structures. (B) Domain rearrangement from the unbound (PDB 1ikn) to the bound (PDB 1lei) structures. The N- and C-terminal domains of chain A in 1ikn are shown in orange and yellow; the C-terminal domain of chain C in 1ikn are shown in magenta. The N- and C-terminal domains of chain A in 1lei are shown in dark and light green; the N- and C-terminal domains of chain B in 1lei are shown in dark and light blue. The light green and dark blue domains in 1lei are rotated by ∼180° from the corresponding yellow and magenta domains in 1ikn when the dark green domain of 1lei and the orange domain of 1ikn are superimposed. The counterpart of the light blue domain of 1lei is missing in 1ikn. Bound DNA are shown as red lines in both panels. The pictures are generated with VMD (http://www.ks.uiuc.edu/Research/vmd/).
Figure 5.The distributions of average numbers of neighboring predictions for protein binding and non-binding proteins.