| Literature DB >> 18801847 |
Shandar Ahmad1, Ozlem Keskin, Akinori Sarai, Ruth Nussinov.
Abstract
Amino acid residues, which play important roles in protein function, are often conserved. Here, we analyze thermodynamic and structural data of protein-DNA interactions to explore a relationship between free energy, sequence conservation and structural cooperativity. We observe that the most stabilizing residues or putative hotspots are those which occur as clusters of conserved residues. The higher packing density of the clusters and available experimental thermodynamic data of mutations suggest cooperativity between conserved residues in the clusters. Conserved singlets contribute to the stability of protein-DNA complexes to a lesser extent. We also analyze structural features of conserved residues and their clusters and examine their role in identifying DNA-binding sites. We show that about half of the observed conserved residue clusters are in the interface with the DNA, which could be identified from their amino acid composition; whereas the remaining clusters are at the protein-protein or protein-ligand interface, or embedded in the structural scaffolds. In protein-protein interfaces, conserved residues are highly correlated with experimental residue hotspots, contributing dominantly and often cooperatively to the stability of protein-protein complexes. Overall, the conservation patterns of the stabilizing residues in DNA-binding proteins also highlight the significance of clustering as compared to single residue conservation.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18801847 PMCID: PMC2566867 DOI: 10.1093/nar/gkn573
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.(A) The number of conserved residues as a function of the number of its conserved neighbors. As an example, the first point on the graph indicates that there are about five residues per protein in the data set, which have no other conserved residue in their structural neighborhood. The values are computed for each protein and the standard deviations are plotted in vertical error bars. The x-axis shows whether a conserved residue occurs as a singlet (first point) or a cluster with conserved neighbors. NCN is the number of conserved neighbors. (B) NCNs of a residue expected by chance. Random distribution was constructed by reassigning conservation scores randomly along the sequence. (C) Histogram of cluster size in unique clusters of conserved residues in observed protein–DNA complexes, compared to randomly distributed conservation scores.
Figure 2.Interface residue packing density histogram. Conserved-clustered residues are more tightly packed than rest of the residues in the protein.
The relationship between conservation scores (C) and the number of conserved neighbors (NCN) of the most destabilizing mutant positions in the ΔΔG data
| Data | <ΔΔ | σ(ΔΔ | < | σ ( | <NCN> | σ (NCN) | |||
|---|---|---|---|---|---|---|---|---|---|
| 10 | Target | 4.837 | 1.000 | 0.779 | 0.154 | 3.200 | 2.044 | 0.154 | 0.138 |
| Control | −0.057 | 1.369 | 0.679 | 0.220 | 2.236 | 2.034 | |||
| 20 | Target | 3.841 | 1.259 | 0.768 | 0.154 | 3.250 | 2.221 | 0.072 | 0.027 |
| Control | −0.132 | 1.273 | 0.678 | 0.221 | 2.222 | 2.032 | |||
| 30 | Target | 3.311 | 1.274 | 0.769 | 0.146 | 2.867 | 2.113 | 0.022 | 0.086 |
| Control | −0.186 | 1.230 | 0.675 | 0.221 | 2.210 | 2.027 | |||
| 40 | Target | 2.974 | 1.248 | 0.730 | 0.164 | 2.450 | 1.987 | 0.137 | 0.510 |
| Control | −0.234 | 1.197 | 0.676 | 0.223 | 2.229 | 2.040 | |||
| 50 | Target | 2.735 | 1.213 | 0.741 | 0.160 | 2.540 | 2.002 | 0.044 | 0.304 |
| Control | −0.280 | 1.168 | 0.675 | 0.224 | 2.228 | 2.046 |
Top N mutations in smddg data with highest values of ΔΔG (most destabilizing mutant positions) are used to form the target data and the rest is the control. N top-ranked mutations were selected for each pair of rows and N varied from 10 to 50 (about 2–10% in the smddg data set).
Difference between average ΔΔG for mutations at nonconserved, conserved singlets and conserved-clustered positions and its statistical significance
| Average ΔΔ | Nonconserved (NC) | −0.131 (345) |
| All conserved (AC) | 0.264 (167) | |
| Conserved singlets (CS) | −1.035 (18) | |
| Conserved clustered (CC) | 0.421 (149) | |
| 0.00459 | ||
| 0.01009 | ||
| 0.00012 | ||
| 9.0E−05 |
Conservation score cutoff is 0.8, all conserved residues with at least one conserved neighbor are treated as conserved-clustered (CC), whereas conserved residues with no conserved neighbor are treated as conserved singlets (CS). Values in the brackets are the actual number of observations in the given category.
Multiple experimental hot spots in the same protein
| PDB Code | Mutations [ΔΔ | CCR positions |
|---|---|---|
| 1lmb | Q44S/Q44Y (av = 3.7); Q33S (4.5); A49V (4.6) | Q44; Q33; |
| 1aay | R18A (2.7); R24A (3.5) | R18; R24 |
| 1b3t | Y518A (2.6); R522A (4.4); R469A (3.4) | Y518; R522 |
| 1run | D138A/D138V/D138L/D138T (av = 4.1); T127L (2.8) | D138 (T127 at C = 0.6 cutoff) |
| 1mse | K128M (2.4); V103L (2.2) | K128 (V103 at C = 0.8 cutoff) |
Some mutations have different mutant residue for the same position; the ΔΔG data have been averaged in such cases. CCR stands for clustered-conserved positions i.e. conserved residues occurring as part of a cluster.
Statistical significance tests between the fractional numbers of DNA-binding residues of each type in the three defined regions
| Residue | No. NCR (% binding) | No. CRS (% binding) | No. CCR (% binding) | |||
|---|---|---|---|---|---|---|
| Ala | 1771 (4.8) | 36 (3.4) | 385 (4.4) | 0.641 | 0.810 | 0.780 |
| Cys | 285 (2.0) | 6 (0.0) | 114 (0.0) | 0.477 | 0.070 | – |
| Asp | 1130 (5.4) | 32 (0.0) | 253 (15.8) | 0.326 | 0.023 | 0.026 |
| Glu | 1777 (4.1) | 33 (0.0) | 318 (13.1) | 0.060 | 0.001 | 0.018 |
| Phe | 810 (6.4) | 34 (0.0) | 283 (15.7) | 0.095 | 0.072 | 0.120 |
| Gly | 1122 (15.7) | 82 (5.8) | 458 (25.8) | 0.046 | 0.044 | 0.001 |
| His | 548 (20.0) | 16 (8.3) | 160 (35.5) | 0.323 | 0.048 | 0.098 |
| Ile | 1293 (8.1) | 14 (0.0) | 267 (11.0) | 0.232 | 0.469 | 0.293 |
| Lys | 1683 (27.1) | 39 (17.6) | 372 (74.9) | 0.299 | 3.1E−06 | 0.031 |
| Leu | 2120 (2.2) | 86 (0.0) | 614 (3.5) | 0.006 | 0.200 | 0.014 |
| Met | 475 (8.5) | 4 (0.0) | 64 (4.2) | 0.669 | 0.508 | 0.706 |
| Asn | 908 (25.5) | 10 (0.0) | 187 (51.7) | 0.271 | 0.017 | 0.165 |
| Pro | 940 (7.0) | 65 (4.3) | 282 (9.9) | 0.397 | 0.402 | 0.246 |
| Gln | 1032 (16.8) | 12 (0.0) | 163 (28.3) | 0.242 | 0.073 | 0.132 |
| Arg | 1484 (45.5) | 81 (12.5) | 490 (100) | 0.002 | 5.8E−06 | 4.6E−05 |
| Ser | 1334 (19.2) | 12 (0.0) | 203 (48.1) | 0.036 | 0.002 | 0.096 |
| Thr | 1098 (17.9) | 11 (0.0) | 268 (30.0) | 0.128 | 0.070 | 0.238 |
| Val | 1413 (3.8) | 31 (4.5) | 317 (11.4) | 0.841 | 0.016 | 0.330 |
| Trp | 298 (15.9) | 18 (0.0) | 97 (24.7) | 0.092 | 0.262 | 0.043 |
| Tyr | 655 (16.3) | 37 (0.0) | 257 (32.8) | 0.024 | 0.044 | 0.022 |
Nonconserved residues (NCR), conserved-residue singlets (CRS) and clustered conserved residues (CCR). P-values for (X/Y) are obtained using two-tailed students t-test on protein-wise distribution of X and Y in the data and indicate the probability that the two types of regions are similar. Some larger P-values, showing low statistical confidence are due to a small number of binding residues of that type in one or both regions compared. N refers to the total number of residues in a given category. Overall, there are 659 singlets and 5552 (about 8.4×) clustered-conserved residues and for the overall data all three pairs have statistically significant difference.
Figure 3.The relative frequency of DNA-binding residues in three identified regions: nonconserved residues; conserved residue singlets with conservation score at least 0.8 and no conserved neighbors; and clustered-conserved regions with conservation score at least 0.8 and at least one conserved neighbor, in different ranges of ASA.
Figure 4.Clustering patterns of conserved residues (A) a typical enzyme (PDB code 1qai, chain B, reverse transcriptase). Several small clusters of conserved residues are observed in most enzymes. (B) Another DNA-binding enzyme TAQ MUTS protein (PDB code 1ewqA). One large cluster of conserved residues is observed in the oligomerization domain forming a protein–protein interface. Several other small clusters occur in recognition domain and scaffold. (C) Nucleosome core particle (PDB code 1eqzG) protein is a typical example of histone-like proteins with highly conserved residues throughout their structure. Usually a single large cluster is observed as most residues are conserved. (D) Phosphate region transcription regulatory protein (PDB code: 1gxp chain B) is a typical HTH protein with a few small clusters, usually one in the recognition helix, one in linker region and the other in the stabilizing helix. (E) A typical zinc-coordinating protein Tandem zinc finger (Zif 268; PDB code 1p47A). Small clusters of two Cys residues (shown in blue)—sometimes accompanied by other residue—form small clusters of conserved residues away from DNA interface and coordinate zinc ions (shown in red). A large cluster is observed in the interface in contact with DNA major groove. Sometimes, this cluster extends to include conserved His residues from the C2H2 motif.
The number of interface and noninterface clusters falling in the specified hydrophobic, hydrophilic, negatively charged and positively charged residues composition ranges
| Number of clusters (%) | ||||||||
|---|---|---|---|---|---|---|---|---|
| Composition (%) | Hydrophobic | Hydrophilic | Negatively charged | Positively charged | ||||
| Bind | NB | Bind | NB | Bind | NB | Bind | NB | |
| 0–10 | 19.7 | 12.3 | 17.7 | 52.5 | 67.9 | 64.9 | 28.9 | 71.7 |
| 10–20 | 3.6 | 0.7 | 14.1 | 5.1 | 20.9 | 8.0 | 28.9 | 6.1 |
| 20–30 | 6.4 | 4.3 | 23.7 | 8.3 | 4.0 | 5.1 | 13.3 | 4.7 |
| 30–40 | 15.7 | 7.6 | 17.3 | 8.7 | 4.4 | 5.4 | 11.2 | 6.5 |
| 40–50 | 26.5 | 32.6 | 16.5 | 22.5 | 2.8 | 14.1 | 13.7 | 10.1 |
| >50 | 28.1 | 42.4 | 10.8 | 2.9 | 0.0 | 2.5 | 4.0 | 0.7 |
| 1.17E−13 | 2.5E−08 | 5.96E−04 | 6.01E−13 | |||||
Clusters in the DNA-interface (bind) and with no DNA contact (NB) significantly differ in their compositions. For example 71.7% of noninterface clusters have <10% positively charged residues, whereas just 28.9% DNA-interface clusters have such low composition of positively charged residues. This difference in composition leads to statistically significant difference between DNA-interface and noninterface residues (see P-values), which could be used for prediction. Linear predictor using just four parameters of a cluster can identify DNA-interface clusters with high confidence. Residue classification: hydrophobic (Ala, Cys, Phe, Ile, Leu, Met, Pro, Val, Trp, Tyr), hydrophilic (Gly, His, Asn, Gln, Ser, Thr), negatively charged (Asp, Glu), positively charged (Lys, Arg). Mean prediction scores on 10-fold validation, Sensitivity (true positive/actual positive): 87.3%, Specificity (true negative/actual negative): 67.3%.