| Literature DB >> 21173468 |
Tim Gruene1, George M Sheldrick.
Abstract
Medium- to high-resolution X-ray structures of DNA and RNA molecules were investigated to find geometric properties useful for automated model building in crystallographic electron-density maps. We describe a simple method, starting from a list of electron-density 'blobs', for identifying backbone phosphates and nucleic acid bases based on properties of the local electron-density distribution. This knowledge should be useful for the automated building of nucleic acid models into electron-density maps. We show that the distances and angles involving C1' and the P atoms, using the pseudo-torsion angles \eta' and \theta\,' that describe the ...P-C1'-P-C1'... chain, provide a promising basis for building the nucleic acid polymer. These quantities show reasonably narrow distributions with asymmetry that should allow the direction of the phosphate backbone to be established.Entities:
Mesh:
Substances:
Year: 2010 PMID: 21173468 PMCID: PMC3006036 DOI: 10.1107/S0108767310039140
Source DB: PubMed Journal: Acta Crystallogr A ISSN: 0108-7673 Impact factor: 2.290
Details of the test data sets used for assessing the quality of phosphate and base search algorithms. Phases were calculated from coordinates and experimental data except for octan, for which bromouracil SAD data were available
| ID | Reference | Description | Resolution (Å) | No. of bases |
|---|---|---|---|---|
| egli1 | Bancroft | 1.01 | 12 | |
| egli2 | Tereshko | 1.12 | 24 | |
| octan | Shakked | 1.70 | 16 | |
| udo | Heinemann & Hahn (1992 | 1.76 | 20 | |
| 419D | Shi | DNA/RNA hybrid | 2.20 | 32 |
| 2HOJ | Edwards & Ferré-D’Amaré (2006 | RNA | 2.50 | 78 |
| 3AFA | Tachiwana | DNA/protein | 2.50 | 292 |
Figure 1Progress of fitting a ‘blob’ to the centre of the base. (a) ‘Blob’ selected after sorting the map voxels together with the 2.5 Å sphere used to find the centre of the base. (b) The sphere after convergence of the ‘centre of density’ together with the eigenvectors of the matrix . The length of each ‘vector’ corresponds to the value of its eigenvalue. (c) Search base (here a purine) aligned to the eigenvectors within the base plane. (d) Top three purines after simplex fitting. The CC values used for sorting are 0.992 (blue), 0.989 (cyan) and 0.988 (red). (e) The best fitting purine before (blue) and after (red) the gradient fitting. (f) The selected base with its Watson–Crick partner and the hydrogen bonding used for scoring.
Evaluation of the base search algorithm
#B is the total number of bases and #bp the number of Watson–Crick base pairs in the reference structure. The average r.m.s.d. (for true positives only) refers to the atoms (C1′, C2, C4, C5, C6, N1, N3, O2) for pyrimidines and to (C1′, C2, C4, C5, C6, C8, N1, N3, N7, N9) for purines. #tp and #fp are the numbers of true and false positive pairs. The ‘Total’ column counts the true positives from both the paired bases and unpaired bases in the clusters (see §2.2.2), with the percentage of the total number of bases in parentheses.
| ID | Resolution (Å) | #B | #bp | #tp | #fp | R.m.s.d. | Total (%) |
|---|---|---|---|---|---|---|---|
| egli1 | 1.01 | 12 | 6 | 6 | 3 | 0.05 | 12 (100%) |
| egli2 | 1.12 | 24 | 12 | 12 | 1 | 0.04 | 24 (100%) |
| octan | 1.70 | 16 | 8 | 8 | 5 | 0.10 | 16 (100%) |
| udo | 1.76 | 20 | 10 | 10 | 3 | 0.20 | 20 (100%) |
| 419D | 2.20 | 32 | 12 | 12 | 3 | 0.17 | 31 (97%) |
| 2HOJ | 2.50 | 78 | 31 | 16 | 9 | 0.26 | 65 (83%) |
| 3AFA | 2.50 | 292 | 146 | 24 | 2 | 0.35 | 114 (39%) |
Evaluation of the phosphate search algorithm
The p% column states at what position in the list of putative phosphates p% of the total number of phosphates were found, e.g. the entry 8 in the 80% column of egli1 means that in order to cover eight phosphates (80% of 10), the first eight peaks of the sorted list of phosphate candidates are required. See §2.2.3 for an explanation of the quality indicator r.n and the definition of the last four column labels.
| ID | #P | 80% | r.n. | 90% | r.n. | 100% | r.n. | ||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| egli1 | 10 | 8 | 1.00 | 9 | 1.00 | 10 | 1.00 | 1.19 | 0.55 | 1.06 | n/a |
| egli2 | 22 | 18 | 1.00 | 20 | 1.00 | 22 | 1.00 | 1.31 | 0.34 | 0.70 | n/a |
| octan | 14 | 12 | 1.00 | 13 | 1.00 | 14 | 1.00 | 1.37 | 0.42 | 1.18 | n/a |
| udo | 18 | 15 | 1.00 | 17 | 1.00 | 18 | 1.00 | 1.38 | 0.71 | 1.15 | n/a |
| 419D | 28 | 23 | 1.00 | 26 | 1.00 | 76 | 2.71 | 1.39 | 0.43 | 0.92 | 0.22 |
| 2HOJ | 78 | 67 | 1.06 | 78 | 1.10 | 183 | 2.35 | 1.51 | 0.82 | 0.94 | 0.26 |
| 3AFA | 290 | 306 | 1.32 | 423 | 1.62 | 825 | 2.84 | 1.45 | 1.11 | 0.77 | 0.30 |
Average angles and distances between and , , respectively
Values were determined as explained in §3. The direction is always .
| C1′—P | P—C1′ | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| No. of data | No. of data | No. of data | |||||||||||
| 737 | 4.77 | 0.05 | 26.7 | 5.34 | 0.08 | 24.5 | 65.7 | 0.993 | 73.2 | 0.991 | |||
| 1171 | 4.29 | 0.19 | 56.9 | 5.14 | 0.28 | 59.1 | 64.6 | 0.983 | 90.6 | 0.987 | |||
| 238 | 4.52 | 0.09 | 6.4 | 4.99 | 0.10 | 5.7 | 151 | 76.1 | 0.999 | 155 | 74.4 | 0.999 | |
| 4.98 | 0.08 | 4.5 | 5.62 | 0.07 | 5.9 | 87 | 86.4 | 0.999 | 83 | 86.7 | 0.998 | ||
| RNA (all) | 1318 | 4.78 | 0.05 | 19.1 | 5.32 | 0.09 | 19.1 | 67.9 | 0.985 | 71.5 | 0.993 | ||
Figure 2Histograms of the C1′3′P (left) and 5′PC1′ (right) distance distributions for RNA and DNA and the fitted normal distribution corresponding to Table 4 ▶. See §3 for more details. 1J5E is the PDB code of the 30S ribosomal subunit determined at 3.05 Å resolution and shown for comparison with the RNA histograms.
Figure 3The scatter plots of the distances versus (blue) show a clear asymmetry allowing determination of the direction of the backbone chain. The brown points show the distances versus where is N9 for purines and N1 for pyrimidines. The asymmetry is absent for these -based data points.