| Literature DB >> 19564688 |
Abstract
The fitting of high-resolution structures into low-resolution densities obtained from techniques such as electron microscopy or small-angle X-ray scattering can yield powerful new insights. While several algorithms for achieving optimal fits have recently been developed, relatively little effort has been devoted to developing objective measures for judging the quality of the resulting fits, in particular with regard to the danger of overfitting. Here, a general method is presented for obtaining confidence intervals for atomic coordinates resulting from fitting of atomic resolution domain structures into low-resolution densities using well established statistical tools. It is demonstrated that the resulting confidence intervals are sufficiently accurate to allow meaningful statistical tests and to provide tools for detecting potential overfitting.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19564688 PMCID: PMC2703574 DOI: 10.1107/S0907444909012876
Source DB: PubMed Journal: Acta Crystallogr D Biol Crystallogr ISSN: 0907-4449
Figure 1Schematic representation of the statistical underpinnings of fitting coordinates into density data. Because the data carry measurement errors, many different data sets (Data) can be realised. As a consequence, different correlation coefficients (CC) arise if the score is calculated between the coordinates and the various density maps. Knowledge of this CC distribution would allow the calculation of its statistical properties, including a confidence interval (CI) for the CC.
Docking summary
For each entry, the data appearing in the columns are the PDB identifier of the target structure, the PDB identifier of the search structure, the resolution of the target map, the number of domains used as modules, the number of residues, the sequence identity between target and search structures and the root-mean-square deviation between Cα atoms of the search and target structures after least-squares fitting of the modules, after modular rigid-body docking of the same modules and after using flexible-fitting protocols. The values for the last column were taken from Trabuco et al. (2008 ▶), Jolley et al. (2008 ▶), Topf et al. (2008 ▶), Topf et al. (2008 ▶) and this study, respectively. The target map resolution used in Jolley et al. (2008 ▶) was 14 Å and that used in Topf et al. (2008 ▶) was 10 Å. ρEM denotes experimental density extracted from an electron-microscopy reconstruction of human rhinovirus with bound Fab fragment (Smith et al., 1993 ▶).
| Target | Search | Resolution (Å) | Modules | Residues | Identity (%) | R.m.s.d.lsq (Å) | R.m.s.d.mod (Å) | R.m.s.d.flex (Å) |
|---|---|---|---|---|---|---|---|---|
| 15 | 3 | 729 | 100 | 0.92 | 1.11 | 2.01 | ||
| 15 | 3 | 691 | 100 | 0.94 | 0.98 | 1.89 | ||
| 15 | 2 | 491 | 28 | 2.58 | 2.98 | 4.90 | ||
| 10 | 2 | 172 | 35 | 1.57 | 2.03 | 11.50 | ||
| ρEM | 28 | 1 | 229 | 100 | 0.97 | 2.25 | 3.60 |
Summary of test data and quality of σ(z) estimation
For each entry, the data appearing in the columns are the Protein Data Bank identifier of the target structure, the resolution of the target map, the standard deviation of the actual z-statistics, the estimate of the standard deviation of the z-statistics calculated according to (5), an accuracy measure for this estimate, the mean correlation coefficient between target map and search structure, the mixing factor between Gaussian noise and Laplacian impulse noise (1.0 corresponds to Gaussian only, 0.0 to Laplacian only) and a measure of the extent of voxel correlation (the relative weight for neighboring voxels in respect to the central voxel).
| PDB code | Resolution (Å) | σ( | σ( | Accuracy | CC | Mix | Weight |
|---|---|---|---|---|---|---|---|
| 6 | 0.00753 | 0.00771 | 0.0922 | 0.8693 | 0.5 | 0.08 | |
| 15 | 0.01087 | 0.01219 | 0.4097 | 0.9419 | 0.8 | 0.12 | |
| 25 | 0.02498 | 0.01644 | 1.1337 | 0.8649 | 0.7 | 0.17 | |
| 15 | 0.00959 | 0.00964 | 0.0207 | 0.9883 | 0.5 | 0.08 | |
| 20 | 0.01239 | 0.01112 | 0.3889 | 0.9435 | 0.8 | 0.13 | |
| 20 | 0.01066 | 0.01112 | 0.1620 | 0.9108 | 0.8 | 0.13 | |
| 10 | 0.00978 | 0.01261 | 0.7970 | 0.9390 | 0.5 | 0.20 | |
| 15 | 0.01071 | 0.01489 | 0.9653 | 0.8243 | 0.9 | 0.14 | |
| 20 | 0.02124 | 0.01735 | 0.6655 | 0.6450 | 0.6 | 0.07 | |
| 10 | 0.01567 | 0.01967 | 0.7307 | 0.6239 | 0.6 | 0.25 | |
| 12 | 0.02191 | 0.02155 | 0.0652 | 0.5619 | 0.3 | 0.08 | |
| 15 | 0.03031 | 0.02409 | 0.7366 | 0.4842 | 0.8 | 0.13 |
Rigid-body docking with the entire unmodified search structure was used.
Figure 2Normal probability plots of z-transformed correlation coefficients. For normally distributed variables, the data points lie approximately on the identity line. The insets show central slices through representative densities used to calculate the underlying correlation coefficients. The noise parameters used to generate the maps are listed in Table 2 ▶. (a) Maps were calculated at 6 Å resolution from PDB entry 1oao chain C. (b) Maps were calculated at 15 Å resolution from PDB entry 1lfh. (c) Maps were calculated at 12 Å resolution from PDB entry 1blb. (d) Maps were calculated at 20 Å resolution from PDB entry 1hwz.
Figure 3Docking of Fab fragment into the equivalent density segment derived from an experimental electron-microscopy reconstruction. (a) The correct structure is shown in red (Fab fragment) and blue (virion). A representation of the ensemble of fitted structures with correlation coefficients within the confidence interval (solution set) is shown in white. The asterisk indicates the Fab-fragment loop that locally changes conformation upon binding to the virus. (b) Root-mean-square deviation within the solution set mapped onto the structure with thickness and color. Thinner and blue corresponds to small deviations and thicker and red to large deviations. The 28 Å resolution density map used for the docking experiment is shown as black chicken wire.