| Literature DB >> 19465773 |
Thomas C Terwilliger1, Paul D Adams, Randy J Read, Airlie J McCoy, Nigel W Moriarty, Ralf W Grosse-Kunstleve, Pavel V Afonine, Peter H Zwart, Li Wei Hung.
Abstract
Estimates of the quality of experimental maps are important in many stages of structure determination of macromolecules. Map quality is defined here as the correlation between a map and the corresponding map obtained using phases from the final refined model. Here, ten different measures of experimental map quality were examined using a set of 1359 maps calculated by re-analysis of 246 solved MAD, SAD and MIR data sets. A simple Bayesian approach to estimation of map quality from one or more measures is presented. It was found that a Bayesian estimator based on the skewness of the density values in an electron-density map is the most accurate of the ten individual Bayesian estimators of map quality examined, with a correlation between estimated and actual map quality of 0.90. A combination of the skewness of electron density with the local correlation of r.m.s. density gives a further improvement in estimating map quality, with an overall correlation coefficient of 0.92. The PHENIX AutoSol wizard carries out automated structure solution based on any combination of SAD, MAD, SIR or MIR data sets. The wizard is based on tools from the PHENIX package and uses the Bayesian estimates of map quality described here to choose the highest quality solutions after experimental phasing.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19465773 PMCID: PMC2685735 DOI: 10.1107/S0907444909012098
Source DB: PubMed Journal: Acta Crystallogr D Biol Crystallogr ISSN: 0907-4449
Real-space measures of map quality tested in this work
| Expected properties | ||||
|---|---|---|---|---|
| Method | Symbol | Basis | Perfect map | Random map |
| Skewness of electron density | skew | High positive density and no negative density in a good map | Positive skewness | Near-zero skewness |
| Contrast of electron density | Solvent and macromolecule have different r.m.s. variation in densities | High contrast | Low contrast | |
| Correlation of local r.m.s. density | Solvent region is contiguous so local r.m.s. is correlated with neighboring local r.m.s. | High correlation | Low correlation | |
| Flatness of electron density | Solvent region has nearly flat electron density | High value of flatness | Low value of flatness | |
| Number of regions enclosing high density | Chains of a macromolecule can be represented by a few connected regions of density | Few (but extended) connected regions | Many short connected regions | |
| Overlap of NCS-related density | If NCS is present, NCS-related density is similar | High overlap | Low overlap | |
Reciprocal-space measures of map quality tested in this work
| Expected properties | ||||
|---|---|---|---|---|
| Method | Symbol | Basis | Perfect map | Random map |
| Phase correlation from statistical density modification | Phases from first cycle of density modification are unbiased and are correlated with experimental phases | High | Low | |
| Amplitudes for a reflection can be calculated from phases and amplitudes of all other reflections and expected features of the map | Low | High | ||
| Density truncation | Much of the information in a map of a macromolecule consists of the density at points in the map near atomic positions | High | Low | |
| Mean figure of merit of phasing | 〈 | Estimates of accuracy of experimental phases are an approximate upper bound on quality of the map | High 〈 | Low 〈 |
Figure 1Measures of the quality of electron-density maps and structure factors. Measures of quality were calculated as described in the text for 1359 sets of structure factors and associated maps. Each measure is plotted with an abscissa equal to the correlation of density of the map with a map calculated from a final model (r 2 MODEL). Measures based on structures determined at resolutions of 2 Å or higher are shown as black diamonds and those at resolutions lower than 2 Å are shown as purple squares. All measures of quality and the correlation with model density (r 2 MODEL) were calculated at a resolution of 2.5 Å or the nominal resolution of the data, whichever is the lower. (a) Skewness of electron density. (b) Contrast of electron density. (c) Correlation of local r.m.s. density. (d) Flatness of solvent region. (e) Number of regions enclosing high density. (f) Overlap of NCS-related density. (g) Phase correlation from statistical density modification. (h) R factor from statistical density modification. (i) Density truncation. (j) Figure of merit of phasing.
Figure 2Comparisons of cross-validated estimates of map quality with actual map quality. Measures of map quality as shown in Fig. 1 ▶ were used in (7a) and (8a) to estimate overall map quality. The calculations were carried out one data set at a time. For each data set, joint probability distributions of each measure of quality and true quality [e.g. p(skew, r 2 MODEL)] were calculated excluding data from all solutions for that structure. These cross-validated joint probability distributions were used in (7a) and (8a) to estimate map quality using the measures of quality for each map associated with that data set. In each case, the true map quality (r 2 MODEL) is plotted as a function of the Bayesian estimates of map quality. (a) Estimates of map quality using the skewness of electron density in (7a). (b) Estimates using the correlation of local r.m.s. density in (7a). (c) Estimates using the skewness and correlation of local r.m.s. density in (8a).
Cross-validated prediction correlation
| Quality measure(s) | Prediction correlation coefficient | R.m.s. prediction error |
|---|---|---|
| skew | 0.90 | 0.10 |
| 0.78 | 0.15 | |
| 0.85 | 0.12 | |
| 0.80 | 0.14 | |
| 0.42 | 0.20 | |
| 0.80 | 0.10 | |
| 0.77 | 0.14 | |
| 0.48 | 0.21 | |
| 〈 | 0.42 | 0.21 |
| skew and | 0.92 | 0.09 |
Correlation of prediction errors
Values of r 2 MODEL were estimated for each measure of map quality using (7a) as in Fig. 3 ▶. The true values of r 2 MODEL were then subtracted, yielding prediction errors for each map for each measure of map quality. The correlation coefficients (r 2) of prediction errors among the various measures of map quality are listed.
| skew | 〈 | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| skew | 1 | ||||||||
| 0.69 | 1 | ||||||||
| 0.60 | 0.82 | 1 | |||||||
| 0.73 | 0.95 | 0.84 | 1 | ||||||
| 0.61 | 0.86 | 0.61 | 0.79 | 1 | |||||
| 0.63 | 0.81 | 0.79 | 0.88 | 0.66 | 1 | ||||
| 0.66 | 0.79 | 0.74 | 0.79 | 0.77 | 0.84 | 1 | |||
| 0.54 | 0.82 | 0.63 | 0.71 | 0.88 | 0.61 | 0.76 | 1 | ||
| 〈 | 0.55 | 0.73 | 0.61 | 0.68 | 0.69 | 0.64 | 0.70 | 0.85 | 1 |
Figure 3Comparisons of measures of map quality for pairs of maps based on enantiomorphic heavy-atom substructures. For structures in nonchiral space groups, all pairs of solutions derived from enantiomorphic pairs of heavy-atom substructures were selected. The member of the pair leading to the map with the higher correlation coefficient to the corresponding model map was identified as the ‘correct’ hand and the other as the ‘inverse’ hand. The value of each measure of map quality for the correct hand is plotted as the abscissa in each plot and the value of the measure for the corresponding inverse hand is the ordinate. Maps based on MAD data are represented as black diamonds, those from MIR data (all maps examined in this figure are from single derivatives) are represented as red triangles and those from SAD data are represented as blue squares. (a) Skewness of electron density. (b) Contrast of electron density. (c) Correlation of local r.m.s. density. (d) Flatness of solvent region. (e) Number of regions enclosing high density. (f) Overlap of NCS-related density. (g) Phase correlation from statistical density modification. (h) R factor from statistical density modification. (i) Density truncation.
Decision-making accuracacy for enantiomeric pairs
The percentage of cases in which the higher (or lower, as appropriate) value of the quality measure is associated with the higher value of the actual map correlation coefficient with the corresponding model map. Only cases in which the actual map correlations differ by at least 0.05 are considered.
| Quality measure(s) | Percentage of correct predictions |
|---|---|
| skew | 0.98 |
| 0.94 | |
| 0.95 | |
| 0.94 | |
| 0.95 | |
| 0.90 | |
| 0.93 | |
| 0.94 | |
| 0.97 |
Figure 4Map qualities of density-modified maps. (a) Qualities of density-modified maps as a function of the qualities of the corresponding experimental maps. (b) Comparison of qualities of pairs of density-modified maps for the same structure derived from experimental maps of similar quality (see text).
Decision-making accuracies in choosing the solution with the best experimental or density-modified map
The percentage of correct predictions of best maps is the percentage of cases in which the solution with the highest value of the quality measure has a map correlation coefficient with the corresponding model map within 0.02 of that of the best obtained for any solution for that structure. The analysis is based on 372 sets of structure factors and associated maps obtained from 149 data sets as in Fig. 1 ▶, selecting the top-ranked 2–6 solutions and carrying out density modification with RESOLVE (Terwilliger, 2000 ▶) to yield density-modified maps. A model was built into each density-modified map using a rapid method for building helices and strands. If the value of the map–model correlation was less than 0.35, then the building procedure was repeated with a standard cycle of building using the methods in the PHENIX AutoBuild wizard (Terwilliger et al., 2008 ▶) and the value of the map–model correlation from the full standard procedure was used. Only structures for which at least one model–map correlation was at least 0.20 are included in the analysis. The worst error in identification of the best maps is the largest value of the difference between the correlation coefficient of the best map with the corresponding model map and that of the map with the highest value of the quality measure.
| Percentage of correct predictions of best maps | Worst error in identification of best maps | |||
|---|---|---|---|---|
| Quality measure | Experimental maps | Density-modified maps | Experimental maps | Density-modified maps |
| Bayesian estimate using skew and | 91 | 88 | 0.29 | 0.58 |
| Map-model correlation for model built into density-modified map | 87 | 92 | 0.40 | 0.26 |
Figure 5Comparison of quality of density-modified maps obtained using the skewness of electron density and correlation of local r.m.s. density for scoring with those obtained using the true map quality (correlation to the corresponding model map) for scoring. See text for details. The light blue bars labeled ‘Perfect scoring’ correspond to running the PHENIX AutoSol wizard and using the actual experimental map quality to make decisions at each step prior to obtaining density-modified phases and using the actual density-modified map quality to make the final choice of solution. The dark maroon bars labeled ‘Bayesian scoring’ correspond to using the Bayesian scores for experimental maps based on the skewness of electron density and correlation of local r.m.s. density and using the model–map correlation to choose the final density-modified solution. The light green bars labeled ‘Random scoring’ correspond to using random scores to make decisions about experimental map quality and model–map correlation to choose the final solution. Each ‘random scoring’ value is the average of ten separate runs of PHENIX AutoSol wizard carried out with differing random seeds. Note that the ‘perfect scoring’ method does not necessarily lead to the best final map. For example, an experimental map that is not the best one but is chosen by another scoring method could adventitiously yield additional sites that lead to a better final solution. (a) Structures determined using MAD. Structures shown are aep-transaminase (PDB code 1m32; Chen et al., 2002 ▶), armadillo (3bct; Huber et al., 1997 ▶), cobd (1kus; Cheong et al., 2002 ▶), cp-synthase (1l1e; Huang et al., 2002 ▶), cyanase (1dw9; Walsh et al., 2000 ▶), epsin (1edu; Hyman et al., 2000 ▶), gene-5 (1vqb; Skinner et al., 1994 ▶), gere (1fse; Ducros et al., 2001 ▶), gpatase (1ecf; Muchmore et al., 1998 ▶), group2-intron (1kxk; Zhang & Doudna, 2002 ▶), ic-lyase (1f61; Sharma et al., 2000 ▶), lysozyme (unpublished results; CSHL Macromolecular Crystallography Course), mbp (1ytt; Burling et al., 1996 ▶), mev-kinase (1kkh; Yang et al., 2002 ▶), nsf-d2 (1nsf; Yu et al., 1998 ▶), p32 (1p32; Jiang et al., 1999 ▶), p9 (1bkb; Peat et al., 1998 ▶), pdz (1kwa; Daniels et al., 1998 ▶), psd-95 (1jxm; Tavares et al., 2001 ▶), rab3a (1zbd; Ostermeier & Brunger, 1999 ▶), s-hydrolase (1a7a; Turner et al., 1998 ▶), synapsin (1auv; Esser et al., 1998 ▶), tryparedoxin (1qk8; Alphey et al., 1999 ▶) and vmp (1l8w; Eicken et al., 2002 ▶) (b) Structures determined using SAD: 1029B (1n0e; Chen et al., 2004 ▶), 1038B (1lql; Choi et al., 2003 ▶), 1063B (1lfp; Shin et al., 2002 ▶), 1071B (1nf2; Shin, Roberts et al., 2003 ▶), 1102B (1l2f; Shin, Nguyen et al., 2003 ▶), 1167B (1s12; Shin et al., 2005 ▶), rnase-p (1nz0; Kazantsev et al., 2003 ▶), calmodulin (1exr; Wilson & Brunger, 2000 ▶), fusion-complex (1sfc; Sutton et al., 1998 ▶), insulin (2bn3; Nanao et al., 2005 ▶), myoglobin (A. Gonzales, personal communication), nsf-n (1qcs; Yu et al., 1999 ▶), sec17 (1qqe; Rice & Brunger, 1999 ▶) and ut-synthase (1e8c; Gordon et al., 2001 ▶). Note that fusion-complex was solved with SAD plus SIR. (c) Structures determined using MIR: flr (1bkj; Tanner et al., 1996 ▶), granulocyte (2gmf; Rozwarski et al., 1996 ▶), groEL (1oel; Braig et al., 1995 ▶), hn-rnp (1ha1; Shamoo et al., 1997 ▶), penicillopepsin (3app; James & Sielecki, 1983 ▶), qaprtase (1qpo; Sharma et al., 1998 ▶), rh-dehalogenase (1bn7; Newman et al., 1999 ▶), rnase-s (1rge; Sevcik et al., 1996 ▶), rop (1f4n; Willis et al., 2000 ▶) and synaptotagmin (1dqv; Sutton et al., 1999 ▶).
Figure 6Histograms of density corresponding to a poor map (dotted lines, correlation to model map of 0.04) and to a good map (solid lines, correlation to model map of 0.66). See text for details.