| Literature DB >> 20648263 |
Pavel V Afonine, Ralf W Grosse-Kunstleve, Vincent B Chen, Jeffrey J Headd, Nigel W Moriarty, Jane S Richardson, David C Richardson, Alexandre Urzhumtsev, Peter H Zwart, Paul D Adams.
Abstract
phenix.model_vs_data is a high-level command-line tool for the computation of crystallographic model and data statistics, and the evaluation of the fit of the model to data. Analysis of all Protein Data Bank structures that have experimental data available shows that in most cases the reported statistics, in particular R factors, can be reproduced within a few percentage points. However, there are a number of outliers where the recomputed R values are significantly different from those originally reported. The reasons for these discrepancies are discussed.Entities:
Year: 2010 PMID: 20648263 PMCID: PMC2906258 DOI: 10.1107/S0021889810015608
Source DB: PubMed Journal: J Appl Crystallogr ISSN: 0021-8898 Impact factor: 3.304
Figure 1Example phenix.model_vs_data output (for PDB entry 3dcv). Model information includes composition and geometry statistics. Data information includes completeness in resolution shells. Model-to-data fit information includes R factors calculated for the whole set of structure factors using an optimized bulk-solvent model, anisotropic scaling, and TLS and twinning if applicable. R factors are also recalculated after applying the resolution limits and σ cutoffs reported in the PDB header.
Figure 2Histogram of differences between R work reported in the PDB file header and the value calculated with phenix.model_vs_data. Resolution and σ cutoffs were applied in the calculation if available.
Comparison of published (column 3) R factors and solvent parameters with those recomputed using default parameters (column 4), recomputed using published values of k sol and B sol (column 5), and recomputed using slightly different values of r shrink and r solv (those used in REFMAC; last column)
All values were recomputed with PHENIX.
| Published (from PDB file header) | Computed with | Recomputed with published | Recomputed with | |||||
|---|---|---|---|---|---|---|---|---|
| PDB code | Resolution (Å) | |||||||
| 1jvx | 2.5 | 23.2/30.4 | 0.55/132.1 | 23.0/29.8 | 0.32/60.0 | 23.3/30.5 | 23.8/30.4 | 0.31/60.0 |
| 1jzb | 2.8 | 23.3/27.7 | 0.58/122.4 | 22.7/24.6 | 0.28/25.9 | 23.1/27.1 | 22.6/24.6 | 0.29/21.5 |
| 1kk7 | 3.2 | 25.9/31.3 | 0.31/162.0 | 24.7/28.1 | 0.20/60.0 | 25.4/29.2 | 24.6/28.1 | 0.20/60.0 |
| 1r30 | 3.4 | 25.6/30.0 | 0.34/136.6 | 22.7/26.1 | 0.31/80.0 | 23.2/26.9 | 22.6/26.3 | 0.31/80.0 |
| 1tve | 3.0 | 28.9/36.3 | 0.32/108.7 | 27.0/35.0 | 0.33/46.1 | 27.4/35.5 | 26.9/35.2 | 0.32/43.4 |
| 3cf1 | 4.4 | 22.9/28.6 | 0.30/179.2 | 25.3/29.0 | 0.32/198.4 | 25.5/29.3 | 26.2/29.9 | 0.31/197.7 |
Example of structures where the original anisotropic atomic displacement parameters are missing and the corresponding PDB files contain only isotropic atomic displacement parameters
Columns 3 and 4 show the published and recomputed R factors. See §3.1.3 for details.
| PDB code | Resolution (Å) | ||
|---|---|---|---|
| 352d | 0.95 | 15.2 | 20.8 |
| 1brf | 0.95 | 13.2 | 17.1 |
| 1dj6 | 1.00 | 16.5 | 19.2 |
| 2fn3 | 1.00 | 12.8 | 17.0 |
| 1pjx | 0.85 | 12.1 | 16.6 |
| 1q6z | 1.00 | 12.2 | 17.2 |
| 1ucs | 0.62 | 13.7 | 17.6 |
Figure 3Differences between R work computed for the original structures with H atoms and the same structures after removal of the H atoms, shown as function of resolution. See §3.1.5 for details.
Figure 4Differences between R work values (shown as function of resolution) computed for structures without H atoms and the same structures with restored H atoms based on ideal geometry. The atomic displacement parameter and occupancy of each restored H atom was set to be identical to those of the bonded atom. See §3.1.5 for details.
Example of PDB entries with missing water molecules
See §3.1.6 for details.
| PDB code | Published | Recomputed with | Water added with | Number of added water molecules |
|---|---|---|---|---|
| 1kel | 19.9/25.8 | 26.4/27.2 | 17.4/21.7 | 648 |
| 1nko | 27.7/30.1 | 27.1/29.3 | 19.8/22.1 | 108 |
| 1p4k | 18.2/22.0 | 22.3/25.1 | 15.3/19.8 | 603 |
| 1r3f | 22.8/25.7 | 25.0/26.0 | 18.8/23.0 | 240 |
| 1rh9 | 18.2/20.5 | 25.5/25.9 | 18.7/21.3 | 508 |
| 1wou | 21.9/22.9 | 23.6/24.0 | 19.0/22.9 | 42 |
| 1xxs | 16.6/24.7 | 22.1/24.5 | 18.8/22.6 | 117 |
| 2jjf | 16.6/18.5 | 21.3/22.1 | 15.3/17.6 | 260 |
| 2ou9 | 15.9/22.0 | 28.4/29.8 | 19.1/21.4 | 312 |
| 2z1y | 18.0/21.7 | 24.1/24.1 | 16.5/19.6 | 1051 |
| 3d9z | 14.5 | 19.9/20.5 | 15.0/17.8 | 199 |
| 3fy3 | 14.9/20.3 | 24.0/26.3 | 18.4/23.0 | 185 |
| 6msi | 21.5 | 23.3/24.1 | 17.9/22.1 | 48 |
| 1ejg | 9.0/9.4 | 20.8/20.7 | 8.3/8.6 | 128 |
The corresponding R factors were not available in PDB file header and the values were extracted from the corresponding publications.
Crystal structures represented by multiple models
R work and R free as extracted from the PDB file headers (second column) and as recalculated using phenix.model_vs_data (third column). (n.a.: not available.)
| PDB code | PDB file header | Recomputed with | PDB code | PDB file header | Recomputed with |
|---|---|---|---|---|---|
| 1gu8 | 23.0/25.6 | 23.0/25.7 | 2g0v | 5.1/5.4 | 18.5/n.a. |
| 1htq | 20.4/22.3 | 20.7/n.a. | 2g0x | 5.5/5.3 | 18.5/n.a. |
| 1l2g | 27.8/29.7 | 25.7/28.7 | 2g0z | 5.8/7.0 | 18.4/n.a. |
| 1mz0 | 15.0/17.3 | 14.6/16.7 | 2g10 | 4.5/4.9 | 17.3/n.a. |
| 1n6j | 24.3/26.8 | 28.5/31.2 | 2g11 | 5.1/5.7 | 17.4/n.a. |
| 1ohh | 23.2/28.0 | 21.7/n.a. | 2g12 | 5.3/6.2 | 17.4/n.a. |
| 1ot6 | 14.4/16.1 | 14.6/n.a. | 2g14 | 5.1/5.8 | 17.3/n.a. |
| 1ot9 | 13.4/16.1 | 13.5/n.a. | 2g32 | 23.9/25.8 | 25.1/27.3 |
| 1t3n | 26.5/28.6 | 25.6/28.0 | 2gn0 | 18.8/22.2 | 23.1/25.9 |
| 1u0c | 21.4/27.7 | 28.6/n.a. | 2gpm | n.a./27.0 | 24.8/33.0 |
| 1u0d | 21.7/25.7 | 37.8/38.5 | 2gq4 | n.a./27.0 | 25.1/28.4 |
| 1vjm | 25.2/29.8 | 24.7/29.3 | 2gq5 | n.a./31.8 | 26.5/31.7 |
| 1wte | 17.1/22.3 | 21.2/26.3 | 2gq6 | n.a./29.5 | 27.4/28.5 |
| 1x0i | 23.8/28.2 | 25.2/28.9 | 2gq7 | n.a./31.0 | 24.8/31.2 |
| 1yk0 | 24.0/28.4 | 23.5/23.8 | 2grz | 10.6/10.9 | 56.9/58.8 |
| 1yrq | 17.1/22.0 | 22.4/26.0 | 2j9j | 14.2/19.1 | 15.3/n.a. |
| 1zbl | 21.7/25.3 | 26.0/28.2 | 2je4 | 14.3/18.4 | 21.4/n.a. |
| 1zev | 21.8/27.9 | 29.0/33.1 | 2ntw | 15.3/19.5 | 14.4/n.a. |
| 1zy8 | 20.8/27.6 | 20.9/27.1 | 2q3m | 15.7/21.7 | 15.7/21.2 |
| 2aaz | 29.0/30.5 | 27.8/29.4 | 2q3o | 18.0/23.5 | 17.9/23.1 |
| 2ce2 | 14.4/16.3 | 21.8/23.3 | 2q3p | 18.2/22.4 | 18.1/21.9 |
| 2cl6 | 14.6/18.6 | 23.8/27.4 | 2q3u | 13.5/17.1 | 14.3/17.4 |
| 2cl7 | 14.8/17.0 | 20.3/23.4 | 2ull | 16.5/19.2 | 50.1/n.a. |
| 2clc | 14.9/18.0 | 23.7/27.0 | 2vtu | 27.2/31.0 | 30.7/26.6 |
| 2cld | 14.9/17.6 | 21.9/24.8 | 3c5f | 22.4/26.3 | 22.2/26.1 |
| 2d6b | 18.2/21.3 | 17.3/n.a. | 3cmy | 17.2/21.3 | 22.4/25.1 |
| 2e1c | 20.6/23.0 | 31.1/31.4 | 3cye | 19.3/23.1 | 18.1/22.0 |
| 2evw | 15.6/23.6 | 20.9/23.6 | 406d | 26.2/29.4 | 33.6/35.8 |
Crystal structures solved using neutron data
R work and R free as extracted from PDB file header (second column), and as recalculated using phenix.model_vs_data (third column).
| PDB code | PDB file header | Recalculated with |
|---|---|---|
| 1c57 | 27.0/30.1 | 30.0/33.7 |
| 1cq2 | 16.0/25.0 | 32.7/32.8 |
| 1iu6 | 20.1/22.8 | 20.6/23.2 |
| 1l2k | 20.1/23.8 | 19.9/23.3 |
| 1v9g | 22.2/29.4 | 24.6/30.4 |
| 1vcx | 18.6/21.7 | 18.5/21.4 |
| 1wq2 | 22.9/28.9 | 27.8/31.3 |
| 1wqz | 25.2/27.4 | 24.0/30.3 |
| 1xqn | 26.6/32.0 | 35.3/35.7 |
| 2dxm | 19.7/26.0 | 20.4/26.7 |
| 2efa | 21.6/29.1 | 24.5/28.9 |
| 2gve | 27.1/31.9 | 25.0/30.1 |
| 2inq | n.a./23.3 | 20.8/24.8 |
| 2mb5 | n.a. | 23.7/n.a. |
| 2r24 | 25.7/29.1 | 25.6/29.1 |
| 2vs2 | 21.9/28.1 | 23.1/22.7 |
| 2yz4 | 27.9/31.2 | 28.1/31.4 |
| 2zoi | 19.2/21.9 | 19.8/22.1 |
| 2zpp | 22.1/26.0 | 23.1/27.4 |
| 2zye | 19.3/22.2 | 19.4/22.0 |
| 3byc | 26.4/31.5 | 27.1/28.6 |
| 3cwh | 23.7/28.8 | 23.9/23.1 |
| 3hgn | 19.6/21.6 | 19.6/21.5 |
| 3ins | 18.2/n.a. | 19.3/n.a. |
| 5pti | n.a. | 18.7/n.a. |
| 5rsa | n.a. | 18.3/n.a. |