| Literature DB >> 32311227 |
Dariusz Brzezinski1,2,3,4, Zbigniew Dauter5, Wladek Minor4, Mariusz Jaskolski1,6.
Abstract
Crystallographic models of biological macromolecules have been ranked using the quality criteria associated with them in the Protein Data Bank (PDB). The outcomes of this quality analysis have been correlated with time and with the journals that published papers based on those models. The results show that the overall quality of PDB structures has substantially improved over the last ten years, but this period of progress was preceded by several years of stagnation or even depression. Moreover, the study shows that the historically observed negative correlation between journal impact and the quality of structural models presented therein seems to disappear as time progresses.Entities:
Keywords: PDB; X-ray crystallography; nucleic acids; proteins; structure quality
Mesh:
Substances:
Year: 2020 PMID: 32311227 PMCID: PMC7340579 DOI: 10.1111/febs.15314
Source DB: PubMed Journal: FEBS J ISSN: 1742-464X Impact factor: 5.542
Quality metric means, standard deviations, and fractions of missing values in the PDB.
| Mean | Standard deviation | Missing values (%) | |
|---|---|---|---|
| Metric | |||
| Clashscore | 8.05 | 9.11 | 0.10 |
| Ramachandran outliers (%) | 0.49 | 1.26 | 1.69 |
| Rotamer outliers (%) | 3.26 | 3.65 | 1.72 |
| RSRZ outliers (%) | 4.05 | 4.06 | 9.56 |
|
| 23.35 | 3.82 | 4.29 |
| Supporting metrics | |||
|
| 19.31 | 3.25 | 2.39 |
| Resolution (Å) | 2.13 | 0.56 | 0 |
| Year of deposition | – | – | 0 |
Evaluation of data imputation methods. Mean results of 100 random experiments with standard deviations given in parentheses, in units of the last significant digit of the mean.
| Error | Method | Clashscore | RSRZ outliers (%) | Ramachandran outliers (%) | Rotamer outliers (%) |
|
|---|---|---|---|---|---|---|
| MAD | MICE |
| 2.01 (2) | 0.21 (1) |
|
|
| Mean | 4.4 (3) | 2.30 (2) | 0.49 (1) | 2.11 (4) | 2.50 (4) | |
| Median | 2.9 (3) |
|
| 1.39 (4) | 2.50 (4) | |
| MAE | MICE |
|
|
|
|
|
| Mean | 5.7 (6) | 2.74 (3) | 0.61 (2) | 2.56 (6) | 3.00 (3) | |
| Median | 5.1 (7) | 2.60 (3) | 0.49 (3) | 2.34 (7) | 3.00 (3) | |
| RMSE | MICE |
|
|
|
|
|
| Mean | 8.9 (23) | 4.04 (14) | 1.24 (16) | 3.65 (14) | 3.82 (4) | |
| Median | 9.3(24) | 4.17 (14) | 1.32 (16) | 3.85 (15) | 3.82 (4) |
Best values for each error estimation method are given in bold. MAD, median absolute deviation; MAE, mean absolute error; RMSE, root‐mean‐square error.
Fig. 1P 1 analysis. Variation in the mean P 1 percentile (higher is better) over time (top) and as a function of resolution (bottom) for proteins (left) and nucleic acids (right). Error bars indicate estimated unbiased standard errors of the mean.
Fig. 2Comparison of P 1(t,d) of protein and nucleic acid structures over time. Variation in mean P 1(t,d) quality percentile (y‐axis, higher is better), comparing nucleic acid and protein structures (color) over time (x‐axis). Error bars indicate estimated unbiased standard errors of the mean.
All‐time journal ranking according to P 1(t,d). The ranking includes all the journals that had at least 100 primary citations of structures in the PDB. P 1(t,d) higher than 50% means that the structures published in a given journal were, on average, better than 50% of structures of similar resolution present in the PDB at the time of deposition. Journals with more than 1000 structures are highlighted in gray. The most frequent venue (To be published) is highlighted in bold.
| Rank | Journal | Mean | Mean resolution (Å) |
|
| Structure count |
|---|---|---|---|---|---|---|
| 1 |
| 87.38 | 2.02 | 2.00 | 1.92 | 132 |
| 2 |
| 71.05 | 2.03 | 1.97 | 1.81 | 418 |
| 3 |
| 69.25 | 1.94 | 1.88 | 1.70 | 241 |
| 4 |
| 68.32 | 1.95 | 1.90 | 1.78 | 153 |
| 5 |
| 68.30 | 1.92 | 1.87 | 1.71 | 527 |
| 6 |
| 67.76 | 1.95 | 1.88 | 1.69 | 281 |
| 7 |
| 67.42 | 1.88 | 1.82 | 1.64 | 167 |
| 8 |
| 66.72 | 2.24 | 2.15 | 1.91 | 169 |
| 9 |
| 66.13 | 2.08 | 2.03 | 1.89 | 276 |
| 10 |
| 65.77 | 2.12 | 2.04 | 1.81 | 103 |
| 11 |
| 65.66 | 1.82 | 1.77 | 1.62 | 242 |
| 12 |
| 65.42 | 2.11 | 2.06 | 1.89 | 1033 |
| 13 |
| 65.11 | 2.02 | 1.97 | 1.81 | 1539 |
| 14 |
| 64.85 | 2.50 | 2.43 | 2.20 | 107 |
| 15 |
| 64.72 | 2.25 | 2.17 | 1.96 | 656 |
| 16 |
| 64.69 | 2.32 | 2.25 | 2.04 | 111 |
| 17 |
| 64.58 | 1.81 | 1.75 | 1.59 | 280 |
| 18 |
| 64.38 | 2.44 | 2.37 | 2.18 | 126 |
| 19 |
| 64.12 | 2.19 | 2.12 | 1.92 | 1013 |
| 20 |
| 64.00 | 2.01 | 1.96 | 1.83 | 112 |
| 21 |
| 63.83 | 2.04 | 1.98 | 1.83 | 1104 |
| 22 |
| 63.59 | 2.09 | 2.02 | 1.81 | 1466 |
| 23 |
| 63.39 | 2.16 | 2.09 | 1.88 | 1847 |
| 24 |
| 63.25 | 1.92 | 1.85 | 1.64 | 1065 |
| 25 |
| 62.82 | 1.78 | 1.73 | 1.57 | 171 |
| 26 |
| 62.61 | 1.88 | 1.82 | 1.64 | 265 |
| 27 |
| 62.60 | 1.81 | 1.77 | 1.64 | 102 |
| 28 |
| 62.39 | 1.88 | 1.86 | 1.77 | 115 |
| 29 |
| 61.98 | 1.87 | 1.83 | 1.68 | 268 |
| 30 |
| 61.91 | 2.35 | 2.29 | 2.14 | 115 |
| 31 |
| 61.75 | 1.79 | 1.73 | 1.56 | 147 |
| 32 |
| 61.57 | 2.15 | 2.09 | 1.92 | 2057 |
| 33 |
| 61.48 | 1.96 | 1.90 | 1.74 | 188 |
| 34 |
| 61.40 | 1.94 | 1.88 | 1.69 | 556 |
| 35 |
| 61.39 | 2.03 | 1.98 | 1.81 | 22 421 |
| 36 |
| 61.37 | 2.21 | 2.11 | 1.86 | 3538 |
| 37 |
| 61.30 | 2.10 | 2.04 | 1.85 | 814 |
| 38 |
| 61.11 | 1.95 | 1.85 | 1.65 | 173 |
| 39 |
| 61.10 | 1.96 | 1.88 | 1.65 | 309 |
| 40 |
| 61.01 | 2.12 | 2.06 | 1.88 | 1062 |
| 41 |
| 60.95 | 2.44 | 2.32 | 1.97 | 245 |
| 42 |
| 60.89 | 2.00 | 1.93 | 1.74 | 2369 |
| 43 |
| 60.59 | 2.05 | 2.00 | 1.84 | 294 |
| 44 |
| 60.17 | 2.08 | 2.03 | 1.88 | 902 |
| 45 |
| 60.06 | 1.99 | 1.91 | 1.70 | 4952 |
| 46 |
| 59.99 | 2.35 | 2.27 | 2.02 | 129 |
| 47 |
| 59.56 | 2.08 | 2.02 | 1.84 | 228 |
| 48 |
| 59.45 | 2.05 | 2.00 | 1.84 | 8896 |
| 49 |
| 59.41 | 2.16 | 2.09 | 1.87 | 412 |
| 50 |
| 59.04 | 2.07 | 2.02 | 1.81 | 128 |
| 51 |
| 59.01 | 2.05 | 2.01 | 1.88 | 279 |
| 52 |
| 58.88 | 2.03 | 1.98 | 1.86 | 161 |
| 53 |
| 58.77 | 2.28 | 2.21 | 1.98 | 2127 |
| 54 |
| 58.53 | 2.20 | 2.11 | 1.86 | 5348 |
| 55 |
| 58.46 | 2.07 | 2.02 | 1.85 | 2235 |
| 56 |
| 58.29 | 2.07 | 2.02 | 1.86 | 5525 |
| 57 |
| 58.24 | 2.34 | 2.26 | 2.06 | 957 |
| 58 |
| 58.00 | 2.21 | 2.17 | 2.05 | 138 |
| 59 |
| 57.99 | 2.09 | 2.04 | 1.87 | 600 |
| 60 |
| 57.95 | 2.33 | 2.22 | 1.85 | 182 |
| 61 |
| 57.91 | 2.12 | 2.06 | 1.89 | 11 055 |
| 62 |
| 57.49 | 2.15 | 2.08 | 1.90 | 1038 |
| 63 |
| 57.37 | 2.22 | 2.16 | 2.01 | 371 |
| 64 |
| 57.28 | 2.10 | 2.04 | 1.88 | 659 |
| 65 |
| 57.09 | 2.07 | 2.01 | 1.84 | 1999 |
| 66 |
| 56.55 | 2.52 | 2.42 | 2.16 | 399 |
| 67 |
| 56.35 | 2.34 | 2.28 | 2.08 | 121 |
| 68 |
| 56.28 | 2.18 | 2.11 | 1.92 | 976 |
| 69 | PNAS | 55.79 | 2.27 | 2.19 | 1.94 | 7376 |
| 70 |
| 55.51 | 2.05 | 2.00 | 1.80 | 168 |
| 71 |
| 55.43 | 2.35 | 2.25 | 2.01 | 336 |
| 72 |
| 55.40 | 2.41 | 2.30 | 2.03 | 869 |
| 73 |
| 55.36 | 1.92 | 1.85 | 1.66 | 199 |
| 74 |
| 55.25 | 2.13 | 2.06 | 1.88 | 9507 |
| 75 |
| 54.60 | 2.42 | 2.36 | 2.20 | 189 |
| 76 |
| 54.30 | 2.07 | 2.03 | 1.92 | 168 |
| 77 |
| 53.84 | 2.52 | 2.42 | 2.12 | 3060 |
| 78 |
| 53.44 | 2.26 | 2.21 | 2.04 | 178 |
| 79 |
| 53.44 | 2.41 | 2.33 | 2.11 | 279 |
| 80 |
| 53.15 | 2.50 | 2.39 | 2.10 | 1949 |
| 81 |
| 52.94 | 2.52 | 2.45 | 2.22 | 119 |
| 82 |
| 52.92 | 2.35 | 2.28 | 2.09 | 211 |
| 83 |
| 52.10 | 2.26 | 2.19 | 2.02 | 296 |
| 84 |
| 51.35 | 2.63 | 2.41 | 2.08 | 149 |
| 85 |
| 51.10 | 2.44 | 2.37 | 2.18 | 265 |
| 86 |
| 51.01 | 2.39 | 2.31 | 2.00 | 104 |
| 87 |
| 50.38 | 2.45 | 2.37 | 2.14 | 1599 |
| 88 |
| 50.19 | 2.19 | 2.16 | 2.05 | 1590 |
| 89 |
| 49.78 | 2.40 | 2.31 | 2.07 | 2915 |
| 90 |
| 49.38 | 2.54 | 2.45 | 2.20 | 1563 |
| 91 |
| 49.15 | 2.37 | 2.30 | 2.10 | 1910 |
Journals that have average P 1(t,d) significantly different than the average P 1(t,d) of the entire PDB, according to Welch's t‐test with the Bonferroni correction at significance level α = 0.001. Mean denotes the arithmetic mean, G‐mean denotes the geometric mean (log‐average), and V‐mean denotes the mean in Å−3.
Fig. 3Journal ranking over time according to P 1(t,d). The plot shows the journal's rank (y‐axis) in a given time period (x‐axis). The ranking includes 25 most popular journals, that is, journals with most structures, ranked based on structures deposited within 5‐year windows. A point appears only if a journal published at least 30 structures in a given 5‐year interval.
Comparison of journal ranking by Brown and Ramaswamy [18] with rankings of the same journals created using P 1(t,d). Numbers of structures considered from a given journal are shown in parentheses. The top three journals according to B&R are highlighted in green, and the bottom three journals are highlighted in red
| B&R ranking [ | Ranking according to | Ranking according to |
|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Journals whose quality was determined to be significantly different from the average quality of structures the entire PDB, at significance level α = 0.001.
Fig. 4Scatterplot of mean journal P 1(t,d) and the journal's impact over time. Variation in mean journal P 1(t,d) (y‐axis) in a given year (color) plotted against the journals IPP. IPP uses the same formula as the 3‐year impact factor, but is based on publicly available Scopus data. The two regression lines show linear trends for 1999 (indigo) and 2018 (yellow) along with 95% confidence intervals (gray areas).
Fig. 5Correlation between structure quality and journal impact. The plot shows Spearman's rank correlation (y‐axis) over time (x‐axis) between structure quality measured by P 1(t,d) and journal impact measured using the IPP and SNIP metrics. IPP uses the same formula as the 3‐year impact factor but is based on Scopus data, whereas SNIP additionally takes into account the scientific field.