| Literature DB >> 34202053 |
Sohini Chakraborti1, Kaushik Hatti2, Narayanaswamy Srinivasan1.
Abstract
Our understanding of the structure-function relationships of biomolecules and thereby applying it to drug discovery programs are substantially dependent on the availability of the structural information of ligand-protein complexes. However, the correct interpretation of the electron density of a small molecule bound to a crystal structure of a macromolecule is not trivial. Our analysis involving quality assessment of ~0.28 million small molecule-protein binding site pairs derived from crystal structures corresponding to ~66,000 PDB entries indicates that the majority (65%) of the pairs might need little (54%) or no (11%) attention. Out of the remaining 35% of pairs that need attention, 11% of the pairs (including structures with high/moderate resolution) pose serious concerns. Unfortunately, most users of crystal structures lack the training to evaluate the quality of a crystal structure against its experimental data and, in general, rely on the resolution as a 'gold standard' quality metric. Our work aims to sensitize the non-crystallographers that resolution, which is a global quality metric, need not be an accurate indicator of local structural quality. In this article, we demonstrate the use of several freely available tools that quantify local structural quality and are easy to use from a non-crystallographer's perspective. We further propose a few solutions for consideration by the scientific community to promote quality research in structural biology and applied areas.Entities:
Keywords: PDB; binding pose; electron density map; ligand–protein crystal structures; quality assessment; resolution
Mesh:
Substances:
Year: 2021 PMID: 34202053 PMCID: PMC8268033 DOI: 10.3390/ijms22136830
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Figure 1Overall percentage distribution of quality assessment results of 276,377 small molecule–protein binding site pairs in 66,851 PDB entries. (A) Percentage distribution of quality categories of ligands. (B) Percentage distribution of quality categories of protein binding sites. (C) Percentage distribution of quality categories of ligand–protein binding site pairs. (D) Chart showing percentage of ligand–protein binding site pairs devoid of ‘Bad’ quality of either ligand/protein binding site.
Figure 2Year-wise percentage distribution of protein–ligand (P–L) binding site pairs in our dataset from 2000 to 2019 (till July). (A) Year-wise percentage distribution of normalized total number of pairs (NTP) of protein–ligand binding sites. The NTP values were calculated using the equation mentioned in Section 4.6. The number of protein–ligand binding site pairs shows a steady increase over the years. (B) Year-wise distribution of %P–L binding site pairs of each quality category based on VHELIBS assessment. The quality trend of each of the nine categories (GG, GD, GB, DG, DD, DB, BG, BD, BB) of P–L binding site pairs is almost similar over approximately the last two decades.
Figure 3Distribution of quality scores obtained from VHELIBS vs. resolution (Å) of the corresponding structure. (A) Quality score of ligands vs. resolution. (B) Quality score of protein binding sites vs. resolution. The green, yellow, and red circles represent ‘Good’ (score = 0), ‘Dubious’ (0 < score ≤ 2), and ‘Bad’ (score > 2) categories, respectively. The vertical dashed lines in both the plots are drawn at 2.5 Å, which highlight some of the structures with the worst local quality scores (indicated by black arrows) that are solved at a resolution better than 2.5 Å. Each plot contains 276,377 data points.
Figure 4Electron density maps around the ligands bound at site S1 of the structures discussed in case study-1. (A) L1 in C1 (2.3 Å). (B) L2 in C2 (2.3 Å). (C) L3 in C3 (2.4 Å). (D) L4 in C4 (2.5 Å). (E) L10 in C17 (2.5 Å). The ligands are shown as a ball and stick model. The neighboring protein residues are shown as thin sticks. The blue translucent blobs are the ‘2 mFo-DFc’ maps contoured at 1.5σ, surrounding all well-determined atoms in the models. The ‘mFo-DFc’ maps (also called difference; shown as mesh) are colored as red (negative density difference contoured at -3σ) and green (positive density difference contoured at +3σ). The red density around an atom indicates either the atom is not present in the crystal or not well determined by the data or is an indicator of other aspects of incorrect modeling. The green density suggests those aspects of a structure that are reflected in the experimental data but have not been accounted for in the model. The same representation styles and contour levels are used in all the figures presented in this article. The images were generated with the 3D visualizer freely available on the PDBe website. Readers are encouraged to refer to the blog available at the PDBe website for a detailed explanation and guide [25].
Figure 5Electron density maps around the ligand L’ bound at site S2 of the structures discussed in case study-1. (A) C1. (B) C2. (C) C3. (D) C4. (E) C17. For details on the resolution of the structures, graphical representation styles, and color codes of density maps, kindly refer to the legend of Figure 4.
Quality assessment of protein–ligand binding sites in C1–C4, and C17 (case study-1).
| Entity | VHELIBS * | EDIA Analysis | PDB Validation Report | |||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Ligand Score; Binding Site Score; Category | EDIAm a | OPIA b | Mean B-Factor c | Occ. < 1.0 d | RSCC e | |||||||||||||||||||||||||
| C1 | C2 | C3 | C4 $ | C17 | C1 | C2 | C3 | C4 | C17 | C1 | C2 | C3 | C4 | C17 | C1 | C2 | C3 | C4 | C17 | C1 | C2 | C3 | C4 | C17 | C1 | C2 | C3 | C4 | C17 | |
| Ligand at S1 |
|
| NA |
|
|
|
|
|
|
|
|
|
|
| 75.01 | 19.96 | 19.97 |
|
| 0 | 0 |
|
|
|
|
|
| |||
| S1: Phe1 | NA |
| 0.90 | 0.81 |
|
|
| 91 | 73 |
|
| 56.53 | 29.01 | 35.90 | 45.28 | 38.92 | 0 | 0 | 0 | 0 | 0 | 0.93 | 0.93 | 0.95 | 0.94 | 0.95 | ||||
| S1: Asn55 | NA |
|
|
|
|
| 75 | 50 | 75 |
|
| 59.71 | 41.85 | 38.72 | 46.98 | 53.29 | 0 | 0 | 0 | 0 | 0 |
|
|
|
| 0.94 | ||||
| S1: Phe154 | NA |
|
|
| 0.80 |
| 73 | 55 | 55 |
|
| 57.93 | 37.09 | 43.20 | 45.71 | 55.62 | 0 | 0 | 0 | 0 | 0 | 0.90 | 0.93 | 0.93 | 0.93 | 0.94 | ||||
| S1: Glu157 | NA |
|
|
|
|
|
| 56 | 56 |
|
| 54.83 | 32.04 | 36.58 | 38.39 | 35.86 |
|
|
|
| 0 | 0.96 | 0.96 | 0.94 | 0.93 | 0.98 | ||||
| S1: Leu158 | NA |
| 0.90 |
| 0.82 |
| 75 | 75 | 75 | 75 | 75 | 51.19 | 29.96 | 31.86 | 35.88 | 29.28 | 0 | 0 | 0 | 0 | 0 | 0.96 | 0.98 | 0.98 | 0.96 | 0.93 | ||||
| S1: Glu165 | NA | 0.87 | 0.84 | 0.91 | 0.91 | 0.84 | 56 | 56 | 78 | 89 | 67 | 57.10 | 36.72 | 41.59 | 47.95 | 38.21 | 0 | 0 | 0 | 0 | 0 | 0.91 | 0.95 | 0.96 | 0.96 | 0.92 | ||||
| S1: Arg168 | NA |
|
|
|
|
|
|
|
|
|
| 60.94 | 45.93 | 45.71 | 56.23 | 62.37 |
|
|
|
|
| 0.96 | 0.94 | 0.94 | 0.92 | 0.93 | ||||
| Ligand at S2 | 2; | 1;1; | 1;1; | 1;2; DD | 1;1; | 0.88 | 0.93 | 0.87 | 0.86 | 0.85 | 71 | 90 | 71 | 71 | 67 | 39.96 | 23.01 | 24.32 | 24.14 | 25.24 | 0 | 0 | 0 | 0 | 0 | 0.94 | 0.95 | 0.98 | 0.97 | 0.97 |
The het codes of the ligands at S1 of C1 (2.3 Å), C2 (2.3 Å), C3 (2.4 Å), C4 (2.5 Å), and C17 (2.5 Å) are L1, L2, L3, L4, and L10, respectively. The het code of the ligands at S2 in all the five structures is L’. The important residues at site S1 that lack fair electron density support are listed in the table. Kindly note the residue numbers as given in the PDB are not revealed in accordance with our principle of masking the identities of the structures due to the reason stated in the text. The numbers used here are with reference to the first residue (Phe) in the table which is assigned as residue number ‘1’ in this paper. * A ligand/binding site is classified as ‘Bad’ (B) by VHELIBS when the score is above 2, indicated in bold. $ As explained in the text, the quality assessment of the ligand (L4) bound at site S1 of C4 could not be performed with VHELIBS. a An EDIAm score of any fragment (ligand/residue) below 0.8 indicates at least three atoms in that fragment are not well supported by electron density. The values below 0.8 are shown in bold. b OPIA: overall percentage of well-resolved interconnected atoms; the values below 50 (shown in bold) indicate less than 50% of the interconnected atoms in the particular fragment have good electron density support. c B-factors are measured in units of Å2. The numbers within the brackets, ‘( )’, in the first and last row indicate the average B-factors of the binding site residues around the respective ligand. These values were calculated by averaging the mean B-factor of the protein residues (that are within 4.5 Å from the ligand) obtained from the EDIA server. Wherever a ligand’s B-factor is 1.5 times more than that of its surrounding protein residues, the B-factor of the former is shown in bold, and it demands careful inspection. d Occ. < 1.0: number of atoms in the fragment that have an occupancy less than unity; cases where one or more atoms have Occ. < 1.0 are shown in bold. Notably, the ligands bound at S1 of C1, C2, and C17 have 22, 20, and 19 non-hydrogen atoms, respectively. None of the atoms in these ligands have Occ. = 1.0. e RSCC: a score below 0.9 indicates the atoms in the ligand/residue are not well supported by electron density and is shown here in bold. The scores are taken from the respective PDB validation report.
Figure 6Electron density maps around the ligands (substrate and co-substrate) bound to the structures (C5 and C6) discussed in case study-2. (A) L” in C5 (1.0 Å). (B) L5 in C5 (1.0 Å). (C) L’” in C6 (1.1 Å). (D) L6 in C6 (1.1 Å). For details on graphical representation styles and color codes of density maps, kindly refer to the legend of Figure 4.
Quality assessment of ligands bound to the structures C5–C10 (case study-2).
| Protein–Ligand Complex Identifier | VHELIBS * | EDIA Analysis | PDB Validation Report | PDB-REDO | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Ligand Score; Binding Site Score; | EDIAm a | OPIA b | Mean B-Factor c | Occ. < 1.0 d | RSCC e | ΔRSCC f | ||||||||
| SL | CSL | SL | CSL | SL | CSL | SL | CSL | SL | CSL | SL | CSL | SL | CSL | |
| C5 |
|
|
|
|
|
| 0 | 0 |
|
| 0.281 | 0.226 | ||
| C6 |
|
|
|
|
|
| 0 | 0 |
|
| 0.188 | 0.279 | ||
| C7 |
|
|
|
|
|
| 0 | 0 |
|
|
|
| ||
| C8 | 2; |
|
|
|
| 0 | 0 |
|
|
|
| |||
| C9 | 2; |
|
|
|
| 0 | 0 |
|
| 0.153 | 0.240 | |||
| C10 | N/A | 0; 0; GG | N/A | 0.84 | N/A | 73 | N/A | 5.5 | N/A | 0 | N/A | N/A | N/A | 0.004 |
The het codes of the substrate ligands (SL) in C5, C6, C7, C8, and C9 are L”, L’”, L’”, L”, and L’”, respectively. The het codes of the co-substrate ligands (CSL) in C5, C6, C7, C8, C9, and C10 are L5, L6, L7, L5, L7, and L5, respectively. *A ligand/binding site is classified as ‘Bad’ (B) by VHELIBS when the score is above 2, indicated here in bold. a An EDIAm score of any fragment (ligand/residue) below 0.8 indicates at least three atoms in that fragment are not well supported by electron density. The values below 0.8 are shown in bold. b OPIA: overall percentage of well-resolved interconnected atoms; a value below 50 (shown in bold) indicates less than 50% of the interconnected atoms in the particular fragment lack good electron density support. Notably, most of the ligands (except the substrate in C8 and co-substrate in C10) have an OPIA score = 0. c B-factors are measured in units of Å2. The numbers within the brackets, ‘( )’, indicate the average B-factors of the binding site residues around the respective ligand. These values were calculated by averaging the mean B-factor of the protein residues (that are within 4.5 Å from the ligand) obtained from the EDIA server. Wherever a ligand’s B-factor is 1.5 times more than that of its surrounding protein residues, the B-factor of the former is shown in bold, and it demands careful inspection. d Occ. < 1.0: number of atoms in the fragment that have an occupancy less than unity. e RSCC: a score below 0.9 indicates the atoms in the ligand/residue are not well supported by electron density and is shown here in bold. The scores are taken from the respective PDB validation report (if available). f ΔRSCC: change in RSCC after re-refinement. A negative value indicates worse in PDB-REDO. The values in bold indicate insignificant change between final and initial density map fits (quantified by RSCC). Although significant changes are observed for the ligands in C5, C6, and C9, the density fits are not satisfactory to give a high RSCC. N/A: not applicable.
Figure 7Electron density maps around the ligands (substrate and co-substrate) bound to the structures (C7 and C8) discussed in case study-2. (A) L’” in C7 (1.54 Å). (B) L7 in C7 (1.54 Å). (C) L” in C8 (1.78 Å). (D) L5 in C8 (1.78 Å). For details on graphical representation styles and color codes of density maps, kindly refer to the legend of Figure 4.
Figure 8Electron density maps around the ligands (substrate and co-substrate) bound to the structures (C9 and C10) discussed in case study-2. (A) L’” in C9 (1.05 Å). (B) L7 in C9 (1.05 Å). (C) L5 in C10 (1.08 Å). For details on graphical representation styles and color codes of density maps, kindly refer to the legend of Figure 4.
Figure 9Electron density maps around the ligands bound to a few structures discussed in case study-3. (A) L8 in C11 (3.4 Å). (B) L8 in C13 (3.4 Å). (C) L8 in C14 (3.4 Å). (D) L8 in C16 (2.1 Å). For details on graphical representation styles and color codes of density maps, kindly refer to the legend of Figure 4.
Quality assessment of ligands bound to the structures C11–C16 (case study-3).
| Protein–Ligand Complex Identifier; Ligand Code | VHELIBS * | EDIA Analysis | PDB Validation | PDB-REDO | |||
|---|---|---|---|---|---|---|---|
| Ligand Score; Binding Site Score; Category | EDIAm a | OPIA b | Mean B-factor c | Occ. < 1.0 d | RSCC e | ΔRSCC f | |
| C11; L8 |
|
| 79.20 |
|
|
| |
| C12; L9 | 0; | 0.88 | 81 | 106.33 | 0 | 0.93 |
|
| C13; L8 | 0; |
| 61 | 109.38 | 0 | 0.95 | 0.058 |
| C14; L8 | 0; 2; GD | 0.84 | 77 | 47.55 | 0 | 0.95 | −0.180 |
| C15; L9 | 0; 1; GD |
| 56 | 49.71 | 0 | 0.97 | −0.055 |
| C16; L8 |
|
| 0 |
| 0.190 | ||
* A ligand/binding site is classified as ‘Bad’ (B) by VHELIBS when the score is above 2, indicated here in bold. a An EDIAm score of any fragment (ligand/residue) below 0.8 indicates at least three atoms in that fragment are not well supported by electron density. The values below 0.8 are shown in bold. b OPIA: overall percentage of well-resolved interconnected atoms; a value below 50 (shown in bold) indicates less than 50% of the interconnected atoms in the particular fragment are well resolved. c B-factors are measured in units of Å2. The numbers within the brackets, ‘( )’, indicate the average B-factors of the binding site residues around the respective ligand. These values were calculated by averaging the mean B-factor of the protein residues (that are within 4.5 Å from the ligand) obtained from the EDIA server. Wherever a ligand’s B-factor is 1.5 times more than that of its surrounding protein residues, the B-factor of the former is shown in bold, and it demands careful inspection. d Occ. < 1.0: number of atoms in the fragment that have an occupancy less than unity. Notably, L8 has 31 non-hydrogen atoms, and all these 31 atoms of L8 in C11 have occupancy less than 1.0. e RSCC: a score below 0.9 indicates the atoms in the ligand/residue are not well supported by electron density and is shown here in bold. The scores are taken from the respective PDB validation report. f ΔRSCC: change in RSCC after re-refinement. A negative value indicates worse in PDB-REDO. The values in bold indicate insignificant change between final and initial density map fits (quantified by RSCC).