| Literature DB >> 25004962 |
Abstract
A novel procedure for the automatic identification of ligands in macromolecular crystallographic electron-density maps is introduced. It is based on the sparse parameterization of density clusters and the matching of the pseudo-atomic grids thus created to conformationally variant ligands using mathematical descriptors of molecular shape, size and topology. In large-scale tests on experimental data derived from the Protein Data Bank, the procedure could quickly identify the deposited ligand within the top-ranked compounds from a database of candidates. This indicates the suitability of the method for the identification of binding entities in fragment-based drug screening and in model completion in macromolecular structure determination.Entities:
Keywords: drug design; ligands; macromolecular X-ray crystallography; shape descriptors
Mesh:
Substances:
Year: 2014 PMID: 25004962 PMCID: PMC4089483 DOI: 10.1107/S1399004714008578
Source DB: PubMed Journal: Acta Crystallogr D Biol Crystallogr ISSN: 0907-4449
Figure 1Schematic representation of the protocol for ligand identification, shown for adenosine triphosphate (ATP) in the structure of a putative N-type ATP pyrophosphatase (PDB entry 3rk1; Forouhar et al., 2011 ▶) at 2.3 Å resolution. The (F o − F c, αc) difference density map is shown contoured at 1.0σ above the mean; free atoms are shown as balls. The thickness of the visual slab has been adjusted for each image to provide the best view; however, it is reduced in (b) in order to clarify the electron density of interest following protein model display in (a).
Figure 2Trimming of pseudo-atomic grid clusters for feature comparison with ligand features. Difference (F o − F c, αc) maps are shown contoured at 2.5σ above the mean. (a) Density values for placed free atoms are sorted in descending order and the differences in adjacent values are calculated. The standard deviations of density differences are plotted, and only those atoms with density higher than the marked point are output. The data are shown for PDB entry 4iun (Li et al., 2010 ▶). (b) The output atoms, shown as balls, are trimmed further based on distance cutoffs to produce the final shape for screening, shown as crosses. It is an excellent match to the deposited ligand, THP. (c) As in (a) but with three clusters identified for the data in PDB deposition 3mb5 (Guelorget et al., 2010 ▶) are marked with arrows. (d) The third cluster, marked by arrow 3, is a good match to the final ligand, SAM.
The 22 features used to compare the sparse-grid density representation with the set of ligands in multiple conformations
| Feature type | No. of such features | Reference (where appropriate) |
|---|---|---|
| Third-order moment invariants | 11 | Lo & Don (1989 |
| Chirality index | 1 | Hattne & Lamzin (2011 |
| Features based on interatomic distances | 2 | Crippen & Havel (1988 |
| Features based on interatomic connectivity | 4 | Burden (1989 |
| Central moments of the Euclidean distances of the atomic coordinates | 3 | Tabachnick & Fidell (1996 |
| No. of atoms | 1 |
Figure 3The NNRMSD differences between the sparse grids calculated for the training set and the ligand coordinates deposited in the PDB are compared for (a) data for various resolutions and (b) ligands of different sizes. The error bars depict the standard deviation of the values across the set.
The ligands used for training purposes, listed by PDB three-letter code with the corresponding common ligand name (either the drug name or the compound name commonly used in the literature)
Those with an asterisk next to their code are screened in at least two different pucker conformations.
| Ligand three-letter code | Ligand common name |
|---|---|
| 017 | Darunavir |
| 1PE | Pentaethylene glycol |
| 2GP | Guanosine 2-monophosphate |
| 2PE | Nonaethylene glycol |
| 5GP* | Guanosine 5-monophosphate |
| A3P* | Adenosine 3′,5′-diphosphate |
| ACO* | Acetyl coenzyme A |
| ADE | Adenine |
| ADN | Adenosine |
| ADP | Adenosine 5′-diphosphate |
| AKG | 2-Oxoglutaric acid |
| AMP | Adenosine monophosphate |
| ATP* | Adenosine 5′-triphosphate |
| B3P | 2-[3-(2-Hydroxy-1,1-dihydroxymethyl-ethylamino)-propylamino]-2-hydroxymethyl-propane-1,3-diol |
| BCL | Bacteriochlorophyll A |
| BTB | Bis-tris buffer |
| BTN | Biotin |
| C2E* | Cyclic diguanosine monophosphate |
| CAM | Camphor |
| CDL | Cardiolipin |
| CHD | Cholic acid |
| CIT | Citric acid |
| CLA | Chlorophyll A |
| CMP | Adenosine 3′,5′-cyclic monophosphate |
| COA | Coenzyme A |
| CXS | 3-Cyclohexyl-1-propylsulfonic acid |
| CYC | Phycocyanobilin |
| DIO | 1,4-Diethylene dioxide |
| DTT | 1,4-Dithiothreitol |
| EPE | HEPES |
| F3S | Fe3–S4 cluster |
| FAD* | Flavin-adenine dinucleotide |
| FMN* | Flavin mononucleotide |
| FPP | Farnesyl diphosphate |
| GOL | Glycerol |
| GSH | Glutathione |
| H4B | 5,6,7,8-Tetrahydrobiopterin |
| HC4 |
|
| HEA* | Haem A |
| HED | 2-Hydroxyethyl disulfide |
| HEM | Haem |
| IMD | Imidazole |
| IPH | Phenol |
| LDA | Lauryl dimethylamine- |
| MES | 2-( |
| MLI | Malonate ion |
| MLT |
|
| MPD | (4 |
| MTE | Phosphonic acid mono-(2-amino-5,6-dimercapto-4-oxo-3,7,8A,9,10,10A-hexahydro-4H-8-oxa-1,3,9,10-tetraaza-anthracen-7-ylmethyl)ester |
| MYR | Myristic acid |
| NAD* | Nicotinamide adenine dinucleotide |
| NAP* | Nicotinamide adenine dinucleotide phosphate |
| NCO | Cobalt hexammine(III) |
| NHE | 2-( |
| OLA | Oleic acid |
| ORO | Orotic acid |
| P6G | Hexaethylene glycol |
| PEG | Di(hydroxyethyl)ether |
| PEP | Phosphoenolpyruvate |
| PG4 | Tetraethylene glycol |
| PGA | 2-Phosphoglycolic acid |
| PGO |
|
| PHQ | Benzyl chlorocarbonate |
| PLM | Palmitic acid |
| PLP | Pyridoxal-5′-phosphate |
| POP | Pyrophosphate2− |
| PYR | Pyruvic acid |
| RET | Retinal |
| SAM* |
|
| SF4 | Iron–sulfur cluster |
| SIA |
|
| SO4 | Sulfate ion |
| SPO | Spheroidene |
| STU* | Staurosporine |
| TAM | Tris(hydroxyethyl)aminomethane |
| THP | Thymidine 3′,5′-diphosphate |
| TLA |
|
| TPP | Thiamine diphosphate |
| TRS | Tris buffer |
| TYD | Thymidine 5′-diphosphate |
| U10 | Coenzyme Q10 |
| UPG | Uridine 5′-diphosphate-glucose |
Figure 4(a) Final ranks of the correct compound following real-space refinement and ranking by CC for the 550 compounds passing through feature-based ligand selection. (b) Performance with data at various resolutions amongst those ligands passed to the final real-space refinement step. (c) Performance with ligands of different sizes amongst those ligands passed to the final real-space refinement step.