| Literature DB >> 33905501 |
Dariusz Brzezinski1,2,3, Przemyslaw J Porebski1, Marcin Kowiel3, Joanna M Macnar1,4,5, Wladek Minor1.
Abstract
Structure-guided drug design depends on the correct identification of ligands in crystal structures of protein complexes. However, the interpretation of the electron density maps is challenging and often burdened with confirmation bias. Ligand identification can be aided by automatic methods such as CheckMyBlob, a machine learning algorithm that learns to generalize ligand descriptions from sets of moieties deposited in the Protein Data Bank. Here, we present the CheckMyBlob web server, a platform that can identify ligands in unmodeled fragments of electron density maps or validate ligands in existing models. The server processes PDB/mmCIF and MTZ files and returns a ranking of 10 most likely ligands for each detected electron density blob along with interactive 3D visualizations. Additionally, for each prediction/validation, a plugin script is generated that enables users to conduct a detailed analysis of the server results in Coot. The CheckMyBlob web server is available at https://checkmyblob.bioreproducibility.org.Entities:
Year: 2021 PMID: 33905501 PMCID: PMC8262754 DOI: 10.1093/nar/gkab296
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Schematic of the ligand clustering procedure. Arrows depict example cluster splits based on different criteria at subsequent stages of the process.
Basic statistics and average predictive performance metrics with standard deviations (in parentheses, in the unit of the last significant digit of the mean value) on the cross-validated (10-fold CV) training set and the holdout test set
| 10-fold CV | Holdout set | |
|---|---|---|
| Ligand instances | 696 887 | 17 150 |
| Mean resolution (Å) | 2.2 | 2.5 |
| Accuracy (%) | 71.2(9) | 58.9 |
| Top-5 accuracy (%) | 90.7(5) | 87.2 |
| Top-10 accuracy (%) | 94.9(2) | 92.5 |
| Micro-averaged recall (%) | 71.2(9) | 58.9 |
| Micro-averaged precision (%) | 69.3(11) | 62.7 |
| Micro-averaged F1 (%) | 69.3(11) | 55.7 |
| Cohen's kappa (%) | 64.6(12) | 46.2 |
Figure 2.Schematic representation of the CheckMyBlob workflow and screenshots of the interactive results visualization page and Coot ligand analysis script. The user provides input files (an MTZ file and PDB or mmCIF file) and chooses to either detect unmodeled ligands or validate existing ligands. Next, blobs are detected, extracted from electron density maps, and described by a set of numerical features. The obtained numerical features are input to a machine learning model, which outputs a ranking of the ten most likely ligands for each blob. This probability-based ranking can be viewed on the interactive results visualization page and tested in Coot through a downloadable script.
CheckMyBlob's predictive performance on metal ions validated by CMM (28,29). Total number of ligands: 34 932. Total classification accuracy on this dataset: 92.3%
| Ligand group | Precision (%) | Recall (%) | F1-score (%) | Ligand instances |
|---|---|---|---|---|
| MG-like | 88.8 | 87.4 | 88.1 | 7 063 |
| CA-like | 89.9 | 93.0 | 91.4 | 12 682 |
| ZN-like | 96.4 | 95.2 | 95.8 | 14 686 |
| SR-like | 0.0 | 0.0 | 0.0 | 19 |
| CD-like | 85.1 | 59.6 | 70.1 | 441 |
| HG-like | 40.5 | 41.5 | 41.0 | 41 |
MG-like: Mg, Na, Al; CA-like: Ca, K; ZN-like: Zn, Mn, Cu, Fe, Ni, Co, Cr, Ga, Ti, V; SR-like: Sr, Rb; CD-like: Cd, Ag, Mo, Ru, Pd, Y, Rh, Zr, In; HG-like: all metals with atomic number 55 (Cs) or higher.