| Literature DB >> 35447030 |
Heesung Shim1, Hyojin Kim2, Jonathan E Allen3, Heike Wulff1.
Abstract
The identification of promising lead compounds showing pharmacological activities toward a biological target is essential in early stage drug discovery. With the recent increase in available small-molecule databases, virtual high-throughput screening using physics-based molecular docking has emerged as an essential tool in assisting fast and cost-efficient lead discovery and optimization. However, the best scored docking poses are often suboptimal, resulting in incorrect screening and chemical property calculation. We address the pose classification problem by leveraging data-driven machine learning approaches to identify correct docking poses from AutoDock Vina and Glide screens. To enable effective classification of docking poses, we present two convolutional neural network approaches: a three-dimensional convolutional neural network (3D-CNN) and an attention-based point cloud network (PCN) trained on the PDBbind refined set. We demonstrate the effectiveness of our proposed classifiers on multiple evaluation data sets including the standard PDBbind CASF-2016 benchmark data set and various compound libraries with structurally different protein targets including an ion channel data set extracted from Protein Data Bank (PDB) and an in-house KCa3.1 inhibitor data set. Our experiments show that excluding false positive docking poses using the proposed classifiers improves virtual high-throughput screening to identify novel molecules against each target protein compared to the initial screen based on the docking scores.Entities:
Mesh:
Substances:
Year: 2022 PMID: 35447030 PMCID: PMC9131459 DOI: 10.1021/acs.jcim.1c01510
Source DB: PubMed Journal: J Chem Inf Model ISSN: 1549-9596 Impact factor: 6.162
List of Our Curated PDB Ion Channel Data Set
| PDB ID | Protein name |
|---|---|
| 1J95 | KCSA potassium channel with TBA (tetrabutylammonium) and potassium |
| 2LY0 | Membrane ion channel M2 solution NMR structure of the influenza A virus S31N mutant (19–49) in presence of drug M2WJ332 |
| 2RLF | Proton channel M2 from influenza A in complex with inhibitor rimantadine |
| 3JAF | Structure of alpha-1 glycine receptor by single particle electron cryomicroscopy, glycine/ivermectin-bound state |
| 4TNW | Avermectin-sensitive glutamate-gated chloride channel GluCl alpha |
| 4XDK | Crystal structure of human two pore domain potassium ion channel TREK2 (K2P10.1) in complex with norfluoxetine |
| 4XDL | Crystal structure of human two pore domain potassium ion channel TREK2 (K2P10.1) in complex with a brominated fluoxetine derivative |
| 5EK0 | Human Nav1.7-VSD4-NavAb in complex with GX-936 |
| 5IS0 | Structure of TRPV1 in complex with capsazepine, determined in lipid nanodisc |
| 5KLG | Structure of CavAb(W195Y) in complex with Br-dihydropyridine derivative UK-59811 |
| 5KMD | Structure of CavAb in complex with amlodipine |
| 5KMF | Structure of CavAb in complex with nimodipine |
| 5KMH | Structure of CavAb in complex with Br-verapamil |
| 5OSC | GLIC-GABAAR alpha1 chimera crystallized in complex with pregnenolone sulfate |
| 5VDH | Crystal structure of human glycine receptor alpha-3 bound to AM-3607, glycine, and ivermectin |
| 5VDI | Crystal structure of human glycine receptor alpha-3 mutant N38Q bound to AM-3607, glycine, and ivermectin |
| 6HUG | CryoEM structure of human full-length alpha1-beta3-gamma2L GABA(A)R in complex with picrotoxin and megabody Mb38 |
| 6JPA | Rabbit Cav1.1-verapamil complex |
| 6JPB | Rabbit Cav1.1-Bay K8644 complex |
| 6JUH | Structure of CavAb in complex with efonidipine |
| 6KEB | Structure basis for Diltiazem block of a voltage-gated calcium channel |
| 6LQA | Voltage-gated sodium channel Nav1.5 with quinidine |
| 6MVX | NavAb voltage-gated sodium channel, I217C, in complex with Class 1C antiarrhythmic flecainide |
| 6RV3 | Crystal structure of the human two pore domain potassium ion channel TASK-1 (K2P3.1) in a closed conformation with a bound inhibitor BAY 1000493 |
| 6RV4 | Crystal structure of the human two pore domain potassium ion channel TASK-1 (K2P3.1) in a closed conformation with a bound inhibitor BAY 2341237 |
| 6SXF | Crystal structure of the voltage-gated sodium channel NavMs (F208L) in complex with Tamoxifen |
| 6UZ0 | Cardiac sodium channel (Nav1.5) with flecainide |
| 6WJ5 | Structure of human TRPA1 in complex with inhibitor GDC-0334 |
| 6X40 | Human GABAA receptor alpha1-beta2-gamma2 subtype in complex with GABA plus picrotoxin |
| 6YSN | Human TRPC5 in complex with Pico145 (HC-608) |
| 7BYM | Cryo-EM structure of human KCNQ4 with retigabine |
| 7BYN | Cryo-EM structure of human KCNQ4 with linopirdine |
| 7CR1 | Human KCNQ2 in complex with ztz240 |
| 7D4P | Structure of human TRPC5 in complex with clemizole |
| 7D4Q | Structure of human TRPC5 in complex with HC-070 |
| 7JUP | Structure of human TRPA1 in complex with antagonist compound 21 |
| 7LQZ | Structure of squirrel TRPV1 in complex with RTX |
| 7MZC | Cryo-EM structure of minimal TRPV1 with RTX bound in C1 state |
| 7MZD | Cryo-EM structure of minimal TRPV1 with RTX bound in C2 state |
| 7RHJ | Cryo-EM structure of human rod CNGA1/B1 channel in |
Figure 1Example of 20 Vina docking scores with their RMSD values. They were calculated between each pose and the crystal structure of the 3R17 ligand (human carbonic anhydrase, hCA) from the PDBbind 2019 refined set.
Figure 2(Left) Closed state of the KCa3.1 channel (PDB ID: 6CNM) and binding site of inhibitors (red box). (Right) General structures of KCa3.1 channel triarylmethane and cyclohexadiene based inhibitors. The four channel alpha subunits are rainbow colored; the channel associated with calmodulin is shown in yellow.
Figure 3Overall network architecture of the proposed 3D-CNN and PCN. The input for the networks is 3D atomic structures with their features (3D pose representation). The PCN uses the input data directly, whereas the 3D-CNN uses their voxelized data. The optional interaction features are concatenated with one of the fully connected layer activations.
Prediction Performance of Proposed Pose Classification on CASF-2016a
| No. Poses | Accuracy (%) | Precision | Recall | F1 score | ||
|---|---|---|---|---|---|---|
| RF_i | incorrect | 1633 | 69.9 | 0.70 | 0.96 | 0.81 |
| correct | 796 | 0.68 | 0.16 | 0.26 | ||
| 3D-CNN | incorrect | 1633 | 83.4 | 0.92 | 0.82 | 0.87 |
| correct | 796 | 0.70 | 0.86 | 0.77 | ||
| 3D-CNN_i | incorrect | 1633 | 83.7 | 0.84 | 0.93 | 0.88 |
| correct | 796 | 0.82 | 0.65 | 0.72 | ||
| 3D-CNN_a | incorrect | 1633 | 87.4% | 0.87 | 0.96 | 0.91 |
| correct | 796 | 0.89 | 0.71 | 0.79 | ||
| 3D-CNN_ia | incorrect | 1633 | 0.93 | 0.89 | 0.91 | |
| correct | 796 | 0.80 | 0.87 | 0.83 | ||
| PCN | incorrect | 1633 | 81.4 | 0.86 | 0.86 | 0.86 |
| correct | 796 | 0.71 | 0.72 | 0.72 | ||
| PCN_a | incorrect | 1633 | 82.1 | 0.89 | 0.84 | 0.86 |
| correct | 796 | 0.70 | 0.79 | 0.74 | ||
| PCN_ia | incorrect | 1633 | 0.90 | 0.92 | 0.91 | |
| correct | 796 | 0.82 | 0.78 | 0.80 |
From top to bottom, Random Forest with protein–ligand interaction features (RF_i), 3D-CNN, 3D-CNN with protein–ligand interaction features (3D-CNN_i), 3D-CNN with affine transformation (3D-CNN_a), 3D-CNN with both interaction features and affine transformation (3D-CNN_ia), PCN, PCN with affine transformation (PCN_a), and PCN with both interaction features and affine transformation (PCN_ia).
Figure 4ROC curves of the pose classification models (3DCNN, 3DCNN_i, 3DCNN_a, 3DCNN_ia, PCN, PCN_a, PCN_ia, and RF_i) on CASF-2016.
Prediction Performance of the Proposed Pose Classification on PDB Ion Channel Data Set
| No. Poses | Accuracy (%) | Precision | Recall | F1 score | ||
|---|---|---|---|---|---|---|
| RF_i | incorrect | 191 | 62.0 | 0.62 | 1.00 | 0.77 |
| correct | 117 | 1.00 | 0.01 | 0.02 | ||
| 3D-CNN | incorrect | 191 | 74.4 | 0.77 | 0.84 | 0.80 |
| correct | 117 | 0.69 | 0.59 | 0.64 | ||
| 3D-CNN_i | incorrect | 191 | 70.5 | 0.69 | 0.94 | 0.80 |
| correct | 117 | 0.77 | 0.32 | 0.45 | ||
| 3D-CNN_a | incorrect | 191 | 62.7 | 0.63 | 0.99 | 0.77 |
| correct | 117 | 0.67 | 0.03 | 0.07 | ||
| 3D-CNN_ia | incorrect | 191 | 75.6 | 0.78 | 0.84 | 0.81 |
| correct | 117 | 0.71 | 0.62 | 0.66 | ||
| PCN | incorrect | 191 | 76.6 | 0.82 | 0.80 | 0.81 |
| correct | 117 | 0.68 | 0.72 | 0.70 | ||
| PCN_a | incorrect | 191 | 0.94 | 0.78 | 0.85 | |
| correct | 117 | 0.72 | 0.91 | 0.80 | ||
| PCN_ia | incorrect | 191 | 81.2 | 0.89 | 0.80 | 0.84 |
| correct | 117 | 0.72 | 0.84 | 0.77 |
Figure 5ROC curves of the pose classification models (3DCNN, 3DCNN_i, 3DCNN_a, 3DCNN_ia, PCN, PCN_a, PCN_ia, and RF_i) on PDB ion channel data set.
Figure 6Pearson correlations between docking scores and binding affinities using seven pose classification models for the PDB ion channel data set. (a) Vina scores of top-ranked poses, (b) average Vina scores of all docking poses, and (c) average Vina scores across correct poses filtered by the proposed pose classifier (3D-CNN_ia).
Figure 7Pearson correlations between the docking scores and binding affinities using seven pose classification models for the KCa3.1 channel inhibitor data set (left, Vina; right, Glide). (a) Vina scores of top-ranked poses, (b) average Vina scores of all docking poses, (c) average Vina scores across correct poses filtered by the proposed pose classifier (3D-CNN_i), (d) Glide scores of top-ranked poses, (e) average Glide scores of all docking poses, and (f) average Glide scores across correct poses filtered by the proposed pose classifier (3D-CNN_a).
Figure 8Example of correct and incorrect docking poses in the PDB ion channel data set with pose classification results (4TNW, top; 4XDK, bottom). Each docking pose includes RMSD, Vina score, and model confidence of one of our pose classifiers (3D-CNN_ia), respectively. The model confidence can be [0, 1], where a number close to 0 indicates incorrect.
Figure 9Pearson correlations between binding affinity and docking scores of the top 10, 20, 30, and 40 ranked compounds based on the confidence scores of our pose classifier models on the KCa3.1 channel inhibitor data set. 3D-CNN_i with Vina docking poses (left) and 3D-CNN_a with Glide docking poses (right).
Figure 10pIC50 of the top 10 ranked compounds in the KCa3.1 channel inhibitor data set without (left) and with the pose classifier (3D-CNN_a,right). The orange colors indicate strong binders (pIC50 ≥ 7). The yellow colors indicate compounds with 6 ≤ pIC50 < 7.