| Literature DB >> 35392534 |
Congzhou M Sha1,2, Jian Wang2, Nikolay V Dokholyan1,2,3,4.
Abstract
Virtual screening is a cost- and time-effective alternative to traditional high-throughput screening in the drug discovery process. Both virtual screening approaches, structure-based molecular docking and ligand-based cheminformatics, suffer from computational cost, low accuracy, and/or reliance on prior knowledge of a ligand that binds to a given target. Here, we propose a neural network framework, NeuralDock, which accelerates the process of high-quality computational docking by a factor of 106, and does not require prior knowledge of a ligand that binds to a given target. By approximating both protein-small molecule conformational sampling and energy-based scoring, NeuralDock accurately predicts the binding energy, and affinity of a protein-small molecule pair, based on protein pocket 3D structure and small molecule topology. We use NeuralDock and 25 GPUs to dock 937 million molecules from the ZINC database against superoxide dismutase-1 in 21 h, which we validate with physical docking using MedusaDock. Due to its speed and accuracy, NeuralDock may be useful in brute-force virtual screening of massive chemical libraries and training of generative drug models.Entities:
Keywords: binding affinity; drug screening; machine learning; small molecule screening; virtual docking
Year: 2022 PMID: 35392534 PMCID: PMC8980736 DOI: 10.3389/fmolb.2022.867241
Source DB: PubMed Journal: Front Mol Biosci ISSN: 2296-889X
FIGURE 1The neural network architecture on the left and performance comparison with MedusaDock on the right. (A) Inputs, hidden layers, and outputs are shown for the architecture. The protein pocket is flattened and fed into a subnetwork, and the ligand is processed similarly. The outputs of the two subnetworks are concatenated and fed into another subnetwork, which outputs 13 × 7 + 1 values representing the 7 summary statistics of the 13 energies output by MedusaDock, as well as the pK of the protein-ligand pair. The structure of each FC layer is shown at the bottom. (B) The 45 million parameter NeuralDock network achieves class-leading performance on the PDBbind 2013 core set (r = 0.85, p < 0.0001). (C) The 45 million parameter NeuralDock network achieves good agreement with experimentally determined pK on the validation set (r = 0.62, p < 0.0001).
FIGURE 21UXM A4V SOD1 dimer chains A (gold) and B (lilac), with the protein pocket of interest (green) in billion-molecule docking. A cartoon, stick, and water-accessible surface representation of 1UXM, an A4V mutant SOD1 dimer structure. The image was generated using PyMOL 2.4.0 (Schrödinger, 2021).
Correlation coefficients of NeuralDock predicted minimum energy and MedusaDock output for a variety of architectures.
| Number of hidden layers per subnetwork (total) | Dimension of hidden layer | Number of trainable parameters | Correlation coefficient for Ewithout VDWR |
|---|---|---|---|
| 10 FC blocks (30 total) | 2048 | 83,221,686 | 0.794 |
| 10 FC blocks (30 total) | 1024 | 45,618,267 | 0.838* |
| 6 FC blocks (18 total) | 512 | 12,055,131 | 0.800 |
| 6 FC blocks (18 total) | 256 | 4,913,499 | 0.758 |
| 6 inception blocks (18 total) | N/A | 55,937,979 | 0.775 |
Correlation coefficients for binding affinity prediction of a variety of neural networks.
| Model | PDBbind core set binding affinity correlation | Number of test set structures | Number of training set structures |
|---|---|---|---|
| Def2018 General Ensemble [ | 0.80 | 280 | 18,450 protein-ligand complexes and 22,584,102 poses |
| KDEEP [ | 0.82 | 195 | 13,308 protein-ligand complexes |
| TopBP-ML [ | 0.85 | 195 | 22,886 compounds against each of 102 protein targets |
| NeuralDock | 0.85 | 154 | 2331 protein-ligand complexes |
FIGURE 3Comparisons among MedusaDock energies, NeuralDock predicted energies, and experimental binding affinity data. (A) The correlation between NeuralDock predicted Ewithout VDWR and MedusaDock Ewithout VDWR on the validation set (blue circles, r = 0.83, p < 0.0001) and 100 random small molecules docked to 1UXM (orange triangles, r = 0.69, p < 0.0001). NeuralDock performs well on the validation set in predicting MedusaDock energies, and the trend generalizes to 1UXM with no significant difference (2-way ANCOVA F = 0.67, p = 0.41). (B) The correlations of MedusaDock Ewithout VDWR (blue circles), NeuralDock predicted Ewithout VDWR (magenta triangles), and experimental binding affinity (pK) on the validation set (r = −0.48 for both data sets, p < 0.0001), with no significant difference (2-way ANCOVA F = 1.27, p = 0.26). (C) The 100 small molecules with maximum NeuralDock pK (green triangles), from docking of 936,054,166 small molecules from the ZINC library against 1UXM; the corresponding predicted Ewithout VDWR is plotted (lilac circles). The left is higher binding affinity (higher pK) and lower energy (lower Ewithout VDWR). (D) and (E) The relative frequency distributions (300 bins) of NeuralDock predicted pK (mean 4.07, std 0.47) and Ewithout VDWR (mean −36.6, std 4.1), respectively, on 8,099,176 (9% of total) randomly selected small molecules from the docking of 1UXM. The plots are centered at the means, the x-axis ranges are ± 5 standard deviations from the mean, and colors are repeated from (C). Note that both the Ewithout VDWR and pKs in (C) are drawn from the extreme tails of the distributions shown in (D).