| Literature DB >> 33326239 |
A Acharya1, R Agarwal2,3, M B Baker4, J Baudry5, D Bhowmik6, S Boehm4, K G Byler5, S Y Chen7, L Coates8, C J Cooper2,3, O Demerdash9, I Daidone10, J D Eblen2,11, S Ellingson12, S Forli13, J Glaser14, J C Gumbart1, J Gunnels15, O Hernandez4, S Irle6,16,17, D W Kneller8, A Kovalevsky8, J Larkin18, T J Lawrence9, S LeGrand18, S-H Liu2,11, J C Mitchell9, G Park7, J M Parks2,3, A Pavlova1, L Petridis2,11, D Poole18, L Pouchard7, A Ramanathan19, D M Rogers14, D Santos-Martins13, A Scheinberg20, A Sedova9, Y Shen2,3, J C Smith2,11, M D Smith2,11, C Soto7, A Tsaris14, M Thavappiragasam9, A F Tillack13, J V Vermaas14, V Q Vuong6,16,17, J Yin14, S Yoo7, M Zahran21, L Zanetti-Polzi22.
Abstract
We present a supercomputer-driven pipeline for in silico drug discovery using enhanced sampling molecular dynamics (MD) and ensemble docking. Ensemble docking makes use of MD results by docking compound databases into representative protein binding-site conformations, thus taking into account the dynamic properties of the binding sites. We also describe preliminary results obtained for 24 systems involving eight proteins of the proteome of SARS-CoV-2. The MD involves temperature replica exchange enhanced sampling, making use of massively parallel supercomputing to quickly sample the configurational space of protein drug targets. Using the Summit supercomputer at the Oak Ridge National Laboratory, more than 1 ms of enhanced sampling MD can be generated per day. We have ensemble docked repurposing databases to 10 configurations of each of the 24 SARS-CoV-2 systems using AutoDock Vina. Comparison to experiment demonstrates remarkably high hit rates for the top scoring tranches of compounds identified by our ensemble approach. We also demonstrate that, using Autodock-GPU on Summit, it is possible to perform exhaustive docking of one billion compounds in under 24 h. Finally, we discuss preliminary results and planned improvements to the pipeline, including the use of quantum mechanical (QM), machine learning, and artificial intelligence (AI) methods to cluster MD trajectories and rescore docking poses.Entities:
Mesh:
Substances:
Year: 2020 PMID: 33326239 PMCID: PMC7754786 DOI: 10.1021/acs.jcim.0c01010
Source DB: PubMed Journal: J Chem Inf Model ISSN: 1549-9596 Impact factor: 4.956
Model Systems Simulated
| Protein/System Notes | ||
|---|---|---|
| S (Spike) Protein Receptor
Binding Domain (RBD)/“Apo”
(PDB:6W41 | S Protein RBD/Complexed with ACE2 (PDB:6W41) | MPro/monomer, CHARMM-GUI default protonation (PDB:6Y2E) |
| MPro/dimer,
CHARMM-GUI default protonation | MPro/dimer, “charged” protonation variant (PDB:6WQF) | MPro monomer/HIE41 protonation variant (PDB:6WQF) |
| MPro dimer/HIE protonation variant (PDB:6WQF) | MPro monomer/HID41 protonation variant (PDB:6WQF) | MPro dimer/HID41 protonation variant (PDB:6WQF) |
| NSP15 (endoribonuclease)/hexamer (PDB:6VWW) | NSP15 (Endoribonuclease)/monomer (PDB:6VWW) | NSP10:NSP16 Complex (Methyltransferase) (PDB:6W4H) |
| NSP10/monomer (PDB:6W4H) | NSP16/monomer (PDB:6W4H) | N (nucleocapsid) N-terminus phosphoprotein/monomer (PDB:6M3M) |
| N (nucleocapsid) N-terminus phosphoprotein/tetramer (PDB:6M3M) | N (nucleocapsid) N-terminus phosphoprotein/tetramer complexed with Zn (PDB:6YVO) | N (nucleocapsid) N-terminus phosphoprotein/monomer alternate crystal structure (PDB:6YVO) |
| NSP9/monomer (PDB:6W4B) | NSP9/dimer (PDB:6W4B) | NSP3 ADP ribose phosphatase/asymmetric unit (PDB:6W02) |
| PLPro/monomer “charged” protonation variant (PDB:6W9C) | PLPro/monomer “neutral” variant (PDB:6WRH) | NSP3 ADP ribose phosphatase (PDB:6W02) |
List of Proteins and Binding Sites Used for “Smaller Database” Docking. PPI Refers to a Protein–protein Interfacea
| receptor/binding site | receptor/binding site |
|---|---|
| MPro monomer/catalytic pocket | NSP15 monomer/catalytic pocket |
| MPro dimer/PPI | NSP15 dimer/PPI |
| NSP9 dimer/FTMap sites | NSP10 monomer/PPI to NSP16 |
| nucleocapsid phosphoprotein/RNA binding site | NSP16 monomer/PPI to NSP10 |
| nucleocapsid phosphoprotein/PPI | NSP10:NSP16/PPI |
| nucleocapsid tetramer/FTMap sites | NSP3 ADRP domain (asymmetric unit, dimer)/active site |
| NSP3 ADRP domain (monomer) active site | NSP9 monomer/PPI |
In some cases, FTMap was used to identify potential binding sites (see SI Table S2).
Figure 1Simulation throughput per replica. Each point represents the performance achieved by replica-exchange MD simulations on a single protein/water system. Run parameters were one replica per node (each node has six GPUs), using between 24 and 40 replicas in a given system.
Figure 2Configurational variability of PLPro (PDB: 6WRH) with neutral HIS protonation states. (A) Overlay of 26 RMSD aligned structures from the lowest temperature replicate spanning the 750 ns of sampling. (B) Population distribution for shape anisotropy (κ) and solvent accessible surface area (SASA), with redder colors indicating greater occupancy of these kappa-SASA combinations. The distributions are also reflected by one-dimensional histograms above and to the right of the plot, and black dots within the population distribution, which represent position information for 10% of the total snapshots considered. (C) Pairwise RMSD clustering for the lowest temperature replica, with the snapshots ordered according to their cluster. The clusters in this instance were defined using a cutoff of half the maximum RMSD observed within the simulation and are labeled according to color with a color-bar for reference located above the plot. (D) Pairwise RMSD distribution across all snapshots. (E) Population statistics for the clusters introduced in (C).
Figure 3Configurational variability of the PLPro (PDB: 6WRH) active site region generally bounded by the black dashed lines and the next step in analysis after Figure . Each of the differently colored aligned protein models represents the center of a populous cluster, as defined by active site conformation RMSD. Residues such as R164, E165, Y266, Q267, and F302 vary in conformation substantially and highlight the conformational variation within the ensemble created through T-REMD. For clearer visualization, only residues 91 and onward for PLPro are shown, as this selection was used for active site alignment. Within the VMD[122] rendering, side chains are displayed without their hydrogens.
Figure 4Distribution of the number of identical compounds being found in n-number of target top 500-compounds selection out of 9014 compounds.
Number of Duplicate Compounds Found in the top 500 lists in specific pairs of proteinsa
In grey/diagonal: the number of compounds unique to the corresponding target/site in the respective top 500 lists.
Number of Top-Scoring Computationally Predicted Compoundsa, Corresponding NCATS-Tested Compounds As a Subset from First Column, Percentage of Strong and Strong+Moderately Active Compounds for the Spike Protein (Top) and MPro (Bottom) Targets
| no. of top compounds
(docking) | no. of corresponding compounds tested (NCATS) | percentage of NCATS actives (strong) | percentage of NCATS actives (strong+moderate) |
|---|---|---|---|
| Spike | |||
| 673 | 235 | 14.0% | 53.2% |
| 420 | 158 | 17.1% | 57.0% |
| 292 | 108 | 20.4% | 61.1% |
| 149 | 55 | 25.5% | 67.3% |
| 81 | 27 | 33.3% | 77.8% |
| 17 | 4 | 100.0% | 100.0% |
| MPro | |||
| 968 | 359 | - | 7.0% |
| 648 | 221 | - | 6.8% |
| 459 | 156 | - | 7.7% |
| 248 | 86 | - | 9.3% |
| 136 | 45 | - | 8.9% |
| 32 | 7 | - | 14.3% |
Top compounds from docking were obtained from the top 500, 300, 200, 100, and 50 ranked lists that correspond to each of the spike and MPro targets. For both systems multiple docking runs were considered and only unique compounds are reported.
Figure 5Comparison of S-protein (Spike) true-positive rates for strong-actives. Plot shows percentage of experimental NCATS positives in top computational-predicted chemicals as solid line. Dashed-line represents constant NCATS positive rate for comparison.
Figure 6General benchmarking of Autodock-GPU and Autodock Vina performance against subset of Enamine database.
Figure 7Example mutational entropy analysis. Residues are colored by entropy, with redder colors corresponding to greater entropy.
Figure 8Deep learning clusters T-REMD simulations of the NSP15 hexameric complex into conformational states that are potentially relevant for docking studies. (A) A 3D-representation of the CVAE learned from the T-REMD simulations shows the presence of multiple conformational states. Each conformation from the simulation is painted using the RMSD to the starting structure and shows the presence of distinct directions in the conformational landscape where low- and high-RMSD structures are distributed. To understand this representation better, we use an at-stochastic neighbor embedding (t-SNE) algorithm to embed the data into a low-dimensional space, where we can clearly visualize how the conformational landscape is organized. In this two-dimensional space, we visualize various observables from the simulations, including (B) RMSD to the native structure, (C) SASA, and (D) radius of gyration. In each of these cases, we can observe the presence of at least three dominant substates with distinct structural characteristics, which can be further used for docking simulations.
Figure 9Pair interaction energy (PIE) decomposition analysis for FMO–DFTB/PCM plotted against FMO-MP2/3-21G/PCM data.
Binding Energy of the top-3 Best Ranked by FMO–DFTB/PCM and Their Binding Free Energy Predicted by Autodock Vina
| ligand SWEETLEAD ID | protein cluster ID | FMO–DFTB/PCM
Δ | Vina Δ |
|---|---|---|---|
| 4752 | 7 | –67.75 | –5.40 |
| 7055 | 11 | –66.78 | –7.60 |
| 4698 | 12 | –66.41 | –7.60 |