| Literature DB >> 35559261 |
Vishwesh Venkatraman1, Thomas H Colligan2, George T Lesica2, Daniel R Olson2, Jeremiah Gaiser2, Conner J Copeland2, Travis J Wheeler2, Amitava Roy2,3.
Abstract
The SARS-CoV2 pandemic has highlighted the importance of efficient and effective methods for identification of therapeutic drugs, and in particular has laid bare the need for methods that allow exploration of the full diversity of synthesizable small molecules. While classical high-throughput screening methods may consider up to millions of molecules, virtual screening methods hold the promise of enabling appraisal of billions of candidate molecules, thus expanding the search space while concurrently reducing costs and speeding discovery. Here, we describe a new screening pipeline, called drugsniffer, that is capable of rapidly exploring drug candidates from a library of billions of molecules, and is designed to support distributed computation on cluster and cloud resources. As an example of performance, our pipeline required ∼40,000 total compute hours to screen for potential drugs targeting three SARS-CoV2 proteins among a library of ∼3.7 billion candidate molecules.Entities:
Keywords: SARS-C0V-2; computer aided drug design; de novo design; machine learning; protein-ligand docking; virtual screeening
Year: 2022 PMID: 35559261 PMCID: PMC9086895 DOI: 10.3389/fphar.2022.874746
Source DB: PubMed Journal: Front Pharmacol ISSN: 1663-9812 Impact factor: 5.988
Several open access software tools for virtual screening. In a number of the tools, such as dockECR and VirtualFlow, multiple docking programs are used to predict scores between a single target or multiple targets (merging and shrinking approach) and a library of compounds. The AMIDE software carries out large-scale chemical ligand docking over a large dataset of proteins with the aim of identifying potential side effects of new drugs. iDrug, Pharmit (for structure-based pharmacophore modeling), iStar, e-LEA3D, USR-VS (3D shape-based similarity), MTiOpenScreen and ChemicalToolbox are web-based platforms for computer-aided drug design. ChemicalToolbox allows for integration with other tools and workflows (molecular dynamics) that are part of the Galaxy software framework (https://galaxyproject.org/). e-LEA3D uses a de novo drug design strategy in which fragments or combination of fragments that fit a QSAR model or the binding site of a protein are identified. * iDrug uses a pocket structure to define the pharmacophore descriptors needed for LBVS. However, they do not explicitly calculate the interaction between a ligand and the pocket, such as docking. In our opinion, they are marginally SBVS.
| Software | LBVS | SBVS | ADMET |
|---|---|---|---|
| dockECR | ✗ |
| ✗ |
| MolAr | ✗ |
| ✗ |
| iDrug |
|
| ✗ |
| ChemicalToolbox | ✗ |
|
|
| VirtualFlow | ✗ |
|
|
| AMIDE | ✗ |
| ✗ |
| VSPipe | ✗ |
| ✗ |
| DockBlaster | ✗ |
| ✗ |
| e-LEA3D | ✗ |
| ✗ |
| Pharmit |
| ✗ | ✗ |
| iStar | ✗ |
| ✗ |
| USR-VS |
| ✗ | ✗ |
| MTiOpenScreen | ✗ |
| ✗ |
| DrugSniffer |
|
|
|
FIGURE 1Outline of the drugsniffer virtual screening pipeline. The stages include (1) model the targets (e.g., using AlphaFold or crystal structure where available), (2) identify possible binding sites/pockets (e.g., using FPocket), (3) design multiple de novo ligands for the target pockets using AutoGrow, (4) use the designed molecules as seeds to identify similar compounds from small-molecule libraries (using ECFP4 fingerprints as found in RDKit), (5) dock the molecules (using AutoDock Vina) identified by the similarity search and calculate the interaction energy between the target and the docked poses, (6) re-score the best-docked poses of all the molecules using our new scoring function (terms for the function are provided by SMINA and DLIGAND2), (7) identify potentially toxic compounds using our fast ADMET analyzer (using FP-ADMET).
The small molecule databases searched as part of the VS protocol.
| Database | Number of ligands |
|---|---|
| Sweetlead | ≈4,000 |
| Drugbank | ≈10,000 |
| MOLPROT | ≈7,600,000 |
| PUBCHEM | ≈103,000,000 |
| ZINC15 | ≈417,000,000 |
| GDB | ≈1,003,000,000 |
| SAVI | ≈1,009,000,000 |
| ENAMINE | ≈1,200,000,000 |
| Total | ≈3,700,000,000 |
https://simtk.org/projects/sweetlead
https://www.drugbank.ca/releases/latest
https://www.molport.com/shop/libraries-collections
http://ftp.ncbi.nlm.nih.gov/pubchem/Compound/
http://files.docking.org/catalogs/
http://gdb.unibe.ch/downloads/
https://cactus.nci.nih.gov/download/savi_download/
https://enamine.net/library-synthesis/real-compounds/real-database
FIGURE 2Affinity prediction model. The model consists of three separate paths from input to output, each composed of five sequential fully-connected layers. Each path uses a separate set of activation functions, allowing the network to learn diverse representations of the input. The outputs of the three paths are concatenated and passed through a final fully-connected (FC) layer that emits a prediction of binding or non-binding. Fully-connected (FC) layers are represented with blue blocks. The number of nodes in each FC layer is indicated below the block. Activation functions applied to the output of the FC layers are shown in circles. The model was trained for 2000 epochs and batch size 8,192 with the Adam (Diederik and Ba, 2014) optimizer using default β 1,2 parameters and a learning rate of 0.001. Dropout (Srivastava et al., 2014) with p = 0.5 was applied after each fully connected layer during training, and also during validation.
Software used in the VS pipeline.
| Software | Comments |
|---|---|
| RDKit | Routines for ECFP4 fingerprint generation |
| Chemistry Development Kit | logP estimation routines |
| OpenBabel | interconvert chemical file formats |
| MGLTools | interconvert chemical file formats |
| AutoDock Vina | Protein-ligand docking |
| DLigand2 | statistical potential term for protein-ligand binding affinity prediction |
| SMINA | scoring terms for protein-ligand binding affinity prediction |
| AUTOGROW4 |
|
| FP-ADMET | Prediction of ADMET properties |
https://www.rdkit.org
https://cdk.github.io/
http://openbabel.org/wiki/Main_Page
https://ccsb.scripps.edu/mgltools/downloads/
https://github.com/ccsb-scripps/AutoDock-Vina
https://github.com/sysu-yanglab/DLIGAND2
https://github.com/mwojcikowski/smina
https://git.durrantlab.pitt.edu/jdurrant/autogrow4
https://gitlab.com/vishsoft/fpadmet
FIGURE 3Test data consisting of 3,900 ligand-protein pairs and 213,000 decoy-protein pairs was analyzed with the tools listed in the legend, with the relevant tool producing a binding affinity estimate for each pair. Default parameters were used for all tools; our model was trained as described in the text. A ROC curve was produced for each tool, based on the sorted list of predicted affinity.
FIGURE 4Bar plots showing the distribution of the predicted classes for the different ADMET endpoints. EPA1 corresponds to the (LD50 ≤ 50 mg/kg) the highest toxicity category. EPA2 (moderately toxic) includes chemicals with 50 LD50 ≤ 500 mg/kg. EPA3 (slightly toxic) includes chemicals with 500 LD50 ≤ 5,000 mg/kg. Safe chemicals (LD50 > 5,000 mg/kg) are included in EPA4. Here, the color green is used to indicate compounds that are better suited for further study.