| Literature DB >> 34956592 |
Agastya P Bhati1, Shunzhou Wan1, Dario Alfè2,3, Austin R Clyde4, Mathis Bode5, Li Tan6, Mikhail Titov7, Andre Merzky7, Matteo Turilli7, Shantenu Jha6,7, Roger R Highfield8, Walter Rocchia9, Nicola Scafuri9, Sauro Succi10, Dieter Kranzlmüller11, Gerald Mathias11, David Wifling11, Yann Donon12, Alberto Di Meglio12, Sofia Vallecorsa12, Heng Ma13, Anda Trifan13, Arvind Ramanathan13, Tom Brettin14, Alexander Partin13, Fangfang Xia13, Xiaotan Duan4, Rick Stevens14, Peter V Coveney1,15.
Abstract
The race to meet the challenges of the global pandemic has served as a reminder that the existing drug discovery process is expensive, inefficient and slow. There is a major bottleneck screening the vast number of potential small molecules to shortlist lead compounds for antiviral drug development. New opportunities to accelerate drug discovery lie at the interface between machine learning methods, in this case, developed for linear accelerators, and physics-based methods. The two in silico methods, each have their own advantages and limitations which, interestingly, complement each other. Here, we present an innovative infrastructural development that combines both approaches to accelerate drug discovery. The scale of the potential resulting workflow is such that it is dependent on supercomputing to achieve extremely high throughput. We have demonstrated the viability of this workflow for the study of inhibitors for four COVID-19 target proteins and our ability to perform the required large-scale calculations to identify lead antiviral compounds through repurposing on a variety of supercomputers.Entities:
Keywords: artificial intelligence; free energy predictions; machine learning; molecular dynamics; novel drug design
Year: 2021 PMID: 34956592 PMCID: PMC8504892 DOI: 10.1098/rsfs.2021.0018
Source DB: PubMed Journal: Interface Focus ISSN: 2042-8898 Impact factor: 3.906
Figure 1Integrated modelling pipeline for new COVID-19 treatments where blind ML is made ‘smarter’ with accurate PB methods. It represents an entire virtual drug discovery pipeline, from hit to lead through to lead optimization. The constituent components are DL-based surrogate model for docking (ML1), Autodock-GPU (S1), coarse and fine-grained binding free energies (S3-CG and S3-FG) and S2 (DeepDriveMD). The arrows show the information transferred between the different methods. (Source: IMPECCABLE [55].)
Classes of secondary structure that DSSP defines.
| letter ID | number ID | class of the secondary structure |
|---|---|---|
| G | 0 | 310 helix (first helix) |
| H | 1 | |
| I | 2 | |
| E | 3 | |
| B | 4 | |
| T | 5 | helix turn |
| S | 6 | bend |
| C | 7 | coil (no SS found) |
Jensen–Shannon divergence (a measure of the similarity between two probability distributions; bounded between 0 and 1) between predicted and observed secondary class transitions in the 6VXX trajectory of the spike protein system. Data are presented in decreasing order of similarity. The labels code for the initial and final class. When they are identical, it means that after some oscillation, the residue goes back to the initial class.
| transition | Jensen–Shannon divergence |
|---|---|
| ‘43’ | 4.91 × 10−3 |
| ‘34’ | 5.67 × 10−3 |
| ‘01’ | 7.70 × 10−3 |
| ‘33’ | 1.02 × 10−2 |
| ‘12’ | 1.21 × 10−2 |
| ‘00’ | 1.62 × 10−2 |
| ‘11’ | 1.93 × 10−2 |
| ‘21’ | 4.38 × 10−2 |
| ‘22’ | 6.14 × 10−2 |
| ‘44’ | 6.51 × 10−2 |
| ‘10’ | 1.54 × 10−1 |
| ‘04’ | 3.68 × 10−1 |
Figure 2Structures of the four target proteins studied, in each case shown bound to a compound. From left to right: 3CLPro, PLPro, ADRP and NSP15. The proteins are shown in cartoon representation and compounds in stick representation.
Overview of computing cost for the different calculations in the computing pipeline on Oak Ridge National Laboratory's Summit supercomputer.
| calculation | physical time required in each MD simulation (ns) | no. independent MD simulations per ligand–protein complex | computing time per calculation (node-hours) | computing time per ligand–protein complex (node-hours) | used theoretical performance (TF) |
|---|---|---|---|---|---|
| docking | — | several thousands | 0.0001 | — | — |
| ESMACS | 12 | 25 | — | 10 | 420 |
| TIES | 6 | 65 | — | 700 | 29 400 |
Number of the most promising compounds for each of the four proteins investigated. For each protein, the top 100 compounds, chosen from 10 000 docked small molecules, are evaluated by the ESMACS approach. The number of compounds, which have the most favourable binding free energies in the ranges corresponding to K values of 10 nM (−10.98 kcal mol−1), 100 nM (−9.61 kcal mol−1), 1 µM (−8.24 kcal mol−1), are listed.
| energy (kcal mol−1) | 3CLPro | ADRP | NSP15 | PLPro |
|---|---|---|---|---|
| Δ | 1 | 0 | 3 | 6 |
| −10.98 ≤ Δ | 2 | 2 | 1 | 8 |
| −9.61 ≤ Δ | 1 | 4 | 10 | 5 |
| Δ | 4 | 6 | 14 | 19 |
Figure 3Correlations between the ESMACS results and docking predictions. Weak or no correlations are obtained for the four protein targets—3CLPro, ADRP, NSP15 and PLPro—with correlation coefficients of 0.25, −0.06, 0.16 and 0.20, respectively.
Results from TIES calculations on a set of ligand transformations studied for ADRP. is the relative binding affinity for a transformation, that is the change in binding affinity on morphing one ligand into the other. σ corresponds to the uncertainty associated with the relative binding affinity predicted by TIES.
| transformation | ||
|---|---|---|
| a0–a2 | 1.48 | 0.60 |
| a0–a4 | 1.82 | 0.66 |
| a0–a5 | 1.14 | 0.60 |
| a0–a6 | 3.22 | 0.44 |
| a0–a7 | 1.32 | 0.43 |
| a0–a9 | 0.25 | 0.57 |
| a0–a10 | 1.52 | 0.70 |
| a0–a41 | 3.41 | 0.53 |
| a0–a44 | 1.18 | 0.49 |
| a0–a45 | −0.46 | 0.52 |
| a0–a46 | 2.91 | 0.70 |
| a0–a47 | 0.36 | 0.57 |
| a0–a48 | −0.55 | 0.57 |
| a0–a49 | 1.84 | 0.46 |
| a0–a50 | 0.52 | 0.64 |
| a1–a42 | −0.29 | 0.82 |
| a1–a43 | 2.05 | 1.03 |
| a3–a42 | 0.49 | 0.81 |
| a42–a43 | 4.62 | 0.82 |