| Literature DB >> 34070140 |
Marko Jukič1,2, Blaž Škrlj3, Gašper Tomšič4, Sebastian Pleško5, Črtomir Podlipnik6, Urban Bren1,2.
Abstract
COVID-19 represents a new potentially life-threatening illness caused by severe acute respiratory syndrome coronavirus 2 or SARS-CoV-2 pathogen. In 2021, new variants of the virus with multiple key mutations have emerged, such as B.1.1.7, B.1.351, P.1 and B.1.617, and are threatening to render available vaccines or potential drugs ineffective. In this regard, we highlight 3CLpro, the main viral protease, as a valuable therapeutic target that possesses no mutations in the described pandemically relevant variants. 3CLpro could therefore provide trans-variant effectiveness that is supported by structural studies and possesses readily available biological evaluation experiments. With this in mind, we performed a high throughput virtual screening experiment using CmDock and the "In-Stock" chemical library to prepare prioritisation lists of compounds for further studies. We coupled the virtual screening experiment to a machine learning-supported classification and activity regression study to bring maximal enrichment and available structural data on known 3CLpro inhibitors to the prepared focused libraries. All virtual screening hits are classified according to 3CLpro inhibitor, viral cysteine protease or remaining chemical space based on the calculated set of 208 chemical descriptors. Last but not least, we analysed if the current set of 3CLpro inhibitors could be used in activity prediction and observed that the field of 3CLpro inhibitors is drastically under-represented compared to the chemical space of viral cysteine protease inhibitors. We postulate that this methodology of 3CLpro inhibitor library preparation and compound prioritisation far surpass the selection of compounds from available commercial "corona focused libraries".Entities:
Keywords: 3C-like protease; 3CLpro; COVID-19; Mpro; SARS-CoV-2; chemical library design; compound prioritisation; high-throughput; in silico drug design; inhibitors; machine learning; virtual screening
Year: 2021 PMID: 34070140 PMCID: PMC8158358 DOI: 10.3390/molecules26103003
Source DB: PubMed Journal: Molecules ISSN: 1420-3049 Impact factor: 4.411
Summary of dominant SARS-CoV-2 variants and relevant mutations.
| Variant 1 | Alternative Name | Sprot/ | Key Mutations | Comment | 3CLpro/PLpro |
|---|---|---|---|---|---|
| B.1.1.7 | UK Variant | 8/23 | E69/70 del | higher transmissibility | none/A1708D |
| B.1.351 | South African Variant | 9/21 | K417N (RBD) | escape host immune response | none/K1655N |
| P.1 | Brasil Variant | 10/17 | K417N/T (RBD) | under research | none/K1795Q |
| B.1.617 | Indian Variant | 7/23 | G142D | under research | none/under research |
1 Other known variants are COH.20G, S Q677H (Midwest variant) and L452R, B1429; 2 The mutations on PLpro are located far outside the enzyme’s active site.
Figure 1Active site of SARS-CoV-2 3Clpro or Mpro enzyme (PDB ID: 6Y7M) with emphasised small molecule and pocket designation on the right. Active site residues are depicted in green coloured line model with emphasized transparent blue-white surface 6 Å around the small molecule inhibitor depicted in blue coloured stick model.
Figure 2Detailed ZINC 15 database “In-Stock” subset tranche description with a total of 9,322,002 compounds used for further calculations (As obtained from the https://zinc15.docking.org; accessed on 8 May 2021; at the time of smiles compound download; the database on the master server is continuously updating).
Figure 3Generated receptor volume with the 3CLpro PDB ID: 6Y7M, the active site near the Cys145 residue and the docking volume defined as a sphere of 7 Å around the reference ligand OEW. Protein is depicted as a pink-magenta-coloured cartoon model with residues emphasised in the line model and coloured atoms (carbon in green, oxygen in red, nitrogen in blue and hydrogen in white colour) with a blue-white transparent active site surface. Docking volume boundary mesh is depicted in blue colour. Isomesh (0.99) was constructed using PyMol 2.1.0 using a grid calculated with cmgrid software (v 0.1.1).
Figure 4HTVS workflow with post-docking filtering, cluster analysis, and compound classification according to the chemical space of 3CLpro inhibitors collected in the ChEMBL database.
Identified top-scoring compounds in the HTVS on the SARS-CoV-2 main protease for further compound prioritisation in biological evaluation experiments.
| no | Structure | Mr (g/mol) | Cluster/QPlogS 1 | CmDock Docking Score 2 | Classification 3 |
|---|---|---|---|---|---|
| 1 |
| 451.54 | 5/−6.42 | −32.51 |
|
| 2 |
| 465.35 | 5/−6.22 | −29.02 |
|
| 3 |
| 459.49 | 5/−3.89 | −26.80 |
|
| 4 |
| 400.45 | 4/−4.54 | −25.58 |
|
| 5 |
| 325.37 | 5/−3.59 | −25.53 |
|
| 6 |
| 396.85 | 5/−6.47 | −25.05 |
|
| 7 |
| 399.83 | 2/−3.44 | −24.76 |
|
| 8 |
| 425.50 | 4/−5.38 | −24.51 |
|
| 9 |
| 353.44 | 5/−4.40 | −24.17 |
|
| 10 |
| 494.55 | 5/−5.60 | −24.01 |
|
| 11 |
| 490.60 | 5/−2.87 | −23.98 |
|
| 12 |
| 337.37 | 5/−4.75 | −23.61 |
|
| 13 |
| 425.52 | 5/−5.66 | −23.53 |
|
| 14 |
| 335.34 | 5/−3.90 | −23.26 |
|
| 15 |
| 401.44 | 4/−5.00 | −23.18 |
|
1 QPlogS Predicted aqueous solubility, log S. S in mol dm−3 is the solute concentration in a saturated solution in equilibrium with the crystalline solid (recommended value range by QuickProp is between –6.5 and –0.5); 3 CmDock INTER-molecular docking score.; 3 As per NeuralNet model.
Figure 5Calculated bound conformations in the 3CLpro active site of the top-scoring hit compounds. Protein is presented in a cartoon model coloured pink with an emphasised molecular surface in light-blue colour. (A); the reference OEW ligand is presented in stick model cored magenta, while the top-scoring hit is coloured grey. (B) the reference OEW ligand is presented in stick model cored magenta while the top 10 scoring compounds are depicted in red-coloured line representations to emphasise their analogous binding mode in the P1-P2-P3 pockets of the active site.
Machine learning model accuracy comparison table.
| Macro F1/mse 1 | NeuralNet 2 | XGB 2 | Linear 2 | Majority/Average 3 |
|---|---|---|---|---|
| Classification | 0.895 ± 0.05 | 0.889 ± 0.014 | 0.667 ± 0.015 | 0.283 ± 0.001 |
| Regression | 0.002 ± 0.001 | 0.003 ±0.001 | 0.012 ± 0.002 | 0.005 ± 0.001 |
1 macro F1 for classification and mse for regression; 2 Learner; 3 majority class classifier in classification and average of the training target space in a regression model.