| Literature DB >> 34124259 |
Rabhi Yassine1, Mrabet Makrem1, Fnaiech Farhat1.
Abstract
A global pandemic has emerged following the appearance of the new severe acute respiratory virus whose official name is the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), strongly affecting the health sector as well as the world economy. Indeed, following the emergence of this new virus, despite the existence of a few approved and known effective vaccines at the time of writing this original study, a sense of urgency has emerged worldwide to discover new technical tools and new drugs as soon as possible. In this context, many studies and researches are currently underway to develop new tools and therapies against SARS CoV-2 and other viruses, using different approaches. The 3-chymotrypsin (3CL) protease, which is directly involved in the cotranslational and posttranslational modifications of viral polyproteins essential for the existence and replication of the virus in the host, is one of the coronavirus target proteins that has been the subject of these extensive studies. Currently, the majority of these studies are aimed at repurposing already known and clinically approved drugs against this new virus, but this approach is not really successful. Recently, different studies have successfully demonstrated the effectiveness of artificial intelligence-based techniques to understand existing chemical spaces and generate new small molecules that are both effective and efficient. In this framework and for our study, we combined a generative recurrent neural network model with transfer learning methods and active learning-based algorithms to design novel small molecules capable of effectively inhibiting the 3CL protease in human cells. We then analyze these small molecules to find the correct binding site that matches the structure of the 3CL protease of our target virus as well as other analyses performed in this study. Based on these screening results, some molecules have achieved a good binding score close to -18 kcal/mol, which we can consider as good potential candidates for further synthesis and testing against SARS-CoV-2.Entities:
Mesh:
Substances:
Year: 2021 PMID: 34124259 PMCID: PMC8172298 DOI: 10.1155/2021/6696012
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
Figure 1Number of trials in each therapeutic area.
Figure 2Flowchart of the strategy to identify candidate SARS-CoV-2 drugs.
Figure 3SARS-CoV-2 main protease: cartoon form (a) and surface form (b).
Datasets for generation tasks.
| Dataset | Purpose |
|---|---|
| ZINC [ | Commercially available compounds for virtual screening |
| ChEMBL [ | A manually curated database of bioactive drug-like molecules |
| ChEMBL [ | Named compounds from chemical patents |
| eMolecules | Purchasable molecules |
| Natural [ | Natural product molecules |
| DrugBank | FDA-approved drugs, experimental drugs, drugs available worldwide |
Selected drug information of current ongoing clinical studies on SARS-CoV-2.
| Drug name | Mechanism of action | Indication | DrugBank ID |
|---|---|---|---|
| Remdesivir | RNA polymerase inhibitor | Anti-Ebola passed phase III, COVID-19 phase III | DB14761 |
| Lopinavir | Protease inhibitor | Anti-HIV approved, COVID-19 | DB01601 |
| Ritonavir | Protease inhibitor | Anti-HIV approved, COVID-19 | DB00503 |
| Emtricitabine | Nucleoside reverse transcriptase inhibitor | Anti-HIV approved, anti-HBV | DB00879 |
| Tenofovir | Nucleoside reverse transcriptase inhibitor | Anti-HIV phase III, anti-HBV | DB14126 |
| Ribavirin | Viral mRNA and protein synthesis inhibitor | Anti-HCV, anti-HBV, anti-SARS, anti-influenza, COVID-19 | DB00811 |
| Methylprednisolone | Corticosteroid | COVID-19 phase II, allergic asthma and rheumatic disorders approved | DB00959 |
| Oseltamivir | Neuraminidase inhibitor; sialidase inhibitor | Anti-influenza approved, COVID-19 phase III | DB00198 |
| Danoprevir | Protease inhibitor | Anti-HCV phase III, COVID-19 phase 4 | DB11779 |
| Chloroquine | — | Antimalarial approved, anti-HIV phase III, anti-HCV, COVID-19 phase 4 | DB14761 |
Figure 4The RNN-LSTM model used to generate SMILES chains. To start, the character “S” is introduced, initializing the hidden and cell states. The network starts sampling symbol by symbol until the end character, “\n” is produced.
Figure 5Flowchart of the strategy to identify the best binding performance candidates.
Figure 6Generated SMILES molecules.
Figure 7PCA projection of the molecular descriptors of the generated molecules and the original training molecules.
Figure 8Distribution of molecular weight and calculated logP (clogP) for generated and original molecules.
A summary of some drug properties for the top anti-SARS-CoV-2 molecules generated using our proposed method and the remdesivir and HIV drugs.
| Chemical formula (CF) | Source | Binding affinity (kcal/mol) | Molecular weight (MW) | log | log | PSA | Similarity to remdesivir | |
|---|---|---|---|---|---|---|---|---|
| 1 | C46H50N4O8 | Generated | -18.3 | 786.92 | 3.82 | -6.94 | 190.99 | 0.30 |
| 2 | C51H59N5O6 | Generated | -18.2 | 838.05 | 4.32 | -7.85 | 156.93 | 0.35 |
| 3 | C52H62N6O6 | Generated | -18.2 | 867.10 | 4.06 | -7.62 | 169.82 | 0.38 |
| 4 | C51H60N6O6 | Generated | -18.1 | 853.07 | 3.72 | -7.35 | 169.82 | 0.38 |
| 5 | C50H58N6O6 | Generated | -18 | 839.04 | 3.38 | -7.08 | 169.82 | 0.38 |
| 6 | C45H49N5O7 | Generated | -17.7 | 771.91 | 3.20 | -6.75 | 196.78 | 0.35 |
| 7 | C45H48N4O8 | Generated | -17.7 | 772.89 | 3.59 | -6.67 | 190.99 | 0.35 |
| 8 | C45H48N4O8 | Generated | -17.7 | 772.89 | 3.48 | -6.67 | 190.99 | 0.30 |
| 9 | C52H61N5O7 | Generated | -17.7 | 868.08 | 3.90 | -7.72 | 166.16 | 0.42 |
| 10 | C53H63N5O6 | Generated | -17.7 | 866.11 | 5.01 | -8.39 | 156.93 | 0.34 |
| 11 | C46H52N4O7 | Generated | -17.6 | 772.93 | 4.30 | -7.07 | 173.92 | 0.36 |
| 12 | C52H61N5O6 | Generated | -17.5 | 852.08 | 4.67 | -8.12 | 156.93 | 0.34 |
| 13 | C52H61N5O6 | Generated | -17.4 | 852.08 | 4.67 | -8.12 | 156.93 | 0.34 |
| 14 | C49H56N6O6 | Generated | -17.3 | 825.01 | 3.03 | -6.81 | 169.82 | 0.39 |
| 15 | C52H66N4O9 | Generated | -17.1 | 891.11 | 4.89 | -8.11 | 211.22 | 0.37 |
| 16 | C46H52N6O6 | Generated | -16.9 | 784.95 | 3.857 | -7.49 | 178.61 | 0.43 |
| 17 | C44H46N4O8 | Generated | -16.9 | 758.86 | 3.09 | -6.25 | 190.99 | 0.40 |
| 18 | C47H53N5O6 | Generated | -16.7 | 783.96 | 4.80 | -8.26 | 165.72 | 0.37 |
| 19 | C52H62N4O6 | Generated | -16.6 | 839.08 | 5.32 | -8.27 | 148.06 | 0.35 |
| 20 | C45H48N4O8 | Generated | -16.5 | 772.89 | 3.48 | -6.67 | 190.99 | 0.31 |
| 21 | C45H50N6O6 | Generated | -16.4 | 770.92 | 3.51 | -7.22 | 178.61 | 0.43 |
| 22 | C46H51N5O6 | Generated | -16.2 | 769.93 | 4.46 | -7.99 | 165.72 | 0.37 |
| 23 | C45H48N4O8 | Generated | -16.2 | 772.89 | 3.48 | -6.67 | 190.99 | 0.31 |
| 24 | C48H58N4O9 | Generated | -16.1 | 835.00 | 3.39 | -7.40 | 225.21 | 0.37 |
| 25 | C53H62N6O7 | Generated | -16 | 895.11 | 5.17 | -9.07 | 194.82 | 0.33 |
| 26 | C53H62N6O7 | Generated | -16 | 895.11 | 5.30 | -8.51 | 198.58 | 0.41 |
| 27 | C48H58N4O10 | Generated | -16 | 851.00 | 4.06 | -7.12 | 211.66 | 0.46 |
| 28 | C47H53N5O5 | Generated | -16 | 767.96 | 4.31 | -7.55 | 140.72 | 0.41 |
| 29 | C44H54N6O6 | Generated | -15.9 | 762.94 | 2.47 | -6.49 | 168.96 | 0.37 |
| 30 | C48H57N3O10 | Generated | -15.8 | 835.99 | 3.79 | -7.33 | 219.42 | 0.38 |
| 31 | C43H51N5O6 | Generated | -15.7 | 733.90 | 3.94 | -7.52 | 165.72 | 0.34 |
| 32 | C42H51N5O6 | Generated | -15.7 | 721.89 | 4.04 | -7.38 | 165.72 | 0.33 |
| 33 | C43H51N5O6 | Generated | -15.6 | 733.90 | 3.94 | -7.52 | 165.72 | 0.34 |
| 34 | C51H66N4O10 | Generated | -15.5 | 895.10 | 5.06 | -8.28 | 201.61 | 0.47 |
| 35 | C43H50N4O6 | Generated | -15.5 | 718.89 | 4.00 | -7.09 | 153.69 | 0.33 |
| 36 | C45H51N5O5 | Generated | -15.4 | 741.92 | 3.75 | -7.27 | 140.72 | 0.40 |
| 37 | C43H52N4O8 | Generated | -15.3 | 752.90 | 3.37 | -6.34 | 190.99 | 0.31 |
| 38 | C50H64N4O10 | Generated | -15.1 | 881.07 | 4.76 | -7.96 | 201.61 | 0.47 |
| 39 | C49H62N4O11 | Generated | -15 | 883.04 | 4.13 | -8.09 | 210.84 | 0.46 |
| 40 | C49H62N4O11 | Generated | -15 | 883.04 | 4.92 | -7.89 | 210.84 | 0.47 |
| 41 | C49H63N5O10 | Generated | -14.9 | 882.06 | 4.60 | -7.62 | 204.85 | 0.52 |
| 42 | C27H35N6O8P | Remdesivir | -13.2 | 602.58 | 0.30 | -4.99 | 213.35 | 1.0 |
| 43 | C38H53N5O7S2 | HIV-TMC-310911 | -11.2 | 755.99 | 5.07 | -6.40 | 179.17 | 0.58 |
| 44 | C38H50N6O5 | HIV-saquinavir | -11.1 | 670.85 | 2.83 | -5.65 | 166.74 | 0.48 |
| 45 | C38H52N6O7 | HIV-atazanavir | -9 | 704.86 | 3.37 | -6.07 | 171.21 | 0.45 |
| 46 | C27H37N3O7S | HIV-darunavir | -8.8 | 547.67 | 2.23 | -3.95 | 148.79 | 0.47 |
| 47 | C32H45N3O4S | HIV-nelfinavir | -8.3 | 567.79 | 4.45 | -5.58 | 127.19 | 0.43 |
| 48 | C25H35N3O6S | HIV-amprenavir | -8.3 | 505.63 | 2.25 | -3.74 | 139.56 | 0.40 |
| 49 | C36H47N5O4 | HIV-indinavir | -8.1 | 613.80 | 2.84 | -3.32 | 118.02 | 0.47 |
| 50 | C33H44N4O6S | HIV-PPL-100 | -8.1 | 624.80 | 4.18 | -5.05 | 159.43 | 0.43 |
Figure 9(a) The best candidate found and SARS-CoV-2 main protease (cartoon view). (b) The best candidate found and SARS-CoV-2 main protease (surface view). (c) The best candidate found and SARS-CoV-2 main protease connections.