| Literature DB >> 34336544 |
Gabriela Ilona B Janairo1, Derrick Ethelbhert C Yu1, Jose Isagani B Janairo2.
Abstract
The widespread infection caused by the 2019 novel corona virus (SARS-CoV-2) has initiated global efforts to search for antiviral agents. Drug discovery is the first step in the development of commercially viable pharmaceutical products to deal with novel diseases. In an effort to accelerate the screening and drug discovery workflow for potential SARS-CoV-2 protease inhibitors, a machine learning model that can predict the binding free energies of compounds to the SARS-CoV-2 main protease is presented. The optimized multiple linear regression model, which was trained and tested on 226 natural compounds demonstrates reliable prediction performance (r 2 test = 0.81, RMSE test = 0.43), while only requiring five topological descriptors. The externally validated model can help conserve and maximize available resources by limiting biological assays to compounds that yielded favorable outcomes from the model. The emergence of highly infectious diseases will always be a threat to human health and development, which is why the development of computational tools for rapid response is very important. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s13721-021-00326-2.Entities:
Keywords: COVID-19; Natural products; QSAR; SARS-CoV-2 main protease; Topological descriptor
Year: 2021 PMID: 34336544 PMCID: PMC8308067 DOI: 10.1007/s13721-021-00326-2
Source DB: PubMed Journal: Netw Model Anal Health Inform Bioinform ISSN: 2192-6670
Fig. 1Narrowing down the number of molecular descriptions by applying thresholds and feature selection to achieve a parsimonious model
Fig. 2Results of diagnostics performed on the lasso regression model using the “olsrr” package. a The actual vs predicted values plot shows if the model is linear and how well the data points fit the regression line. b Residual histogram checks if the residuals are normally distributed. c The residual plot shows if the model is homoscedastic. d Outlier and leverage plot detects the observations influencing the model. Blue points are normal, red points are leverages, green points are outliers, and pink points are both outliers and leverages
Computed VIF and tolerance for each of the five molecular descriptors selected by lasso regression
| Variables | Tolerance | VIF |
|---|---|---|
| WTPT.2 | 0.260 | 3.842 |
| VAdjMat | 0.501 | 1.996 |
| MDEC.23 | 0.538 | 1.859 |
| MDEC.33 | 0.762 | 1.313 |
| FMF | 0.303 | 3.303 |
Variance inflation factor (VIF) and tolerance are metrics to detect multicollinearity
Summary of prediction performance of the machine learning models
Gray-shaded rows used the original dataset composed of 226 compounds. White rows used the refined dataset composed of 203 compounds
Results of the external validation. References for the reported BFE: compounds 1–7 (Farabi et al. 2020), compounds 8–17 Khaerunnisa et al. 2020, compounds 18–30 Prasanth et al. 2020
| CAS no. | Compound name | Reported BFE | Predicted BFE | Difference | % error | |
|---|---|---|---|---|---|---|
| 1 | 154-23-4 | Catechin | − 7.24 | − 7.19 | − 0.05 | 0.68 |
| 2 | 39728-80-8 | Zingerol | − 5.40 | − 5.52 | 0.12 | 2.19 |
| 3 | 539-86-6 | Allicin | − 4.03 | − 3.65 | − 0.38 | 9.41 |
| 4 | 520-18-3 | Kaempferol | − 8.58 | − 7.21 | − 1.37 | 16.01 |
| 5 | 480-41-1 | Naringenin | − 7.89 | − 7.17 | − 0.72 | 9.10 |
| 6 | 22608-11-3 | Demethoxycurcumin | − 7.99 | − 7.21 | − 0.78 | 9.73 |
| 7 | 4670-05-7 | Theaflavin | − 9.00 | − 8.77 | − 0.23 | 2.51 |
| 8 | 480-10-4 | Astragalin | − 8.80 | − 7.99 | − 0.81 | 9.25 |
| 9 | 21637-25-2 | Isoquercitrin | − 8.70 | − 7.97 | − 0.73 | 8.40 |
| 10 | 482-36-0 | Hyperoside | − 8.60 | − 8.02 | − 0.58 | 6.73 |
| 11 | 81-27-6 | Sennoside A | − 8.30 | − 8.80 | 0.50 | 6.08 |
| 12 | 1415-73-2 | Aloin A | − 8.20 | − 7.89 | − 0.31 | 3.76 |
| 13 | 38953-85-4 | Isovitexin | − 8.00 | − 7.93 | − 0.07 | 0.85 |
| 14 | 3463-92-1 | Carpaine | − 7.90 | − 8.04 | 0.14 | 1.80 |
| 15 | 529-92-0 | Cusparine | − 7.90 | − 7.70 | − 0.20 | 2.58 |
| 16 | 54983-96-9 | Piperitol | − 7.80 | − 8.00 | 0.20 | 2.54 |
| 17 | 520-36-5 | Kaempferol | − 7.80 | − 7.17 | − 0.63 | 8.05 |
| 18 | 6750-60-3 | Spathulenol | − 6.60 | − 6.54 | − 0.06 | 0.85 |
| 19 | 83-48-7 | Stigmasterol | − 7.10 | − 7.46 | 0.36 | 5.03 |
| 20 | 925213-53-2 | Subamolide A | − 5.50 | − 6.11 | 0.61 | 11.05 |
| 21 | 530-57-4 | Syringic_acid | − 5.50 | − 5.53 | 0.03 | 0.58 |
| 22 | 21453-69-0 | Lirioresinol B | − 7.40 | − 7.75 | 0.35 | 4.74 |
| 23 | 12798-57-1 | Procyanidin-B5 | − 7.70 | − 8.81 | 1.11 | 14.41 |
| 24 | 607-80-7 | Sesamin | − 7.60 | − 8.32 | 0.72 | 9.54 |
| 25 | 485-19-8 | Reticuline | − 7.00 | − 7.27 | 0.27 | 3.86 |
| 26 | 65230-04-8 | Anhydrocinnzeylanine | − 6.60 | − 7.16 | 0.56 | 8.56 |
| 27 | 523-80-8 | Apiole | − 5.40 | − 6.25 | 0.85 | 15.78 |
| 28 | 499-75-2 | Carvacrol | − 5.30 | − 5.27 | − 0.03 | 0.60 |
| 29 | 87-44-5 | Caryophyllene | − 6.20 | − 6.33 | 0.13 | 2.15 |
| 30 | 23953-63-1 | Carpacin | − 5.40 | − 6.16 | 0.76 | 14.04 |
Fig. 3Distribution of differences between actual and predicted BFE