| Literature DB >> 36232307 |
Muhammad Tahir Ul Qamar1, Xi-Tong Zhu2, Ling-Ling Chen1,2, Laila Alhussain3, Maha A Alshiekheid4, Abdulrahman Theyab5,6, Mohammad Algahtani5.
Abstract
Leveraging machine learning has been shown to improve the accuracy of structure-based virtual screening. Furthermore, a tremendous amount of empirical data is publicly available, which further enhances the performance of the machine learning approach. In this proof-of-concept study, the 3CLpro enzyme of SARS-CoV-2 was used. Structure-based virtual screening relies heavily on scoring functions. It is widely accepted that target-specific scoring functions may perform more effectively than universal scoring functions in real-world drug research and development processes. It would be beneficial to drug discovery to develop a method that can effectively build target-specific scoring functions. In the current study, the bindingDB database was used to retrieve experimental data. Smina was utilized to generate protein-ligand complexes for the extraction of InteractionFingerPrint (IFP) and SimpleInteractionFingerPrint SIFP fingerprints via the open drug discovery tool (oddt). The present study found that randomforestClassifier and randomforestRegressor performed well when used with the above fingerprints along the Molecular ACCess System (MACCS), Extended Connectivity Fingerprint (ECFP4), and ECFP6. It was found that the area under the precision-recall curve was 0.80, which is considered a satisfactory level of accuracy. In addition, our enrichment factor analysis indicated that our trained scoring function ranked molecules correctly compared to smina's generic scoring function. Further molecular dynamics simulations indicated that the top-ranked molecules identified by our developed scoring function were highly stable in the active site, supporting the validity of our developed process. This research may provide a template for developing target-specific scoring functions against specific enzyme targets.Entities:
Keywords: COVID-19; SARS-CoV-2; machine learning; smina; target specific scoring function
Mesh:
Substances:
Year: 2022 PMID: 36232307 PMCID: PMC9570399 DOI: 10.3390/ijms231911003
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 6.208
Figure 1Domain organization and structural view of the 3CLpro enzyme.
Figure 2Workflow of 3CLpro-specific machine learning scoring function. The input was docked poses of proteins and ligands in the pdb and mol2 format.
Figure 3Chemical space analysis of the actives and decoys. The chemical space was defined as the weight and logP.
Figure 4Characteristics of the actives and decoys molecules. Lipinski’s Rule of Five (Ro5) analysis of (a) active and (b) decoys molecules. Normalized principal moments ratio (NPR) analysis of (c) actives and (d) decoys.
Figure 5Target specific scoring function performance: (a,c) ROC curve, and (b,d) precision-recall curve.
Figure 6Correlation graph of actual and predicted pIC50 values.
Comparison of top 5 molecules ranked by Smina and our developed scoring function.
| Top 5 Molecules Scored by Smina | Top 5 Molecules Scored by Smina 3CLpro-Specific Machine Learning Model | ||||
|---|---|---|---|---|---|
| Molecules | Smina Score | Actual pIC50 | Molecules | 3CLpro-Specific Score | Actual pIC50 |
| Mol_1514 | −10.80 | 4.79 | Mol_336 | 6.95 | 7.10 |
| Mol_890 | −10.64 | 4.79 | Mol_821 | 6.67 | 7.01 |
| Mol_1170 | −10.43 | 2 | Mol_522 | 6.62 | 7.08 |
| Mol_1112 | −10.35 | 2 | Mol_1355 | 6.47 | 7.27 |
| Mol_280 | −10.25 | 4.49 | Mol_819 | 6.39 | 6.66 |
Figure 7Dynamics stability of the top 2 molecules: (a) RMSD, (b) RoG, (c) protein-ligand distance, (d) RMSF.
Figure 8Dynamis stability of the top 2 molecules screened with smina: (a) RMSD, (b) RoG.
Figure 9Decoys generation process through DeepCoys algorithm. The figure was adapted from [24].