| Literature DB >> 36071074 |
Divya B Korlepara1, C S Vasavi1, Shruti Jeurkar1, Pradeep Kumar Pal1, Subhajit Roy1,2, Sarvesh Mehta1, Shubham Sharma1, Vishal Kumar1, Charuvaka Muvva1, Bhuvanesh Sridharan1, Akshit Garg1, Rohit Modee1, Agastya P Bhati3, Divya Nayar4, U Deva Priyakumar5.
Abstract
Computational methods and recently modern machine learning methods have played a key role in structure-based drug design. Though several benchmarking datasets are available for machine learning applications in virtual screening, accurate prediction of binding affinity for a protein-ligand complex remains a major challenge. New datasets that allow for the development of models for predicting binding affinities better than the state-of-the-art scoring functions are important. For the first time, we have developed a dataset, PLAS-5k comprised of 5000 protein-ligand complexes chosen from PDB database. The dataset consists of binding affinities along with energy components like electrostatic, van der Waals, polar and non-polar solvation energy calculated from molecular dynamics simulations using MMPBSA (Molecular Mechanics Poisson-Boltzmann Surface Area) method. The calculated binding affinities outperformed docking scores and showed a good correlation with the available experimental values. The availability of energy components may enable optimization of desired components during machine learning-based drug design. Further, OnionNet model has been retrained on PLAS-5k dataset and is provided as a baseline for the prediction of binding affinities.Entities:
Mesh:
Substances:
Year: 2022 PMID: 36071074 PMCID: PMC9451116 DOI: 10.1038/s41597-022-01631-9
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 8.501
Fig. 1Protocol for input preparation and simulations.
Fig. 2Correlation plots between the experimental and calculated binding affinities for a subset with 2000 pdbids. The binding affinities are calculated (a) using Auto-dock Vina, and (b) using MM-PBSA.
Correlation between experimental and predicted binding free energies for different enzyme classes on a subset of PLAS-5k containing 2000 pdbids, whose experimental binding affinities are available.
| Enzyme class | Number of complexes in each class | ||||
|---|---|---|---|---|---|
| Transferase | 613 | 0.456 | 0.454 | 0.521 | 0.517 |
| Hydrolase | 572 | 0.345 | 0.357 | 0.620 | 0.670 |
| Oxido-reductases | 273 | 0.475 | 0.413 | 0.325 | 0.328 |
| Isomerase | 56 | 0.603 | 0.625 | 0.694 | 0.707 |
| Ligase | 72 | 0.432 | 0.419 | 0.667 | 0.662 |
| Lyase | 36 | 0.438 | 0.358 | 0.534 | 0.492 |
| Others | 378 | 0.411 | 0.403 | 0.529 | 0.552 |
In this subset peptide inhibitors were not considered.
Fig. 3Prediction of binding affinity based on correlation with experimental data: FDA approved drugs for HIV-I protease targets (a) Experimental vs Docking, (b) Experimental vs MM-PBSA; For Tuberculosis targets - (c) Experimental vs Docking (d) Experimental vs MM-PBSA.
Fig. 4Pearson correlation coefficient after training OnionNet on PLAS-5k database.
| Measurement(s) | Binding Affinity |
| Technology Type(s) | Molecular dynamics simulation/MM-PBSA |
| Factor Type(s) | 3D-protein structures |
| Sample Characteristic - Organism | NA |
| Sample Characteristic - Environment | NA |
| Sample Characteristic - Location | NA |