| Literature DB >> 35889440 |
Chao Yang1, Eric Anthony Chen1, Yingkai Zhang1,2.
Abstract
Molecular docking plays a significant role in early-stage drug discovery, from structure-based virtual screening (VS) to hit-to-lead optimization, and its capability and predictive power is critically dependent on the protein-ligand scoring function. In this review, we give a broad overview of recent scoring function development, as well as the docking-based applications in drug discovery. We outline the strategies and resources available for structure-based VS and discuss the assessment and development of classical and machine learning protein-ligand scoring functions. In particular, we highlight the recent progress of machine learning scoring function ranging from descriptor-based models to deep learning approaches. We also discuss the general workflow and docking protocols of structure-based VS, such as structure preparation, binding site detection, docking strategies, and post-docking filter/re-scoring, as well as a case study on the large-scale docking-based VS test on the LIT-PCBA data set.Entities:
Keywords: datasets; deep learning; machine learning; molecular docking; protein–ligand scoring function; virtual screening
Mesh:
Substances:
Year: 2022 PMID: 35889440 PMCID: PMC9323102 DOI: 10.3390/molecules27144568
Source DB: PubMed Journal: Molecules ISSN: 1420-3049 Impact factor: 4.927
Figure 1Schematics of the categories and datasets and evaluations of the protein–ligand scoring functions.
Machine learning scoring functions.
| ML Algorithm | Name | Input Features | Dataset | Year |
|---|---|---|---|---|
| RF | RF-score [ | Protein–ligand atom-type pair counts in predefined distance cutoff | PDBbind v2007 | 2010 |
| SFCscoreRF [ | Descriptors of ligand-dependent, specific interactions, surface area | PDBbind v2007 | 2013 | |
| ΔVinaRF20 [ | Vina empirical terms, surface area terms | PDBbind v2014 | 2017 | |
| XGB | ΔVinaXGB [ | Vina empirical terms, surface area terms, ligand stability terms, bridge water terms | PDBbind v2016 | 2019 |
| ΔLinF9XGB [ | A series of gauss terms characterizing protein–ligand interactions, surface area terms, ligand descriptors, bridge water terms and pocket features | PDBbind | 2022 | |
| ERT | ET-score [ | Distance-weighted interatomic contacts between protein and ligand | PDBbind v2016 | 2021 |
| GBT | AGL-Score [ | Algebraic graph theory-based features of protein–ligand complex | PDBbind | 2019 |
| ECIF-GBT [ | Protein–ligand atom-type pair counts considering each atom connectivity | PDBbind v2016 | 2021 | |
| NN | NNScore 1.0 [ | Descriptors of specific interactions and ligand-dependent | MOAD | 2010 |
| NNScore 2.0 [ | Vina empirical terms, protein–ligand atom-type pair counts in predefined distance cutoff | MOAD | 2011 | |
| CNN | AtomNet [ | Local structure-based 3D grid from protein–ligand structures | DUD-E | 2017 |
| Pafnucy [ | Atom property-based 3D grid from protein–ligand structures | PDBbind v2016 | 2017 | |
| Kdeep [ | Atom type-based 3D grid from protein–ligand structures | PDBbind v2016 | 2018 | |
| OnionNet [ | Rotation-free element-pair specific contacts between protein and ligand atoms in different distance ranges | PDBbind v2016 | 2019 | |
| GNN | PotentialNet [ | Atom node feature and distance matrix | PDBbind v2007 | 2018 |
| graphDelta [ | Atom node features considering local environment and distance matrix | PDBbind v2018 | 2020 | |
| SIGN [ | Distance matrix of atom nodes and angle matrix of bond edges | PDBbind v2016 | 2021 |
Figure 2Two models of molecular docking. (A) A lock-and-key model. (B) Induced fit model.
Figure 3General scheme of a VS workflow.
Figure 4Workflow of docking-based VS protocol on LIT-PCBA benchmark.
Figure 5Collected LIT-PCBA benchmark test results from four different groups (Zhou et al. [197], Sunseri et al. [90], Tran-Nguyen et al. [198] and Yang et al. [103]). (A) Average enrichment factor at top 1% (mean EF1%) is used to evaluate the early hit enrichment performance. (B) Counting number of targets that satisfy the thresholds of EF1% > 2 as a metric to assess the generalizability of the scoring functions on all 15 diverse targets.