| Literature DB >> 34900139 |
Debby D Wang1, Moon-Tong Chan2, Hong Yan3.
Abstract
Binding affinity prediction (BAP) using protein-ligand complex structures is crucial to computer-aided drug design, but remains a challenging problem. To achieve efficient and accurate BAP, machine-learning scoring functions (SFs) based on a wide range of descriptors have been developed. Among those descriptors, protein-ligand interaction fingerprints (IFPs) are competitive due to their simple representations, elaborate profiles of key interactions and easy collaborations with machine-learning algorithms. In this paper, we have adopted a building-block-based taxonomy to review a broad range of IFP models, and compared representative IFP-based SFs in target-specific and generic scoring tasks. Atom-pair-counts-based and substructure-based IFPs show great potential in these tasks.Entities:
Keywords: Computer-aided drug design; Interaction fingerprint; Machine learning; Protein–ligand binding affinity; Scoring function
Year: 2021 PMID: 34900139 PMCID: PMC8637032 DOI: 10.1016/j.csbj.2021.11.018
Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN: 2001-0370 Impact factor: 7.271
Fig. 1Example of a protein–ligand complex and the interacting atoms. (A) Complex of HIV-1 protease and its inhibitor (PDB ID:2QNQ). (B) Protein–ligand interacting atoms defined by a distance threshold ().
Fig. 2Interpretations of the construction processes of representative IFPs. (A) SIFt. (B) CHIF. (C) Atom-pair counts used by RF-Score. (D) APIF. (E) SPLIF.
An overview of different IFP models and the related scoring tasks.
| Category | IFP | Ref | Format | Target-specific scoring | Generic scoring (evaluated on | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Binary | Integer | Floating number | Target | Machine learning algorithm | Evaluation | Version | Machine learning algorithm | Evaluation | |||
| Residue-based | SIFt, r-SIFt | – | – | – | – | – | – | ||||
| w-SIFt | p38 | – | – | – | – | ||||||
| Continuous IFP | HIV-1 protease | GBDT | – | – | – | ||||||
| MIEC-IFP | 5HT2AR, | multiple | – | – | – | ||||||
| Atom-based | KB-IFP | – | – | – | – | – | – | ||||
| PADIF | – | – | – | – | – | – | |||||
| Atom-pair-counts | APC | – | – | – | v2007 | RF | |||||
| EAPC | subset of | NN | v2009 | NN | |||||||
| APCiDB | – | – | – | v2016 | CNN | ||||||
| RAPCiDB | – | – | – | v2016 | CNN | ||||||
| ECIF | – | – | – | V2016 | GBDT | ||||||
| Multi-interaction | APIF | – | – | – | – | – | – | ||||
| Pharm-IF | – | – | – | – | – | – | |||||
| TIFP | – | – | – | – | – | – | |||||
| Substructure | IASF | – | – | – | – | – | – | ||||
| SPLIF | – | – | – | – | – | – | |||||
| PLEC FP | – | – | – | v2016 | NN | ||||||
| PrtCmm IFP | – | – | – | v2019 | RF | ||||||
GBDT: gradient boosting decision tree, RF: random forest, NN: neural network, CNN: convolutional neural network.
R: Pearson’s correlation between predicted and experimental affinities, RMSE: root-mean-square error, SD: standard deviation.
APC: atom-pair counts, EAPC: evolved APC, APCiDB: atom-pair counts in distance bins, RAPCiDB: residue-atom-pair counts in distance bins.
Several toolkits offering the use of IFPs.
| Toolkit | Online address | IFP type | Ref | BAP works |
|---|---|---|---|---|
| IChem | http://bioinfo-pharma.u-strasbg.fr/labwebsite/download.html | Residue-based | - | |
| OEChem | https://www.eyesopen.com/oechem-tk | Residue-based | - | |
| PyPLIF | http://code.google.com/p/pyplif | Residue-based | - | |
| MOE | https://www.chemcomp.com/Products.htm | Residue-based | - | |
| ODDT | https://github.com/oddt/oddt | Substructure-based |
Datasets in PDBbind database for scoring tasks.
| Task | Target protein | Number of | Affinity range |
|---|---|---|---|
| Target-specific | HIV-1 protease | 301 | |
| BETA-SECRETASE 1 | 326 | ||
| BROMODOMAIN-CONTAINING PROTEIN 4 | 176 | ||
| Generic scoring | Multiple (refined set) | 4852 | |
| Multiple (Core Set) | 285 |
IFPs for constructing target-specific scoring functions.
| IFP | Key interactions | pharmacophoric properties | |
|---|---|---|---|
| SIFt1 | 4.5 | Contact, main-chain atom, side-chain atom, polar, nonpolar, hydrogen-bond donor/acceptor | |
| SIFt2 | 4.5 | Hydrogen-bond donor/acceptor, hydrophobic, polar, nonpolar, aromatic (face-to-face), aromatic (edge to face), metal-acceptor | |
| HIF | 10 | Hydrogen bonds | |
| CIF | 10 | Close contacts | |
| CHIF | 10 | Hydrogen bonds and close contacts | |
| APC | 12 | Atom-pair counts | |
| APCiDB | 30.5 | Atom-pair counts in distance bins | |
| ECIF | 6 | Extended connectivity interaction features | |
| APIF | 10 | Pairwise interactions (hydrophobic, hydrogen-bond donor/acceptor) | |
| SPLIF | 4.5 | Implicitly encodes all possible local interactions ( | |
| PLEC FP | 4.5 | Implicitly encodes all possible local interactions ( | |
| PrtCmm IFP | 4.5 | Implicitly encodes all possible local interactions ( |
Fig. 3Performances of IFP Scores in three target-specific tasks (targets: HIV-1 protease, BETA-SECRETASE 1, BROMODOMAIN-CONTAINING PROTEIN 4). These IFP Scores were constructed by associating an IFP model (SIFt1, SIFt2, CIF, HIF, CHIF, APC, APCiDB, ECIF, APIF, SPLIF, PLEC FP or PrtCmm IFP) and a machine-learning method (RFs, GBDTs or regression trees). Performances are evaluated using Pearson’s correlation (upper panels) and RMSE (lower panels) between the predicted and experimental affinities.
Fig. 4Average performances of IFP Scores in three target-specific tasks (targets: HIV-1 protease, BETA-SECRETASE 1, BROMODOMAIN-CONTAINING PROTEIN 4). Pearson’s correlations and RMSEs between the predicted and experimental affinities are presented in the left and right panels, respectively.
Fig. 5Performances of IFP Scores in the generic scoring task (multiple targets). The performances are evaluated on PDBbind v2019 Core Set, with Pearson’s correlations and RMSEs between the predicted and experimental affinities presented in the left and right panels.