Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Classical scoring functions for docking are unable to exploit large volumes of structural and interaction data.

Literature DB >> 30873528

Classical scoring functions for docking are unable to exploit large volumes of structural and interaction data.

Hongjian Li^1,2, Jiangjun Peng³, Pavel Sidorov⁴, Yee Leung⁵, Kwong-Sak Leung^5,6, Man-Hon Wong⁶, Gang Lu², Pedro J Ballester⁴.

Abstract

MOTIVATION: Studies have shown that the accuracy of random forest (RF)-based scoring functions (SFs), such as RF-Score-v3, increases with more training samples, whereas that of classical SFs, such as X-Score, does not. Nevertheless, the impact of the similarity between training and test samples on this matter has not been studied in a systematic manner. It is therefore unclear how these SFs would perform when only trained on protein-ligand complexes that are highly dissimilar or highly similar to the test set. It is also unclear whether SFs based on machine learning algorithms other than RF can also improve accuracy with increasing training set size and to what extent they learn from dissimilar or similar training complexes.
RESULTS: We present a systematic study to investigate how the accuracy of classical and machine-learning SFs varies with protein-ligand complex similarities between training and test sets. We considered three types of similarity metrics, based on the comparison of either protein structures, protein sequences or ligand structures. Regardless of the similarity metric, we found that incorporating a larger proportion of similar complexes to the training set did not make classical SFs more accurate. In contrast, RF-Score-v3 was able to outperform X-Score even when trained on just 32% of the most dissimilar complexes, showing that its superior performance owes considerably to learning from dissimilar training complexes to those in the test set. In addition, we generated the first SF employing Extreme Gradient Boosting (XGBoost), XGB-Score, and observed that it also improves with training set size while outperforming the rest of SFs. Given the continuous growth of training datasets, the development of machine-learning SFs has become very appealing.
AVAILABILITY AND IMPLEMENTATION: https://github.com/HongjianLi/MLSF. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Entities: Disease

Mesh：

Substances：
Ligands
Proteins

Year: 2019 PMID： 30873528 DOI： 10.1093/bioinformatics/btz183

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

Keyword Cloud
Cited

10 in total

1. Nonparametric chemical descriptors for the calculation of ligand-biopolymer affinities with machine-learning scoring functions.

Authors: Edelmiro Moman; Maria A Grishina; Vladimir A Potemkin
Journal: J Comput Aided Mol Des Date: 2019-11-14 Impact factor: 3.686

2. Scoring Functions for Protein-Ligand Binding Affinity Prediction using Structure-Based Deep Learning: A Review.

Authors: Rocco Meli; Garrett M Morris; Philip C Biggin
Journal: Front Bioinform Date: 2022-06-17

3. Machine-Learning- and Knowledge-Based Scoring Functions Incorporating Ligand and Protein Fingerprints.

Authors: Kazuhiro J Fujimoto; Shota Minami; Takeshi Yanai
Journal: ACS Omega Date: 2022-05-25

4. Machine-learning scoring functions trained on complexes dissimilar to the test set already outperform classical counterparts on a blind benchmark.

Authors: Hongjian Li; Gang Lu; Kam-Heung Sze; Xianwei Su; Wai-Yee Chan; Kwong-Sak Leung
Journal: Brief Bioinform Date: 2021-11-05 Impact factor: 11.622

Classical scoring functions for docking are unable to exploit large volumes of structural and interaction data.

1. Nonparametric chemical descriptors for the calculation of ligand-biopolymer affinities with machine-learning scoring functions.

2. Scoring Functions for Protein-Ligand Binding Affinity Prediction using Structure-Based Deep Learning: A Review.

3. Machine-Learning- and Knowledge-Based Scoring Functions Incorporating Ligand and Protein Fingerprints.

4. Machine-learning scoring functions trained on complexes dissimilar to the test set already outperform classical counterparts on a blind benchmark.

5. Machine learning on ligand-residue interaction profiles to significantly improve binding affinity prediction.

6. Concise Polygenic Models for Cancer-Specific Identification of Drug-Sensitive Tumors from Their Multi-Omics Profiles.

7. Machine learning prediction of 3CL^pro SARS-CoV-2 docking scores.

8. AtomNet PoseRanker: Enriching Ligand Pose Quality for Dynamic Proteins in Virtual High-Throughput Screens.

9. 3D-RISM-AI: A Machine Learning Approach to Predict Protein-Ligand Binding Affinity Using 3D-RISM.

10. A Free Web-Based Protocol to Assist Structure-Based Virtual Screening Experiments.