Literature DB >> 30873528

Classical scoring functions for docking are unable to exploit large volumes of structural and interaction data.

Hongjian Li1,2, Jiangjun Peng3, Pavel Sidorov4, Yee Leung5, Kwong-Sak Leung5,6, Man-Hon Wong6, Gang Lu2, Pedro J Ballester4.   

Abstract

MOTIVATION: Studies have shown that the accuracy of random forest (RF)-based scoring functions (SFs), such as RF-Score-v3, increases with more training samples, whereas that of classical SFs, such as X-Score, does not. Nevertheless, the impact of the similarity between training and test samples on this matter has not been studied in a systematic manner. It is therefore unclear how these SFs would perform when only trained on protein-ligand complexes that are highly dissimilar or highly similar to the test set. It is also unclear whether SFs based on machine learning algorithms other than RF can also improve accuracy with increasing training set size and to what extent they learn from dissimilar or similar training complexes.
RESULTS: We present a systematic study to investigate how the accuracy of classical and machine-learning SFs varies with protein-ligand complex similarities between training and test sets. We considered three types of similarity metrics, based on the comparison of either protein structures, protein sequences or ligand structures. Regardless of the similarity metric, we found that incorporating a larger proportion of similar complexes to the training set did not make classical SFs more accurate. In contrast, RF-Score-v3 was able to outperform X-Score even when trained on just 32% of the most dissimilar complexes, showing that its superior performance owes considerably to learning from dissimilar training complexes to those in the test set. In addition, we generated the first SF employing Extreme Gradient Boosting (XGBoost), XGB-Score, and observed that it also improves with training set size while outperforming the rest of SFs. Given the continuous growth of training datasets, the development of machine-learning SFs has become very appealing.
AVAILABILITY AND IMPLEMENTATION: https://github.com/HongjianLi/MLSF. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author(s) 2019. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

Entities:  

Mesh:

Substances:

Year:  2019        PMID: 30873528     DOI: 10.1093/bioinformatics/btz183

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  10 in total

1.  Nonparametric chemical descriptors for the calculation of ligand-biopolymer affinities with machine-learning scoring functions.

Authors:  Edelmiro Moman; Maria A Grishina; Vladimir A Potemkin
Journal:  J Comput Aided Mol Des       Date:  2019-11-14       Impact factor: 3.686

2.  Scoring Functions for Protein-Ligand Binding Affinity Prediction using Structure-Based Deep Learning: A Review.

Authors:  Rocco Meli; Garrett M Morris; Philip C Biggin
Journal:  Front Bioinform       Date:  2022-06-17

3.  Machine-Learning- and Knowledge-Based Scoring Functions Incorporating Ligand and Protein Fingerprints.

Authors:  Kazuhiro J Fujimoto; Shota Minami; Takeshi Yanai
Journal:  ACS Omega       Date:  2022-05-25

4.  Machine-learning scoring functions trained on complexes dissimilar to the test set already outperform classical counterparts on a blind benchmark.

Authors:  Hongjian Li; Gang Lu; Kam-Heung Sze; Xianwei Su; Wai-Yee Chan; Kwong-Sak Leung
Journal:  Brief Bioinform       Date:  2021-11-05       Impact factor: 11.622

5.  Machine learning on ligand-residue interaction profiles to significantly improve binding affinity prediction.

Authors:  Beihong Ji; Xibing He; Jingchen Zhai; Yuzhao Zhang; Viet Hoang Man; Junmei Wang
Journal:  Brief Bioinform       Date:  2021-09-02       Impact factor: 11.622

6.  Concise Polygenic Models for Cancer-Specific Identification of Drug-Sensitive Tumors from Their Multi-Omics Profiles.

Authors:  Stefan Naulaerts; Michael P Menden; Pedro J Ballester
Journal:  Biomolecules       Date:  2020-06-26

7.  Machine learning prediction of 3CLpro SARS-CoV-2 docking scores.

Authors:  Lukas Bucinsky; Dušan Bortňák; Marián Gall; Ján Matúška; Viktor Milata; Michal Pitoňák; Marek Štekláč; Daniel Végh; Dávid Zajaček
Journal:  Comput Biol Chem       Date:  2022-02-26       Impact factor: 3.737

8.  AtomNet PoseRanker: Enriching Ligand Pose Quality for Dynamic Proteins in Virtual High-Throughput Screens.

Authors:  Kate A Stafford; Brandon M Anderson; Jon Sorenson; Henry van den Bedem
Journal:  J Chem Inf Model       Date:  2022-03-02       Impact factor: 4.956

9.  3D-RISM-AI: A Machine Learning Approach to Predict Protein-Ligand Binding Affinity Using 3D-RISM.

Authors:  Kazu Osaki; Toru Ekimoto; Tsutomu Yamane; Mitsunori Ikeguchi
Journal:  J Phys Chem B       Date:  2022-08-15       Impact factor: 3.466

10.  A Free Web-Based Protocol to Assist Structure-Based Virtual Screening Experiments.

Authors:  Nathalie Lagarde; Elodie Goldwaser; Tania Pencheva; Dessislava Jereva; Ilza Pajeva; Julien Rey; Pierre Tuffery; Bruno O Villoutreix; Maria A Miteva
Journal:  Int J Mol Sci       Date:  2019-09-19       Impact factor: 5.923

  10 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.