Literature DB >> 30835112

In Need of Bias Control: Evaluating Chemical Data for Machine Learning in Structure-Based Virtual Screening.

Jochen Sieg1, Florian Flachsenberg1, Matthias Rarey1.   

Abstract

Reports of successful applications of machine learning (ML) methods in structure-based virtual screening (SBVS) are increasing. ML methods such as convolutional neural networks show promising results and often outperform traditional methods such as empirical scoring functions in retrospective validation. However, trained ML models are often treated as black boxes and are not straightforwardly interpretable. In most cases, it is unknown which features in the data are decisive and whether a model's predictions are right for the right reason. Hence, we re-evaluated three widely used benchmark data sets in the context of ML methods and came to the conclusion that not every benchmark data set is suitable. Moreover, we demonstrate on two examples from current literature that bias is learned implicitly and unnoticed from standard benchmarks. On the basis of these results, we conclude that there is a need for eligible validation experiments and benchmark data sets suited to ML for more bias-controlled validation in ML-based SBVS. Therefore, we provide guidelines for setting up validation experiments and give a perspective on how new data sets could be generated.

Mesh:

Substances:

Year:  2019        PMID: 30835112     DOI: 10.1021/acs.jcim.8b00712

Source DB:  PubMed          Journal:  J Chem Inf Model        ISSN: 1549-9596            Impact factor:   4.956


  33 in total

1.  Data Set Augmentation Allows Deep Learning-Based Virtual Screening to Better Generalize to Unseen Target Classes and Highlight Important Binding Interactions.

Authors:  Jack Scantlebury; Nathan Brown; Frank Von Delft; Charlotte M Deane
Journal:  J Chem Inf Model       Date:  2020-08-04       Impact factor: 4.956

2.  Improving Structure-Based Virtual Screening with Ensemble Docking and Machine Learning.

Authors:  Joel Ricci-Lopez; Sergio A Aguila; Michael K Gilson; Carlos A Brizuela
Journal:  J Chem Inf Model       Date:  2021-10-15       Impact factor: 4.956

3.  Machine learning in postgenomic biology and personalized medicine.

Authors:  Animesh Ray
Journal:  Wiley Interdiscip Rev Data Min Knowl Discov       Date:  2022-01-24

4.  Scoring Functions for Protein-Ligand Binding Affinity Prediction using Structure-Based Deep Learning: A Review.

Authors:  Rocco Meli; Garrett M Morris; Philip C Biggin
Journal:  Front Bioinform       Date:  2022-06-17

5.  Three-Dimensional Convolutional Neural Networks and a Cross-Docked Data Set for Structure-Based Drug Design.

Authors:  Paul G Francoeur; Tomohide Masuda; Jocelyn Sunseri; Andrew Jia; Richard B Iovanisci; Ian Snyder; David R Koes
Journal:  J Chem Inf Model       Date:  2020-09-10       Impact factor: 4.956

6.  SidechainNet: An all-atom protein structure dataset for machine learning.

Authors:  Jonathan Edward King; David Ryan Koes
Journal:  Proteins       Date:  2021-07-12

7.  Big data and artificial intelligence (AI) methodologies for computer-aided drug design (CADD).

Authors:  Jai Woo Lee; Miguel A Maria-Solano; Thi Ngoc Lan Vu; Sanghee Yoon; Sun Choi
Journal:  Biochem Soc Trans       Date:  2022-02-28       Impact factor: 4.919

Review 8.  Benchmarking Data Sets from PubChem BioAssay Data: Current Scenario and Room for Improvement.

Authors:  Viet-Khoa Tran-Nguyen; Didier Rognan
Journal:  Int J Mol Sci       Date:  2020-06-19       Impact factor: 5.923

9.  A path to next-generation reproducibility in cheminformatics.

Authors:  Robert D Clark
Journal:  J Cheminform       Date:  2019-10-14       Impact factor: 5.514

10.  Validating the validation: reanalyzing a large-scale comparison of deep learning and machine learning models for bioactivity prediction.

Authors:  Matthew C Robinson; Robert C Glen; Alpha A Lee
Journal:  J Comput Aided Mol Des       Date:  2020-01-20       Impact factor: 3.686

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.