Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Most Ligand-Based Classification Benchmarks Reward Memorization Rather than Generalization.

Literature DB >> 29698607

Most Ligand-Based Classification Benchmarks Reward Memorization Rather than Generalization.

Abstract

Undetected overfitting can occur when there are significant redundancies between training and validation data. We describe AVE, a new measure of training-validation redundancy for ligand-based classification problems, that accounts for the similarity among inactive molecules as well as active ones. We investigated seven widely used benchmarks for virtual screening and classification, and we show that the amount of AVE bias strongly correlates with the performance of ligand-based predictive methods irrespective of the predicted property, chemical fingerprint, similarity measure, or previously applied unbiasing techniques. Therefore, it may be the case that the previously reported performance of most ligand-based methods can be explained by overfitting to benchmarks rather than good prospective accuracy.

Mesh：

Substances：
Ligands

Year: 2018 PMID： 29698607 DOI： 10.1021/acs.jcim.7b00403

Source DB: PubMed Journal: J Chem Inf Model ISSN： 1549-9596 Impact factor: 4.956

Keyword Cloud
Cited

29 in total

1. Data Set Augmentation Allows Deep Learning-Based Virtual Screening to Better Generalize to Unseen Target Classes and Highlight Important Binding Interactions.

Authors: Jack Scantlebury; Nathan Brown; Frank Von Delft; Charlotte M Deane
Journal: J Chem Inf Model Date: 2020-08-04 Impact factor: 4.956

2. First report of q-RASAR modeling toward an approach of easy interpretability and efficient transferability.

Authors: Arkaprava Banerjee; Kunal Roy
Journal: Mol Divers Date: 2022-06-29 Impact factor: 3.364

3. Hierarchical confounder discovery in the experiment-machine learning cycle.

Authors: Alex Rogozhnikov; Pavan Ramkumar; Rishi Bedi; Saul Kato; G Sean Escola
Journal: Patterns (N Y) Date: 2022-02-22

4. What Does the Machine Learn? Knowledge Representations of Chemical Reactivity.

Authors: Joshua A Kammeraad; Jack Goetz; Eric A Walker; Ambuj Tewari; Paul M Zimmerman
Journal: J Chem Inf Model Date: 2020-03-03 Impact factor: 4.956

5. D3R grand challenge 4: blind prediction of protein-ligand poses, affinity rankings, and relative binding free energies.

Authors: Conor D Parks; Zied Gaieb; Michael Chiu; Huanwang Yang; Chenghua Shao; W Patrick Walters; Johanna M Jansen; Georgia McGaughey; Richard A Lewis; Scott D Bembenek; Michael K Ameriks; Tara Mirzadegan; Stephen K Burley; Rommie E Amaro; Michael K Gilson
Journal: J Comput Aided Mol Des Date: 2020-01-23 Impact factor: 3.686

10. Validating the validation: reanalyzing a large-scale comparison of deep learning and machine learning models for bioactivity prediction.

Authors: Matthew C Robinson; Robert C Glen; Alpha A Lee
Journal: J Comput Aided Mol Des Date: 2020-01-20 Impact factor: 3.686

Most Ligand-Based Classification Benchmarks Reward Memorization Rather than Generalization.

1. Data Set Augmentation Allows Deep Learning-Based Virtual Screening to Better Generalize to Unseen Target Classes and Highlight Important Binding Interactions.

2. First report of q-RASAR modeling toward an approach of easy interpretability and efficient transferability.

3. Hierarchical confounder discovery in the experiment-machine learning cycle.

4. What Does the Machine Learn? Knowledge Representations of Chemical Reactivity.

5. D3R grand challenge 4: blind prediction of protein-ligand poses, affinity rankings, and relative binding free energies.

6. Three-Dimensional Convolutional Neural Networks and a Cross-Docked Data Set for Structure-Based Drug Design.

7. Property-Unmatched Decoys in Docking Benchmarks.

Review 8. Benchmarking Data Sets from PubChem BioAssay Data: Current Scenario and Room for Improvement.

9. Practical Model Selection for Prospective Virtual Screening.

10. Validating the validation: reanalyzing a large-scale comparison of deep learning and machine learning models for bioactivity prediction.