Literature DB >> 20936880

Leave-cluster-out cross-validation is appropriate for scoring functions derived from diverse protein data sets.

Christian Kramer1, Peter Gedeck.   

Abstract

With the emergence of large collections of protein-ligand complexes complemented by binding data, as found in PDBbind or BindingMOAD, new opportunities for parametrizing and evaluating scoring functions have arisen. With huge data collections available, it becomes feasible to fit scoring functions in a QSAR style, i.e., by defining protein-ligand interaction descriptors and analyzing them with modern machine-learning methods. As in each data modeling ansatz, care has to be taken to validate the model carefully. Here, we show that there are large differences measured in R (0.77 vs 0.46) or R² (0.59 vs 0.21) for a relatively simple scoring function depending on whether it is validated against the PDBbind core set or validated in a leave-cluster-out cross-validation. If proteins from the same family are present in both the training and validation set, the estimated prediction quality from standard validation techniques looks too optimistic.

Mesh:

Year:  2010        PMID: 20936880     DOI: 10.1021/ci100264e

Source DB:  PubMed          Journal:  J Chem Inf Model        ISSN: 1549-9596            Impact factor:   4.956


  18 in total

1.  Visualizing convolutional neural network protein-ligand scoring.

Authors:  Joshua Hochuli; Alec Helbling; Tamar Skaist; Matthew Ragoza; David Ryan Koes
Journal:  J Mol Graph Model       Date:  2018-06-18       Impact factor: 2.518

2.  Protein-Ligand Scoring with Convolutional Neural Networks.

Authors:  Matthew Ragoza; Joshua Hochuli; Elisa Idrobo; Jocelyn Sunseri; David Ryan Koes
Journal:  J Chem Inf Model       Date:  2017-04-11       Impact factor: 4.956

3.  A D3R prospective evaluation of machine learning for protein-ligand scoring.

Authors:  Jocelyn Sunseri; Matthew Ragoza; Jasmine Collins; David Ryan Koes
Journal:  J Comput Aided Mol Des       Date:  2016-09-03       Impact factor: 3.686

4.  AGL-Score: Algebraic Graph Learning Score for Protein-Ligand Binding Scoring, Ranking, Docking, and Screening.

Authors:  Duc Duy Nguyen; Guo-Wei Wei
Journal:  J Chem Inf Model       Date:  2019-07-01       Impact factor: 4.956

5.  Scoring Functions for Protein-Ligand Binding Affinity Prediction using Structure-Based Deep Learning: A Review.

Authors:  Rocco Meli; Garrett M Morris; Philip C Biggin
Journal:  Front Bioinform       Date:  2022-06-17

6.  Lessons learned in empirical scoring with smina from the CSAR 2011 benchmarking exercise.

Authors:  David Ryan Koes; Matthew P Baumgartner; Carlos J Camacho
Journal:  J Chem Inf Model       Date:  2013-02-12       Impact factor: 4.956

7.  Target-Specific Prediction of Ligand Affinity with Structure-Based Interaction Fingerprints.

Authors:  Florian Leidner; Nese Kurt Yilmaz; Celia A Schiffer
Journal:  J Chem Inf Model       Date:  2019-08-19       Impact factor: 4.956

8.  Three-Dimensional Convolutional Neural Networks and a Cross-Docked Data Set for Structure-Based Drug Design.

Authors:  Paul G Francoeur; Tomohide Masuda; Jocelyn Sunseri; Andrew Jia; Richard B Iovanisci; Ian Snyder; David R Koes
Journal:  J Chem Inf Model       Date:  2020-09-10       Impact factor: 4.956

9.  Machine-learning scoring functions trained on complexes dissimilar to the test set already outperform classical counterparts on a blind benchmark.

Authors:  Hongjian Li; Gang Lu; Kam-Heung Sze; Xianwei Su; Wai-Yee Chan; Kwong-Sak Leung
Journal:  Brief Bioinform       Date:  2021-11-05       Impact factor: 11.622

10.  One Size Does Not Fit All: The Limits of Structure-Based Models in Drug Discovery.

Authors:  Gregory A Ross; Garrett M Morris; Philip C Biggin
Journal:  J Chem Theory Comput       Date:  2013-08-05       Impact factor: 6.006

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.