Literature DB >> 18380447

Impact of benchmark data set topology on the validation of virtual screening methods: exploration and quantification by spatial statistics.

Sebastian G Rohrer1, Knut Baumann.   

Abstract

A common finding of many reports evaluating ligand-based virtual screening methods is that validation results vary considerably with changing benchmark data sets. It is widely assumed that these data set specific effects are caused by the redundancy, self-similarity, and cluster structure inherent to those data sets. These phenomena manifest themselves in the data sets' representation in descriptor space, which is termed the data set topology. A methodology for the characterization of data set topology based on spatial statistics is introduced. The method is nonparametric and can deal with arbitrary distributions of descriptor values. With this methodology it is possible to associate differences in virtual screening performance on different data sets with differences in data set topology. Moreover, the better virtual screening performance of certain descriptors can be explained by their ability of representing the benchmark data sets by a more favorable topology. Finally it is shown, that the composition of some benchmark data sets causes topologies that lead to overoptimistic validation results even in very "simple" descriptor spaces. Spatial statistics analysis as proposed here facilitates the detection of such biased data sets and may provide a tool for the future design of unbiased benchmark data sets.

Mesh:

Substances:

Year:  2008        PMID: 18380447     DOI: 10.1021/ci700099u

Source DB:  PubMed          Journal:  J Chem Inf Model        ISSN: 1549-9596            Impact factor:   4.956


  4 in total

1.  Benchmarking methods and data sets for ligand enrichment assessment in virtual screening.

Authors:  Jie Xia; Ermias Lemma Tilahun; Terry-Elinor Reid; Liangren Zhang; Xiang Simon Wang
Journal:  Methods       Date:  2014-12-03       Impact factor: 3.608

2.  Open-source platform to benchmark fingerprints for ligand-based virtual screening.

Authors:  Sereina Riniker; Gregory A Landrum
Journal:  J Cheminform       Date:  2013-05-30       Impact factor: 5.514

3.  An unbiased method to build benchmarking sets for ligand-based virtual screening and its application to GPCRs.

Authors:  Jie Xia; Hongwei Jin; Zhenming Liu; Liangren Zhang; Xiang Simon Wang
Journal:  J Chem Inf Model       Date:  2014-05-01       Impact factor: 4.956

4.  Predictiveness curves in virtual screening.

Authors:  Charly Empereur-Mot; Hélène Guillemain; Aurélien Latouche; Jean-François Zagury; Vivian Viallon; Matthieu Montes
Journal:  J Cheminform       Date:  2015-11-04       Impact factor: 5.514

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.