| Literature DB >> 17125164 |
Martin Whittle1, Valerie J Gillet, Peter Willett, Jens Loesel.
Abstract
This paper presents a theoretical model of how data fusion can be used to combine the results of multiple similarity searches of chemical databases. The model is based on frequency distributions of similarity values that are fused using a multiple integration over regions defined by the particular fusion rule that is being applied. For pairwise fusion, the resulting double integrals are straightforward to evaluate for simple model distributions. Similarity values for recovered-active and recovered-nonactive frequency distributions are independently modeled using a constant background, linearly biased terms, and a first-order correlated term. The model shows that two standard fusion rules can give performance enhancements in some cases but that the results of fusion are dependent on many factors that, taken together, can lead to seemingly inconsistent levels of enhancement.Mesh:
Year: 2006 PMID: 17125164 DOI: 10.1021/ci049615w
Source DB: PubMed Journal: J Chem Inf Model ISSN: 1549-9596 Impact factor: 4.956