Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Effect of data standardization on chemical clustering and similarity searching.

Literature DB >> 19434820

Effect of data standardization on chemical clustering and similarity searching.

Chia-Wei Chu¹, John D Holliday, Peter Willett.

Abstract

Standardization is used to ensure that the variables in a similarity calculation make an equal contribution to the computed similarity value. This paper compares the use of seven different methods that have been suggested previously for the standardization of integer-valued or real-valued data, comparing the results with unstandardized data. Sets of structures from the MDL Drug Data Report and IDAlert databases and represented by Pipeline Pilot physicochemical parameters, molecular holograms and Molconn-Z parameters are clustered using the k-means and Ward's clustering methods. The resulting classifications are evaluated in terms of the degree of clustering of active compounds selected from eleven different biological activity classes, with these classes also being used in similarity searches. It is shown that there is no consistent pattern when the various standardization methods are ranked in order of decreasing effectiveness and that there is no obvious performance benefit (when compared to unstandardized data) that is likely to be obtained from the use of any particular standardization method.

Mesh：

Year: 2009 PMID： 19434820 DOI： 10.1021/ci800224h

Source DB: PubMed Journal: J Chem Inf Model ISSN： 1549-9596 Impact factor: 4.956

Keyword Cloud
Cited

2 in total

1. Analysis and use of fragment-occurrence data in similarity-based virtual screening.

Authors: Shereena M Arif; John D Holliday; Peter Willett
Journal: J Comput Aided Mol Des Date: 2009-06-18 Impact factor: 3.686

2. Classifiers and their Metrics Quantified.

Authors: J B Brown
Journal: Mol Inform Date: 2018-01-23 Impact factor: 3.353

2 in total