Literature DB >> 17552511

Molecular basis sets - a general similarity-based approach for representing chemical spaces.

Akshay S Raghavendra1, Gerald M Maggiora.   

Abstract

A new method, based on generalized Fourier analysis, is described that utilizes the concept of "molecular basis sets" to represent chemical space within an abstract vector space. The basis vectors in this space are abstract molecular vectors. Inner products among the basis vectors are determined using an ansatz that associates molecular similarities between pairs of molecules with their corresponding inner products. Moreover, the fact that similarities between pairs of molecules are, in essentially all cases, nonzero implies that the abstract molecular basis vectors are nonorthogonal, but since the similarity of a molecule with itself is unity, the molecular vectors are normalized to unity. A symmetric orthogonalization procedure, which optimally preserves the character of the original set of molecular basis vectors, is used to construct appropriate orthonormal basis sets. Molecules can then be represented, in general, by sets of orthonormal "molecule-like" basis vectors within a proper Euclidean vector space. However, the dimension of the space can become quite large. Thus, the work presented here assesses the effect of basis set size on a number of properties including the average squared error and average norm of molecular vectors represented in the space-the results clearly show the expected reduction in average squared error and increase in average norm as the basis set size is increased. Several distance-based statistics are also considered. These include the distribution of distances and their differences with respect to basis sets of differing size and several comparative distance measures such as Spearman rank correlation and Kruscal stress. All of the measures show that, even though the dimension can be high, the chemical spaces they represent, nonetheless, behave in a well-controlled and reasonable manner. Other abstract vector spaces analogous to that described here can also be constructed providing that the appropriate inner products can be directly evaluated as is the case in this work, a problem that is well-known in kernel-based machine learning.

Year:  2007        PMID: 17552511     DOI: 10.1021/ci600552n

Source DB:  PubMed          Journal:  J Chem Inf Model        ISSN: 1549-9596            Impact factor:   4.956


  1 in total

1.  Error bounds on the SCISSORS approximation method.

Authors:  Imran S Haque; Vijay S Pande
Journal:  J Chem Inf Model       Date:  2011-09-08       Impact factor: 4.956

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.