Literature DB >> 15446827

Evaluation of mutual information and genetic programming for feature selection in QSAR.

Vishwesh Venkatraman1, Andrew Rowland Dalby, Zheng Rong Yang.   

Abstract

Feature selection is a key step in Quantitative Structure Activity Relationship (QSAR) analysis. Chance correlations and multicollinearity are two major problems often encountered when attempting to find generalized QSAR models for use in drug design. Optimal QSAR models require an objective variable relevance analysis step for producing robust classifiers with low complexity and good predictive accuracy. Genetic algorithms coupled with information theoretic approaches such as mutual information have been used to find near-optimal solutions to such multicriteria optimization problems. In this paper, we describe a novel approach for analyzing QSAR data based on these methods. Our experiments with the Thrombin dataset, previously studied as part of the KDD (Knowledge Discovery and Data Mining) Cup 2001 demonstrate the feasibility of this approach. It has been found that it is important to take into account the data distribution, the rule "interestingness", and the need to look at more invariant and monotonic measures of feature selection. Copyright 2004 American Chemical Society

Entities:  

Mesh:

Year:  2004        PMID: 15446827     DOI: 10.1021/ci049933v

Source DB:  PubMed          Journal:  J Chem Inf Comput Sci        ISSN: 0095-2338


  5 in total

1.  On the interpretation and interpretability of quantitative structure-activity relationship models.

Authors:  Rajarshi Guha
Journal:  J Comput Aided Mol Des       Date:  2008-09-11       Impact factor: 3.686

2.  IMMAN: free software for information theory-based chemometric analysis.

Authors:  Ricardo W Pino Urias; Stephen J Barigye; Yovani Marrero-Ponce; César R García-Jacas; José R Valdes-Martiní; Facundo Perez-Gimenez
Journal:  Mol Divers       Date:  2015-01-26       Impact factor: 2.943

3.  A combined Fisher and Laplacian score for feature selection in QSAR based drug design using compounds with known and unknown activities.

Authors:  Mohammad Amin Valizade Hasanloei; Razieh Sheikhpour; Mehdi Agha Sarram; Elnaz Sheikhpour; Hamdollah Sharifi
Journal:  J Comput Aided Mol Des       Date:  2017-12-26       Impact factor: 3.686

4.  AlPOs synthetic factor analysis based on maximum weight and minimum redundancy feature selection.

Authors:  Yuting Guo; Jianzhong Wang; Na Gao; Miao Qi; Ming Zhang; Jun Kong; Yinghua Lv
Journal:  Int J Mol Sci       Date:  2013-11-08       Impact factor: 5.923

5.  Effective automated feature construction and selection for classification of biological sequences.

Authors:  Uday Kamath; Kenneth De Jong; Amarda Shehu
Journal:  PLoS One       Date:  2014-07-17       Impact factor: 3.240

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.