Literature DB >> 14741013

Implementing the Fisher's discriminant ratio in a k-means clustering algorithm for feature selection and data set trimming.

Thy-Hou Lin1, Huang-Te Li, Keng-Chang Tsai.   

Abstract

The Fisher's discriminant ratio has been used as a class separability criterion and implemented in a k-means clustering algorithm for performing simultaneous feature selection and data set trimming on a set of 221 HIV-1 protease inhibitors. The total number of molecular descriptors computed for each inhibitor is 43, and they are scaled to lie between 1 and 0 before being subjected to the feature selection process. Since the purpose is to select some of the most class sensitive descriptors, several feature evaluation indices such as the Shannon entropy, the linear regression of selected descriptors on the pKi of selected inhibitors, and a stepwise variable selection program are used to filter them. While the Shannon entropy provides the information content for each descriptor computed, more class sensitive descriptors are searched by both the linear regression and stepwise variable selection procedures. The inhibitors are divided into several different numbers of classes. They are subsequently divided into five classes due to the fact that the best feature selection result is obtained by the division. Most of the good features selected are the topological descriptors, and they are correlated well with the pKi values. The outliers or the inhibitors with less class-sensitive descriptor values computed for each selected descriptor are identified and gathered by the k-means clustering algorithm. These are the trimmed inhibitors, while the remaining ones are retained or selected. We find that 44% or 98 inhibitors can be retained when the number of good descriptors selected for clustering is three. The descriptor values of these selected inhibitors are far more class sensitive than the original ones as evidenced by substantial increasing in statistical significance when they are subjected to both the SYBYL CoMFA PLS and Cerius2 PLS regression analyses.

Entities:  

Year:  2004        PMID: 14741013     DOI: 10.1021/ci030295a

Source DB:  PubMed          Journal:  J Chem Inf Comput Sci        ISSN: 0095-2338


  4 in total

1.  Quantitative structure-activity relationship by CoMFA for cyclic urea and nonpeptide-cyclic cyanoguanidine derivatives on wild type and mutant HIV-1 protease.

Authors:  Speranta Avram; Cristian Bologa; Maria-Luiza Flonta
Journal:  J Mol Model       Date:  2005-02-16       Impact factor: 1.810

2.  Automatic discrimination between safe and unsafe swallowing using a reputation-based classifier.

Authors:  Mohammad S Nikjoo; Catriona M Steele; Ervin Sejdić; Tom Chau
Journal:  Biomed Eng Online       Date:  2011-11-15       Impact factor: 2.819

3.  A Cluster-then-label Semi-supervised Learning Approach for Pathology Image Classification.

Authors:  Mohammad Peikari; Sherine Salama; Sharon Nofech-Mozes; Anne L Martel
Journal:  Sci Rep       Date:  2018-05-08       Impact factor: 4.379

4.  Synthesis of 2-alkylthio-N-(quinazolin-2-yl)benzenesulfonamide derivatives: anticancer activity, QSAR studies, and metabolic stability.

Authors:  Aneta Pogorzelska; Beata Żołnowska; Jarosław Sławiński; Anna Kawiak; Krzysztof Szafrański; Mariusz Belka; Tomasz Bączek
Journal:  Monatsh Chem       Date:  2018-07-13       Impact factor: 1.451

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.