Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 A comparative study on feature selection methods for drug discovery.

Literature DB >> 15446842

A comparative study on feature selection methods for drug discovery.

Abstract

Feature selection is frequently used as a preprocessing step to machine learning. The removal of irrelevant and redundant information often improves the performance of learning algorithms. This paper is a comparative study of feature selection in drug discovery. The focus is on aggressive dimensionality reduction. Five methods were evaluated, including information gain, mutual information, a chi2-test, odds ratio, and GSS coefficient. Two well-known classification algorithms, Naïve Bayesian and Support Vector Machine (SVM), were used to classify the chemical compounds. The results showed that Naïve Bayesian benefited significantly from the feature selection, while SVM performed better when all features were used. In this experiment, information gain and chi2-test were most effective feature selection methods. Using information gain with a Naïve Bayesian classifier, removal of up to 96% of the features yielded an improved classification accuracy measured by sensitivity. When information gain was used to select the features, SVM was much less sensitive to the reduction of feature space. The feature set size was reduced by 99%, while losing only a few percent in terms of sensitivity (from 58.7% to 52.5%) and specificity (from 98.4% to 97.2%). In contrast to information gain and chi2-test, mutual information had relatively poor performance due to its bias toward favoring rare features and its sensitivity to probability estimation errors. Copyright 2004 American Chemical Society

Entities: Disease

Mesh：

Year: 2004 PMID： 15446842 DOI： 10.1021/ci049875d

Source DB: PubMed Journal: J Chem Inf Comput Sci ISSN： 0095-2338

Keyword Cloud
Cited

16 in total

1. Comparative virtual screening and novelty detection for NMDA-GlycineB antagonists.

Authors: Bjoern A Krueger; Tanja Weil; Gisbert Schneider
Journal: J Comput Aided Mol Des Date: 2009-11-05 Impact factor: 3.686

2. Boosted feature selectors: a case study on prediction P-gp inhibitors and substrates.

Authors: Gonzalo Cerruela García; Nicolás García-Pedrajas
Journal: J Comput Aided Mol Des Date: 2018-10-26 Impact factor: 3.686

3. Influence of feature rankers in the construction of molecular activity prediction models.

Authors: Gonzalo Cerruela-García; José Pérez-Parra Toledano; Aída de Haro-García; Nicolás García-Pedrajas
Journal: J Comput Aided Mol Des Date: 2019-12-31 Impact factor: 3.686

4. Hybrid gene selection approach using XGBoost and multi-objective genetic algorithm for cancer classification.

Authors: Xiongshi Deng; Min Li; Shaobo Deng; Lei Wang
Journal: Med Biol Eng Comput Date: 2022-01-13 Impact factor: 2.602

5. Benchmark of filter methods for feature selection in high-dimensional gene expression survival data.

Authors: Andrea Bommert; Thomas Welchowski; Matthias Schmid; Jörg Rahnenführer
Journal: Brief Bioinform Date: 2022-01-17 Impact factor: 11.622

6. Support vector inductive logic programming outperforms the naive Bayes classifier and inductive logic programming for the classification of bioactive chemical compounds.

Authors: Edward O Cannon; Ata Amini; Andreas Bender; Michael J E Sternberg; Stephen H Muggleton; Robert C Glen; John B O Mitchell
Journal: J Comput Aided Mol Des Date: 2007-03-27 Impact factor: 4.179

7. Data perturbation independent diagnosis and validation of breast cancer subtypes using clustering and patterns.

Authors: G Alexe; G S Dalgin; R Ramaswamy; C Delisi; G Bhanot
Journal: Cancer Inform Date: 2007-02-19

8. A hybrid approach for biomarker discovery from microarray gene expression data for cancer classification.

Authors: Yanxiong Peng; Wenyuan Li; Ying Liu
Journal: Cancer Inform Date: 2007-02-22

Review 9. Recent applications of deep learning and machine intelligence on in silico drug discovery: methods, tools and databases.

Authors: Ahmet Sureyya Rifaioglu; Heval Atas; Maria Jesus Martin; Rengul Cetin-Atalay; Volkan Atalay; Tunca Doğan
Journal: Brief Bioinform Date: 2019-09-27 Impact factor: 11.622

10. Random forests for feature selection in QSPR Models - an application for predicting standard enthalpy of formation of hydrocarbons.

Authors: Ana L Teixeira; João P Leal; Andre O Falcao
Journal: J Cheminform Date: 2013-02-11 Impact factor: 5.514