Literature DB >> 19708770

Evolutionary undersampling for classification with imbalanced datasets: proposals and taxonomy.

Salvador García1, Francisco Herrera.   

Abstract

Learning with imbalanced data is one of the recent challenges in machine learning. Various solutions have been proposed in order to find a treatment for this problem, such as modifying methods or the application of a preprocessing stage. Within the preprocessing focused on balancing data, two tendencies exist: reduce the set of examples (undersampling) or replicate minority class examples (oversampling). Undersampling with imbalanced datasets could be considered as a prototype selection procedure with the purpose of balancing datasets to achieve a high classification rate, avoiding the bias toward majority class examples. Evolutionary algorithms have been used for classical prototype selection showing good results, where the fitness function is associated to the classification and reduction rates. In this paper, we propose a set of methods called evolutionary undersampling that take into consideration the nature of the problem and use different fitness functions for getting a good trade-off between balance of distribution of classes and performance. The study includes a taxonomy of the approaches and an overall comparison among our models and state of the art undersampling methods. The results have been contrasted by using nonparametric statistical procedures and show that evolutionary undersampling outperforms the nonevolutionary models when the degree of imbalance is increased.

Mesh:

Year:  2009        PMID: 19708770     DOI: 10.1162/evco.2009.17.3.275

Source DB:  PubMed          Journal:  Evol Comput        ISSN: 1063-6560            Impact factor:   3.277


  5 in total

1.  Improving predictions in imbalanced data using Pairwise Expanded Logistic Regression.

Authors:  Xiaoqian Jiang; Robert El-Kareh; Lucila Ohno-Machado
Journal:  AMIA Annu Symp Proc       Date:  2011-10-22

2.  Pathway-wide association study identifies five shared pathways associated with schizophrenia in three ancestral distinct populations.

Authors:  C Liu; C A Bousman; C Pantelis; E Skafidas; D Zhang; W Yue; I P Everall
Journal:  Transl Psychiatry       Date:  2017-02-21       Impact factor: 6.222

3.  Spiking neurons with short-term synaptic plasticity form superior generative networks.

Authors:  Luziwei Leng; Roman Martel; Oliver Breitwieser; Ilja Bytschok; Walter Senn; Johannes Schemmel; Karlheinz Meier; Mihai A Petrovici
Journal:  Sci Rep       Date:  2018-07-13       Impact factor: 4.379

4.  Exploratory study on classification of diabetes mellitus through a combined Random Forest Classifier.

Authors:  Xuchun Wang; Mengmeng Zhai; Zeping Ren; Hao Ren; Meichen Li; Dichen Quan; Limin Chen; Lixia Qiu
Journal:  BMC Med Inform Decis Mak       Date:  2021-03-20       Impact factor: 2.796

5.  Comparison of mortality prediction models for road traffic accidents: an ensemble technique for imbalanced data.

Authors:  Yookyung Boo; Youngjin Choi
Journal:  BMC Public Health       Date:  2022-08-02       Impact factor: 4.135

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.