Literature DB >> 27041974

RACOG and wRACOG: Two Probabilistic Oversampling Techniques.

Barnan Das1, Narayanan C Krishnan1, Diane J Cook1.   

Abstract

As machine learning techniques mature and are used to tackle complex scientific problems, challenges arise such as the imbalanced class distribution problem, where one of the target class labels is under-represented in comparison with other classes. Existing oversampling approaches for addressing this problem typically do not consider the probability distribution of the minority class while synthetically generating new samples. As a result, the minority class is not well represented which leads to high misclassification error. We introduce two Gibbs sampling-based oversampling approaches, namely RACOG and wRACOG, to synthetically generating and strategically selecting new minority class samples. The Gibbs sampler uses the joint probability distribution of attributes of the data to generate new minority class samples in the form of Markov chain. While RACOG selects samples from the Markov chain based on a predefined lag, wRACOG selects those samples that have the highest probability of being misclassified by the existing learning model. We validate our approach using five UCI datasets that were carefully modified to exhibit class imbalance and one new application domain dataset with inherent extreme class imbalance. In addition, we compare the classification performance of the proposed methods with three other existing resampling techniques.

Entities:  

Keywords:  Gibbs sampling; Imbalanced class distribution; Markov chain Monte Carlo (MCMC); oversampling

Year:  2014        PMID: 27041974      PMCID: PMC4814938          DOI: 10.1109/TKDE.2014.2324567

Source DB:  PubMed          Journal:  IEEE Trans Knowl Data Eng        ISSN: 1041-4347            Impact factor:   6.977


  6 in total

1.  A kernel-based two-class classifier for imbalanced data sets.

Authors:  Xia Hong; Sheng Chen; Chris J Harris
Journal:  IEEE Trans Neural Netw       Date:  2007-01

2.  Exploratory undersampling for class-imbalance learning.

Authors:  Xu-Ying Liu; Jianxin Wu; Zhi-Hua Zhou
Journal:  IEEE Trans Syst Man Cybern B Cybern       Date:  2008-12-16

3.  The Monte Carlo method.

Authors:  N METROPOLIS; S ULAM
Journal:  J Am Stat Assoc       Date:  1949-09       Impact factor: 5.033

4.  Assessing the quality of activities in a smart environment.

Authors:  Diane J Cook; M Schmitter-Edgecombe
Journal:  Methods Inf Med       Date:  2009-05-15       Impact factor: 2.176

5.  The use of artificial intelligence in the design of an intelligent cognitive orthosis for people with dementia.

Authors:  A Mihailidis; G R Fernie; J C Barbenel
Journal:  Assist Technol       Date:  2001

Review 6.  Application of cognitive rehabilitation theory to the development of smart prompting technologies.

Authors:  Adriana M Seelye; Maureen Schmitter-Edgecombe; Barnan Das; Diane J Cook
Journal:  IEEE Rev Biomed Eng       Date:  2012
  6 in total
  1 in total

1.  Artificial-Intelligence-Based Prediction of Clinical Events among Hemodialysis Patients Using Non-Contact Sensor Data.

Authors:  Saurabh Singh Thakur; Shabbir Syed Abdul; Hsiao-Yean Shannon Chiu; Ram Babu Roy; Po-Yu Huang; Shwetambara Malwade; Aldilas Achmad Nursetyo; Yu-Chuan Jack Li
Journal:  Sensors (Basel)       Date:  2018-08-27       Impact factor: 3.576

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.