Literature DB >> 32549976

RSMOTE: improving classification performance over imbalanced medical datasets.

Mehdi Naseriparsa1, Ahmed Al-Shammari1,2, Ming Sheng3, Yong Zhang3, Rui Zhou1.   

Abstract

INTRODUCTION: Medical diagnosis is a crucial step for patient treatment. However, diagnosis is prone to bias due to imbalanced datasets. To overcome the imbalanced dataset problem, simple minority oversampling technique (SMOTE) was proposed that can generate new synthetic samples at data level to create the balance between minority and majority classes. However, the synthetic samples are generated on a random basis which causes class mixture problem; thus, resulting in deteriorating the classification performance and biased diagnosis.
PURPOSE: In order to overcome the SMOTE shortcomings, some modified methods were proposed that try to generate synthetic samples along the line segment of selected minority samples. Most of these methods adopt one of the two policies for selecting minority samples to generate synthetic samples: borderline region sampling or safe region sampling. However, they both suffer from over-generalisation problem. We propose a modified SMOTE-based resampling method called RSMOTE to alleviate the medical imbalanced dataset problem. We provide an in-depth analysis and verify the performance of RSMOTE over imbalanced medical datasets.
METHODS: In this paper, the proposed RSMOTE divides the minority sample domain into four regions (normal, semi-normal, semi-critical, and critical) based on the minority sample density analysis. RSMOTE discovers the minority sample region globally and applies the resampling near a specific group of samples.
RESULTS: Our analysis and experiments verify that if synthetic samples are generated in the regions with high minority sample density, classification performance will be improved due to low risk of class mixture. Unlike some safe region methods, RSMOTE decides the region of minority samples on a global basis, thus removing the over-generalisation problem. Classic and additional evaluation metrics are considered to measure the effectiveness of the modified method: Recall, FP Rate, Precision, F-Measure, ROC area, and Average Aggregated Metric. We carried out experiments over various imbalanced medical datasets.
CONCLUSION: Based on the minority sample density analysis, we propose RSMOTE method that divides the minority sample domain into four regions. The proposed RSMOTE includes four re-sampling methods that each of them carries out resampling on a specific region. According to the experimental results, resampling on the regions with high minority sample density obtained better results while those with lower minority sample density got the inferior results. Thus, we conclude that the RSMOTE is a more flexible resampling method for the imbalanced medical datasets that is capable of generating samples with various minority sample densities. © Springer Nature Switzerland AG 2020.

Entities:  

Keywords:  Class mixture; Classification performance; Imbalanced learning; Medical diagnosis; SMOTE

Year:  2020        PMID: 32549976      PMCID: PMC7292850          DOI: 10.1007/s13755-020-00112-w

Source DB:  PubMed          Journal:  Health Inf Sci Syst        ISSN: 2047-2501


  8 in total

1.  Comparison of variable selection methods for clinical predictive modeling.

Authors:  L Nelson Sanchez-Pinto; Laura Ruth Venable; John Fahrenbach; Matthew M Churpek
Journal:  Int J Med Inform       Date:  2018-05-21       Impact factor: 4.046

2.  An effective density-based clustering and dynamic maintenance framework for evolving medical data streams.

Authors:  Ahmed Al-Shammari; Rui Zhou; Mehdi Naseriparsaa; Chengfei Liu
Journal:  Int J Med Inform       Date:  2019-03-28       Impact factor: 4.046

3.  Prediction of lung cancer patient survival via supervised machine learning classification techniques.

Authors:  Chip M Lynch; Behnaz Abdollahi; Joshua D Fuqua; Alexandra R de Carlo; James A Bartholomai; Rayeanne N Balgemann; Victor H van Berkel; Hermann B Frieboes
Journal:  Int J Med Inform       Date:  2017-09-25       Impact factor: 4.046

4.  Using machine learning to support healthcare professionals in making preauthorisation decisions.

Authors:  Flávio H D Araújo; André M Santana; Pedro de A Santos Neto
Journal:  Int J Med Inform       Date:  2016-06-16       Impact factor: 4.046

5.  Supervised learning methods for pathological arterial pulse wave differentiation: A SVM and neural networks approach.

Authors:  Joana S Paiva; João Cardoso; Tânia Pereira
Journal:  Int J Med Inform       Date:  2017-10-31       Impact factor: 4.046

6.  SCADI: A standard dataset for self-care problems classification of children with physical and motor disability.

Authors:  M S Zarchi; S M M Fatemi Bushehri; M Dehghanizadeh
Journal:  Int J Med Inform       Date:  2018-03-30       Impact factor: 4.046

7.  Healthcare Text Classification System and its Performance Evaluation: A Source of Better Intelligence by Characterizing Healthcare Text.

Authors:  Saurabh Kumar Srivastava; Sandeep Kumar Singh; Jasjit S Suri
Journal:  J Med Syst       Date:  2018-04-13       Impact factor: 4.460

8.  Proteomics Versus Clinical Data and Stochastic Local Search Based Feature Selection for Acute Myeloid Leukemia Patients' Classification.

Authors:  Lokmane Chebouba; Dalila Boughaci; Carito Guziolowski
Journal:  J Med Syst       Date:  2018-06-04       Impact factor: 4.460

  8 in total
  1 in total

1.  Multimetric feature selection for analyzing multicategory outcomes of colorectal cancer: random forest and multinomial logistic regression models.

Authors:  Catherine H Feng; Mary L Disis; Chao Cheng; Lanjing Zhang
Journal:  Lab Invest       Date:  2021-09-18       Impact factor: 5.662

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.