Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 RSMOTE: improving classification performance over imbalanced medical datasets.

Literature DB >> 32549976

RSMOTE: improving classification performance over imbalanced medical datasets.

Mehdi Naseriparsa¹, Ahmed Al-Shammari^1,2, Ming Sheng³, Yong Zhang³, Rui Zhou¹.

Abstract

INTRODUCTION: Medical diagnosis is a crucial step for patient treatment. However, diagnosis is prone to bias due to imbalanced datasets. To overcome the imbalanced dataset problem, simple minority oversampling technique (SMOTE) was proposed that can generate new synthetic samples at data level to create the balance between minority and majority classes. However, the synthetic samples are generated on a random basis which causes class mixture problem; thus, resulting in deteriorating the classification performance and biased diagnosis.
PURPOSE: In order to overcome the SMOTE shortcomings, some modified methods were proposed that try to generate synthetic samples along the line segment of selected minority samples. Most of these methods adopt one of the two policies for selecting minority samples to generate synthetic samples: borderline region sampling or safe region sampling. However, they both suffer from over-generalisation problem. We propose a modified SMOTE-based resampling method called RSMOTE to alleviate the medical imbalanced dataset problem. We provide an in-depth analysis and verify the performance of RSMOTE over imbalanced medical datasets.
METHODS: In this paper, the proposed RSMOTE divides the minority sample domain into four regions (normal, semi-normal, semi-critical, and critical) based on the minority sample density analysis. RSMOTE discovers the minority sample region globally and applies the resampling near a specific group of samples.
RESULTS: Our analysis and experiments verify that if synthetic samples are generated in the regions with high minority sample density, classification performance will be improved due to low risk of class mixture. Unlike some safe region methods, RSMOTE decides the region of minority samples on a global basis, thus removing the over-generalisation problem. Classic and additional evaluation metrics are considered to measure the effectiveness of the modified method: Recall, FP Rate, Precision, F-Measure, ROC area, and Average Aggregated Metric. We carried out experiments over various imbalanced medical datasets.
CONCLUSION: Based on the minority sample density analysis, we propose RSMOTE method that divides the minority sample domain into four regions. The proposed RSMOTE includes four re-sampling methods that each of them carries out resampling on a specific region. According to the experimental results, resampling on the regions with high minority sample density obtained better results while those with lower minority sample density got the inferior results. Thus, we conclude that the RSMOTE is a more flexible resampling method for the imbalanced medical datasets that is capable of generating samples with various minority sample densities. © Springer Nature Switzerland AG 2020.

Entities: Chemical

Keywords: Class mixture; Classification performance; Imbalanced learning; Medical diagnosis; SMOTE

Year: 2020 PMID： 32549976 PMCID： PMC7292850 DOI： 10.1007/s13755-020-00112-w

Source DB: PubMed Journal: Health Inf Sci Syst ISSN： 2047-2501

8 in total

1. Comparison of variable selection methods for clinical predictive modeling.

Authors: L Nelson Sanchez-Pinto; Laura Ruth Venable; John Fahrenbach; Matthew M Churpek
Journal: Int J Med Inform Date: 2018-05-21 Impact factor: 4.046

2. An effective density-based clustering and dynamic maintenance framework for evolving medical data streams.

Authors: Ahmed Al-Shammari; Rui Zhou; Mehdi Naseriparsaa; Chengfei Liu
Journal: Int J Med Inform Date: 2019-03-28 Impact factor: 4.046

3. Prediction of lung cancer patient survival via supervised machine learning classification techniques.

Authors: Chip M Lynch; Behnaz Abdollahi; Joshua D Fuqua; Alexandra R de Carlo; James A Bartholomai; Rayeanne N Balgemann; Victor H van Berkel; Hermann B Frieboes
Journal: Int J Med Inform Date: 2017-09-25 Impact factor: 4.046

4. Using machine learning to support healthcare professionals in making preauthorisation decisions.

Authors: Flávio H D Araújo; André M Santana; Pedro de A Santos Neto
Journal: Int J Med Inform Date: 2016-06-16 Impact factor: 4.046

5. Supervised learning methods for pathological arterial pulse wave differentiation: A SVM and neural networks approach.

Authors: Joana S Paiva; João Cardoso; Tânia Pereira
Journal: Int J Med Inform Date: 2017-10-31 Impact factor: 4.046

6. SCADI: A standard dataset for self-care problems classification of children with physical and motor disability.

Authors: M S Zarchi; S M M Fatemi Bushehri; M Dehghanizadeh
Journal: Int J Med Inform Date: 2018-03-30 Impact factor: 4.046

7. Healthcare Text Classification System and its Performance Evaluation: A Source of Better Intelligence by Characterizing Healthcare Text.

Authors: Saurabh Kumar Srivastava; Sandeep Kumar Singh; Jasjit S Suri
Journal: J Med Syst Date: 2018-04-13 Impact factor: 4.460

8. Proteomics Versus Clinical Data and Stochastic Local Search Based Feature Selection for Acute Myeloid Leukemia Patients' Classification.

Authors: Lokmane Chebouba; Dalila Boughaci; Carito Guziolowski
Journal: J Med Syst Date: 2018-06-04 Impact factor: 4.460

8 in total

1 in total

1. Multimetric feature selection for analyzing multicategory outcomes of colorectal cancer: random forest and multinomial logistic regression models.

Authors: Catherine H Feng; Mary L Disis; Chao Cheng; Lanjing Zhang
Journal: Lab Invest Date: 2021-09-18 Impact factor: 5.662

1 in total