Literature DB >> 34091384

A multiple combined method for rebalancing medical data with class imbalances.

Yun-Chun Wang1, Ching-Hsue Cheng2.   

Abstract

Most classification algorithms assume that classes are in a balanced state. However, datasets with class imbalances are everywhere. The classes of actual medical datasets are imbalanced, severely impacting identification models and even sacrificing the classification accuracy of the minority class, even though it is the most influential and representative. The medical field has irreversible characteristics. Its tolerance rate for misjudgment is relatively low, and errors may cause irreparable harm to patients. Therefore, this study proposes a multiple combined method to rebalance medical data featuring class imbalances. The combined methods include (1) resampling methods (synthetic minority oversampling technique [SMOTE] and undersampling [US]), (2) particle swarm optimization (PSO), and (3) MetaCost. This study conducted two experiments with nine medical datasets to verify and compare the proposed method with the listing methods. A decision tree is used to generate decision rules for easy understanding of the research results. The results show that (1) the proposed method with ensemble learning can improve the area under a receiver operating characteristic curve (AUC), recall, precision, and F1 metrics; (2) MetaCost can increase sensitivity; (3) SMOTE can effectively enhance AUC; (4) US can improve sensitivity, F1, and misclassification costs in data with a high-class imbalance ratio; and (5) PSO-based attribute selection can increase sensitivity and reduce data dimension. Finally, we suggest that the dataset with an imbalanced ratio >9 must use the US results to make the decision. As the imbalanced ratio is < 9, the decision-maker can simultaneously consider the results of SMOTE and US to identify the best decision.
Copyright © 2021 Elsevier Ltd. All rights reserved.

Entities:  

Keywords:  Class imbalance; MetaCost; Particle swarm optimization; Synthetic minority oversampling technique

Mesh:

Year:  2021        PMID: 34091384     DOI: 10.1016/j.compbiomed.2021.104527

Source DB:  PubMed          Journal:  Comput Biol Med        ISSN: 0010-4825            Impact factor:   4.589


  2 in total

1.  Feature-extraction and analysis based on spatial distribution of amino acids for SARS-CoV-2 Protein sequences.

Authors:  Ranjeet Kumar Rout; Sk Sarif Hassan; Sabha Sheikh; Saiyed Umer; Kshira Sagar Sahoo; Amir H Gandomi
Journal:  Comput Biol Med       Date:  2021-11-10       Impact factor: 6.698

2.  Detection of Embryonic Trisomy 21 in the First Trimester Using Maternal Plasma Cell-Free RNA.

Authors:  Carl P Weiner; Mark L Weiss; Helen Zhou; Argyro Syngelaki; Kypros H Nicolaides; Yafeng Dong
Journal:  Diagnostics (Basel)       Date:  2022-06-07
  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.