Literature DB >> 20347072

A learning method for the class imbalance problem with medical data sets.

Der-Chiang Li1, Chiao-Wen Liu, Susan C Hu.   

Abstract

In medical data sets, data are predominately composed of "normal" samples with only a small percentage of "abnormal" ones, leading to the so-called class imbalance problems. In class imbalance problems, inputting all the data into the classifier to build up the learning model will usually lead a learning bias to the majority class. To deal with this, this paper uses a strategy which over-samples the minority class and under-samples the majority one to balance the data sets. For the majority class, this paper builds up the Gaussian type fuzzy membership function and alpha-cut to reduce the data size; for the minority class, we use the mega-trend diffusion membership function to generate virtual samples for the class. Furthermore, after balancing the data size of classes, this paper extends the data attribute dimension into a higher dimension space using classification related information to enhance the classification accuracy. Two medical data sets, Pima Indians' diabetes and the BUPA liver disorders, are employed to illustrate the approach presented in this paper. The results indicate that the proposed method has better classification performance than SVM, C4.5 decision tree and two other studies. 2010 Elsevier Ltd. All rights reserved.

Entities:  

Mesh:

Year:  2010        PMID: 20347072     DOI: 10.1016/j.compbiomed.2010.03.005

Source DB:  PubMed          Journal:  Comput Biol Med        ISSN: 0010-4825            Impact factor:   4.589


  24 in total

1.  Integrating new data balancing technique with committee networks for imbalanced data: GRSOM approach.

Authors:  Danaipong Chetchotsak; Sirorat Pattanapairoj; Banchar Arnonkijpanich
Journal:  Cogn Neurodyn       Date:  2015-07-31       Impact factor: 5.082

2.  Increasing the Value of Data Within a Large Pharmaceutical Company Through In Silico Models.

Authors:  Alessandro Brigo; Doha Naga; Wolfgang Muster
Journal:  Methods Mol Biol       Date:  2022

3.  Machine Learning Models for Classifying High- and Low-Grade Gliomas: A Systematic Review and Quality of Reporting Analysis.

Authors:  Ryan C Bahar; Sara Merkaj; Gabriel I Cassinelli Petersen; Niklas Tillmanns; Harry Subramanian; Waverly Rose Brim; Tal Zeevi; Lawrence Staib; Eve Kazarian; MingDe Lin; Khaled Bousabarah; Anita J Huttner; Andrej Pala; Seyedmehdi Payabvash; Jana Ivanidze; Jin Cui; Ajay Malhotra; Mariam S Aboian
Journal:  Front Oncol       Date:  2022-04-22       Impact factor: 5.738

4.  Generative Image Translation for Data Augmentation in Colorectal Histopathology Images.

Authors:  Jerry Wei; Arief Suriawinata; Louis Vaickus; Bing Ren; Xiaoying Liu; Jason Wei; Saeed Hassanpour
Journal:  Proc Mach Learn Res       Date:  2019-12

Review 5.  How to evaluate an agent's behavior to infrequent events?-Reliable performance estimation insensitive to class distribution.

Authors:  Sirko Straube; Mario M Krell
Journal:  Front Comput Neurosci       Date:  2014-04-10       Impact factor: 2.380

6.  Machine learning applications for prediction of relapse in childhood acute lymphoblastic leukemia.

Authors:  Liyan Pan; Guangjian Liu; Fangqin Lin; Shuling Zhong; Huimin Xia; Xin Sun; Huiying Liang
Journal:  Sci Rep       Date:  2017-08-07       Impact factor: 4.379

7.  Detecting representative data and generating synthetic samples to improve learning accuracy with imbalanced data sets.

Authors:  Der-Chiang Li; Susan C Hu; Liang-Sian Lin; Chun-Wu Yeh
Journal:  PLoS One       Date:  2017-08-03       Impact factor: 3.240

8.  Comparison of machine learning techniques to predict all-cause mortality using fitness data: the Henry ford exercIse testing (FIT) project.

Authors:  Sherif Sakr; Radwa Elshawi; Amjad M Ahmed; Waqas T Qureshi; Clinton A Brawner; Steven J Keteyian; Michael J Blaha; Mouaz H Al-Mallah
Journal:  BMC Med Inform Decis Mak       Date:  2017-12-19       Impact factor: 2.796

9.  A priori prediction of tumour response to neoadjuvant chemotherapy in breast cancer patients using quantitative CT and machine learning.

Authors:  Hadi Moghadas-Dastjerdi; Hira Rahman Sha-E-Tallat; Lakshmanan Sannachi; Ali Sadeghi-Naini; Gregory J Czarnota
Journal:  Sci Rep       Date:  2020-07-02       Impact factor: 4.379

10.  Multilevel Weighted Support Vector Machine for Classification on Healthcare Data with Missing Values.

Authors:  Talayeh Razzaghi; Oleg Roderick; Ilya Safro; Nicholas Marko
Journal:  PLoS One       Date:  2016-05-19       Impact factor: 3.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.