Literature DB >> 31739268

Affinity and class probability-based fuzzy support vector machine for imbalanced data sets.

Xinmin Tao1, Qing Li2, Chao Ren3, Wenjie Guo4, Qing He5, Rui Liu6, Junrong Zou7.   

Abstract

The learning problem from imbalanced data sets poses a major challenge in data mining community. Although conventional support vector machine can generally show relatively robust performance in dealing with the classification problems of imbalanced data sets, it treats all training samples with the same contribution for learning, which results in the final decision boundary biasing toward the majority class especially in the presence of outliers or noises. In this paper, we propose a new affinity and class probability-based fuzzy support vector machine technique (ACFSVM). The affinity of a majority class sample is calculated according to support vector description domain (SVDD) model trained only by the given majority class training samples in kernel space similar to that used for FSVM learning. The obtained affinity can be used for identifying possible outliers and some border samples existing in the majority class training samples. In order to eliminate the effect of noises, we employ the kernel k-nearest neighbor method to determine the class probability of the majority class samples in the same kernel space as before. The samples with lower class probabilities are more likely to be noises and their contribution for learning seems to be reduced by their low memberships constructed by combining the affinities and the class probabilities. Thus, ACFSVM can pay more attention to the majority class samples with higher affinities and class probabilities while reducing their effects of the ones with lower affinities and class probabilities, eventually skewing the final classification boundary toward the majority class. In addition, the minority class samples are assigned relative high memberships to guarantee their importance for the model learning. The extensive experimental results on the different imbalanced datasets from UCI repository demonstrate that the proposed approach can achieve better generalization performance in terms of G-Mean, F-Measure, and AUC as compared to the other existing imbalanced dataset classification techniques.
Copyright © 2019 Elsevier Ltd. All rights reserved.

Keywords:  Affinity; Class probability; Fuzzy support vector machine (FSVM); Imbalanced data; Kernelknn

Mesh:

Year:  2019        PMID: 31739268     DOI: 10.1016/j.neunet.2019.10.016

Source DB:  PubMed          Journal:  Neural Netw        ISSN: 0893-6080


  3 in total

1.  Diagnostic classification of cancers using DNA methylation of paracancerous tissues.

Authors:  Baoshan Ma; Bingjie Chai; Heng Dong; Jishuang Qi; Pengcheng Wang; Tong Xiong; Yi Gong; Di Li; Shuxin Liu; Fengju Song
Journal:  Sci Rep       Date:  2022-06-23       Impact factor: 4.996

2.  Deep Learning-Based Imbalanced Classification With Fuzzy Support Vector Machine.

Authors:  Ke-Fan Wang; Jing An; Zhen Wei; Can Cui; Xiang-Hua Ma; Chao Ma; Han-Qiu Bao
Journal:  Front Bioeng Biotechnol       Date:  2022-01-21

3.  Experimental Study and Comparison of Imbalance Ensemble Classifiers with Dynamic Selection Strategy.

Authors:  Dongxue Zhao; Xin Wang; Yashuang Mu; Lidong Wang
Journal:  Entropy (Basel)       Date:  2021-06-28       Impact factor: 2.524

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.