Literature DB >> 29990027

A Distance-Based Weighted Undersampling Scheme for Support Vector Machines and its Application to Imbalanced Classification.

Qi Kang, Lei Shi, MengChu Zhou, XueSong Wang, QiDi Wu, Zhi Wei.   

Abstract

A support vector machine (SVM) plays a prominent role in classic machine learning, especially classification and regression. Through its structural risk minimization, it has enjoyed a good reputation in effectively reducing overfitting, avoiding dimensional disaster, and not falling into local minima. Nevertheless, existing SVMs do not perform well when facing class imbalance and large-scale samples. Undersampling is a plausible alternative to solve imbalanced problems in some way, but suffers from soaring computational complexity and reduced accuracy because of its enormous iterations and random sampling process. To improve their classification performance in dealing with data imbalance problems, this work proposes a weighted undersampling (WU) scheme for SVM based on space geometry distance, and thus produces an improved algorithm named WU-SVM. In WU-SVM, majority samples are grouped into some subregions (SRs) and assigned different weights according to their Euclidean distance to the hyper plane. The samples in an SR with higher weight have more chance to be sampled and put to use in each learning iteration, so as to retain the data distribution information of original data sets as much as possible. Comprehensive experiments are performed to test WU-SVM via 21 binary-class and six multiclass publically available data sets. The results show that it well outperforms the state-of-the-art methods in terms of three popular metrics for imbalanced classification, i.e., area under the curve, F-Measure, and G-Mean.

Entities:  

Year:  2017        PMID: 29990027     DOI: 10.1109/TNNLS.2017.2755595

Source DB:  PubMed          Journal:  IEEE Trans Neural Netw Learn Syst        ISSN: 2162-237X            Impact factor:   10.451


  5 in total

1.  IDDLncLoc: Subcellular Localization of LncRNAs Based on a Framework for Imbalanced Data Distributions.

Authors:  Yan Wang; Xiaopeng Zhu; Lili Yang; Xuemei Hu; Kai He; Cuinan Yu; Shaoqing Jiao; Jiali Chen; Rui Guo; Sen Yang
Journal:  Interdiscip Sci       Date:  2022-02-22       Impact factor: 2.233

2.  Research on Brand Image Evaluation Method Based on Consumer Sentiment Analysis.

Authors:  ZhengMin Li
Journal:  Comput Intell Neurosci       Date:  2022-05-27

3.  A Deep Learning-Based Sentiment Classification Model for Real Online Consumption.

Authors:  Yang Su; Yan Shen
Journal:  Front Psychol       Date:  2022-04-14

4.  Deep Learning-Based Imbalanced Classification With Fuzzy Support Vector Machine.

Authors:  Ke-Fan Wang; Jing An; Zhen Wei; Can Cui; Xiang-Hua Ma; Chao Ma; Han-Qiu Bao
Journal:  Front Bioeng Biotechnol       Date:  2022-01-21

5.  Solving the class imbalance problem using ensemble algorithm: application of screening for aortic dissection.

Authors:  Lijue Liu; Xiaoyu Wu; Shihao Li; Yi Li; Shiyang Tan; Yongping Bai
Journal:  BMC Med Inform Decis Mak       Date:  2022-03-28       Impact factor: 2.796

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.