Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Why Does Rebalancing Class-Unbalanced Data Improve AUC for Linear Discriminant Analysis?

Literature DB >> 26353332

Why Does Rebalancing Class-Unbalanced Data Improve AUC for Linear Discriminant Analysis?

Abstract

Many established classifiers fail to identify the minority class when it is much smaller than the majority class. To tackle this problem, researchers often first rebalance the class sizes in the training dataset, through oversampling the minority class or undersampling the majority class, and then use the rebalanced data to train the classifiers. This leads to interesting empirical patterns. In particular, using the rebalanced training data can often improve the area under the receiver operating characteristic curve (AUC) for the original, unbalanced test data. The AUC is a widely-used quantitative measure of classification performance, but the property that it increases with rebalancing has, as yet, no theoretical explanation. In this note, using Gaussian-based linear discriminant analysis (LDA) as the classifier, we demonstrate that, at least for LDA, there is an intrinsic, positive relationship between the rebalancing of class sizes and the improvement of AUC. We show that the largest improvement of AUC is achieved, asymptotically, when the two classes are fully rebalanced to be of equal sizes.

Year: 2015 PMID： 26353332 DOI： 10.1109/TPAMI.2014.2359660

Source DB: PubMed Journal: IEEE Trans Pattern Anal Mach Intell ISSN： 0098-5589 Impact factor: 6.226

Keyword Cloud
Cited

9 in total

1. Early Detection of Human Epileptic Seizures Based on Intracortical Microelectrode Array Signals.

Authors: Yun S Park; G Rees Cosgrove; Joseph R Madsen; Emad N Eskandar; Leigh R Hochberg; Sydney S Cash; Wilson Truccolo
Journal: IEEE Trans Biomed Eng Date: 2019-06-06 Impact factor: 4.538

2. Comparison of logistic regression, support vector machines, and deep learning classifiers for predicting memory encoding success using human intracranial EEG recordings.

Authors: Akshay Arora; Jui-Jui Lin; Alec Gasperian; Joseph Maldjian; Joel Stein; Michael Kahana; Bradley Lega
Journal: J Neural Eng Date: 2018-09-13 Impact factor: 5.043

3. A Novel Ensemble Method for Imbalanced Data Learning: Bagging of Extrapolation-SMOTE SVM.

Authors: Qi Wang; ZhiHao Luo; JinCai Huang; YangHe Feng; Zhong Liu
Journal: Comput Intell Neurosci Date: 2017-01-30

Review 4. Integrated Chemometrics and Statistics to Drive Successful Proteomics Biomarker Discovery.

Authors: Anouk Suppers; Alain J van Gool; Hans J C T Wessels
Journal: Proteomes Date: 2018-04-26

5. XGBoost-based and tumor-immune characterized gene signature for the prediction of metastatic status in breast cancer.

Authors: Qingqing Li; Hui Yang; Peipei Wang; Xiaocen Liu; Kun Lv; Mingquan Ye
Journal: J Transl Med Date: 2022-04-18 Impact factor: 8.440

6. An empirical evaluation of sampling methods for the classification of imbalanced data.

Authors: Misuk Kim; Kyu-Baek Hwang
Journal: PLoS One Date: 2022-07-28 Impact factor: 3.752

7. Decoding declarative memory process for predicting memory retrieval based on source localization.

Authors: Jenifer Kalafatovich; Minji Lee; Seong-Whan Lee
Journal: PLoS One Date: 2022-09-08 Impact factor: 3.752

8. Prediction of Drug-Induced Long QT Syndrome Using Machine Learning Applied to Harmonized Electronic Health Record Data.

Authors: Steven T Simon; Divneet Mandair; Premanand Tiwari; Michael A Rosenberg
Journal: J Cardiovasc Pharmacol Ther Date: 2021-03-08 Impact factor: 2.457

9. From ERPs to MVPA Using the Amsterdam Decoding and Modeling Toolbox (ADAM).

Authors: Johannes J Fahrenfort; Joram van Driel; Simon van Gaal; Christian N L Olivers
Journal: Front Neurosci Date: 2018-07-03 Impact factor: 4.677

9 in total