Literature DB >> 36213775

Improving logistic regression on the imbalanced data by a novel penalized log-likelihood function.

Lili Zhang1, Trent Geisler1, Herman Ray2, Ying Xie3.   

Abstract

Logistic regression is estimated by maximizing the log-likelihood objective function formulated under the assumption of maximizing the overall accuracy. That does not apply to the imbalanced data. The resulting models tend to be biased towards the majority class (i.e. non-event), which can bring great loss in practice. One strategy for mitigating such bias is to penalize the misclassification costs of observations differently in the log-likelihood function. Existing solutions require either hard hyperparameter estimating or high computational complexity. We propose a novel penalized log-likelihood function by including penalty weights as decision variables for observations in the minority class (i.e. event) and learning them from data along with model coefficients. In the experiments, the proposed logistic regression model is compared with the existing ones on the statistics of area under receiver operating characteristics (ROC) curve from 10 public datasets and 16 simulated datasets, as well as the training time. A detailed analysis is conducted on an imbalanced credit dataset to examine the estimated probability distributions, additional performance measurements (i.e. type I error and type II error) and model coefficients. The results demonstrate that both the discrimination ability and computation efficiency of logistic regression models are improved using the proposed log-likelihood function as the learning objective.
© 2021 Informa UK Limited, trading as Taylor & Francis Group.

Entities:  

Keywords:  Logistic regression; binary classification; cost-sensitive; imbalanced data; maximum likelihood; penalized log-likelihood function

Year:  2021        PMID: 36213775      PMCID: PMC9542776          DOI: 10.1080/02664763.2021.1939662

Source DB:  PubMed          Journal:  J Appl Stat        ISSN: 0266-4763            Impact factor:   1.416


  6 in total

1.  Logistic Regression-HSMM-Based Heart Sound Segmentation.

Authors:  David B Springer; Lionel Tarassenko; Gari D Clifford
Journal:  IEEE Trans Biomed Eng       Date:  2015-09-01       Impact factor: 4.538

2.  Estimation of the probability of an event as a function of several independent variables.

Authors:  S H Walker; D B Duncan
Journal:  Biometrika       Date:  1967-06       Impact factor: 2.445

3.  Prevalence and predictors of undiagnosed diabetes mellitus in Indonesia.

Authors:  Laurentius A Pramono; Siti Setiati; Pradana Soewondo; Imam Subekti; Asri Adisasmita; Nasrin Kodim; Bambang Sutrisna
Journal:  Acta Med Indones       Date:  2010-10

4.  A descriptive study of variable discretization and cost-sensitive logistic regression on imbalanced credit data.

Authors:  Lili Zhang; Herman Ray; Jennifer Priestley; Soon Tan
Journal:  J Appl Stat       Date:  2019-07-23       Impact factor: 1.416

Review 5.  On determining the most appropriate test cut-off value: the case of tests with continuous results.

Authors:  Farrokh Habibzadeh; Parham Habibzadeh; Mahboobeh Yadollahie
Journal:  Biochem Med (Zagreb)       Date:  2016-10-15       Impact factor: 2.313

6.  A simple plug-in bagging ensemble based on threshold-moving for classifying binary and multiclass imbalanced data.

Authors:  Guillem Collell; Drazen Prelec; Kaustubh R Patil
Journal:  Neurocomputing       Date:  2018-01-31       Impact factor: 5.719

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.