Literature DB >> 36199763

A Tutorial on Generative Adversarial Networks with Application to Classification of Imbalanced Data.

Yuxiao Huang1, Kara G Fields2, Yan Ma3.   

Abstract

A challenge unique to classification model development is imbalanced data. In a binary classification problem, class imbalance occurs when one class, the minority group, contains significantly fewer samples than the other class, the majority group. In imbalanced data, the minority class is often the class of interest (e.g., patients with disease). However, when training a classifier on imbalanced data, the model will exhibit bias towards the majority class and, in extreme cases, may ignore the minority class completely. A common strategy for addressing class imbalance is data augmentation. However, traditional data augmentation methods are associated with overfitting, where the model is fit to the noise in the data. In this tutorial we introduce an advanced method for data augmentation: Generative Adversarial Networks (GANs). The advantages of GANs over traditional data augmentation methods are illustrated using the Breast Cancer Wisconsin study. To promote the adoption of GANs for data augmentation, we present an end-to-end pipeline that encompasses the complete life cycle of a machine learning project along with alternatives and good practices both in the paper and in a separate video. Our code, data, full results and video tutorial are publicly available in the paper's github repository.

Entities:  

Keywords:  class imbalance; classification; data augmentation; generative adversarial networks; machine learning

Year:  2021        PMID: 36199763      PMCID: PMC9529000          DOI: 10.1002/sam.11570

Source DB:  PubMed          Journal:  Stat Anal Data Min        ISSN: 1932-1864            Impact factor:   1.247


  7 in total

1.  Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning.

Authors:  Margaret A Shipp; Ken N Ross; Pablo Tamayo; Andrew P Weng; Jeffery L Kutok; Ricardo C T Aguiar; Michelle Gaasenbeek; Michael Angelo; Michael Reich; Geraldine S Pinkus; Tane S Ray; Margaret A Koval; Kim W Last; Andrew Norton; T Andrew Lister; Jill Mesirov; Donna S Neuberg; Eric S Lander; Jon C Aster; Todd R Golub
Journal:  Nat Med       Date:  2002-01       Impact factor: 53.440

2.  Measuring morbidity: disease counts, binary variables, and statistical power.

Authors:  K F Ferraro; J M Wilmoth
Journal:  J Gerontol B Psychol Sci Soc Sci       Date:  2000-05       Impact factor: 4.077

3.  A survey on generative adversarial networks for imbalance problems in computer vision tasks.

Authors:  Vignesh Sampath; Iñaki Maurtua; Juan José Aguilar Martín; Aitor Gutierrez
Journal:  J Big Data       Date:  2021-01-29

4.  CovidGAN: Data Augmentation Using Auxiliary Classifier GAN for Improved Covid-19 Detection.

Authors:  Abdul Waheed; Muskan Goyal; Deepak Gupta; Ashish Khanna; Fadi Al-Turjman; Placido Rogerio Pinheiro
Journal:  IEEE Access       Date:  2020-05-14       Impact factor: 3.367

5.  Can machine-learning improve cardiovascular risk prediction using routine clinical data?

Authors:  Stephen F Weng; Jenna Reps; Joe Kai; Jonathan M Garibaldi; Nadeem Qureshi
Journal:  PLoS One       Date:  2017-04-04       Impact factor: 3.240

6.  Real-time tracking of self-reported symptoms to predict potential COVID-19.

Authors:  Cristina Menni; Ana M Valdes; Claire J Steves; Tim D Spector; Maxim B Freidin; Carole H Sudre; Long H Nguyen; David A Drew; Sajaysurya Ganesh; Thomas Varsavsky; M Jorge Cardoso; Julia S El-Sayed Moustafa; Alessia Visconti; Pirro Hysi; Ruth C E Bowyer; Massimo Mangino; Mario Falchi; Jonathan Wolf; Sebastien Ourselin; Andrew T Chan
Journal:  Nat Med       Date:  2020-05-11       Impact factor: 53.440

7.  Parkinson's progression prediction using machine learning and serum cytokines.

Authors:  Diba Ahmadi Rastegar; Nicholas Ho; Glenda M Halliday; Nicolas Dzamko
Journal:  NPJ Parkinsons Dis       Date:  2019-07-25
  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.