Literature DB >> 31777848

A Kernel Theory of Modern Data Augmentation.

Tri Dao1, Albert Gu1, Alexander J Ratner1, Virginia Smith2, Christopher De Sa3, Christopher Ré1.   

Abstract

Data augmentation, a technique in which a training set is expanded with class-preserving transformations, is ubiquitous in modern machine learning pipelines. In this paper, we seek to establish a theoretical framework for understanding data augmentation. We approach this from two directions: First, we provide a general model of augmentation as a Markov process, and show that kernels appear naturally with respect to this model, even when we do not employ kernel classification. Next, we analyze more directly the effect of augmentation on kernel classifiers, showing that data augmentation can be approximated by first-order feature averaging and second-order variance regularization components. These frameworks both serve to illustrate the ways in which data augmentation affects the downstream learning model, and the resulting analyses provide novel connections between prior work in invariant kernels, tangent propagation, and robust optimization. Finally, we provide several proof-of-concept applications showing that our theory can be useful for accelerating machine learning workflows, such as reducing the amount of computation needed to train using augmented data, and predicting the utility of a transformation prior to training.

Entities:  

Year:  2019        PMID: 31777848      PMCID: PMC6879382     

Source DB:  PubMed          Journal:  Proc Mach Learn Res


  5 in total

1.  Deep, big, simple neural nets for handwritten digit recognition.

Authors:  Dan Claudiu Cireşan; Ueli Meier; Luca Maria Gambardella; Jürgen Schmidhuber
Journal:  Neural Comput       Date:  2010-09-21       Impact factor: 2.026

2.  Discriminative Unsupervised Feature Learning with Exemplar Convolutional Neural Networks.

Authors:  Alexey Dosovitskiy; Philipp Fischer; Jost Tobias Springenberg; Martin Riedmiller; Thomas Brox
Journal:  IEEE Trans Pattern Anal Mach Intell       Date:  2015-10-29       Impact factor: 6.226

3.  Enhancing text categorization with semantic-enriched representation and training data augmentation.

Authors:  Xinghua Lu; Bin Zheng; Atulya Velivelli; Chengxiang Zhai
Journal:  J Am Med Inform Assoc       Date:  2006-06-23       Impact factor: 4.497

4.  The Cancer Imaging Archive (TCIA): maintaining and operating a public information repository.

Authors:  Kenneth Clark; Bruce Vendt; Kirk Smith; John Freymann; Justin Kirby; Paul Koppel; Stephen Moore; Stanley Phillips; David Maffitt; Michael Pringle; Lawrence Tarbox; Fred Prior
Journal:  J Digit Imaging       Date:  2013-12       Impact factor: 4.056

5.  Learning to Compose Domain-Specific Transformations for Data Augmentation.

Authors:  Alexander J Ratner; Henry R Ehrenberg; Zeshan Hussain; Jared Dunnmon; Christopher Ré
Journal:  Adv Neural Inf Process Syst       Date:  2017-12
  5 in total
  3 in total

Review 1.  Machine intelligence in non-invasive endocrine cancer diagnostics.

Authors:  Nicole M Thomasian; Ihab R Kamel; Harrison X Bai
Journal:  Nat Rev Endocrinol       Date:  2021-11-09       Impact factor: 43.330

2.  Virtual data augmentation method for reaction prediction.

Authors:  Xinyi Wu; Yun Zhang; Jiahui Yu; Chengyun Zhang; Haoran Qiao; Yejian Wu; Xinqiao Wang; Zhipeng Wu; Hongliang Duan
Journal:  Sci Rep       Date:  2022-10-12       Impact factor: 4.996

3.  Deep learning-based fully automated detection and segmentation of lymph nodes on multiparametric-mri for rectal cancer: A multicentre study.

Authors:  Xingyu Zhao; Peiyi Xie; Mengmeng Wang; Wenru Li; Perry J Pickhardt; Wei Xia; Fei Xiong; Rui Zhang; Yao Xie; Junming Jian; Honglin Bai; Caifang Ni; Jinhui Gu; Tao Yu; Yuguo Tang; Xin Gao; Xiaochun Meng
Journal:  EBioMedicine       Date:  2020-06-05       Impact factor: 8.143

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.