Literature DB >> 14622875

The general inefficiency of batch training for gradient descent learning.

D Randall Wilson1, Tony R Martinez.   

Abstract

Gradient descent training of neural networks can be done in either a batch or on-line manner. A widely held myth in the neural network community is that batch training is as fast or faster and/or more 'correct' than on-line training because it supposedly uses a better approximation of the true gradient for its weight updates. This paper explains why batch training is almost always slower than on-line training-often orders of magnitude slower-especially on large training sets. The main reason is due to the ability of on-line training to follow curves in the error surface throughout each epoch, which allows it to safely use a larger learning rate and thus converge with less iterations through the training data. Empirical results on a large (20,000-instance) speech recognition task and on 26 other learning tasks demonstrate that convergence can be reached significantly faster using on-line training than batch training, with no apparent difference in accuracy.

Mesh:

Year:  2003        PMID: 14622875     DOI: 10.1016/S0893-6080(03)00138-2

Source DB:  PubMed          Journal:  Neural Netw        ISSN: 0893-6080


  13 in total

1.  Improving the prediction accuracy of residue solvent accessibility and real-value backbone torsion angles of proteins by guided-learning through a two-layer neural network.

Authors:  Eshel Faraggi; Bin Xue; Yaoqi Zhou
Journal:  Proteins       Date:  2009-03

2.  Comparative assessment of glucose prediction models for patients with type 1 diabetes mellitus applying sensors for glucose and physical activity monitoring.

Authors:  K Zarkogianni; K Mitsis; E Litsa; M-T Arredondo; G Ficο; A Fioravanti; K S Nikita
Journal:  Med Biol Eng Comput       Date:  2015-06-07       Impact factor: 2.602

3.  Optimizing artificial neural network models for metabolomics and systems biology: an example using HPLC retention index data.

Authors:  L Mark Hall; Dennis W Hill; Lochana C Menikarachchi; Ming-Hui Chen; Lowell H Hall; David F Grant
Journal:  Bioanalysis       Date:  2015       Impact factor: 2.681

4.  Improving quantitative structure-activity relationship models using Artificial Neural Networks trained with dropout.

Authors:  Jeffrey Mendenhall; Jens Meiler
Journal:  J Comput Aided Mol Des       Date:  2016-02-01       Impact factor: 3.686

5.  Deep Unsupervised Learning on a Desktop PC: A Primer for Cognitive Scientists.

Authors:  Alberto Testolin; Ivilin Stoianov; Michele De Filippo De Grazia; Marco Zorzi
Journal:  Front Psychol       Date:  2013-05-06

6.  Operant conditioning: a minimal components requirement in artificial spiking neurons designed for bio-inspired robot's controller.

Authors:  André Cyr; Mounir Boukadoum; Frédéric Thériault
Journal:  Front Neurorobot       Date:  2014-07-25       Impact factor: 2.650

7.  Dual Temporal Scale Convolutional Neural Network for Micro-Expression Recognition.

Authors:  Min Peng; Chongyang Wang; Tong Chen; Guangyuan Liu; Xiaolan Fu
Journal:  Front Psychol       Date:  2017-10-13

8.  Diagnosis of Malignancy in Thyroid Tumors by Multi-Layer Perceptron Neural Networks With Different Batch Learning Algorithms.

Authors:  Saeedeh Pourahmad; Mohsen Azad; Shahram Paydar
Journal:  Glob J Health Sci       Date:  2015-03-30

9.  Convergence of batch gradient learning with smoothing regularization and adaptive momentum for neural networks.

Authors:  Qinwei Fan; Wei Wu; Jacek M Zurada
Journal:  Springerplus       Date:  2016-03-08

10.  Chromatin accessibility prediction via convolutional long short-term memory networks with k-mer embedding.

Authors:  Xu Min; Wanwen Zeng; Ning Chen; Ting Chen; Rui Jiang
Journal:  Bioinformatics       Date:  2017-07-15       Impact factor: 6.937

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.