Literature DB >> 9117894

Flat minima.

S Hochreiter1, J Schmidhuber.   

Abstract

We present a new algorithm for finding low-complexity neural networks with high generalization capability. The algorithm searches for a "flat" minimum of the error function. A flat minimum is a large connected region in weight space where the error remains approximately constant. An MDL-based, Bayesian argument suggests that flat minima correspond to "simple" networks and low expected overfitting. The argument is based on a Gibbs algorithm variant and a novel way of splitting generalization error into underfitting and overfitting error. Unlike many previous approaches, ours does not require gaussian assumptions and does not depend on a "good" weight prior. Instead we have a prior over input-output functions, thus taking into account net architecture and training set. Although our algorithm requires the computation of second-order derivatives, it has backpropagation's order of complexity. Automatically, it effectively prunes units, weights, and input lines. Various experiments with feedforward and recurrent nets are described. In an application to stock market prediction, flat minimum search outperforms conventional backprop, weight decay, and "optimal brain surgeon/optimal brain damage".

Mesh:

Year:  1997        PMID: 9117894     DOI: 10.1162/neco.1997.9.1.1

Source DB:  PubMed          Journal:  Neural Comput        ISSN: 0899-7667            Impact factor:   2.026


  20 in total

1.  Initialization and self-organized optimization of recurrent neural network connectivity.

Authors:  Joschka Boedecker; Oliver Obst; N Michael Mayer; Minoru Asada
Journal:  HFSP J       Date:  2009-10-26

2.  Provenance of correlations in psychological data.

Authors:  Thomas L Thornton; David L Gilden
Journal:  Psychon Bull Rev       Date:  2005-06

Review 3.  REBUS and the Anarchic Brain: Toward a Unified Model of the Brain Action of Psychedelics.

Authors:  R L Carhart-Harris; K J Friston
Journal:  Pharmacol Rev       Date:  2019-07       Impact factor: 25.468

4.  The inverse variance-flatness relation in stochastic gradient descent is critical for finding flat minima.

Authors:  Yu Feng; Yuhai Tu
Journal:  Proc Natl Acad Sci U S A       Date:  2021-03-02       Impact factor: 11.205

5.  Archetypal landscapes for deep neural networks.

Authors:  Philipp C Verpoort; Alpha A Lee; David J Wales
Journal:  Proc Natl Acad Sci U S A       Date:  2020-08-25       Impact factor: 11.205

6.  Global Model Analysis of Cognitive Variability.

Authors:  David L Gilden
Journal:  Cogn Sci       Date:  2009-08-10

7.  The Iterated Classification Game: A New Model of the Cultural Transmission of Language.

Authors:  Samarth Swarup; Les Gasser
Journal:  Adapt Behav       Date:  2009       Impact factor: 1.942

8.  Discrimination of smoking status by MRI based on deep learning method.

Authors:  Shuangkun Wang; Rongguo Zhang; Yufeng Deng; Kuan Chen; Dan Xiao; Peng Peng; Tao Jiang
Journal:  Quant Imaging Med Surg       Date:  2018-12

9.  PAC Bayesian Performance Guarantees for Deep (Stochastic) Networks in Medical Imaging.

Authors:  Anthony Sicilia; Xingchen Zhao; Anastasia Sosnovskikh; Seong Jae Hwang
Journal:  Med Image Comput Comput Assist Interv       Date:  2021-09-21

10.  Degeneracy and Redundancy in Active Inference.

Authors:  Noor Sajid; Thomas Parr; Thomas M Hope; Cathy J Price; Karl J Friston
Journal:  Cereb Cortex       Date:  2020-10-01       Impact factor: 5.357

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.