Literature DB >> 34496389

Critical Point-Finding Methods Reveal Gradient-Flat Regions of Deep Network Losses.

Charles G Frye1, James Simon2, Neha S Wadia3, Andrew Ligeralde4, Michael R DeWeese5, Kristofer E Bouchard6.   

Abstract

Despite the fact that the loss functions of deep neural networks are highly nonconvex, gradient-based optimization algorithms converge to approximately the same performance from many random initial points. One thread of work has focused on explaining this phenomenon by numerically characterizing the local curvature near critical points of the loss function, where the gradients are near zero. Such studies have reported that neural network losses enjoy a no-bad-local-minima property, in disagreement with more recent theoretical results. We report here that the methods used to find these putative critical points suffer from a bad local minima problem of their own: they often converge to or pass through regions where the gradient norm has a stationary point. We call these gradient-flat regions, since they arise when the gradient is approximately in the kernel of the Hessian, such that the loss is locally approximately linear, or flat, in the direction of the gradient. We describe how the presence of these regions necessitates care in both interpreting past results that claimed to find critical points of neural network losses and in designing second-order methods for optimizing neural networks.
© 2021 Massachusetts Institute of Technology.

Entities:  

Mesh:

Year:  2021        PMID: 34496389      PMCID: PMC8919680          DOI: 10.1162/neco_a_01388

Source DB:  PubMed          Journal:  Neural Comput        ISSN: 0899-7667            Impact factor:   2.026


  6 in total

1.  Energy landscape of a lennard-jones liquid: statistics of stationary points.

Authors:  K Broderix; K K Bhattacharya; A Cavagna; A Zippelius; I Giardina
Journal:  Phys Rev Lett       Date:  2000-12-18       Impact factor: 9.161

2.  Saddles in the energy landscape probed by supercooled liquids.

Authors:  L Angelani; G Ruocco; A Scala; F Sciortino
Journal:  Phys Rev Lett       Date:  2000-12-18       Impact factor: 9.161

3.  Energy landscapes for machine learning.

Authors:  Andrew J Ballard; Ritankar Das; Stefano Martiniani; Dhagash Mehta; Levent Sagun; Jacob D Stevenson; David J Wales
Journal:  Phys Chem Chem Phys       Date:  2017-05-24       Impact factor: 3.676

4.  Loss surface of XOR artificial neural networks.

Authors:  Dhagash Mehta; Xiaojun Zhao; Edgar A Bernal; David J Wales
Journal:  Phys Rev E       Date:  2018-05       Impact factor: 2.529

5.  The Loss Surface of Deep Linear Networks Viewed Through the Algebraic Geometry Lens.

Authors:  Dhagash Mehta; Tianran Chen; Tingting Tang; Jonathan D Hauenstein
Journal:  IEEE Trans Pattern Anal Mach Intell       Date:  2022-08-04       Impact factor: 9.322

6.  Complexity control by gradient descent in deep networks.

Authors:  Tomaso Poggio; Qianli Liao; Andrzej Banburski
Journal:  Nat Commun       Date:  2020-02-24       Impact factor: 14.919

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.