Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Critical Point-Finding Methods Reveal Gradient-Flat Regions of Deep Network Losses.

Literature DB >> 34496389

Critical Point-Finding Methods Reveal Gradient-Flat Regions of Deep Network Losses.

Charles G Frye¹, James Simon², Neha S Wadia³, Andrew Ligeralde⁴, Michael R DeWeese⁵, Kristofer E Bouchard⁶.

Abstract

Despite the fact that the loss functions of deep neural networks are highly nonconvex, gradient-based optimization algorithms converge to approximately the same performance from many random initial points. One thread of work has focused on explaining this phenomenon by numerically characterizing the local curvature near critical points of the loss function, where the gradients are near zero. Such studies have reported that neural network losses enjoy a no-bad-local-minima property, in disagreement with more recent theoretical results. We report here that the methods used to find these putative critical points suffer from a bad local minima problem of their own: they often converge to or pass through regions where the gradient norm has a stationary point. We call these gradient-flat regions, since they arise when the gradient is approximately in the kernel of the Hessian, such that the loss is locally approximately linear, or flat, in the direction of the gradient. We describe how the presence of these regions necessitates care in both interpreting past results that claimed to find critical points of neural network losses and in designing second-order methods for optimizing neural networks.

Entities: Chemical

Mesh：

Year: 2021 PMID： 34496389 PMCID： PMC8919680 DOI： 10.1162/neco_a_01388

Source DB: PubMed Journal: Neural Comput ISSN： 0899-7667 Impact factor: 2.026

Keyword Cloud
References

6 in total

Critical Point-Finding Methods Reveal Gradient-Flat Regions of Deep Network Losses.

1. Energy landscape of a lennard-jones liquid: statistics of stationary points.

2. Saddles in the energy landscape probed by supercooled liquids.

3. Energy landscapes for machine learning.

4. Loss surface of XOR artificial neural networks.

5. The Loss Surface of Deep Linear Networks Viewed Through the Algebraic Geometry Lens.

6. Complexity control by gradient descent in deep networks.