| Literature DB >> 32094327 |
Tomaso Poggio1, Qianli Liao2, Andrzej Banburski2.
Abstract
Overparametrized deep networks predict well, despite the lack of an explicit complexity control during training, such as an explicit regularization term. For exponential-type loss functions, we solve this puzzle by showing an effective regularization effect of gradient descent in terms of the normalized weights that are relevant for classification.Entities:
Year: 2020 PMID: 32094327 PMCID: PMC7039878 DOI: 10.1038/s41467-020-14663-9
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 14.919
Fig. 1Classical generalization and consistency in deep networks.
a Unnormalized cross-entropy loss in CIFAR-10 for randomly labeled data. b Cross-entropy loss for the normalized network for randomly labeled data. c Generalization cross-entropy loss (difference between training and testing loss) for the normalized network for randomly labeled data as a function of the number of data N. The generalization loss converges to zero as a function of N but very slowly.
Fig. 2No overfitting in deep networks.
Empirical and expected error in CIFAR-10 as a function of number of neurons in a 5-layer convolutional network. The expected classification error does not increase when increasing the number of parameters beyond the size of the training set.