| Literature DB >> 29967442 |
Masayuki Ohzeki1, Shuntaro Okada2, Masayoshi Terabe2, Shinichiro Taguchi2.
Abstract
We numerically test an optimization method for deep neural networks (DNNs) using quantum fluctuations inspired by quantum annealing. For efficient optimization, our method utilizes the quantum tunneling effect beyond the potential barriers. The path integral formulation of the DNN optimization generates an attracting force to simulate the quantum tunneling effect. In the standard quantum annealing method, the quantum fluctuations will vanish at the last stage of optimization. In this study, we propose a learning protocol that utilizes a finite value for quantum fluctuations strength to obtain higher generalization performance, which is a type of robustness. We demonstrate the performance of our method using two well-known open datasets: the MNIST dataset and the Olivetti face dataset. Although computational costs prevent us from testing our method on large datasets with high-dimensional data, results show that our method can enhance generalization performance by induction of the finite value for quantum fluctuations.Entities:
Year: 2018 PMID: 29967442 PMCID: PMC6028692 DOI: 10.1038/s41598-018-28212-4
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Schematic pictures of two local minima and quantum effects.
Figure 2Accuracy for test data (red and dashed curves: classical Adam, blue and solid curves: quantum Adam) in single-layer NN for MNIST. All results from the M-replicated systems are indicated by light-colored curves. The bold curves denote the average, and the thin curves represent the maximum in the replicated NNs. The horizontal axis represents the epoch, and the vertical axis represents the accuracy of the test data.
Figure 3Loss function for test data in an auto encoder using MNIST. All results from the replicated systems are indicated by light-colored curves. The bold and thin curves indicate the average and the minimum in replicated NNs. The horizontal axis represents the epoch, and the vertical axis represents the loss function of the test data. The inset shows an enlarged view of the average loss functions during 800–1000 epochs.
Figure 4Accuracy for test data for classification of Olivetti face images. The same curves as those in Fig. 2 are used. The horizontal axis represents the epoch, and the vertical axis represents the accuracy of the test data.