| Literature DB >> 34945353 |
Mingming Shen1,2, Jing Yang1,3, Shaobo Li1,3, Ansi Zhang1,3, Qiang Bai1.
Abstract
Deep neural networks are widely used in the field of image processing for micromachines, such as in 3D shape detection in microelectronic high-speed dispensing and object detection in microrobots. It is already known that hyperparameters and their interactions impact neural network model performance. Taking advantage of the mathematical correlations between hyperparameters and the corresponding deep learning model to adjust hyperparameters intelligently is the key to obtaining an optimal solution from a deep neural network model. Leveraging these correlations is also significant for unlocking the "black box" of deep learning by revealing the mechanism of its mathematical principle. However, there is no complete system for studying the combination of mathematical derivation and experimental verification methods to quantify the impacts of hyperparameters on the performances of deep learning models. Therefore, in this paper, the authors analyzed the mathematical relationships among four hyperparameters: the learning rate, batch size, dropout rate, and convolution kernel size. A generalized multiparameter mathematical correlation model was also established, which showed that the interaction between these hyperparameters played an important role in the neural network's performance. Different experiments were verified by running convolutional neural network algorithms to validate the proposal on the MNIST dataset. Notably, this research can help establish a universal multiparameter mathematical correlation model to guide the deep learning parameter adjustment process.Entities:
Keywords: deep neural network; hyperparameters; image processing; multiparameter mathematical correlation model
Year: 2021 PMID: 34945353 PMCID: PMC8704841 DOI: 10.3390/mi12121504
Source DB: PubMed Journal: Micromachines (Basel) ISSN: 2072-666X Impact factor: 2.891
Figure 1Training accuracies obtained under different q (%).
Figure 2Model convergence under different q: (a) convergence of training cross-entropy loss; (b) convergence of test accuracy.
The running time required with the same number of steps for different q.
|
| 0.1 | 0.2 | 0.3 | 0.4 | 0.5 | 0.6 | 0.7 | 0.8 | 0.9 |
|---|---|---|---|---|---|---|---|---|---|
| Running time (s) | 10,094 | 10,118 | 10,270 | 10,023 | 10,089 | 10,240 | 10,192 | 10,041 | 10,060 |
Figure 3Training accuracies obtained under different m (%).
Figure 4Model convergence under different m: (a) convergence of training cross-entropy loss; (b) convergence of test accuracy.
The running time required with the same number of steps at different m.
|
| 2 | 4 | 8 | 16 | 32 | 64 |
|---|---|---|---|---|---|---|
| Running time (s) | 1482 | 2101 | 2352 | 3474 | 5677 | 10,089 |
Figure 5Training accuracies obtained under different lr (%).
Figure 6Model convergence with different lr: (a) convergence of training cross-entropy loss; (b) convergence of testing accuracy.
The running time required with the same number of steps at different lr.
|
| 1 × 10−2 | 1 × 10−3 | 1 × 10−4 | 1 × 10−5 | 1 × 10−6 | 1 × 10−7 |
|---|---|---|---|---|---|---|
| Running time (s) | 5596 | 5619 | 5677 | 5496 | 5554 | 5514 |
Figure 7Training accuracies obtained under different ke (%).
Figure 8Model convergence with ke: (a) convergence of training cross-entropy loss; (b) convergence of test accuracy.
The running time required for the same number of steps at different ke.
|
| 1 | 2 | 3 | 4 | 5 | 6 | 7 |
|---|---|---|---|---|---|---|---|
| Running time (s) | 6719 | 6064 | 5443 | 5586 | 5677 | 5918 | 6069 |
Network parameter design for the confirmatory experiment.
| Experiment Number |
|
|
|
|
|---|---|---|---|---|
| 1 | 0.3 | 32 | 4 | 1 × 10−3 |
| 2 | 0.4 | 32 | 4 | 1 × 10−3 |
| 3 | 0.5 | 32 | 4 | 1 × 10−3 |
| 4 | 0.3 | 32 | 4 | 1 × 10−4 |
| 5 | 0.4 | 32 | 4 | 1 × 10−4 |
| 6 | 0.5 | 32 | 4 | 1 × 10−4 |
Figure 9Training accuracies and test accuracies obtained for the confirmatory experiment (%).
Figure 10Model convergence for the confirmatory experiment: (a) convergence of training cross-entropy loss; (b) convergence of test accuracy.
The running time required for the same number of steps.
| Experiment Number | 1st Experiment | 2nd Experiment | 3rd Experiment | 4th Experiment | 5th Experiment | 6th Experiment |
|---|---|---|---|---|---|---|
| Running time (s) | 5298 | 5337 | 5358 | 5314 | 5297 | 5332 |
Figure 11Statistical box plots of LRelu, Relu, and Selu models under different steps for 10 times experiment results: (a) testing accuracy of LRelu; (b) training loss of LRelu; (c) testing accuracy of Relu; (d) training loss of Relu; (e) testing accuracy of Selu; (f) training loss of Selu.