| Literature DB >> 33266651 |
Jianjun Jiang1, Jing Zhang1, Lijia Zhang1, Xiaomin Ran1, Jun Jiang1, Yifan Wu1.
Abstract
Deep belief networks (DBNs) of deep learning technology have been successfully used in many fields. However, the structure of a DBN is difficult to design for different datasets. Hence, a DBN structure design algorithm based on information entropy and reconstruction error is proposed. Unlike previous algorithms, we innovatively combine network depth and node number and optimizes them simultaneously. First, the mathematical model of the structural design problem is established, and the boundary constraint for node number based on information entropy is derived by introducing the idea of information compression. Moreover, the optimization objective of the network performance based on reconstruction error is proposed by deriving the fact that network energy is proportional to reconstruction error. Finally, the improved simulated annealing (ISA) algorithm is used to adjust the DBN network layers and nodes simultaneously. Experiments were carried out on three public datasets (MNIST, Cifar-10 and Cifar-100). The results show that the proposed algorithm can design its proper structure to different datasets, yielding a trained DBN which has the lowest reconstruction error and prediction error rate. The proposed algorithm is shown to have the best performance compared with other algorithms and can be used to assist the setting of DBN structural parameters for different datasets.Entities:
Keywords: DBN; artificial intelligence; deep learning; improved simulated annealing algorithm; information entropy; reconstruction error; structure design
Year: 2018 PMID: 33266651 PMCID: PMC7512514 DOI: 10.3390/e20120927
Source DB: PubMed Journal: Entropy (Basel) ISSN: 1099-4300 Impact factor: 2.524
Figure 1RBM structure in a DBN.
Computational complexity of reconstruction error and network energy.
| Means | Multiplication Quantity | Addition Quantity |
|---|---|---|
| Reconstruction Error |
|
|
| Network Energy |
|
|
Note.V and H represent the number of neurons in all visible layers and hidden layers, respectively.
Algorithm parameter settings for the Cifar-10 dataset.
| Batch Size | Iterations (Supervised, Unsupervised) | Learning Algorithm | Momentum | Learning Rate (Supervised, Unsupervised) | Activation Function | Output Classifier |
|
|
|---|---|---|---|---|---|---|---|---|
| 2000 | (1500,50) | Momentum gradient | 0.5 | (0.5,0.5) | Sigmoid | Softmax | 1 | 0.7 |
The five DBN structures obtained by above five algorithms in Cifar-10 dataset.
| Algorithm | DBN Structure | Reconstruction Error |
|---|---|---|
| REE | [3072,200,200,200,200,200,200,10] | 3.9989 |
| TSCL | [3072,3008,2009,500,507,406,99,208,316,58,36,10] | 5.0036 |
| RCE | [3072,100,100,100,100,100,100,10] | 3.6587 |
| IERESA | [3072,2959,756,1024,146,99,95,10] | 1.4032 |
| IEREISA | [3072,2958,756,1033,134,99,95,10] | 1.1106 |
Figure 2DBN reconstruction error variation of five structural algorithms on the Cifar-10 dataset.
Figure 3RMSE variation of five algorithms on the Cifar-10 dataset.
Figure 4Prediction error rate of five algorithms on the Cifar-10 dataset.
Figure 5Runtime of five algorithms on the Cifar-10 dataset.
Algorithm parameter settings for the MNIST dataset.
| Batch Size | Iterations (Supervised, Unsupervised) | Learning Algorithm | Momentum | Learning Rate (Supervised, Unsupervised) | Activation Function | Output Classifier |
|
|
|---|---|---|---|---|---|---|---|---|
| 200 | (30,500) | Momentum gradient | 0.5 | (0.5,0.5) | Sigmoid | Softmax | 5 | 0.5 |
Figure 6DBN reconstruction error variation of five algorithms on the MNIST dataset.
The five DBN structures obtained by above five algorithms in MNIST dataset.
| Algorithm | DBN Structure | Reconstruction Error |
|---|---|---|
| REE | [784,200,200,200,200,200,200,10] | 3.9989 |
| TSCL | [784,777,659,452,68,106,69,78,16,28,36,10] | 5.0036 |
| RCE | [784,100,100,100,100,100,100,10] | 3.6587 |
| IERESA | [784,150,138,112,102,92,82,10] | 1.4032 |
| IEREISA | [784,155,150,112,112,100,75,10] | 1.1106 |
Figure 7RMSE variation of DBN of five algorithms on the MNIST dataset.
Figure 8Prediction error rate of five algorithms on the MNIST dataset.
Figure 9Runtime of five algorithms on the MNIST dataset.
Parameter settings of the IEREIGA algorithm on different datasets.
| Dataset | Coding Length | Population | Max Number of Generations | Crossover Probability | Mutation Probability |
|---|---|---|---|---|---|
| Cifar-10 | 12 | 10 | 10 | 0.75 | 0.01 |
| Cifar-100 | 12 | 10 | 10 | 0.75 | 0.01 |
| MNIST | 10 | 10 | 10 | 0.75 | 0.01 |
Experimental results of the IEREISA algorithm on different dataset.
| Dataset | Number of Layers | Number of Neurons | Reconstruction Error | RMSE | Prediction Accuracy |
|---|---|---|---|---|---|
| Cifar-10 | 8 | [3072,2958,756,1033,134,99,95,10] | 1.1106 | 3.3010 | 69.65% |
| Cifar-100 | 10 | [3072,2586,880,112,86,73,99,95,86,100] | 36.2558 | 10.0777 | 61.94% |
| MNIST | 8 | [784,155,150,112,112,100,75,10] | 6.2096 | 0.0299 | 99.19% |
Experimental results of the IERESA algorithm on different dataset.
| Dataset | Number of Layers | Number of Neurons | Reconstruction Error | RMSE | Prediction Accuracy |
|---|---|---|---|---|---|
| Cifar-10 | 8 | [3072,2959,756,1024,146,99,95,10] | 1.4032 | 3.4263 | 67.43% |
| Cifar-100 | 10 | [3072,2516,892,117,86,73,98,95,85,100] | 36.8585 | 11.7817 | 61.70% |
| MNIST | 8 | [784,150,138,112,102,92,82,10] | 6.2397 | 0.0302 | 99.08% |
Experimental results of the IEREGA algorithm on different dataset.
| Dataset | Number of Layers | Number of Neurons | Reconstruction Error | RMSE | Prediction Accuracy |
|---|---|---|---|---|---|
| Cifar-10 | 9 | [3072,2436,1056,102,461,156,114,95,10] | 2.0031 | 3.4003 | 64.34% |
| Cifar-100 | 10 | [3072,2516,892,201,88,98,102,94,85,100] | 38.6475 | 11.8016 | 61.60% |
| MNIST | 8 | [784,155,150,112,107,95,74,10] | 6.3305 | 0.0311 | 99.07% |