| Literature DB >> 33285968 |
Dongwei Chen1, Fei Hu2,3, Guokui Nian3,4,5, Tiantian Yang1.
Abstract
Deep learning plays a key role in the recent developments of machine learning. This paper develops a deep residual neural network (ResNet) for the regression of nonlinear functions. Convolutional layers and pooling layers are replaced by fully connected layers in the residual block. To evaluate the new regression model, we train and test neural networks with different depths and widths on simulated data, and we find the optimal parameters. We perform multiple numerical tests of the optimal regression model on multiple simulated data, and the results show that the new regression model behaves well on simulated data. Comparisons are also made between the optimal residual regression and other linear as well as nonlinear approximation techniques, such as lasso regression, decision tree, and support vector machine. The optimal residual regression model has better approximation capacity compared to the other models. Finally, the residual regression is applied into the prediction of a relative humidity series in the real world. Our study indicates that the residual regression model is stable and applicable in practice.Entities:
Keywords: deep residual learning; neural network; nonlinear approximation; nonlinear regression
Year: 2020 PMID: 33285968 PMCID: PMC7516619 DOI: 10.3390/e22020193
Source DB: PubMed Journal: Entropy (Basel) ISSN: 1099-4300 Impact factor: 2.524
Figure 1The shortcut connections of a deep residual neural network (ResNet) for the image process. (a) An identity block, which is employed when the input and output have the same dimensions. (b) A convolution block, which is used when the dimensions are different.
Figure 2The shortcut connections of ResNet for nonlinear regression. Convolution layers are replaced by dense layers. There are three hidden dense layers in each dense block and identity block. (a) An identity block, which is employed when the input and output have the same dimensions. (b) A dense block, which is used when the dimensions are different.
Fundamental blocks of ResNet regression.
| Type of Block | Input Shape | Output Shape | Activation Function | Input and Output Are of Same Size |
|---|---|---|---|---|
| Input | (1, 6) | (1, None *) | ReLU | False |
| Dense | (1, | (1, | ReLU | False |
| Identity | (1, | (1, | ReLU | True |
| Output | (1, None) | (1,1) | Linear | False |
* None means the size of input in this dimension is uncertain.
Figure 3Structure of ResNet regression model. #1 means 1 dense block and #M means M identity blocks. In this paper 1 dense block and 2 identity blocks are stacked repeatedly between dashed lines.
Figure 4Datasets for residual regression model. (a), (b), (c) and (d) are simulated data with a nonlinearity order from 1 to 4, respectively.
Residual regression model with different depths.
| Depth of ResNet | Number of Parameters | Stopping Epoch | Training Time | Training Loss (10−4) | Validation Loss (10−4) | Testing Loss (10−4) |
|---|---|---|---|---|---|---|
| 10 | 4581 | 50 | 00:30:04 | 6.6637 | 3.6566 | 3.6687 |
| 19 | 9581 | 37 | 00:23:42 | 6.7196 | 4.4022 | 4.3824 |
|
|
|
|
|
|
|
|
| 37 | 19,581 | 50 | 00:59:48 | 5.0877 | 5.1125 | 5.2183 |
| 46 | 24,581 | 34 | 01:16:57 | 6.3228 | 5.3790 | 5.3401 |
| 55 | 29,581 | 41 | 02:14:36 | 5.4676 | 4.3013 | 4.1437 |
| 82 | 44,581 | 44 | 02:27:26 | 6.6924 | 3.8426 | 3.8526 |
| 100 | 54,581 | 50 | 02:25:18 | 8.2667 | 19.000 | 18.943 |
| 145 | 79,581 | 39 | 03:46:58 | 15.000 | 2497.0 | 2499.5 |
| 244 | 134,581 | 50 | 09:09:05 | 1273.0 | 1277.0 | 1279.0 |
* The bold implies that the depth of ResNet is optimal and has a minimum testing loss.
Residual regression model with different widths.
| Width of ResNet | Number of Parameters | Stopping Epoch | Training Time | Training Loss (10−4) | Validation Loss (10−4) | Testing Loss (10−4) |
|---|---|---|---|---|---|---|
| 1 | 198 | 50 | 00:41:38 | 48.000 | 547.00 | 548.44 |
| 4 | 1125 | 50 | 00:44:36 | 15.000 | 13.000 | 13.255 |
| 8 | 3145 | 50 | 00:44:13 | 5.6651 | 3.0078 | 3.0666 |
| 12 | 6061 | 50 | 00:20:54 | 5.2821 | 6.7455 | 6.6151 |
|
|
|
|
|
|
|
|
| 20 | 14,581 | 44 | 00:55:34 | 5.7221 | 3.5205 | 3.4855 |
| 30 | 30,271 | 50 | 00:54:05 | 4.8670 | 3.5846 | 3.5973 |
| 50 | 78,451 | 42 | 00:18:28 | 5.5210 | 3.9034 | 3.9000 |
| 70 | 149,031 | 44 | 00:21:20 | 5.1793 | 4.2886 | 4.2837 |
| 90 | 242,011 | 50 | 00:22:11 | 4.5157 | 2.8237 | 2.8627 |
| 150 | 655,351 | 50 | 01:02:57 | 5.4313 | 3.4565 | 3.4189 |
| 300 | 2,570,701 | 47 | 00:45:00 | 5.6756 | 8.2258 | 8.1343 |
| 500 | 7,084,501 | 50 | 01:16:35 | 6.2678 | 4.0093 | 3.9767 |
| 700 | 13,838,301 | 50 | 01:56:21 | 6.8682 | 3.3580 | 3.4051 |
* The bold implies that the width of ResNet is optimal and has a minimum testing loss.
Optimal regression model on nonlinear datasets.
| Nonlinearity of Dataset | Number of Parameters | Stopping Epoch | Training Time | Training Loss (10−4) | Validation Loss (10−4) | Testing Loss (10−4) |
|---|---|---|---|---|---|---|
| 1 | 9873 | 45 | 00:46:15 | 1.0619 | 0.2552 | 0.2550 |
| 2 | 9873 | 35 | 00:36:09 | 8.2566 | 4.2014 | 4.2117 |
| 3 | 9873 | 50 | 00:22:22 | 4.7534 | 2.1222 | 2.1360 |
| 4 | 9873 | 50 | 00:57:26 | 3.6542 | 2.0439 | 2.0481 |
Figure 5The results of optimal regression model on simulated nonlinear data. A red cross symbol (×) means predicted data and a green circle symbol (O) stands for simulated data generated by Equations (2)–(5).
Comparisons of regression techniques.
| Regression Techniques Used | Stopping Epoch | Training Time | Training Loss (10−4) | Validation Loss (10−4) | Testing Loss (10−4) |
|---|---|---|---|---|---|
| Linear regression | NA | 00:00:20 | 371.61 | NA | 371.68 |
| Ridge regression | NA | 00:00:33 | 371.64 | NA | 371.59 |
| Lasso regression | NA | 00:00:39 | 371.51 | NA | 371.97 |
| Elastic regression | NA | 00:34:48 | 371.51 | NA | 372.00 |
|
|
|
|
|
|
|
| ANN without shortcuts | 34 | 00:27:07 | 7.2640 | 6.6379 | 6.6537 |
| Decision tree regression | NA | 00:17:06 | 9.4083 | NA | 9.7777 |
| Support vector regression | NA | 43:51:36 | 126.75 | NA | 126.76 |
* The bold implies that the residual regression model has a minimum testing loss.
Optimal hyperparameters of regression techniques.
| Regression Techniques Used | Name of Hyperparameters | Range of Hyperparameters | Optimal Hyperparameters |
|---|---|---|---|
| Linear regression | NA | NA | NA |
| Ridge regression | Penalty parameter of | 10−10, 10−9, 10−8
| 102 |
| Lasso regression | Penalty parameter of | 10−10, 10−9, 10−8
| 10−5 |
| Elastic regression | Penalty parameter | 10−10, 10−9,…, 1010; 0.0, 0.1,…,0.9, 1.0; | 10−4; 1.0; |
| Residual regression | Width; Depth; | NA | 16; 28; |
| ANN without shortcuts | Width; Depth; | NA | 16; 28; |
| Decision tree regression | Maximum depth; | 1, 2,…,9, 10; | 10 |
| Support vector regression | Penalty parameter | 102, 103,…, 107; | 102; |
Figure 6The prediction of regression models for relative humidity. The red cross symbol (×) means predicted data, and the green circle symbol (O) stands for observations of EAR5.