| Literature DB >> 35062616 |
Jingyi Liu1, Shuni Song1, Jiayi Wang1, Maimutimin Balaiti2, Nina Song2, Sen Li2.
Abstract
With the improvement of industrial requirements for the quality of cold rolled strips, flatness has become one of the most important indicators for measuring the quality of cold rolled strips. In this paper, the strip production data of a 1250 mm tandem cold mill in a steel plant is modeled by an improved deep neural network (the improved DNN) to improve the accuracy of strip shape prediction. Firstly, the type of activation function is analyzed, and the monotonicity of the activation function is deemed independent of the convexity of the loss function in the deep network. Regardless of whether the activation function is monotonic, the loss function is not strictly convex. Secondly, the non-convex optimization of the loss functionextended from the deep linear network to the deep nonlinear network, is discussed, and the critical point of the deep nonlinear network is identified as the global minimum point. Finally, an improved Swish activation function based on batch normalization is proposed, and its performance is evaluated on the MNIST dataset. The experimental results show that the loss of an improved Swish function is lower than that of other activation functions. The prediction accuracy of a deep neural network (DNN) with an improved Swish function is 0.38% more than that of a deep neural network (DNN) with a regular Swish function. For the DNN with the improved Swish function, the mean square error of the prediction for the flatness of cold rolled strip is reduced to 65% of the regular DNN. The accuracy of the improved DNN is up to and higher than the industrial requirements. The shape prediction of the improved DNN will assist and guide the industrial production process, reducing the scrap yield and industrial cost.Entities:
Keywords: Swish activation function; batch normalization; cold rolled strip; deep neural network; non-convex optimization
Mesh:
Year: 2022 PMID: 35062616 PMCID: PMC8780483 DOI: 10.3390/s22020656
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
The common activation functions.
| Activation Function | Formula | Description |
|---|---|---|
| Sigmoid function |
| The gradient of the Sigmoid function easily falls into the saturation zone in backpropagation. |
| Tanh function |
| It has soft saturation and vanishing gradient disadvantages. |
| ReLU function |
| As the intensity of the training increases, part of the weight update falls into the hard saturation zone, failing to update. |
| ELU function |
| ELU alleviates the vanishing gradient problem. |
| Swish function |
| As the amount of data continues to increase, Swish will have better performance. |
Figure 1Comparison of loss. The horizontal axis represents epoch times (0~17.5), and the vertical axis represents loss value (0~0.8).
Figure 2Comparison of accuracy. The horizontal axis represents epoch times (0~17.5), and the vertical axis represents accuracy value (0.7~1).
Figure 3Comparison of loss. The horizontal axis represents epoch times (0~17.5), and the vertical axis represents loss value (0~0.35).
Figure 4Comparison of accuracy. The horizontal axis represents epoch times (0~17.5), and the vertical axis represents accuracy value (0.8~1).
Comparison of the best loss and accuracy value for each activation function.
| Activation Function | Training Set Loss | Test Set Loss | Training Set Accuracy | Test Set Accuracy |
|---|---|---|---|---|
| Improved Swish activation function | 0.0288 | 0.1088 | 0.9928 | 0.9852 |
| Swish function | 0.0279 | 0.1159 | 0.9933 | 0.9815 |
| ReLU function | 0.029 | 0.1307 | 0.993 | 0.982 |
| ELU function | 0.0308 | 0.1133 | 0.9912 | 0.9802 |
| Sigmoid function | 0.1034 | 0.1066 | 0.9658 | 0.9731 |
Figure 5Reduction curve of the loss function.
Comparison of real and predicted values.
| Sensor Area | Predicted Value | True Value | Sensor Area | Predicted Value | True Value | Sensor Area | Predicted Value | True Value |
|---|---|---|---|---|---|---|---|---|
| f939 | −3.299 | −6.014 | f9310 | 20.291 | 20.623 | 9311 | 23.519 | 24.176 |
| f9312 | 23.388 | 23.659 | f9313 | 23.052 | 23.336 | f9314 | 22.456 | 23.036 |
| f9315 | 21.554 | 22.622 | f9316 | 19.862 | 21.321 | f9317 | 17.045 | 18.237 |
| f9318 | 13.198 | 14.230 | f9319 | 8.647 | 9.440 | f9320 | 3.902 | 4.500 |
| f9321 | −1.152 | 0.202 | f9322 | −8.183 | −7.844 | f9323 | −16.903 | −19.211 |
| f9324 | −23.236 | −25.932 | f9325 | −25.180 | −26.736 | f9326 | −25.405 | −25.921 |
| f9327 | −26.771 | −24.859 | f9328 | −27.216 | −25.265 | f9329 | −23.888 | −24.721 |
| f9330 | −16.256 | −17.145 | f9331 | −6.582 | −6.235 | f940 | 1.962 | 2.316 |
| f941 | 8.507 | 8.611 | f942 | 13.305 | 13.130 | f943 | 16.826 | 16.793 |
| f944 | 19.659 | 19.976 | f945 | 21.367 | 21.737 | f946 | 22.339 | 22.387 |
| f947 | 22.915 | 22.913 | f948 | 23.323 | 23.340 | f949 | 23.283 | 23.501 |
| f9410 | 23.124 | 23.366 | f9411 | 17.751 | 17.802 | f9412 | −7.461 | −7.534 |
Figure 6Comparison of real and predicted values. (a) BP; (b) DNN; (c) Improved DNN. The horizontal axis represents sample points (0~175,000), and the vertical axis represents the predicted plastic shape value (18~24).
Figure 7Three-dimensional fitting effect of improved DNN model.
Comparison of the model.
| MSE of Training Sets | MSE of Test Sets | |
|---|---|---|
| BP | 7.851 | 8.329 |
| DNN | 3.229 | 3.731 |
| Improved DNN | 1.281 | 1.305 |