| Literature DB >> 30231077 |
Amirhessam Tahmassebi1, Amir H Gandomi2, Simon Fong3, Anke Meyer-Baese1, Simon Y Foo4.
Abstract
In this study, a multi-stage optimization procedure is proposed to develop deep neural network models which results in a powerful deep learning pipeline called intelligent deep learning (iDeepLe). The proposed pipeline is then evaluated by a challenging real-world problem, the modeling of the spectral acceleration experienced by a particle during earthquakes. This approach has three main stages to optimize the deep model topology, the hyper-parameters, and its performance, respectively. This pipeline optimizes the deep model via adaptive learning rate optimization algorithms for both accuracy and complexity in multiple stages, while simultaneously solving the unknown parameters of the regression model. Among the seven adaptive learning rate optimization algorithms, Nadam optimization algorithm has shown the best performance results in the current study. The proposed approach is shown to be a suitable tool to generate solid models for this complex real-world system. The results also show that the parallel pipeline of iDeepLe has the capacity to handle big data problems as well.Entities:
Mesh:
Year: 2018 PMID: 30231077 PMCID: PMC6145533 DOI: 10.1371/journal.pone.0203829
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1iDeepLe flowchart.
Parameters setting for the grid-search stage 1.
| Hyper-parameter | Settings |
|---|---|
| Number of layers | 3, 4, 5, 6, 7, 8, 9, 10, 12, 15, 20 |
| Number of neurons in each layer | 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 15, 20, 25, 30, 35, 40, 45, 50 |
| Batch sizes | 50 |
| Number of epochs | 1000 |
| Activation functions | ReLU, Tanh, Softplus, Softsign, Linear, Softmax, Sigmoid |
| Optimizers | Adam |
| Learning rates | 0.001 |
| Losses | MSE |
| Score metrics | R |
| Number of HPC nodes | 20 |
| Number of folds in cross-validation | 10 |
Fig 2The architecture of the optimized deep neural network model.
Properties of the activation functions used in different layers of iDeepLe.
| Function | Equation | Derivative | Range |
|---|---|---|---|
| ReLU | [0, ∞) | ||
| Softmax | (0, 1) | ||
| Sigmoid | (0, 1) |
Parameters setting for the grid-search stage 2.
| Hyper-parameter | Settings |
|---|---|
| Number of layers | 8 |
| Number of neurons in each layer | 8, 30, 25, 20, 12, 8, 4, 1 |
| Batch sizes | 50 |
| Number of epochs | 1000 |
| Activation functions | ReLU, Softmax, Sigmoid |
| Optimizers | SGD, Adagrad, Adadelta, RMSprop, Adam, Adamax, Nadam |
| Learning rates list | 1.0, 0.1, 0.005, 0.002, 0.001 |
| Losses | MSE |
| Score metrics | R |
| Number of HPC nodes | 10 |
| Number of folds in cross-validation | 10 |
Optimized hyper-parameters for the optimization algorithms after the grid-search stage 2.
| Optimizers | ||||||
|---|---|---|---|---|---|---|
| SGD | 0.01 | None | None | 0.0 | None | None |
| Adagrad | 0.01 | 1e-08 | None | 0.0 | None | None |
| Adadelta | 1.0 | 1e-08 | 0.95 | 0.0 | None | None |
| RMSprop | 0.001 | 1e-08 | 0.9 | 0.0 | None | None |
| Adam | 0.001 | 1e-08 | None | 0.0 | 0.9 | 0.999 |
| Adamax | 0.002 | 1e-08 | None | 0.0 | 0.9 | 0.999 |
| Nadam | 0.002 | 1e-08 | None | 0.004 | 0.9 | 0.999 |
Fig 3Density plots of the predictor input variables.
Fig 4Scatter matrix presentation of the predictor input variables with their probability histograms.
Parameters setting for the grid-search stage 3.
| Hyper-parameter | Settings |
|---|---|
| Number of layers | 8 |
| Number of neurons in each layer | 8, 30, 25, 20, 12, 8, 4, 1 |
| Batch sizes | 50, 100, 150, 200 |
| Number of epochs | 50, 100, 200, 300, 500, 800, 1000 |
| Activation functions | ReLU, Softmax, Sigmoid |
| Optimizers | SGD, Adagrad, Adadelta, RMSprop, Adam, Adamax, Nadam |
| Learning rates | 0.01, 0.01, 1.0, 0.001, 0.001, 0.002, 0.002 |
| Losses | MAE, MSE |
| Score metrics | R, EV |
| Number of HPC nodes | 10 |
| Number of folds in cross-validation | 10 |
Fig 510-folds cross-validation R2 during number of epochs for different batch sizes employing different optimization algorithms.
Fig 6Statistical metrics for different batch sizes employing different optimization algorithms.
Fig 7Histograms of the ratio of the predicted and the measured Ln(SA) for different batch sizes.
Mean and coefficient of variation of this ratio are also reported for different batch sizes. CB 2008 model has also been demonstrated for comparison.
External validation results of the deep models with different batch sizes and CB 2008 model.
| Condition | Batch 50 | Batch 100 | Batch 150 | Batch 200 | CB 2008 | |
|---|---|---|---|---|---|---|
| 0.96264 | 0.96150 | 0.96120 | 0.96031 | 0.91938 | ||
|
| 0.85 < | 0.99710 | 0.98845 | 0.99373 | 0.98863 | 0.96735 |
|
| 0.85 < | 0.99332 | 1.00164 | 0.99632 | 1.00123 | 1.01279 |
|
| 0.67588 | 0.67220 | 0.66959 | 0.66673 | 0.52372 | |
|
|
| 0.99993 | 0.99896 | 0.99967 | 0.99893 | 0.98997 |
|
|
| 0.99966 | 0.99896 | 0.99967 | 0.99893 | 0.98997 |
|
| | | 0.07904 | 0.08056 | 0.08201 | 0.08321 | 0.17120 |
|
| | | 0.07875 | 0.08166 | 0.08225 | 0.08436 | 0.18158 |
Parameters setting for the Adam optimizer using the popular deep learning libraries.
| Libraries | ||||
|---|---|---|---|---|
| Keras | 0.001 | 1e-08 | 0.9 | 0.999 |
| TensorFlow | 0.001 | 1e-08 | 0.9 | 0.999 |
| Caffe | 0.001 | 1e-08 | 0.9 | 0.999 |
| Lasagne | 0.001 | 1e-08 | 0.9 | 0.999 |
| Torch | 0.001 | 1e-08 | 0.9 | 0.999 |
| MxNet | 0.001 | 1e-08 | 0.9 | 0.999 |
| Blocks | 0.002 | 1e-08 | 0.9 | 0.999 |
Fig 8Radar plots of the regression metrics for the most common regression models compared to iDeepLe model.