| Literature DB >> 26417368 |
Antonino Laudani1, Gabriele Maria Lozito1, Francesco Riganti Fulginei1, Alessandro Salvini1.
Abstract
A comprehensive review on the problem of choosing a suitable activation function for the hidden layer of a feed forward neural network has been widely investigated. Since the nonlinear component of a neural network is the main contributor to the network mapping capabilities, the different choices that may lead to enhanced performances, in terms of training, generalization, or computational costs, are analyzed, both in general-purpose and in embedded computing environments. Finally, a strategy to convert a network configuration between different activation functions without altering the network mapping capabilities will be presented.Entities:
Mesh:
Year: 2015 PMID: 26417368 PMCID: PMC4568332 DOI: 10.1155/2015/818243
Source DB: PubMed Journal: Comput Intell Neurosci
Analytic AFs.
| Ref. | Name | Expression | Notes |
|---|---|---|---|
| [ | Step-like |
| |
| Linear |
| ||
| Saturated linear |
| Where to ensure | |
| Sigmoid |
| ||
| Hyp. tangent |
| ||
|
| |||
| [ | Arctangent |
| |
|
| |||
| [ | Quadratic sigmoid function |
| |
|
| |||
| [ | Logarithmic-Exponential |
|
|
|
| |||
| [ | Triangular approximation |
| Where |
|
| |||
| [ | Hermite polyn. |
| |
|
| |||
| [ | Gaussian |
| |
|
| |||
| [ | PolyExp |
| Where |
|
| |||
| [ | Wave |
| |
|
| |||
| [ | Neural |
| Where |
|
| |||
| [ | Sinusoidal |
| Where |
|
| |||
| [ | CosGauss |
| The parameter |
|
| |||
| [ | Sinc |
| |
|
| |||
| [ | SinCos |
| Where |
From sigmoid to tanh.
| Hidden bias | [ |
|
| |
| Hidden weights | [ |
|
| |
| Output bias | [ |
|
| |
| Output weights | [ |
From tanh to sigmoid.
| Hidden bias |
|
|
| |
| Hidden weights |
|
|
| |
| Output bias |
|
|
| |
| Output weights |
|
Summary of analytic AF.
| Ref. | Method | Convergence | Precision | Computational costs | Notes |
|---|---|---|---|---|---|
| [ | Hyperbolic tangent | N/A | MSE = 0.0165 | 0.3435 us (Pentium II Machine) | |
|
| |||||
| [ | Arctangent | 24 Epochs | MSE = 1 | 0.3435 us (Pentium II Machine) | Backpropagation with |
|
| |||||
| [ | Quadratic sigmoid | 4000 Epochs | MSE = 0.1–0.5 (2x more accurate than sigmoid on the same problem) | N/A | Backpropagation with |
|
| |||||
| [ | Logarithmic-Exponential | 250 Epochs | MSE = 0.048 (fitting problem) | 6.1090 us (Pentium II Machine) | |
|
| |||||
| [ | Spline-interpolant | 2000 Epochs | MSE 〈dB〉 = −17.94 dB (3.26 dB less than sigmoid on the same problem) | N/A | Backpropagation with |
|
| |||||
| [ | Hermite polyn. | N/A | 98.5% accuracy (classification problem) | N/A | |
|
| |||||
| [ | Neural | 50 Epochs | 97.6% accuracy (classification problem) | N/A | |
|
| |||||
| [ | Composite AF | 900 Epochs | 92.8% accuracy (classification problem) | N/A | Even distribution of Gaussian, sinusoidal, and sigmoid |
|
| |||||
| [ | Wave | 250 Epochs | MSE = 0.2465 (15x less accurate than sigmoid on the same problem) | 0.3830 us (Pentium II Machine) | |
|
| |||||
| [ | CosGauss | 20 Epochs | MSE = 1.0 (10x more accurate than sigmoid on the same problem) | N/A | Implemented on a cascade correlation network |
|
| |||||
| [ | Sinc | 250 Epochs | MSE = 0.0132 (0.25x more accurate than tanh on the same problem) | 104.3360 us (Pentium II Machine) | |
|
| |||||
| [ | PolyExp | 250 Epochs | MSE = 0.1007 (6x less accurate than tanh on the same problem) | 0.3840 us (Pentium II Machine) | |
|
| |||||
| [ | SinCos | 250 Epochs | MSE = 0.0114 (0.7x more accurate than tanh on the same problem) | 1.1020 us (Pentium II Machine) | |
Summary of fuzzy logic AF.
| Ref. | Method | Convergence | Precision | Computational costs | Notes |
|---|---|---|---|---|---|
| [ | Fuzzy-tanh | 20 Epochs (up to 4x faster than tanh on the same problem) | MAE = 0.039 (2.5x more accurate than tanh) | N/A | |
|
| |||||
| [ | Type 2 Fuzzy | 41 Epochs (5x faster than tanh on the same problem) | MAE = 0.35 | N/A | Backpropagation with learning rate |
|
| |||||
| [ | Fuzzy-tanh 2 | N/A | RMSE = 0.0116 (comparable to tanhon the same problem) | N/A | Trained with extreme machine learning algorithm |
Summary of adaptive strategies.
| Ref. | Method | Convergence | Precision | Computational costs | Notes |
|---|---|---|---|---|---|
| [ | Scalable sigmoid | 6960 Epochs | % error = 0.033 | Training time: 2739 s | Learning rate 0.9 |
|
| |||||
| [ | Sin-sigmoid | 8232 Epochs | % error = 0.045 | Training time: 2080 s | Learning rate 0.9 |
|
| |||||
| [ | Morlet wavelet | 10000 Epochs | % error = 0.097 | Training time: 3046 s | Learning rate 0.2 |
|
| |||||
| [ | Sigmoid-radial-sin | 5250 Epochs | N-RMSE = 0.09301 | N/A | Trained with Levenberg Marquardt Algorithm |
|
| |||||
| [ | Sin-sigmoid | 5000–9000 Epochs | 89.60–94.3% accuracy (classification problem) | N/A | Implemented on higher order NN (HONN) |
|
| |||||
| [ | Trainable AF | 20000 Epochs | RMSE 〈dB〉 = −35 dB | N/A | |
Summary of Lookup-Table approximations.
| Ref. | Method | Convergence | Precision | Computational costs | Notes |
|---|---|---|---|---|---|
| [ | RA-LUT (tanh) | N/A | MSE = 0.0053 | 17.5 us on 50 MHz FPGA | Resources used: |
|
| |||||
| [ | RA-LUT (Logsig) | N/A | MSE = 0.1598 | 17.5 us on 50 MHz FPGA | Resources used: |
|
| |||||
| [ | RA-LUT (Tansig) + FPU | N/A | MSE = 0.0150 | 47 us on 50 MHz FPGA | Resources used: |
|
| |||||
| [ | Error-optimized LUT | N/A | Max. error = 0.0378 | Propagation delay: 0.95 ns (2x faster than classic LUT approach) | Gate Count: 70 |
|
| |||||
| [ | Compact RA-LUT | N/A | Max. error = 0.0182 | Propagation delay: 2.46 ns | Gate Count: 181 |
|
| |||||
| [ | Hybrid | 11 Epochs | % error = 1.88 | Propagation delay: 0.8 ns | Trained on-chip with Levenberg Marquardt Algorithm |
|
| |||||
| [ | LUT | 16 Epochs | % error = 1.34 | Propagation delay: 2.2 ns | Trained on-chip with Levenberg Marquardt Algorithm |
|
| |||||
| [ | RA-LUT | 12 Epochs | % error = 0.89 | Propagation delay: 1.0 ns | Trained on-chip with Levenberg Marquardt Algorithm |
Summary of Piecewise Linear Approximations.
| Ref. | Method | Convergence | Precision | Computational costs | Notes |
|---|---|---|---|---|---|
| [ | Piecewise linear | N/A | MSE = 0.00049 | 213 clock cycles | Resources used: |
|
| |||||
| [ | “Bajger-Omondi” method | N/A | Absolute error: up to 10−6 for 128 pieces with 18-bit precision | N/A | |
|
| |||||
| [ | PWL approximation | N/A | N/A | Propagation delay: 1.834 ns (100 ns more than LUT approach) | Resources used: |
|
| |||||
| [ | A-Law | N/A | % error = 0.63 | Propagation delay: 3.729 ns | Resources used: |
|
| |||||
| [ | Alippi | N/A | % error = 1.11 | Propagation delay: 3.441 ns | Resources used: |
|
| |||||
| [ | PLAN | N/A | % error = 0.63 | Propagation delay: 4.265 ns | Resources used: |
Summary of hybrid and higher order techniques.
| Ref. | Method | Convergence | Precision | Computational costs | Notes |
|---|---|---|---|---|---|
| [ | 4th-order Taylor | N/A | From 99.68% to 45% accuracy (classification problem) | Full NN computation time: 1.7 ms | Resources used: |
|
| |||||
| [ | 5th-order Taylor | N/A | % error = 0.51 | N/A | Resources used: |
|
| |||||
| [ | Hybrid with PWL and RA-LUT | N/A | Up to 6.80 | Elaboration time: 40 | Resources used: |
|
| |||||
| [ | Hybrid with PWL and combinatorial | N/A | Up to 2.28 | Elaboration time: 40 | Resources used: |
|
| |||||
| [ | High precision sigmoid/exponential | N/A | RMSE = 8.362 | Maximum operative frequency: 868.056 MHz | Resources used: |
|
| |||||
| [ | PWL and optimized LUT | N/A | N/A | Propagation delay: 0.06 ns | Resources used: |
|
| |||||
| [ | Four-polynomial tanh | N/A | MSE = 0.0039 | Full NN computation (50 MHz FPGA) | |
|
| |||||
| [ | Five-polynomial tanh | N/A | MSE = 0.0018 | Full NN computation (50 MHz FPGA) | |
|
| |||||
| [ | Five-polynomial Logsig | N/A | MSE = 0.0075 | Full NN computation (50 MHz FPGA) | |
|
| |||||
| [ | Piecewise Quadratic Tanh | N/A | MEA = 4.1 | Throughput rate: 0.773 MHz | Resources used: |
|
| |||||
| [ | Piecewise Quadratic Tanh | 33 Epochs | SE = 0.1 | N/A | |
|
| |||||
| [ | Zhang quadratic approximation | N/A | MEA = 7.7 | Propagation delay: 3.9 ns | Resources used: |
|
| |||||
| [ | Adjusted LUT | N/A | MEA = 0.0121 | Propagation delay: 2.80 ns | Area ( |
|
| |||||
| [ | Adjusted LUT | N/A | MEA = 0.0246 | Propagation delay: 2.31 ns | Area ( |