| Literature DB >> 33285877 |
Adrian Moldovan1,2, Angel Caţaron1,2, Răzvan Andonie3.
Abstract
Current neural networks architectures are many times harder to train because of the increasing size and complexity of the used datasets. Our objective is to design more efficient training algorithms utilizing causal relationships inferred from neural networks. The transfer entropy (TE) was initially introduced as an information transfer measure used to quantify the statistical coherence between events (time series). Later, it was related to causality, even if they are not the same. There are only few papers reporting applications of causality or TE in neural networks. Our contribution is an information-theoretical method for analyzing information transfer between the nodes of feedforward neural networks. The information transfer is measured by the TE of feedback neural connections. Intuitively, TE measures the relevance of a connection in the network and the feedback amplifies this connection. We introduce a backpropagation type training algorithm that uses TE feedback connections to improve its performance.Entities:
Keywords: backpropagation; causality; deep learning; gradient descent; neural network; transfer entropy
Year: 2020 PMID: 33285877 PMCID: PMC7516405 DOI: 10.3390/e22010102
Source DB: PubMed Journal: Entropy (Basel) ISSN: 1099-4300 Impact factor: 2.524
Figure 1This illustrates how the two neurons with indices i and j from a network produce a series of activations. The g threshold is the red line that splits these activations into two groups: the ones above the threshold (blue) and the ones below the threshold (red). They correspond to the and time series, which produce the time series of binary values and used to calculate the TE. The process is applied to all pairs of connected neurons.
The XOR dataset. A training epoch consists of 200 vectors, randomly selected from this dataset.
| Input 1 | Input 2 | Output |
|---|---|---|
| 0 | 0 | 0 |
| 0 | 1 | 1 |
| 1 | 0 | 1 |
| 1 | 1 | 0 |
Figure 2The FF+FB architecture for the XOR problem. The values are calculated between neurons j and i. Neuron i is in layer , neuron j is in layer l. The colored arrows from the neurons show how the outputs from the neurons are used to calculate the values. The same color in a dotted line arrow shows to which weight the is applied (see Equation 3). The bias units are implemented but not shown here since they do not use the values in the algorithm.
Figure 3Ten runs on XOR dataset. Each x axis finishes when the last of the FF+FB or FF reaches either the maximum number of epochs or 100% training accuracy (log scale).
Comparison between the number of epochs required by FF+FB and FF to reach 100% training accuracy on the XOR dataset in 10 runs. The networks did not successfully reach the target accuracy on runs 1 for FF+FB and runs 2,5,7 for FF, showing the maximum number of epochs (300) the training was limited to.
| Run | ||
|---|---|---|
| 1 | 300 | 207 |
| 2 | 32 | 300 |
| 3 | 40 | 261 |
| 4 | 75 | 249 |
| 5 | 28 | 300 |
| 6 | 27 | 213 |
| 7 | 31 | 300 |
| 8 | 30 | 206 |
| 9 | 29 | 237 |
| 10 | 29 | 226 |
|
|
|
|
Comparison of various between FF+FB and FF for the target validation accuracies on specified datasets (average of 10 runs). Whenever the networks did not successfully reach the targets, we used the maximum number of epochs and the last recorded accuracy to calculate the averages.
| Dataset | Target Accuracy | Avg. | Avg. | Accuracy Difference | Max Epochs | ||
|---|---|---|---|---|---|---|---|
| abalone | 52% | 53.01 | 6.2 | 52.16 | 37.5 | 0.84% | 50 |
| car | 73% | 72.21 | 163.5 | 73.14 | 184.2 | −0.92% | 300 |
| chess | 96% | 96.20 | 19.0 | 95.41 | 38.3 | 0.79% | 40 |
| glass | 52% | 52.46 | 154.6 | 35.84 | 294.4 | 16.61% | 300 |
| ionosphere | 92% | 92.17 | 16.4 | 92.26 | 22.5 | −0.09% | 60 |
| iris | 92% | 95.11 | 13.8 | 96.22 | 24.8 | −1.11% | 100 |
| liver | 70% | 68.46 | 212.1 | 61.82 | 294.2 | 6.63% | 300 |
| redwine | 52% | 50.18 | 134.6 | 49.89 | 171.8 | 0.29% | 200 |
| seeds | 85% | 87.46 | 41.3 | 87.14 | 136.2 | 0.31% | 200 |
| divorce | 98% | 98.03 | 6.9 | 98.62 | 7.4 | −0.58% | 20 |