| Literature DB >> 33004868 |
Marcelo Bertalmío1, Alex Gomez-Villa2, Adrián Martín2, Javier Vazquez-Corral2, David Kane2, Jesús Malo3.
Abstract
The responses of visual neurons, as well as visual perception phenomena in general, are highly nonlinear functions of the visual input, while most vision models are grounded on the notion of a linear receptive field (RF). The linear RF has a number of inherent problems: it changes with the input, it presupposes a set of basis functions for the visual system, and it conflicts with recent studies on dendritic computations. Here we propose to model the RF in a nonlinear manner, introducing the intrinsically nonlinear receptive field (INRF). Apart from being more physiologically plausible and embodying the efficient representation principle, the INRF has a key property of wide-ranging implications: for several vision science phenomena where a linear RF must vary with the input in order to predict responses, the INRF can remain constant under different stimuli. We also prove that Artificial Neural Networks with INRF modules instead of linear filters have a remarkably improved performance and better emulate basic human perception. Our results suggest a change of paradigm for vision science as well as for artificial intelligence.Entities:
Mesh:
Year: 2020 PMID: 33004868 PMCID: PMC7530701 DOI: 10.1038/s41598-020-73113-0
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Schematic of single-neuron spatial summation with a INRF. Contributions of linear dendrites are summed with weights . Contributions of nonlinear dendrites are summed with weights . The nonlinearity is shifted by a local average of the signal I around point x, and this value is obtained by the nonlinear dendrites through feedback from the soma, represented by an arrow.
Figure 2OFF cells become ON cells when the spatial frequency of the stimulus increases. Each panel plots the cell response at the center region as a function of the input value at said region, stimulus shown in inset (red circle denotes center region, which has a gray value inside the input range). The response is computed in this manner: the input stimulus is convolved with a small Gaussian to simulate retinal blur, then the INRF is applied to it and finally the result is rectified (see “Methods”). Notice how in the left and middle-left panels the cell behaves as an OFF cell, since it responds only to stimuli below the average level of 0.5, while the reverse happens for the middle-right and right panels, where the cell responds only to stimuli above the average and therefore behaves as an ON cell.
Figure 3Brightness perception curves (average over all observers) show “crispening” at the surround luminance level when the background is uniform (a), but the phenomenon disappears for salt and pepper background (c). A model based on the INRF that uses Gaussian kernels for m and w qualitatively replicates both cases (b,d) with a fixed set of parameters, which is not possible with a DoG, linear RF formulation.
Figure 4A simple L+NL model, consisting of a difference of Gaussians (DoG) linear RF followed by a pointwise nonlinearity, has to change with the input in order to reproduce the crispening phenomenon. When the model is adjusted to the uniform background condition (red curve), it qualitatively replicates brightness perception for the uniform background case (blue curve, left) but not for the salt and pepper surround case (blue curve, right). The reverse happens when the model is adjusted for the salt and pepper background condition. Both the DoG filter and the nonlinearity change with the stimuli (see “Methods”).
Pearson correlation with Mean Opinion Scores (MOS) in TID2013 database[52] for different image quality metrics: PSNR, SSIM[55], LPIPS[53,54], INRF-IQ (proposed in text).
| PSNR | SSIM | LPIPS | INRF-IQ | |
|---|---|---|---|---|
| Correlation with MOS | 57% | 65% | 76% | 74% |
Figure 5A model based on the INRF qualitatively predicts the observers’ response to White’s illusion when bandpass noise is added. Given 6 different levels of bandpass noise frequency, our model presents the same trend as the observers’ data published in[56]. This is particularly striking when comparing our results with those shown in[56], where none of the vision models that were tried, based on linear RFs, were able to replicate this behaviour.
Figure 6Light/dark asymmetry (“irradiation illusion”): a white square over black background (left) is perceived as being larger than a black square over white background (right). This phenomenon can be reproduced with a L+NL model that changes with the stimulus[57]. Instead we model the irradiation illusion with a fixed INRF followed by clipping in the range [0, 1] (see “Methods”).
Summary of the vision science experiments performed with different instances of the INRF model and the corresponding choice of parameters.
| Experiment | Nonlinearity on input | Kernel | Kernel | Kernel | Nonlinearity | |
|---|---|---|---|---|---|---|
| OFF/ON cells | None | Constant (size = 85 px) | Constant (size = 512 px) | Constant (size = 85 px) | 1000 | |
| Crispening | Naka–Rushton eq. (semisaturation from[ | Gaussian (std = | Gaussian (std = | Delta function | 3.88 | |
| Image quality metric | Power law (lightness channel from CIELAB) | Gaussian (std = | Gaussian (std = | Delta function | 3.88 | |
| White’s illusion | Power law (gamma-corrected data) | Gaussian (std = | Gaussian (std = | Delta function | 3.88 | |
| Irradation illusion | Naka–Rushton eq. (semisaturation = 18, n = 1) | Gaussian (std = | Gaussian (std = | Delta function | 3.88 |
Except for the OFF/ON cells experiment (first row), the results for the other four experiments have all been obtained with the same set of parameter values.
Comparison of classification error between a CNN and the equivalent network using INRF elements instead of linear RF filters.
| Dataset | CNN | INRFnet |
|---|---|---|
| MNIST | 0.48 | 0.43 |
| CIFAR10 | 24.28 | 16.78 |
| CIFAR100 | 57.01 | 48.8 |
| SVHN | 6.26 | 3.41 |
Accuracy against whitebox adversarial attacks on the MNIST dataset.
| Attack methods | FGSM ( | FGSM ( | FGSM ( | DeepFool | Carlini–Wagner ( | Carlini–Wagner ( |
|---|---|---|---|---|---|---|
| CNN | 88.14% | 44.69% | 11.03% | 52.01% | 4.18% | 42.5% |
| INRFnet | 93.14% | 62.23% | 33.42% | 65.27% | 7.24% | 58.06% |
Accuracy against whitebox adversarial attacks on the CIFAR10 dataset.
| Attack methods | FGSM ( | FGSM ( | FGSM ( | DeepFool |
|---|---|---|---|---|
| CNN | 13.27% | 12.26% | 10.79% | 47.63% |
| INRFnet | 19.3% | 16.6% | 15.6% | 57.46% |