Literature DB >> 34772812

Divisive normalization unifies disparate response signatures throughout the human visual hierarchy.

Marco Aqil^1,2, Tomas Knapen^3,2, Serge O Dumoulin^3,2,4.

Abstract

Neural processing is hypothesized to apply the same mathematical operations in a variety of contexts, implementing so-called canonical neural computations. Divisive normalization (DN) is considered a prime candidate for a canonical computation. Here, we propose a population receptive field (pRF) model based on DN and evaluate it using ultra-high-field functional MRI (fMRI). The DN model parsimoniously captures seemingly disparate response signatures with a single computation, superseding existing pRF models in both performance and biological plausibility. We observe systematic variations in specific DN model parameters across the visual hierarchy and show how they relate to differences in response modulation and visuospatial information integration. The DN model delivers a unifying framework for visuospatial responses throughout the human visual hierarchy and provides insights into its underlying information-encoding computations. These findings extend the role of DN as a canonical computation to neuronal populations throughout the human visual hierarchy.

Entities: Chemical

Keywords: divisive normalization; human visual cortex; information-encoding models; population receptive fields; ultra-high-field fMRI

Mesh：

Year: 2021 PMID： 34772812 PMCID： PMC8609633 DOI： 10.1073/pnas.2108713118

Source DB: PubMed Journal: Proc Natl Acad Sci U S A ISSN： 0027-8424 Impact factor: 12.779

The principal aim of computational neuroscience is to elucidate the theoretical principles underlying brain function. A particularly fruitful approach has been the search for canonical neural computations: the idea that the same mathematical operations may underlie a multitude of seemingly distinct neural, perceptual, and cognitive phenomena. Identifying these biological information-processing algorithms would allow a description of many brain functions in terms of explicit, yet parsimonious, mathematical models (1, 2). Divisive normalization (DN) was first proposed to overcome the inability of linear receptive field models to capture nonlinear phenomena, such as contrast saturation and surround suppression, observed in primary visual cortex (V1) (3, 4). Over the past decades, DN has been used to explain many neurophysiological and behavioral observations in different sensory modalities (5–7), multisensory integration (8, 9), and cognitive domains (10, 11). DN also presents significant computational advantages for neural information processing by reducing redundancy and increasing sensitivity (12). Hence, it is now considered a prime candidate for a canonical neural computation (13). The fundamental notion underlying DN is that the output of neuronal computations generally does not only depend on the stimulus-driven input to the local neuronal population. Rather, responses also depend on the contextual activity of many other neurons. Models based on DN describe neuronal responses as the ratio of these two components: a direct input term in the numerator (activation) and a separate neuronal pool in the denominator (normalization). The divisive interaction of these two populations allows the model to describe a variety of nonlinear, contextual interactions. Evidence for the biological plausibility of DN comes from its ability to explain many aspects of neuronal responses, from single-neuron contrast-response functions and size tuning (14) to attention (10) and cross-orientation suppression (15). Prominent models of neural circuits have shown that biologically realistic substrates can implement DN computations (16–19). Combining direct input and contextual activity of other neurons, which is mathematically formalized by DN, appears to be an important property for healthy brain function (20, 21). Indeed, certain psychiatric disorders have been characterized as disorders of DN; i.e., they may stem from impaired ability to suitably weigh and combine direct input with contextual evidence (22–25). Here, we investigate the canonical nature of DN by extending the principle to the level of neuronal population responses throughout the human visual hierarchy. In vision, the population receptive field (pRF) is the region of visual space that elicits a response from a population of neurons, which may be recorded by using functional MRI (fMRI) or electrodes (26–30). fMRI measurements at ultra-high-field enable the investigation of potential canonical neural computations, such as DN, with high sensitivity and specificity (31, 32). We propose a pRF model based on DN and use fMRI at 7 T to ask whether it can 1) unify existing models; and 2) provide insights into visuo-cortical computations. We show that the DN model unifies and outperforms existing pRF models (27, 33, 34) and uniquely captures a variety of response signatures throughout the human visual cortex, with a single computation. We find systematic variations in both modulatory and spatial parameters of the DN model along the visual hierarchy, and we show how these variations underlie the different properties of responses observed. These findings provide direct evidence for DN as a canonical computation in neuronal populations throughout the human visual hierarchy.

Results

DN Model Outperforms Existing Models by Capturing Surround Suppression, Nonlinear Compression, and their Combination

We compare three existing pRF models—i.e., Gaussian (Gauss) (27), Difference of Gaussians (DoG) (33), and Compressive Spatial Summation (CSS) (34)—with the pRF model based on DN. The respective formulations are given in Fig. 1. The DoG model uses center-surround configurations to capture “suppression”: the negative deflections observed in blood-oxygen-level-dependent (BOLD) time courses when stimuli pass beyond the flanks of the Gauss pRF, particularly in early visual cortex (33). Unlike responses in early visual cortex, responses in late visual cortex do not display prominent suppression, yet are increasingly invariant to stimulus size. That is, large stimuli elicit much smaller responses than the sum of their components. The CSS model, incorporating a static nonlinearity, was proposed to explain these sublinear, or “compressed,” responses (34).

Fig. 1.

Existing pRF models and proposed DN model. (A) Gauss. (B) DoG. (C) CSS. (D) The DN pRF model. The DN model summarizes activation and normalization pRFs as isotropic, two-dimensional Gaussians, G1 and G2, centered on the same position (x0, y0) in visual space (x, y), but with different sizes, σ1 and σ2, and different amplitudes, a and c, respectively. Activation and normalization also have constant “baselines”—that is, an activation constant b and a normalization constant d. In agreement with previous studies, we observe a range of qualitatively different BOLD time courses across the visual cortex in response to spatial visual stimuli (Fig. 2), showing suppression (Fig. 2 , gray regions), compression (Fig. 2 ), and combinations of the two in the same single time course (Fig. 2 ). The improvement brought about by the DN model is particularly evident in comparison with the linear models (Gauss and DoG), where nonlinear summation is present (Fig. 2 ). The improvement of the DN model is also evident in comparison with the models without center-surround configurations (Gauss and CSS) in time courses where suppression is present (Fig. 2 ). Furthermore, going beyond existing models, the DN model is uniquely able to describe both phenomena with a single computation, whereas the other extant models can only capture one aspect at best. Hence, it is the only model capable of aptly capturing time courses showing combinations of compression and suppression (Fig. 2 ). Overall, these results show that the DN model is able to outperform all existing pRF models at the level of single BOLD time courses, by capturing response patterns showing compression, suppression, and combinations thereof.

Fig. 2.

PRF model based on DN outperforms existing models in single time courses and visual regions throughout the human visual cortex. (A–D) Example time course showing suppression (A), compression (B), and combinations of the two (C and D). The DN model supersedes existing models by simultaneously capturing both with a single computation. A–D, Insets on each time course show the pRF spatial profile. (E) We evaluate model performance based on the cross-validated variance explained (cvR2), where we fit model parameters on one-half of the data and evaluated model predictions with the other, independent half of the data. The model with highest cvR2 is shown across the cortical surfaces of two participants. (F) cvR2 distribution. Black and white horizontal lines within each violin plot are the mean and median of the distribution, respectively. Asterisks represent statistical significance of the performance difference between the DN model and the other models; the color and number of the asterisks represent which model performed better and statistical significance of the difference (Fisher’s permutation test. *; **; ***. No asterisks indicates no significant difference in either direction. (G) Mean cvR2 difference of DoG, CSS, and DN models relative to the original Gauss model. Asterisks represent statistical significance of the cvR2 difference between Gauss and other models (same significance as in F). Next, we directly compare the four models’ performance across nine visual regions spanning the hierarchy of human visual cortex (V1, secondary visual cortex [V2], third visual cortex [V3], visual area 3AB [V3AB], human visual area V4 [hV4], lateral occipital [LO], temporal occipital [TO], ventral occipital [VO], and intraparietal sulcus [IPS]) (35, 36). Fig. 2 shows the best model at each cortical location for two participants. Fig. 2 shows model performance and statistical significance of the difference between DN and other models (if present), with the DN model generally outperforming all others throughout the visual hierarchy. Fig. 2 shows the performance difference compared to the standard Gauss model and statistical significance of the difference between Gauss and other models. Combined, these quantifications show that the DN model significantly and systematically outperforms all other models across the visual hierarchy. To verify that these findings are not unique to a specific stimulation protocol, we devised four additional experimental conditions, in which we varied the width and speed of the visual stimulus to elicit potentially different response signatures. The DN model also systematically outperforms existing models throughout these additional experimental conditions ().

DN Model Constants Modulate Suppression and Compression

Next, we sought to disentangle how the DN model can capture patterns of suppression, compression, and their combination. To this end, we simulated the DN model pRF profile (a one-dimensional slice of the pRF shape in visual space) and the one-dimensional response function (the model response to the fraction of activation and normalization pRFs being stimulated). Simulating pRF profiles allows us to investigate how parameters modulate the spatial aspect of DN model behavior, while response functions allow us to investigate which parameters modulate the (non)linearity of DN model responses. Note that both pRF profiles and response functions are simplifications of the full model. Specifically, the pRF profiles do not take into account the nonlinearity of model responses, while the response functions disregard the effect of potentially different sizes σ1 and σ2 of the activation and normalization pRFs. The combination of these aspects is discussed separately in Size Ratio of Normalization and Activation pRFs Controls Spatial Oversaturation, through the analysis of size–response curves. Simulations of DN pRF profiles show that a nonzero value of the activation constant (b) is necessary for the model to create suppressive center-surround configurations (Fig. 3), which generate below-baseline time-course deflections (e.g., Fig. 2 ). Increasing the value of the activation constant generates progressively more pronounced signatures of suppression (Fig. 3). On the other hand, simulations of the model response function show that the normalization constant (d) inversely modulates the strength of compression in the model response (Fig. 3). That is, higher values of the normalization constant will generate a more linear response, while lower values generate responses showing stronger signatures of CSS. Hence, these simulations indicate that the activation constant b might play a key role in modulating the strength of suppression and that the normalization constant d may be the key modulator of compression in DN model responses.

Fig. 3.

Activation and normalization constants b and d modulate suppression and compression in the DN model. (A) The activation constant b modulates the strength of center-surround configurations in simulated DN pRF profiles. (B) The normalization constant d modulates the strength of nonlinear compression in simulated DN model response functions. (C) Low compression–low suppression (Gauss-like), high compression–low suppression (CSS-like), and low compression–high suppression (DoG-like) configurations may be captured as special cases of the (b, d) parameter space. In addition, the DN model can also capture high suppression–high compression configurations. (D–F) The activation constant b decreases systematically along the visual hierarchy (D), correlates with the strength of suppression in the DoG model (E), and with the performance improvement brought about by the DoG model over Gauss (F). This provides empirical evidence for the role of the activation constant b as modulator of suppression. (G–I) The normalization constant d decreases systematically along the visual hierarchy (G), correlates with the exponent of the CSS model (H), and with the performance improvement brought about by the CSS model over Gauss (I). This provides empirical evidence for the role of the normalization constant d as modulator of compression. Bar heights in D and G show the R2-weighted mean across participants, per visual region; data points show the R2-weighted mean for each participant’s visual region; error bars are SEM, corrected for volume-to-surface upsampling. Data points in E, F, H, and I represent the R2-weighted mean for eight sorted bins, each containing approximately equal numbers of vertices, pooled across participants, per visual region; error bars are SEM within each bin, corrected for volume-to-surface upsampling. Solid lines show best-fit linear regression; shaded areas represent CIs on the regression line obtained by bootstrapping. Together with the finding that eccentricity-size relationships are broadly similar across models (), these theoretical considerations suggest that qualitatively different response signatures, previously described by separate models and thought to represent different computations, may arise from a single canonical computation (DN) with locally varying parameters. In other words, Gauss, DoG, and CSS models may capture specific subregions of the DN model’s b–d parameter space (Fig. 3). Previous studies found that suppressive center-surround configurations are more prominent in early, rather than late, visual cortex (33). Conversely, CSS is more prominent in late, rather than early, visual cortex (34). If DN model activation and normalization constants are indeed indices of suppression and compression, we expect their values to change systematically from early to late visual cortex and correlate with relevant properties of the DoG and CSS models. Indeed, we find that the DN model activation and normalization constants b and d show systematic variations across the visual hierarchy, consistent with their hypothesized roles of suppression and compression modulators (Fig. 3 and ). The activation constant (b) positively correlates with the strength of suppression in the DoG model, as indexed by the ratio of DoG model time-course prediction area below and above baseline (Fig. 3). The normalization constant (d) positively correlates with compression in the CSS model, as indexed by the CSS model’s exponent parameter (Fig. 3). Furthermore, the constants also correlate with the R2 improvement brought about by the DoG and CSS models over the Gauss model, respectively (Fig. 3 ). Hence, variations of activation and normalization constants in the DN model recapitulate the contributions of center-surround configurations and CSS captured separately by the CSS and DoG models. We also note that variations in baseline constants do not appear to be related to variations in pRF sizes and amplitudes (). In sum, simulations, empirically observed systematic variations, and correlations with independent properties of existing models provide converging evidence that the DN model captures suppression and compression through local variations in its activation and normalization constants.

Size Ratio of Normalization and Activation pRFs Controls Spatial Oversaturation

In the previous sections, we have provided evidence that the DN model captures suppressive and compressive response signatures through local variations in activation and normalization constants. In addition, the DN model also allows us to probe the visuospatial information-integration properties of neuronal populations at different stages of the visual hierarchy. To do this, we simulate and empirically estimate stimulus size–response curves and attempt to identify relevant DN model parameters controlling different properties of the curves. In particular, we show that the ratio of normalization and activation pRF sizes (σ2 and σ1) is directly responsible for the presence (or absence) of spatial oversaturation. Single-cell and imaging recordings in early visual cortex have investigated responses to stimuli of increasing size, presented at the center of the receptive field (34, 37–39). Responses generally initially increase with increasing stimulus size, reach a maximum, and finally decrease (or plateau). Hence, in size–response curves, there are two potential nonlinearities to take into account: a first nonlinearity describing the initial rise of the curve and a second nonlinearity in the decrease or plateau of the response after the peak has been reached. The first nonlinearity reflects nonlinear spatial summation; the second nonlinearity reflects “spatial oversaturation.” Linear models, such as Gauss and DoG, cannot capture the first nonlinearity, since they lack the flexibility to nonlinearly sum inputs (33). Conversely, models without a surround, such as CSS and Gauss, are unable to capture the second nonlinearity: Their responses increase monotonically with stimulus size (34). Unlike other models, the DN model possesses both of these characteristics. Hence, we reasoned that it should be the most suitable model for capturing both nonlinearities in stimulus size–response curves. Confirming its role of compression modulator observed in the DN response function (Fig. 3), we found that the normalization constant d controls the first nonlinearity in theoretical size–response curves (Fig. 4 ). Concerning the second nonlinearity, we found the ratio of normalization and activation pRF sizes to be the crucial parameter. implies no spatial oversaturation: After reaching the maximal possible value, response remains constant despite increasing stimulus sizes; implies that responses will oversaturate, with larger values of generating stronger spatial oversaturation (Fig. 4). Hence, we found that the combined variation of d and allows the DN model to describe both nonlinearities in size–response curves (Fig. 4).

Fig. 4.

The size ratio of activation and normalization pRF sizes provides an index of spatial oversaturation. (A) Simulations indicate that the normalization constant d (inversely) controls the strength of compression, i.e., the first nonlinearity, in size–response curves. This effects cannot be captured by Gauss and DoG models, since they do not include nonlinear spatial summation mechanisms. (B) The size ratio of normalization and activation pRFs, , controls the strength of spatial oversaturation, i.e., the “second nonlinearity.” Higher values of generate stronger oversaturation. This effect cannot be captured by CSS and Gauss models, since they do not include surround mechanisms. (C) The combined effect of d and produces a range of size–response curves. (D) The size ratio decreases systematically along the visual hierarchy, suggesting larger size differences between activation and normalization pRFs and, hence, stronger oversaturation in early visual cortex. (E) DN model size–response curves for each of the visual regions under examination, showing a gradual, systematic variation across the visual hierarchy. This is consistent with the hypothesized roles and observed variations of DN model parameters. Only the DN model can capture the different properties of observed size–response curves, indicative of different information-integration strategies, with a single computation. Bar heights in D show the R2-weighted mean across participants, per visual region; data points show the R2-weighted mean for each participant’s visual region; error bars are SEM, corrected for volume-to-surface upsampling. Size–response curves in E are based the R2-weighted mean of DN model parameters, per visual region, across participants; shaded regions represent SEM across participants. Verifying these theoretical results empirically, we found that the size ratio systematically decreases along the visual hierarchy, indicating stronger spatial oversaturation in early visual cortex (Fig. 4 and ). Note that we did not constrain the size of the normalization pRF to be larger than the activation pRF; in principle, it could have been of equal size or smaller. Hence, the consistently larger size of the normalization pRF is an empirical finding, not an assumption built in the model. The variation in size ratio suggests that early visual cortex populations have a more marked size difference between activation and normalization pRFs, leading to oversaturating responses to larger stimuli, whereas in late visual cortex populations, the tuning of activation and normalization pRFs is relatively similar, producing a more invariant response to stimuli beyond the optimal size. We then computed size–response curves for each of the nine visual regions under investigation, based on the mean parameters in each region. We found that the shape of the curves varies systematically and robustly across the visual hierarchy, consistent with the hypothesized roles of d and (Fig. 4 and ). Importantly, size–response curves take into account all the model parameters simultaneously: pRF sizes, amplitudes, and baselines. If the variations observed in DN model parameters were due to some spurious collective rescaling, rather than to meaningful changes in responses, we would expect the size–response curves of different visual regions to collapse on each other. However, we did not find this to be the case. Hence, the analysis of size–response curves confirms that the DN model parameters capture specific, systematic variations in visuospatial response signatures along the visual hierarchy, with the size ratio being the key modulator of spatial oversaturation. In sum, we show that the DN model is the most suitable to capture the full range of both nonlinearities (nonlinear spatial summation and spatial oversaturation) present in size–response curves. We found that the model is able to capture these nonlinearities through variations in the normalization constant (d) and the size ratio of normalization and activation pRFs (), respectively. In particular, this identifies the size ratio, , as the parameter of interest for the investigation of visuospatial information integration in neuronal populations.

Discussion

Here, we introduce a pRF model based on DN. The model provides a single coherent explanation of a highly diverse set of response signatures, which previously required a similarly diverse multitude of pRF models (27, 33, 34). The DN model supersedes these existing models by unifying them in a biologically realistic and interpretable framework, in which modulatory and spatial integration processes capture separate aspects of visuospatial responses. Visuospatial responses vary systematically across the visual hierarchy. Specifically, at the population level, and in line with previous observations, suppression is more prominent in early, rather than late, visual cortex (33), while compression is more prominent in late, rather than early, visual cortex (34). The DN model captures these patterns across the visual hierarchy through variations in its activation and normalization constants b and d (Fig. 3). Importantly, the joint variation of these parameters allows DN to explain time courses that simultaneously exhibit both compression and suppression. The models geared toward explaining either compression or suppression are incapable of this. Thus, the activation and normalization constants provide a general mechanism for modulation of the DN computation, allowing it to capture a variety of visuospatial response signatures. Previous studies employing electrophysiological recordings in humans and animals have shown specific neural correlates of pRF suppression and compression (30, 40, 41). Hence, it seems to us at least plausible that the modulatory action of activation and normalization constants could reflect not only a computational, but also a biological, property of brain responses. Application of the DN model to these recordings could further clarify the biological interpretation of the model parameters. We speculate that modulatory constants b and d may be related to the baseline firing rates of activation and normalization pools and, hence, may serve as a proxy of the excitation–inhibition balance, which is known to be important for visuospatial computations (42–44). In line with this speculation, recent work has shown that several modulatory neural properties, such as receptor density profiles, specificity of responses, and excitability, all covary along a principal gradient from early sensory to late sensory to associative cortex (45), as does resting-state functional connectivity (46). The variations and the hypothesized computational roles of the modulatory constants b and d along the visual hierarchy are highly reminiscent of these gradients. Thus, the DN pRF model may offer a bridge between computation and biology. We show that the DN model is uniquely able to describe stimulus size–response curves with a wide range of properties (Fig. 4). In particular, the size ratio of normalization and activation pRFs () controls spatial oversaturation, the second nonlinearity in size–response curves. As describes the spatial extent of the overlap and, hence, the amount of interaction between activation and normalization components, it provides a proxy for visuospatial integration properties of neuronal populations. We speculate that variations in size ratio across the visual hierarchy may reflect a systematic variation in the extent of feedback and lateral connectivities, which are thought to underlie cortical mechanisms of spatial oversaturation (37, 47–50). A variety of different DN formulations have been proposed in the literature. For example, some models include distinct exponentiations of activation and normalization responses or an additional activation term in the denominator (7). In choosing a specific formulation for the DN pRF model, largely following ref. 1, our goal was to maximize parsimony, while still allowing the model sufficient flexibility to capture the disparate response signatures observed throughout the visual hierarchy. In our view, a formulation consisting of a Gaussian pRF in the numerator and one in the denominator, with respective sizes, amplitudes, and baselines, was the most intuitive DN formulation to achieve this goal. Within this formulation, multiple lines of evidence suggest that spatial suppression, compression, and oversaturation may be conceptualized as modulations of an underlying DN computation exerted by locally varying baseline constants and size ratio. We believe that our formulation has significant merits in terms of parsimony, flexibility, and interpretability. However, we would not claim that this is the only possible way to conceptualize these phenomena, that ours is the only potentially suitable formulation of a DN pRF model, or that it is necessarily the most parsimonious possible formulation. It is possible that, with additional or fewer computations and/or model parameters, phenomena may be conceptualized in different ways, leading to different interpretations. In particular, exponentiation may be a useful computational component to build back in the model if attempting to capture richer datasets. With regard to the ongoing debate concerning the isotropy of pRF shapes (51–56), we note that the shape of the activation and normalization pRFs could be adjusted. Although we support the evidence that visual pRFs are nearly Gaussian, the model could be easily extended to incorporate spatially skewed pRFs. Modeling brain responses to naturalistic stimuli is a crucial goal of computational neuroscience (57) and is likely to involve DN computations (58). Some authors have already proposed models that effectively capture BOLD responses to naturalistic visual stimuli (58, 59). These models are more complex than purely spatial pRF models, because they incorporate additional dimensions of feature tuning, such as spatial frequency and orientation. Nevertheless, the topographic representation of visual space has been shown to structure activations throughout the visual system (60, 61), and there are significant benefits to employing artificial stimuli varying only in some specific dimensions (62). In such a simplified context, it becomes significantly easier to interpret results and to more explicitly probe which computations are crucial for the neural representation. In other words, we argue that our singular focus on the visuospatial dimension allows DN’s canonical role to unequivocally manifest. Beyond the visuospatial application presented here, the model may also be applied in other sensory and cognitive domains. For instance, in numerosity and audition, pRFs have been successfully characterized with log-Gaussian shapes (40, 63, 64). Based on these findings, it seems plausible that DN pRF models in these domains would also be better characterized by log-Gaussian, rather than Gaussian, activation and normalization pRFs. Further domain-specific considerations may also be necessary: The normalization pRF may not simply be a tuning curve with a different size and the same position with respect to the activation pRF; it could, for example, have a different shape, be centered at a different position, or even exist in a different stimulus dimension. Our findings reinforce the notion of DN as a canonical neural computation (1, 13). We provide evidence that DN represents a canonical information-processing algorithm in neuronal populations throughout the entire human visual hierarchy, far beyond single neurons in V1. The applicability of the DN pRF model is not limited to sensory processing, but may also be extended toward cognitive and clinical directions (28–30, 65). For example, DN has been employed to describe effects of attention on single neurons (10). Electrophysiological studies (66, 67) and fMRI retinotopic mapping (68, 69) have shown that attention also modulates responses at the level of neuronal populations. These modulations have been modeled in the pRF framework (70–74). Hence, the DN pRF model may be instrumental in the effort to investigate attentional modulations of population responses. We remark that because of the complexity of fMRI analysis, careful evaluations, simulations, and validations of past and future studies are required to ensure that conclusions are not contaminated by statistical artifacts (75–77). As a clinical example, Northoff and Mushiake (24) have recently proposed that certain psychiatric disorders, i.e., schizophrenia and major depressive disorder, may be interpreted as dysfunctions of DN. In line with this proposal, pRFs have also shown to be systematically altered in autism and schizophrenia (28, 78, 79). Investigating alterations of DN pRF model parameters in clinical conditions may then shed new light on the underlying computational mechanisms and their disorders. Thus, beyond sensory processing, the canonical nature of the DN pRF model appears to be a promising avenue for cognitive and clinical neuroscience. In sum, in this work, we present a pRF model based on DN. The combination of pRFs, DN, and fMRI uniquely allows us to probe both modulatory aspects and visuospatial information integration in neuronal population responses. We show that the DN model outperforms and supersedes existing models by providing a unified, interpretable, and biologically plausible explanation of phenomena that were previously thought to be unrelated. DN is, hence, revealed to underlie visuospatial neuronal population responses as their properties vary over large scales. Our findings consolidate the notion of DN as a canonical neural computation throughout the human brain.

Materials and Methods

Stimulus Description

The visual stimuli were generated by using the Python PsychoPy package (80). Stimuli were displayed on an MRI-compatible screen located outside the bore (Cambridge Research Systems 32-inch LCD widescreen, 1,920 × 1,080 resolution, 120-Hz refresh rate). The participants viewed the screen by mirrors placed on top of the scanner head coil. We used a bar-shaped stimulus with a checkerboard pattern (100% Michelson contrast) vignetted by a circular aperture (27). The stimulus aperture diameter subtended ∼10∘ of visual angle. The checkerboard pattern inside the bar moved parallel to the bar orientation, and the bar itself stepped in the perpendicular direction at every repetition time (TR). Four bar orientations (0∘, 45∘, 90∘, and 135∘) and two different motion directions were used, giving a total of eight different bar configurations in each condition. We varied the width and speed of the bar stimulus, obtaining five different experimental conditions (Fig. 5). The width of the bar subtended 2.5∘, 1.25∘, or 0.625∘, depending on the condition (four units, two units, or one unit); the bar swept across the stimulus aperture in TR-locked steps of 0.3125∘, 0.625∘, or 1.25∘, depending on the condition (Slow, Regular, or Fast). A period of 15 s of mean luminance (0% contrast) was presented every two bar passes. Additionally, mean luminance periods of 15 s and 30 s were presented at the beginning and end of each scan, giving a total of five blocks of mean luminance during each scan, presented at evenly spaced intervals. To ensure that participants fixated at the center of the screen, a small fixation dot (0.1∘) was presented in the middle of the stimulus. This fixation dot changed color (red to green) at semirandom time intervals, and participants were instructed to report this change of color. Responses were recorded with an air-pressure button and monitored in real time to guarantee that participants maintained responsiveness throughout each scan and served as a proxy for fixation.

Fig. 5.

PRF mapping stimuli in different experimental conditions. Schematic representations of the five experimental conditions are shown. High-contrast checkerboard bars of varying speed and size sweep the screen in TR-locked steps (here, 1 TR = 1.5 s).

pRF Models

We extended the method first described for the original Gaussian pRF model in ref. 27. Here, we fit and compared three existing pRF models—Gauss, DoG (33), and CSS (34)—and a pRF model based on DN. Model predictions were generated according to the respective formulation and parameters, convolved with a standard hemodynamic response function. For the Gauss, CSS, and DoG models, the predicted neural response to an arbitrary time-dependent, two-dimensional, contrast-defined visual stimulus at time t is: In the DN model, both the activation and normalization pRFs are isotropic, two-dimensional Gaussian functions of space, G1 and G2, centered on the same position (x0, y0) in visual space (x, y), but with different sizes, σ1 and σ2, and different amplitudes, a and c, respectively. The inclusion of variable amplitudes for both activation and normalization pRFs ensures that a larger pRF will not necessarily generate a larger response to a stimulus of the same relative size and accounts for different choices of integral or of pRF volume normalization, without loss of generality. Activation and normalization components also have “baseline” constants—that is, an activation constant b and a normalization constant d, respectively. Hence, the model’s predicted neural response is: In the above equations, the spatial dependence of pRFs and the spatiotemporal dependence of stimuli are omitted for brevity, and we denote:with representing element-wise product.

Fitting

PRF modeling was carried out with a dedicated Python package, prfpy. At every cortical location, the parameters of each model were optimized by minimizing the residual sum of squares between the model prediction and the measured BOLD signal. Similarly to ref. 33, we estimated the baseline of the BOLD signal at each cortical location using the mean-luminance blocks. During model fitting, the BOLD baseline was set to the value estimated from data for all models. Fitting began with a grid search for the best Gauss model parameters (pRF position and size) and an iterative optimization. The optimal position and size parameters of the Gauss models were used as starting values for iterative optimizations of all other models. For the DN model, we also carried out an intermediate grid search for its additional parameters, before the same iterative optimization routine used for the CSS and DoG models. This is because the parameter space of the DN model is of higher dimensionality and far less smooth than that of other models. This additional grid stage does not confer benefits when applied to the simpler models. The subtraction of the constant ratio b/d in Eq. was applied so that the DN model predicted zero response in the absence of a stimulus, since the BOLD baseline value does not carry any intrinsic physical interpretation. For non-fMRI data, where the neural baseline activity may be directly measured and contain useful information, the subtraction could be removed. In that case, the baseline firing rate (model prediction in the absence of a stimulus) would be equal to the ratio of activation and normalization constants (b/d).

Performance

We compared model performance in terms of cross-validated variance explained (cvR2) on a 50–50 even–odd-scans data split for the participants where the higher number of scans provided higher signal reliability (two participants, six repeated scans per condition). Statistical significance of performance differences was evaluated with a permutation test, corrected for the volume-to-surface upsampling of time courses. For each visual region, we selected 10,000 randomized subsamples of vertices, with the same proportion as the volume-to-surface upsampling. For each subsample, the mean cvR2 difference between pairs of models was compared with the mean of a null distribution obtained by randomizing the model label of vertices in the subsample. A statistically significant difference in the direction of either model in the pair was confirmed if the proportion of times the mean of the subsample was greater or smaller, respectively, than the mean of the randomized null-distribution was smaller than across all subsamples. Evaluating model statistical significance with other common nonparametric tests (Kolmogorov–Smirnov statistics or Wilcoxon signed-rank test) did not result in any major differences in the statistical significance.

Parameter Analysis

Bar heights in parameter plots represent the R2-weighted mean of parameters in each visual region across participants; data points represent the R2-weighted mean of parameters in each participant’s visual region; error bars on data points represent SEM within each participant’s visual region, corrected for volume-to-surface upsampling. Data points in scatter plots represent the R2-weighted mean of data, split in eight bins containing approximately equal numbers of vertices; error bars represent SEM within each bin, corrected for volume-to-surface upsampling. Solid lines show best-fit linear regression; shaded areas represent CIs on the regression line obtained by bootstrapping. Only vertices where the model had and eccentricity within 0∘and 4.5∘ are used. Data from all five experimental conditions are included. Cortical surface visualizations were obtained with the dedicated software package pycortex (81).

PRF Profiles and Response Function

We simulated the DN pRF profile P(x) (Fig. 3), a one-dimensional slice of the pRF shape in visual space, and the response function r(s) (Fig. 3), the model response to the fraction s of activation and normalization pRFs being stimulated:where G1 and G2 are Gaussians. Note that both the DN pRF profile P(x) and the response function r(s) are simplifications of the full model behavior. In particular, P(x) does not take into account the nonlinear behavior of the model; r(s) does not take into account the potentially different sizes of activation and normalization pRFs.

Size–Response Curves

Size–response curves (Fig. 4) were computed as the model’s predicted response to a circular stimulus presented at the center of the pRF. Size–response curves take into account all model parameters: amplitudes, sizes, and baseline constants. Hence, they account for both the potentially different sizes and amplitudes of activation and normalization pRFs, as well as nonlinearities in the response. The size–response curve of each visual region is based on the R2-weighted mean of DN model parameters; shaded regions represent SEM across participants. Size–response curves for individual participants are shown in . Only vertices where the model had and eccentricity within 0∘ and 4.5∘ were used. Data from all five experimental conditions are included.

Participants

Seven participants (ages 25 to 41 y, two female) participated in this study. All participants had normal or corrected-to-normal visual acuity. All studies were performed with the informed written consent of the participants and were approved by the Human Ethics Committee of Vrije Universiteit Amsterdam.

Anatomical Scans

T1-weighted and T2-weighted structural MRI data were acquired by using a Philips Achieva 7-T scanner with a 32-channel Nova Medical head coil, at a resolution of 0.7-mm isotropic. Freesurfer 7.1 recon-all was used to obtain native cortical surface reconstructions for each participant. The software makes use of the T2w image to refine the segmentation obtained by T1w alone, particularly in the exclusion of sinus and at the pial surface border.

Functional Scans

fMRI data were acquired by using a Philips Achieva 7-T scanner with an 32-channel Nova Medical head coil. The participants were scanned with a two-dimensional EPI sequence with 60 slices. The following parameters were used: TR = 1,500 ms, echo time (TE) = 23 ms, and flip angle = 65∘. The functional resolution was 1.7 mm isotropic, with a field of view of 216 × 216 mm. The first 10 s of recorded data at the beginning of each scan were automatically discarded to avoid start-up magnetization transients. The scan durations were 570 s, 330 s, or 210 s, depending on the condition (Slow, Regular, or Fast). For five participants, two scans per condition were collected, over separate sessions, giving a total of 10 scans per participant. For 2 participants, 6 scans were collected for each of the 5 conditions, during multiple sessions, giving a total of 30 scans per participant. Repeated scans of the same experimental condition were averaged to obtain a high signal-to-noise ratio. Foam padding, or custom head-casts in the case of subject (sub)-006 and sub-007, were used to minimize head movement. At the end of each experimental scan, a top-up scan with opposing phase-encoding directions was recorded, in order to perform susceptibility distortion correction.

Preprocessing

“For each of the BOLD scans found per participant, the following preprocessing was performed. First, a reference volume and its skull-stripped version were generated by using a custom methodology of fMRIPrep. Head-motion parameters with respect to the BOLD reference (transformation matrices and six corresponding rotation and translation parameters) were estimated before any spatiotemporal filtering by using mcflirt (ref. 82; FSL 5.0.9). BOLD runs were slice-time-corrected by using 3dTshift from AFNI 20160207 (ref. 83; Research Resource Identifier [RRID]: SCR_005927). A B0-nonuniformity map (or fieldmap) was estimated based on two (or more) echo-planar imaging (EPI) references with opposing phase-encoding directions, with 3dQwarp from AFNI 20160207 (ref. 83; RRID:SCR_005927). Based on the estimated susceptibility distortion, a corrected EPI reference was calculated for a more accurate coregistration with the anatomical reference. The BOLD reference was then coregistered to the T1w reference by using bbregister (FreeSurfer), which implemented boundary-based registration (84). Coregistration was configured with six degrees of freedom. The BOLD time series were resampled onto the following surfaces (FreeSurfer reconstruction nomenclature): fsnative. The BOLD time series (including slice-timing correction when applied) were resampled onto their original, native space by applying a single, composite transform to correct for head-motion and susceptibility distortions. All resamplings can be performed with a single interpolation step by composing all the pertinent transformations (i.e., head-motion transform matrices, susceptibility distortion correction when available, and coregistrations to anatomical and output spaces). Surface resamplings were performed by using mri_vol2surf (FreeSurfer). Many internal operations of fMRIPrep use Nilearn 0.6.2 (ref. 85; RRID:SCR_001362), mostly within the functional processing workflow. For more details of the pipeline, see the section corresponding to workflows in fMRIPrep’s documentation (https://fmriprep.org/en/latest/workflows.html).” The preprocessed BOLD time courses in fsnative space output by fMRIPrep were converted to percent signal change. Part of the above text was automatically generated by fMRIPrep with the express intention that users should copy and paste this text into their manuscripts unchanged. It is released under the CC0 license.

Visual Regions

Visual regions (V1, V2, V3, V3AB, hV4, LO, TO, VO, and IPS) were defined for each participant on the basis of individual polar angle and eccentricity maps, following refs. 35 and 36.

79 in total

Divisive normalization unifies disparate response signatures throughout the human visual hierarchy.

Results

DN Model Outperforms Existing Models by Capturing Surround Suppression, Nonlinear Compression, and their Combination

DN Model Constants Modulate Suppression and Compression

Size Ratio of Normalization and Activation pRFs Controls Spatial Oversaturation

Discussion

Materials and Methods

Stimulus Description

pRF Models

Fitting

Performance

Parameter Analysis

PRF Profiles and Response Function

Size–Response Curves

Participants

Anatomical Scans

Functional Scans

Preprocessing

Visual Regions

1. Modeling center-surround configurations in population receptive fields using fMRI.

2. The retinotopy of visual spatial attention.

3. A computational perspective on autism.

4. Population Receptive Field Shapes in Early Visual Cortex Are Nearly Circular.

5. Compressive spatial summation in human visual cortex.

6. Computing with a canonical neural circuits model with pool normalization and modulating feedback.

Review 7. Ultra-high field MRI: Advancing systems neuroscience towards mesoscopic human brain function.

8. A Neural Signature of Divisive Normalization at the Level of Multisensory Integration in Primate Cortex.

Review 9. Circuits and Mechanisms for Surround Modulation in Visual Cortex.

10. Top-down feedback controls spatial summation and response amplitude in primate visual cortex.

1. Normalizing population receptive fields.

2. Divisive normalization is an efficient code for multivariate Pareto-distributed environments.