Literature DB >> 35452293

A spatiotemporal mechanism of visual attention: Superdiffusive motion and theta oscillations of neural population activity patterns.

Guozhang Chen1,2,3, Pulin Gong1,2.   

Abstract

Recent evidence has demonstrated that during visual spatial attention sampling, neural activity and behavioral performance exhibit large fluctuations. To understand the origin of these fluctuations and their functional role, here, we introduce a mechanism based on the dynamical activity pattern (attention spotlight) emerging from neural circuit models in the transition regime between different dynamical states. This attention activity pattern with rich spatiotemporal dynamics flexibly samples from different stimulus locations, explaining many key aspects of temporal fluctuations such as variable theta oscillations of visual spatial attention. Moreover, the mechanism expands our understanding of how visual attention exploits spatially complex fluctuations characterized by superdiffusive motion in space and makes experimentally testable predictions. We further illustrate that attention sampling based on such spatiotemporal fluctuations provides profound functional advantages such as adaptive switching between exploitation and exploration activities and is particularly efficient at sampling natural scenes with multiple salient objects.

Entities:  

Year:  2022        PMID: 35452293      PMCID: PMC9032965          DOI: 10.1126/sciadv.abl4995

Source DB:  PubMed          Journal:  Sci Adv        ISSN: 2375-2548            Impact factor:   14.957


INTRODUCTION

Attention is one of the most central cognitive functions governing the efficient selection and gating of sensory events and actions (, ). Understanding the neural mechanisms underlying attentional selection is a key long-standing question in systems and computational neuroscience (–). Conventional models of attention suggest that sustained neural firing constitutes a neural correlate of sustained attention (). However, recent studies have increasingly demonstrated that the behavioral performance of sampling multiple spatial locations (i.e., visual spatial attention), features, and objects exhibits temporal fluctuations with a theta rhythm (3 to 10 Hz) (–). Neurophysiological recordings have further revealed that these behavioral fluctuations originate from endogenous fluctuations of neural population activity with theta oscillations (, –) and that these theta oscillations are nested with gamma bursts to implement effective attentional sampling (). These temporal fluctuating dynamics of behavioral and neural activities have been found in both top-down, goal-driven (, ), and bottom-up stimulus-driven attention tasks (), suggesting a general neural mechanism underlying attentional sampling, independent of particular task requirements. The prevalent modeling framework for understanding the neural mechanism of attentional sampling is based on winner-take-all (WTA) neural networks (, ). In these network models of bottom-up attention, saliency maps are sequentially sampled by the focus of attention through the interplay between the mechanisms of WTA and inhibition of return. The WTA dynamics detect the location of the highest saliency at any given time, and inhibition of return suppresses the currently attended location; this interplay thus allows the focus of attention to generate dynamical sequential sampling of the saliency map (). Virtually all previous models of bottom-up attention including local circuits (–) and large-scale ones with feedforward setup (, ) are based on the paradigm of WTA. These circuit models generate sequential sampling behavior but are unable to explain the emergent large fluctuations of attention including theta rhythm activity. In addition, experimental evidence supporting the presence of inhibition of return both at the behavioral and neural levels remains controversial (, ). On the other hand, neural circuit models for explaining the neural effects of top-down attention often treat them as static inhibitory () or excitatory inputs (, ) to local circuits; doing so, thus, completely ignores the dynamical fluctuations of attention. Therefore, despite widespread investigations, the fundamental questions of the neural circuit mechanism underlying attention fluctuations and their functional role remain unclear. Here, we propose a new mechanism for implementing flexible visual spatial attention sampling based on rich, complex spatiotemporal dynamics of neural population activity. This spatiotemporal mechanism enables the attention activity pattern to behave as a dynamical spotlight switching between objects located at different spatial locations. Crucially, these attention sampling processes occur in a fundamentally automatic and flexible manner (Fig. 1A) without imposing additional mechanisms such as inhibition of return as in the classical framework of WTA (, ). We illustrate our spatiotemporal mechanism by using a biophysically realistic, spatially extended neural circuit with neural adaptation that implements stimulus-driven, bottom-up attention. The localized attention activity pattern with fluctuating spatiotemporal dynamics emerges from the neural circuit model working in the transition regime between different cortical states (i.e., asynchronous and propagating wave states); the attention pattern provides a mechanistic account of a range of neural and behavioral properties of visual attention, including theta rhythmic activity accompanied by an arrhythmic 1/f component and the coupling between theta and gamma activities.
Fig. 1.

Complex spatiotemporal dynamics of the attention activity pattern underlying flexible spatial attention sampling.

(A) Schematic diagram illustrating the network structure and our proposed attention mechanism. The spatially extended, spiking neural circuit consists of excitatory (blue triangles) and inhibitory (red circles) neurons. The activity pattern (black circles) emerging from the circuit sequentially samples different locations of a natural scene in a fundamentally autonomous and flexible manner, with the focus points of attention denoted by dashed circles. The spatiotemporal dynamics of this activity pattern provide a unified account of key neurophysiological and behavioral features of spatial attention sampling, including theta (θ) oscillations that are nested with gamma (γ) bursts. (B) Schematic diagram showing the trajectory of superdiffusive motion, which consists of small-movement steps occasionally interspersed by long jumps.

Complex spatiotemporal dynamics of the attention activity pattern underlying flexible spatial attention sampling.

(A) Schematic diagram illustrating the network structure and our proposed attention mechanism. The spatially extended, spiking neural circuit consists of excitatory (blue triangles) and inhibitory (red circles) neurons. The activity pattern (black circles) emerging from the circuit sequentially samples different locations of a natural scene in a fundamentally autonomous and flexible manner, with the focus points of attention denoted by dashed circles. The spatiotemporal dynamics of this activity pattern provide a unified account of key neurophysiological and behavioral features of spatial attention sampling, including theta (θ) oscillations that are nested with gamma (γ) bursts. (B) Schematic diagram showing the trajectory of superdiffusive motion, which consists of small-movement steps occasionally interspersed by long jumps. Going beyond these key temporal properties, our model reveals a previously unrecognized spatial property of attention sampling; that is, the movement of the attention pattern exhibits large spatial fluctuations with a cluster of short step sizes occasionally interspersed with longer movements or jumps and can thus be characterized as superdiffusive Lévy motion (Fig. 1B), a type of motion that is associated with behaviors such as efficient search patterns of animals foraging for food (, ). The superdiffusive property of the attention activity pattern along with its temporal fluctuation property (i.e., theta rhythmic activity) provides crucial functional advantages: They enable the attentional activity pattern not only to efficiently sample known stimulus-relevant locations but also to occasionally explore locations of unknown relevance. This thus provides a dynamical mechanism for flexibly switching between exploitation and exploration, a hallmark feature of flexible attention sampling (). Flexible switching between exploitation and exploration is also a key property of other cognitive functions such as decision-making (, ), suggesting that the mechanism could be of general applicability to understanding exploitation and exploration in brain functions ranging from attention to decision-making. We further demonstrate that flexible switching between exploitation and exploration in our attention mechanism is particularly efficient at sampling complex visual environments such as natural scenes and can reproduce psychophysical observations of attention maps () with better performance than the classical WTA model. The computational advantage for sampling natural scenes is further elucidated by developing a mathematical model that captures the key dynamical properties (i.e., superdiffusive motion and oscillations) of the attention activity pattern. These results thus offer a novel perspective to understand the essential spatiotemporal organization properties of bottom-up attention sampling and its underlying dynamical circuit mechanism.

RESULTS

Theta oscillatory components and their dynamical properties

We consider a biophysically plausible spiking circuit model comprising inhibitory and excitatory neurons (). This circuit model incorporates well-known experimental observations, including distance-dependent synaptic connectivity () and balanced excitatory and inhibitory synaptic inputs (Materials and Methods) (). The spiking circuit exhibits a rich repertoire of dynamical activity states (), ranging from the asynchronous to the propagating wave states. Most of the model parameters such as the reversal potentials and synaptic time constants are constrained by experimental measurements; a free parameter is the synaptic inhibition-to-excitation (I-E) ratio ζ (Materials and Methods), which can be adjusted to study its influence on circuit activity. For different values of ζ, we identify the circuit activity states based on the mean population firing rate and spatiotemporal activity dynamics. When the I-E ratio ζ is decreased, the firing rate gradually increases; after ζc = 3.31, it increases more rapidly (Fig. 2A). The population firing rate can be well fitted by two power functions, one for ζ < ζc [adjusted coefficient of determination (R2) = 0.998] and another for ζ > ζc, with a much larger exponent than that for ζ < ζc (R2 = 0.988). When ζ < ζc, coherent activity in the form of a localized propagating wave emerges; this wave pattern propagates across the neural circuit with a relatively smooth and regular trajectory (state I; fig. S1A and movie S1). However, when ζ > ζc, the circuit exhibits an asynchronous state without any structured patterns in spiking activity (state III; fig. S1B and movie S2). When the circuit is near the transition state (ζ ≈ ζc) between the asynchronous (i.e., disordered) and the localized wave (i.e., ordered) states (state II), a localized activity pattern emerges and exhibits complex spatiotemporal dynamics. This localized pattern hovers around one location and then switches to another location in an intermittent manner (Fig. 2B and movie S3). In this study, we elucidate that this localized activity pattern with this intermittent switching dynamics can function as the neural basis of an attention spotlight, adaptively sampling different spatial locations where stimulus objects are input to the circuit. Because of this functional role in spatial attention sampling, we refer to this localized activity pattern as the attention activity pattern, focusing on stimulus-driven, bottom-up attention as in the classical WTA model (), unless otherwise stated.
Fig. 2.

Distinct activity states emerging from the spiking neural circuit.

(A) Mean firing rate of the excitatory population indicates a transition of activity states around the I-E ratio ζ = ζc = 3.31. The red and blue solid lines are the two power functions fitted to the data points marked by red squares and blue circles, in the form of a1x + c1 and a2x + c2, respectively. The fitted coefficients are a1 = 15.78 ± 0.80 (95% confidence bounds), b1 = −4.604 ± 0.12, c1 = −11.23 ± 0.98, a2 = 3.3 ± 0.09, b2 = −15.79 ± 1, and c2 = 1.43 ± 0.05. The error bars show 1 SD calculated over 15 trials. Two dashed lines separate states I to III. State I exhibits a localized propagating wave pattern, state III corresponds to the asynchronous state, and state II is the transition state. (B) Black dots denote spontaneous spikes in the past 10-ms period, showing a spatially localized activity pattern. The red circle represents 1 SD of the fitted two-dimensional (2D) Gaussian firing rate profile. The line with color gradient shows the trajectory of the localized pattern in the previous 200 ms, which consists of the pattern center estimated by the peak of the fitted Gaussian profile (red triangle).

Distinct activity states emerging from the spiking neural circuit.

(A) Mean firing rate of the excitatory population indicates a transition of activity states around the I-E ratio ζ = ζc = 3.31. The red and blue solid lines are the two power functions fitted to the data points marked by red squares and blue circles, in the form of a1x + c1 and a2x + c2, respectively. The fitted coefficients are a1 = 15.78 ± 0.80 (95% confidence bounds), b1 = −4.604 ± 0.12, c1 = −11.23 ± 0.98, a2 = 3.3 ± 0.09, b2 = −15.79 ± 1, and c2 = 1.43 ± 0.05. The error bars show 1 SD calculated over 15 trials. Two dashed lines separate states I to III. State I exhibits a localized propagating wave pattern, state III corresponds to the asynchronous state, and state II is the transition state. (B) Black dots denote spontaneous spikes in the past 10-ms period, showing a spatially localized activity pattern. The red circle represents 1 SD of the fitted two-dimensional (2D) Gaussian firing rate profile. The line with color gradient shows the trajectory of the localized pattern in the previous 200 ms, which consists of the pattern center estimated by the peak of the fitted Gaussian profile (red triangle). We first illustrate that the rich spatiotemporal dynamics of the attention activity pattern provide a mechanistic account of key neural and behavioral features found during a distributed spatial attention task in which two simultaneously presented objects are monitored to detect an unpredictable change in one of the objects (–). To model this scenario, we use two external inputs at two different locations of the spiking circuit model (Materials and Methods). Rather than maintaining persistent or sustained activity, the circuit appears to “monitor” these two locations sequentially with the localized activity pattern switching between them in alternation (Fig. 3A and fig. S2A); this sequential switching behavior is the hallmark of spatial attention sampling (–).
Fig. 3.

Dynamics of the attention activity pattern account for neural temporal fluctuations.

(A) Snapshots of firing rates during a 10-ms period at different time points. Red circles denote the 1 SD of Gaussian profile of two input objects. One object is in the center and another one is in one of the corners. (B) Spiking raster showing the spike times of a subpopulation of 100 excitatory neurons, where one of the objects is input to the circuit. (C) Normalized time-frequency diagram of local field potentials (LFPs) recorded in the center position. Black, white solid, and dashed lines denote the time course of raw LFP, 30- to 100-Hz band-pass (gamma band), and 3- to 10-Hz band-pass LFPs (theta band), respectively. The amplitudes of those LFPs are normalized. (D) Averaged power spectrum of the trajectories of the attention activity pattern on the X coordinate (Y coordinate is similar). An approximately straight line in this log-log plot indicates that the power spectrum follows Power ∝ 1/fβ, β = 1.4. The shaded area represents the SEM across 500 trials. (E) Same as in (D) but for the multiunit activity (MUA) activity, with the exponent β = 2.5. (F) Distribution of number of times the attention pattern visits the center area per second. (G) Characteristics of the gamma amplitude modulation by theta phase. The color represents the modulation index of phase-amplitude coupling (PAC). Freq., frequency; Amp., amplitude. a.u., arbitrary units.

Dynamics of the attention activity pattern account for neural temporal fluctuations.

(A) Snapshots of firing rates during a 10-ms period at different time points. Red circles denote the 1 SD of Gaussian profile of two input objects. One object is in the center and another one is in one of the corners. (B) Spiking raster showing the spike times of a subpopulation of 100 excitatory neurons, where one of the objects is input to the circuit. (C) Normalized time-frequency diagram of local field potentials (LFPs) recorded in the center position. Black, white solid, and dashed lines denote the time course of raw LFP, 30- to 100-Hz band-pass (gamma band), and 3- to 10-Hz band-pass LFPs (theta band), respectively. The amplitudes of those LFPs are normalized. (D) Averaged power spectrum of the trajectories of the attention activity pattern on the X coordinate (Y coordinate is similar). An approximately straight line in this log-log plot indicates that the power spectrum follows Power ∝ 1/fβ, β = 1.4. The shaded area represents the SEM across 500 trials. (E) Same as in (D) but for the multiunit activity (MUA) activity, with the exponent β = 2.5. (F) Distribution of number of times the attention pattern visits the center area per second. (G) Characteristics of the gamma amplitude modulation by theta phase. The color represents the modulation index of phase-amplitude coupling (PAC). Freq., frequency; Amp., amplitude. a.u., arbitrary units. We now demonstrate that the dynamics of the attention activity pattern underlie a variety of neurophysiological properties of attentional sampling, such as theta rhythmic oscillations and theta-gamma cross-frequency coupling of neural activity (, ). As shown in Fig. 3A, one object is in the center and another one is in one of the corners; because of the periodic boundary conditions of the two-dimensional (2D) circuit model, the latter appears to occupy one-quarter of each corner. The dynamical switching of the activity pattern from one location to another causes neurons to fluctuate between phases of vigorous (on) and faint (off) spiking, and these neurons fire in a bursting way; Fig. 3B and fig. S2 (B and C) show this burst-like spiking activity of neurons in each of the two object locations, respectively. The firing rates when one of the locations is sampled and is not sampled are 67.57 ± 0.94 and 6.62 ± 0.35 spikes/s (means ± SEM), respectively. To further quantify the firing rate difference between the on and off activities and directly compare with experimental data, we calculate the on-off firing rate modulation index μ = (ron − roff)/(ron + roff) as in (); in our model, μ = 0.83 ± 0.01, which is comparable with that (∼0.5) reported in (). Notably, on average, the pattern visits each of these locations about four times per second; that is, the switching behavior of the pattern exhibits a theta rhythmic element. At each time moment, we calculate the center position of the attention pattern by fitting a 2D Gaussian function to the localized firing rate profile of the excitatory neurons (Eq. 17 and Materials and Methods); this center is defined as the focus of attention. The center positions at sequential time points provide the movement trajectory of the attention pattern. The power spectrum of these trajectories, shown in Fig. 3D, contains a theta peak at ∼4.08 Hz, followed by a 1/f decay associated with the arrhythmic component of the activity. We further apply the irregular resampling method developed in () to separate arrhythmic 1/f activity from the oscillatory activity and find that the theta rhythm spectrum exceeds the 1/f spectrum by more than 2 SDs (P = 3.64 × 10−4, upper-tail t test). The activity pattern switching in the theta frequency band naturally gives rise to theta rhythmic oscillations of the population activity of neurons as recorded in multiunit activity (MUA; which is the sum of spikes of a group of neurons; fig. S2, D and E; see Materials and Methods for details) and local field potential (LFP; which is calculated on the basis of the sum of synaptic currents of excitatory neurons; fig. S2, G and H; see Materials and Methods for details). To demonstrate this, we calculate the power spectra of MUAs in one of the two object locations. As shown in Fig. 3E, the MUA exhibits rhythmic fluctuations in the theta band (4.15 Hz), appearing as a bump on top of 1/f activity, and it exceeds 1/f activity by more than 3 SDs (P = 1.04 × 10−31, upper-tail t test), as found in monkey V4 (). Calculations of LFP reveal a very similar behavior in that a prominent theta peak is present (fig. S3A) (). Note that the theta band power of MUAs, LFPs, and pattern trajectories under the task condition are significantly stronger than those during spontaneous activity (P = 1.08 × 10−118,9.72 × 10−136, and 3.94 × 10−4, respectively; upper-tail t test; fig. S3, B to D), indicating that they are mainly driven by external objects; this stimulus-induced theta activity has been found to reflect bottom-up–driven attentional sampling, as found in macaque V4 () and macaque inferotemporal cortex (). Rather than being regular as often assumed, a key feature of the theta activity in our model is that it exhibits large temporal variability. To demonstrate this, we calculate the sampling rate by counting the number of the attention activity pattern visiting one of the two object locations per second; each location is defined by a circle whose radius is 1 SD of the size of the input objects (the result is not sensitive to this SD value). Calculating the sampling rate by counting the number of times that the MUA firing rate reaches a threshold of population activity level (90% percentile of MUA) would yield similar results. As shown in Fig. 3F, the sampling rate varies from 2 to 8 Hz with a mean of 4.12 Hz, indicating that the pattern switches to each position with a variable rate. The variability of theta rhythmic activity is also reflected by the fact that the theta oscillation peak rides on top of an arrhythmic component (1/f) in MUA and LFP. To demonstrate that the 1/f arrhythmic component contributes to the variability of the sampling rate, we use irregular resampling (, ) to separate 1/f activity from the oscillatory component and find that the variance of sampling rate based on the MUA without 1/f is 0.38, significantly smaller than the value (0.42) calculated based on the raw MUA (P = 3.71 × 10−10, upper-tail t test; calculated over 500 trials). While such an arrhythmic 1/f component accompanying theta oscillations has been widely observed during attention tasks, it has been traditionally treated either as noise or a nuisance variable and often removed to highlight oscillatory components in existing studies. However, the 1/f arrhythmic component is a key dynamical property of our attention mechanism. As illustrated below, these variable properties of the theta activity and the 1/f component only emerge in the dynamical transition regime between different states of the circuit model with neural adaptation, and they form an experimentally testable prediction of our attention sampling mechanism.

Gamma bursts are nested in theta oscillations

We next demonstrate that each theta sampling cycle is implemented through bursts of gamma band activity (i.e., gamma bursts), with theta-gamma coupling as found in primate visual cortex during spatial attention sampling tasks (–). The time-frequency spectrum in Fig. 3C shows that, once the attention activity pattern switches to an object, it gives rise to gamma bursts in LFP near the object. Spatially, the gamma bursts are organized as a localized pattern; the gamma pattern switches between the two object locations in alternation (fig. S2F). Note that, in our model, the bursts are in high-frequency band gamma activity (>60 Hz), which has been shown to be a reliable proxy of population spiking activity (). In our model, spiking bursts and gamma bursts occur ∼4 times/s. Figure 3C and fig. S2 (G and H) show that these gamma bursts ride on the crest phase of theta oscillations, implying theta-gamma phase-amplitude coupling (PAC). To further quantify this theta-gamma PAC, we calculate the phase-amplitude modulation index that quantifies the strength of cross-frequency PAC by comparing it with shuffled data (Materials and Methods) (). Cross-frequency PAC is investigated in a 2D frequency space; for each frequency pair, the phase-amplitude index is calculated. As shown in Fig. 3G, there is a distinct peak in the theta-gamma modulation index, indicating that the amplitude of the gamma band power is modulated systematically by theta phase. The theta rhythmic modulation of gamma in our model is most pronounced for the high-frequency end of the gamma band activity (>60 Hz), as found in monkey visual cortex during attention tasks (). Note that the theta-gamma coupling is an emergent, intrinsic property of our spiking neural circuit. This is different from existing models of theta-gamma PAC; for instance, in the Lisman and Idiart model (), theta-gamma PAC is realized by externally imposing a theta component to a circuit model that only generates gamma oscillations. As demonstrated below, the specific properties of the theta-gamma PAC including its peak frequency and modulation index can be modulated by neurophysiological mechanisms such as the ratio of excitation and inhibition and neural adaptation.

Theta rhythms and behavioral performance

We next demonstrate that the temporal theta rhythmic fluctuations of the neural activity pattern underlie theta fluctuations of behavioral performance, as found in spatial attention tasks involving multiple objects (, ). In these experiments, primates had to report a change to a target object occurring at a random time (referred to as target onset) by executing a saccade to that location or pressing a relevant button (response). The behavior performance, i.e., reaction time, is the period from the target onset to the response (fig. S4A). As in experimental studies (), we first add one input object to the circuit, which is followed by another object in addition 500 ms later (fig. S4A). A target, which is the change of the contrast level of one of the two objects, is then presented after a randomized period within up to 1000 ms following the onset of the second object (fig. S4A) (). After the target onset, we assume that a response is generated when MUA at the target location reaches a threshold (fig. S4A), which is chosen to be 3 SDs of MUA after the onset of the second object (the result is not sensitive to this value). We then calculate the mean reaction time for each target onset time across 5000 trials; this results in a vector of mean reaction time with an entry for each target onset time, referred to as the time course of reaction time (fig. S4B) as in (). This parsimonious way of modeling response by introducing a threshold value to population firing rates is a common practice in studies of mechanistic circuit models [for instance, ()], with the assumption that threshold-crossing events might be read out by neurons in a downstream motor command center to generate a response (i.e., saccadic eye movement) (). By analyzing the reaction time as a function of target onset time, we find that the responses mostly happen at the crest of MUAs located at the target input (fig. S4C) and that the reaction time fluctuates in theta rhythm. We further confirm this by calculating the power spectrum of the reaction time course; there exists a theta peak on top of 1/f power spectrum of reaction time (fig. 4A). Replacing the raw MUA with the gamma band MUA (Fig. 3C) would give rise to a similar variable reaction time course. However, if the MUA time course is randomly shuffled, then both the theta peak and the 1/f activity disappear (Fig. 4A, red line). Our model is mainly about uncued (distributed) attention tasks during which multiple objects are monitored for detecting changes in one of these objects (), so cues are not included in our modeling framework.
Fig. 4.

Model reproduces temporal characteristics of reaction time.

(A) Power spectra of the reaction time. The power spectrum follows Power ∝ 1/fβ, with the exponent β = 2. If the MUA time course is randomly shuffled, then both the theta peak and the 1/f component disappear (red curve). (B) Cross-correlation between reaction time and MUA.

Model reproduces temporal characteristics of reaction time.

(A) Power spectra of the reaction time. The power spectrum follows Power ∝ 1/fβ, with the exponent β = 2. If the MUA time course is randomly shuffled, then both the theta peak and the 1/f component disappear (red curve). (B) Cross-correlation between reaction time and MUA. To further quantify the relationship between the simulated behavior and the underlying neural activity, we calculate the cross-correlations between the time course of reaction time and that of MUA (Materials and Methods). We find that there are large correlations (0.54 ± 0.02; means ± SEM) between them (Fig. 4B); similar correlations between MUA and reaction time have been found in macaque V4 during a bottom-up attentional sampling task (). In summary, we have demonstrated that the attention activity pattern with complex dynamics provides a unified account of key temporal properties of neurophysiological and behavioral attributes of spatial attentional sampling.

Spatial dynamics of visual attention sampling

Going beyond the existing studies that have mainly focused on its temporal fluctuations (, , ), we next elucidate a novel property of attentional sampling from the perspective of its spatial fluctuations. We demonstrate that this previously unappreciated property plays a fundamental role in attentional sampling, providing computational advantages such as flexible switching between exploration and exploitation. The attention activity pattern wanders around one spatial location for a while and then switches to another one (movie S4); during this process, clusters of short step sizes are occasionally interspersed with longer movements or jumps. This intermittent switching motion of the localized activity pattern in space can be fundamentally characterized as superdiffusive Lévy motion, a type of random motion that has been widely observed in natural systems including movements of human and other animals (, ) and in the movement of gamma activity patterns in monkey middle temporal (MT) area (). To quantify the superdiffusive Lévy motion of the attention activity pattern, we calculate its mean square displacement (MSD) and the distribution of the increment of the pattern’s movement. We track the localized pattern over time and calculate its center, r(t) = (x(t), y(t)). The MSD of the pattern can be calculated on the basis of r(t) as MSD (τ) = 〈r(t + τ) − r(t)〉2, where τ is the time lag. As shown in Fig. 5A, the MSD is linear on a log-log scale, indicating that it is a power function of τ, such that MSD (τ) ∝ τβ. The diffusion exponent β determines the type of the random motion: A value of β = 1 indicates Brownian motion, a value of β > 1 indicates a superdiffusive process, and a value of β < 1 indicates a subdiffusive process. We find that the attention activity pattern has β = 1.2, indicating that its movement is superdiffusive and can travel much further from its starting point than a Brownian walk in the same duration. We further examine the distribution of the increment Δx(t) = x(t + Δt) − x(t) of the pattern’s center trajectory by fixing the time interval Δt = 5 ms and by finding that it can be fitted to a symmetric Lévy α-stable distribution (Materials and Methods); using maximum likelihood, we find that α = 1.27 ± 0.0016 (95% confidence bounds; Fig. 5B). The tail of this distribution follows asymptotically a power law p(x) ∼ x−1 − α, and, as a result, α is often referred to as the tail index. Such a heavy-tailed property is also presented in the distribution of the displacements, , which can be fitted as a heavy-tailed function P(Δr) ∼ Δrβ, β = −2.26, xmin = 9.99 μm, and xmax = 208.38 μm [Kolmogorov-Smirnov (KS) test; Fig. 5B, inset, and Materials and Methods]. These heavy-tailed distributions and the corresponding superdiffusion are the characteristic features of Lévy motion ().
Fig. 5.

Spatial properties of the attention activity pattern.

(A) MSD of trajectories of the attention activity pattern demonstrates superdiffusion. The red dashed line shows MSD (τ) ∝ τ1.2. The shaded area represents the SEM across 500 trials. (B) Distribution of the increments over 100 s. The red line denotes a fitted symmetric Lévy α-stable distribution. Inset: Distribution of the displacement of the attention activity patterns shows a power-law tail. The red dashed reference line has an exponent of −2.2. Dis., Displacement. (C) The color map represents the logarithmic probability of the pattern visiting one cell of the grid. The dashed and solid circles represent the stimulus locations within the 1 or 2 SDs of the stimulus object’s Gaussian profile, respectively. Positions of inputs are shifted under periodic boundary conditions for better illustration. (D to F) Theta sampling rate is insensitive to stimulus properties. White circles in violin plots denote group medians; violins are kernel density estimates. (D) The sampling rate as a function of interstimulus distance (Intersti. dis.). (E) The sampling rate as a function of stimulus contrast. (F) The sampling rate as a function of differences between contrast levels of two stimuli.

Spatial properties of the attention activity pattern.

(A) MSD of trajectories of the attention activity pattern demonstrates superdiffusion. The red dashed line shows MSD (τ) ∝ τ1.2. The shaded area represents the SEM across 500 trials. (B) Distribution of the increments over 100 s. The red line denotes a fitted symmetric Lévy α-stable distribution. Inset: Distribution of the displacement of the attention activity patterns shows a power-law tail. The red dashed reference line has an exponent of −2.2. Dis., Displacement. (C) The color map represents the logarithmic probability of the pattern visiting one cell of the grid. The dashed and solid circles represent the stimulus locations within the 1 or 2 SDs of the stimulus object’s Gaussian profile, respectively. Positions of inputs are shifted under periodic boundary conditions for better illustration. (D to F) Theta sampling rate is insensitive to stimulus properties. White circles in violin plots denote group medians; violins are kernel density estimates. (D) The sampling rate as a function of interstimulus distance (Intersti. dis.). (E) The sampling rate as a function of stimulus contrast. (F) The sampling rate as a function of differences between contrast levels of two stimuli.

Flexible switching between exploitation and exploration

The superdiffusive Lévy motion displacement in space with the heavy-tailed distribution means that the motion of the attention activity pattern contains large fluctuations in position at multiple spatial scales. Because of these large fluctuations, the attentional activity pattern does not behave like a stable attractor as in the classical WTA model nor does it only switch between the two object locations. Rather, it samples space through a sequence of displacements, bringing the attention activity pattern toward and occasionally away from the objects. To quantitatively demonstrate this, we calculate the proportion of time spent sampling object-irrelevant locations (i.e., those locations outside the two objects) by counting the instances when the attention activity pattern jumps outside the two object locations among 500 trials (one trial is for 10 s). Each object location is defined by a circle whose radius is 1 SD (or 2 SDs) of the size of the inputs (Eq. 13 and Fig. 5C); the proportion of this irrelevant sampling is 35.06% (or 12.25%) of the total sampled times. This result suggests that the attentional activity pattern or spotlight can be directed toward the known objects for exploitative purposes or occasionally deployed on the entire visual field for exploring other less-relevant sources of information. This spatial property thus provides a mechanism for flexibly switching between exploitation and exploration of the visual space. This adaptive switching between exploration and exploitation prevents the attention pattern from being overly focused on any given location and promotes a more active sampling of the visual environment. Note that during top-down attention tasks, the attention spotlight of monkeys explored irrelevant locations for a similar proportion of time (), as found in our model; this similarity indicates that the flexibility or freedom to explore irrelevant spatial locations could be a general property of attention sampling regardless of tasks. In contrast, in the classical WTA model, there is a lack of the freedom of exploring task-irrelevant locations, because the attention dynamics in this model are based on deterministic WTA attractors. These free explorations, however, would provide an opportunity for flexibly processing unexpected or irrelevant events, a key functioning mechanism in many cognitive domains such as memory retrieval and decision-making (, ). The ability to flexibly switch between exploration and exploitation is another core prediction of our attention mechanism.

Attentional sampling is insensitive to object properties

Because of the large fluctuations and long jumps inherent in superdiffusive Lévy motion, the attention activity pattern is able to switch between different locations without being significantly affected by their properties such as their spatial separation. To demonstrate this, we systematically change the interobject distance and quantify the sampling rate of the attention pattern, visiting one of the two object locations per second. As shown in Fig. 5D, the sampling rates slightly fluctuate but are still in the theta band, indicating that distance has little effect on shifting the focus of the attention pattern. This relatively constant sampling rate for different object distances reveals that the time taken by the focus of attention to shift from one location to another is relatively invariant to their distance, as found in (). In contrast, in the classical WTA model of bottom-up attention, the shifting time of the focus of attention is strongly dependent on distances between different objects (, ). In the classical model of bottom-up attention sampling, when two objects have the same contrast levels, the WTA mechanism would not be able to discriminate a winner and the focus of attention is thus trapped in either of the two objects, a limitation highlighted in (). In our model, the fluctuations allow the attention activity pattern to sample the objects in a way that is insensitive to the differences of their contrast levels, thus overcoming this limitation of the WTA model. All the above results are based on objects with the same contrast levels (cA = cB; Eq. 13), which is the same situation as in experimental studies (–). We further systematically change the contrast levels of two objects and calculate the sampling rate. As shown in Fig. 5E, as the contrast levels increase, the medians of sampling rates first increase slightly (cA = cB < 1.75; Eq. 13) and then reach the steady state (cA = cB > 1.75); all sampling rates are in the theta band, and the rich dynamical features of attentional sampling, including theta-band oscillations, 1/f power spectrum, and theta-gamma PAC, are still present. This increase of sampling rate caused by changing the contrast level leads to an increase of the reaction time peak frequency (fig. S4D); this result indicates that small changes of the sampling rate would affect the fluctuations of reaction time. When a global, uniform object is added to the circuit (Materials and Methods), the attention activity pattern exhibits the same dynamical properties as for the two-object scenario, as shown in its sampling trajectory as well as in MUA and LFP activities (fig. S5). Our attention sampling mechanism is also insensitive to the differences of the object contrast levels. To demonstrate this, we fix the contrast level of one object cA and systematically increase contrast level of another object cB. As the contrast difference ∣cA − cB∣ increases, the medians of sampling rates do not change much and are still in the theta band (Fig. 5F). We further calculate the dwelling time (fixation time) of the attention activity pattern, which is defined as the duration that the attention activity pattern spends in one object location before switching to the other, and find that the dwelling time linearly increases as the contrast level increases (fig. S6). This result indicates that, although the overall sampling rate is still in the theta range, the location with higher contrast “attracts” more attention from the perspective of dwelling time. We next characterize the effect of the number of objects on the attention sampling dynamics. We add up to five objects (2, 3, 4, or 5) to the circuit and repeat the same analysis as performed above for the two-object case. We find that the LFP, MUA, MSD, and displacement distribution of the attention activity pattern show similar features (i.e., theta peak on top of 1/f arrhythmic activity and superdiffusive Lévy motion), as found for the two-object case (fig. S7, A to E). The variance of the sampling rate, however, increases with the object number (fig. S7F). As for the two-object case, we further quantify the exploratory behavior by calculating the proportion of sampling object-irrelevant locations; the proportion of this irrelevant sampling increases from 12.25% (35.06%) to 29.13% (72.00%) as the object number increases from two to five, with each object location defined by 1 SD (or 2 SDs) of the input size (fig. S7, G and F). Such an increasing trend of exploration forms another prediction of our attention sampling mechanism. In summary, these results indicate that our attention mechanism is robust to variation in the object properties and has great flexibility that can overcome the fundamental limitations of the classical WTA model.

Attentional sampling of natural images

We next illustrate that, when sampling complex visual scenes such as natural scenes, our attention sampling mechanism with the spatiotemporal properties illustrated above provides an account of the emergence of attention maps and scan paths, as found in psychophysical studies (, ). To model the saliency-driven bottom-up attentional sampling of natural scenes, following the procedure in (), we first obtain the saliency map of each nature scene, which is then used as input to the spiking neural circuit model (Fig. 6, A and B, and Materials and Methods). In this study, to demonstrate that the attention dynamics are robust with respect to different ways of generating saliency maps, we use four methods, including the Itti et al. method (), the adaptive whitening saliency (AWS) method (), the graph-based visual saliency (GBVS) method (), and the Judd et al. method (), to produce saliency maps of 1003 nature scenes (i.e., the MIT1003 dataset) ranging from natural outdoor scenes to human portraits (Fig. 6, A and B) ().
Fig. 6.

Spatiotemporal dynamics of the attention activity pattern when sampling nature scenes.

(A and B) Two examples of human and simulated attention maps (AMs). Simulated AMs are generated by our model and the WTA model (), with different saliency maps (SM) generated by four different methods: Itti et al. (), AWS (), GBVS (), and Judd et al. (). (C) Attention activity pattern wanders and jumps from one location to another. Black dots represent spikes during a period of 5 ms. The curve with color gradient shows the trajectory of the localized centers of the pattern over the next 400 ms. (D) Averaged power spectrum of pattern trajectories with the saliency maps of MIT1003. An approximately straight line in this log-log plot indicates that the power spectrum follows Power ∝ 1/fβ, with the exponent β = 1.9. The shaded area represents SEM calculated across 500 trials. Sim., simulated. (E) The increment of pattern trajectories over 5 ms can be fitted as a symmetric Lévy α-stable distribution (red line). Inset: Distribution of the displacement of spontaneous patterns over 5-ms intervals shows a power-law tail. Saliency maps used in (C) to (E) are generated by the Judd et al. () method.

Spatiotemporal dynamics of the attention activity pattern when sampling nature scenes.

(A and B) Two examples of human and simulated attention maps (AMs). Simulated AMs are generated by our model and the WTA model (), with different saliency maps (SM) generated by four different methods: Itti et al. (), AWS (), GBVS (), and Judd et al. (). (C) Attention activity pattern wanders and jumps from one location to another. Black dots represent spikes during a period of 5 ms. The curve with color gradient shows the trajectory of the localized centers of the pattern over the next 400 ms. (D) Averaged power spectrum of pattern trajectories with the saliency maps of MIT1003. An approximately straight line in this log-log plot indicates that the power spectrum follows Power ∝ 1/fβ, with the exponent β = 1.9. The shaded area represents SEM calculated across 500 trials. Sim., simulated. (E) The increment of pattern trajectories over 5 ms can be fitted as a symmetric Lévy α-stable distribution (red line). Inset: Distribution of the displacement of spontaneous patterns over 5-ms intervals shows a power-law tail. Saliency maps used in (C) to (E) are generated by the Judd et al. () method. Figure 6C shows that the attention activity pattern wanders or stays around one salient part of a natural scene for a while before switching to another part of the scene (movie S5). Crucially, this sampling process happens in an automatic manner, without imposing any extra mechanisms such as the inhibition of return as assumed in the WTA model. In our model and in the WTA model, only one attention activity pattern switches between different salient parts of complex natural scenes. Notably, it has been found that, when multiple objects were presented, neurons in primate visual cortex showed fluctuating activity consistent with switching between responding to each object across time (, ), suggesting that one population activity pattern might be in response to multiple stimuli. As in the two-object cases illustrated above, we then track the attention pattern and characterize its spatiotemporal dynamics; we find that the pattern trajectories exhibit similar temporal fluctuations with their power spectra exhibiting a theta peak (∼3.6 Hz) riding on top of 1/f activity. The theta peak exceeds the estimate of 1/f power spectrum by more than 3 SDs (P = 1.04 × 10−31, upper-tail t test; Fig. 6D). However, these 1/f power spectra and theta-band oscillations disappear when pattern trajectories are randomly shuffled over time. Spatially, the attention activity pattern samples the saliency maps with a similar Lévy-motion displacement; as shown in Fig. 6E, the distribution of the attention pattern displacements in space is heavy-tailed. We further validate our attention sampling mechanism by demonstrating that it can reproduce psychophysical observations of attention maps with better performance than the classical WTA model. To this end, we follow common practice in the literature to operationalize the distribution of focus of attention (i.e., attention map) as the distribution of eye fixations (). We use the MIT1003 dataset, a collection of eye-tracking data from 15 subjects freely viewing 1003 natural images (). This database of natural images permits the analysis of fixation points and gaze-scanning paths. As shown in fig. S8A, the distribution of human fixation durations is heavy-tailed. On the basis of the fixation points, we obtain the human attention map as ground truth by accumulating +1 at each human fixation location across subjects and then applying a low-pass filter with circular boundary conditions as in () (Materials and Methods). Similarly, to generate comparable attention maps from our model (Fig. 6, A and B), we first calculate the center of the attention activity pattern as described above (Materials and Methods), which is the focus of attention in our model when sampling a nature scene. The fixation duration (i.e., dwelling time) of the attention activity pattern sampling natural images has a heavy-tailed distribution (fig. S8A). The performance of our model for reproducing psychophysical observations of attention maps is quantified by comparing simulated attention maps with the benchmark (i.e., human attention maps) according to two metrics: cross-correlation between human and simulated attention maps [correlation coefficient (CC)] and shuffled area under curve (sAUC) (Materials and Methods) (). To compare the performance of our mechanism and WTA, we first add the aforementioned four kinds of saliency maps of the MIT1003 dataset to the classical WTA model () and then calculate attention maps using the same method; the simulated attention maps are also compared with the human attention maps based on CC and sAUC. As shown in Figs. 6 (A and B) and 7, for all four different types of saliency maps, our model achieves higher scores than the WTA model based on CC (Fig. 7A) and sAUC (Fig. 7B). For example, the mean CC of our model with the Judd et al. () saliency maps is 0.37, whereas that of WTA model is 0.3. These comparisons are statistically significant, with all P < 10−5 (t test, upper tail), indicating that our model outperforms the WTA model in simulating attention maps.
Fig. 7.

Comparison between human attention maps and simulated attention maps generated by two models (our model and the WTA model) with different saliency maps.

(A) The cross-correlation between the simulated and human attention map. White circles in violin plots denote group medians of 500 trials; violins are kernel density estimates. (B) Same as in (A) but for sAUC. O, our model; W, the WTA model; I, Itti et al. (); A, AWS (); G, GBVS (); J, Judd et al. ().

Comparison between human attention maps and simulated attention maps generated by two models (our model and the WTA model) with different saliency maps.

(A) The cross-correlation between the simulated and human attention map. White circles in violin plots denote group medians of 500 trials; violins are kernel density estimates. (B) Same as in (A) but for sAUC. O, our model; W, the WTA model; I, Itti et al. (); A, AWS (); G, GBVS (); J, Judd et al. (). Aside from the static attention maps, our attention sampling mechanism can account for dynamical scan paths of attentional sampling. The scan paths of humans can be obtained from the MIT1003 dataset (). The simulated scan path is the trajectory of the center of the attention activity pattern when the saliency maps of MIT1003 dataset are added to our spiking circuit model (Materials and Methods). To quantitatively compare the scan paths in the model and human scan paths, we use MultiMatch, which is a vector-based, multidimensional scan path similarity measure (). It treats individual scan paths as vectors and compares their differences in terms of shape, direction, length, position, and duration. Its value ranges from 0 to 1; the value of 0 indicates that the paths are completely different; the value of 1 shows that they are identical. In Table 1, the indices between the simulated paths of attention focus and human scan paths [e.g., mean shape score when using Judd et al. () saliency map is 0.86] are larger than those in surrogate data (e.g. mean shape score is around 0.01, drawn from a Gaussian distribution) with all P ≪ 10−5 (two-tail t test). For the human scan paths, we perform the same analysis as done in our modeling study; we find that the power spectrum of the human scan paths has a theta peak on top of arrhythmic 1/f activity (fig. S8B) and the displacement distribution is heavy-tailed with an exponent of −2.2 (fig. S8C).
Table 1.

Comparison between human scan paths and simulated scan paths with different saliency maps.

The values in parentheses are SEM × 104 calculated across 500 trials.

Itti et al.(15)AWS (46)GBVS (47)Judd et al.(45)
Shape0.85 (±0.13)0.89 (±0.06)0.86 (±0.11)0.86 (±0.08)
Direction0.61 (±0.38)0.64 (±0.23)0.61 (±0.35)0.60 (±0.25)
Length0.86 (±0.24)0.87 (±0.13)0.89 (±0.18)0.89 (±0.12)
Position0.77 (±0.22)0.81 (±0.15)0.78 (±0.21)0.79 (±0.14)
Duration0.42 (±0.54)0.45 (±0.42)0.41 (±0.54)0.41 (±0.42)

Comparison between human scan paths and simulated scan paths with different saliency maps.

The values in parentheses are SEM × 104 calculated across 500 trials. The hallmark of the scan paths generated in our model is that they exhibit large trial-by-trial variability. To quantify this variability, we calculate the SD of MultiMatch metrics among different trials with the same nature scene. We find that the SD of shape is 0.03; for direction, it is 0.09; for length, it is 0.03; for position, it is 0.1; for duration, it is 0.19. In contrast, the classical WTA model () sampling natural images, the focus of attention always follows the order from the largest to the smallest saliency values without variability, as shown in fig. S9. Another key difference is that the dwelling time (fixation duration) of the WTA model is a fixed value determined by the time scale of inhibition of return, without the fixation variability found in the human scan paths and in the sampling paths of our model (fig. S8A).

Working regime of attention sampling

We next elucidate that the transition state (state II; Fig. 1C) between a state of ordered network activity (i.e., propagating waves) and one of more disordered activity (i.e., asynchronous activity) equipped with neural adaptation is the circuit mechanism underlying the emergence of the attention activity pattern with the spatial and temporal properties illustrated above. To demonstrate this, we first shift the circuit away from state II by increasing the I-E ratio, making it now further into state III; we find that the circuit model generates a localized activity pattern that only switches between two object locations (movie S6). The switching behavior is quite regular with an oscillation frequency of ∼17 Hz, greater than 4 Hz (Fig. 8, B and C); because of such a regular switching behavior of the activity pattern, all salient features of the temporal and spatial fluctuations of attentional sampling, including the variable sampling rate, the aperiodic 1/f component in MUA and LFP, the gamma bursts (fig. S10), the switching behavior between exploitation and exploration, and the Lévy motion displacement in space, are destroyed. Because of the absence of theta oscillations, there is no theta-gamma PAC in state III. On the other hand, by shifting the circuit into state I by decreasing the I-E ratio, the propagating activity pattern keeps moving and does not concentrate on the locations with external input objects (movie S7); thus, there is a fundamental lack of sequential sampling dynamics. Consequently, the theta sampling dynamics and the superdiffusive Lévy motion do not emerge in state I (Fig. 8A and movie S7). In state II, the proportion of the attention activity pattern sampling task-irrelevant objects can be modulated by varying the I-E ratio (fig. S11). These results indicate that the transition regime (state II) between the asynchronous and localized propagating wave states is essential for the emergence of rich attentional sampling dynamics; this regime is thus referred to as the working regime of attention sampling. In this regime, the localized wave pattern (i.e., ordered activity state) intermittently emerges from the asynchronous state (i.e., disordered activity state), inducing transitions between the two states and large fluctuations of firing rates (Fig. 3B). Note that, in this working regime, the I-E ratio ζc = 3.31 is similar to the ratio found in the visual cortex of awake monkeys (). The emergence of superdiffusive Lévy motion in this regime is analogous to its emergence in complex physical systems close to transitions between ordered and disordered states ().
Fig. 8.

Circuit mechanism of attentional sampling.

(A) Power spectrum of LFP when ζ/ζc = 0.7. (B) Same as in (A) but for ζ/ζc = 1.3. (C) Sampling rate of the attention pattern as a function of I-E ratio. White circles in violin plots denote group medians; violins are kernel density estimates. (D) Sampling rate of the attention activity pattern as a function of Δgk (Eq. 3). (E) Representative phase-amplitude comodulograms computed for one LFP record over 100 s when gk = 6 nS. The color represents the modulation index of PAC.

Circuit mechanism of attentional sampling.

(A) Power spectrum of LFP when ζ/ζc = 0.7. (B) Same as in (A) but for ζ/ζc = 1.3. (C) Sampling rate of the attention pattern as a function of I-E ratio. White circles in violin plots denote group medians; violins are kernel density estimates. (D) Sampling rate of the attention activity pattern as a function of Δgk (Eq. 3). (E) Representative phase-amplitude comodulograms computed for one LFP record over 100 s when gk = 6 nS. The color represents the modulation index of PAC. In previous modeling studies, oscillations are usually achieved by incorporating neural adaptation mechanisms such as spike frequency adaptation (). Our model also incorporates neural adaptation in the form of slow potassium current (Eq. 2), which is the origin of the slow theta oscillatory component in our model. To illustrate this, we completely remove the adaptation from the circuit. In this case, we find that there are no oscillatory components. To further illustrate the role of the neural adaptation, we gradually decrease the strength of adaptation (Δgk in Eq. 3) from 1 to 10 nS and find that the median sampling frequency gradually increases to 7 Hz (Fig. 8D). As the sampling rate of the attention activity pattern changes, the peak phase frequency of theta-gamma PAC also is modulated accordingly (Fig. 8E); this result thus suggests a neural mechanism (i.e., spike frequency adaptation) for modulating theta-gamma PAC.

Computational advantages of our attentional sampling mechanism

The working regime (state II) gives rise to the best match with the human attention map and has the highest efficiency of attentional sampling. To demonstrate this, we vary the I-E ratio to shift away from the working regime and characterize the CC between the human and simulated attention maps; as shown in fig. S8D, CC is maximum in the working regime (P ≪ 10−5, upper-tail t test). We characterize the sampling efficiency as the number of locally most salient locations in natural scenes that are sampled by the attention activity pattern in 2 s (Materials and Method). As shown in Fig. 9A, the number of sampled distinct locations is also maximum in the working regime (P ≪ 10−5, upper-tail t test). On the basis of the human scan path data (the MIT1003 dataset), we obtain the distribution of fixation time (tf; fig. S8A, inset), from which we can then calculate the sampling rate () distribution (fig. S8A). As shown in this figure, the maximal number (upper bound) of objects that can be sampled per second is around 20; note that an overswitching between objects without enough fixation time is related to attention-deficit/hyperactivity disorder (). Within the upper bound (<20), a larger number of objects sampled by the attention pattern can therefore be used to indicate higher sampling efficiency in our circuit model. Note that the efficient sampling mechanism of the attention activity pattern entailing Lévy motion is analogous to the efficient Lévy motion foraging strategies for animals to optimally search for spatially distributed food () and for lymphocyte T cells to efficiently find targets ().
Fig. 9.

Sampling efficiency of the attention activity pattern.

(A) Number of sampled positions in 2 s by the attention activity pattern is maximized in the working regime where the I-E ratio ζ = ζc. (B) Color-coded saliency map with patchy, salient parts whose local maxima are denoted by the red circles. Saliency map is generated by the Judd et al. method (). (C) Lévy motion with oscillations samples the patchy parts shown in (B). (D) Same as in (C) but for the Brownian motion. (E) Numbers of sampled positions by the Lévy motion with oscillations (L.O.), Brownian motion with oscillations (B.O.), Lévy motion without oscillations (L.), and Brownian motion without oscillations (B.).

Sampling efficiency of the attention activity pattern.

(A) Number of sampled positions in 2 s by the attention activity pattern is maximized in the working regime where the I-E ratio ζ = ζc. (B) Color-coded saliency map with patchy, salient parts whose local maxima are denoted by the red circles. Saliency map is generated by the Judd et al. method (). (C) Lévy motion with oscillations samples the patchy parts shown in (B). (D) Same as in (C) but for the Brownian motion. (E) Numbers of sampled positions by the Lévy motion with oscillations (L.O.), Brownian motion with oscillations (B.O.), Lévy motion without oscillations (L.), and Brownian motion without oscillations (B.). To further reveal the computational properties and advantages of our attention mechanism, we use a simple yet effective mathematical model that captures the attention pattern dynamics. As demonstrated above, the attention activity pattern behaves similar to a random walker with rich spatiotemporal dynamics: It exhibits occasional long jumps in space, a characteristic feature of superdiffusive Lévy motion, and an oscillatory component. To model a random walk with these features, we use a stochastic differential equation driven by Lévy motion with an auxiliary momentum termwhere x(t) is a 2D vector representing the center of the attention activity pattern at time t, v(t) is an auxiliary variable representing the momentum term, β is the damping coefficient, b(x) is a drift term related to the energy (or probability) landscapes of natural scenes (Fig. 9B and Materials and Methods), γ is the strength of the noise, and L(t)α is the Lévy motion (noise) whose step sizes over a time period Δt follow a symmetric Lévy α-stable distribution that has a power-law tail with a tail index 1 < α ≤ 2 (Materials and Methods). The momentum term v(t) is responsible for generating temporal oscillations in the trajectory of the random walker, with the frequency of the oscillations controlled by the damping coefficient β. Thus, the mathematical model (Eq. 1) is able to capture the essential dynamical features of the attention activity pattern emerging from our neural circuit model, i.e., the temporal oscillations and the Lévy motion. In the mathematical model, we set the tail index to α = 1.2, similar to the tail index characterizing the Lévy motion of the attention activity pattern emerging in the neural circuit model, and set β = 0.03 to capture its oscillatory aspect; other values close to these would generate qualitatively similar results. We demonstrate the attention sampling efficiency through comparing our approach with other scenarios, which include the sampling with Lévy motion but without the oscillatory component (α = 1.2 and β = 0) and the Brownian motion–based sampling either without oscillations (α = 2.0 and β = 0) or with oscillations (α = 2.0 and β = 0.03); the former case (α = 2.0 and β = 0) corresponds to the classical Markov chain Monte Carlo (MCMC) sampling (), and the latter (α = 2.0 and β = 0.03) corresponds to Hamiltonian Monte Carlo sampling (). We apply these different attention sampling approaches to natural scenes (Materials and Methods) and find that those involving Lévy motion flexibly switch between different salient patchy areas as found in our circuit model (Fig. 9C); in contrast, the ones with Brownian motion tend to be trapped in some areas (Fig. 9D), thus lacking the flexibility of the attention activity pattern. We then calculate the number of distinct salient areas sampled by the random walks in a fixed time interval; we do this for 1000 natural scenes randomly selected from the MIT1003 dataset. As shown in Fig. 9E, the Lévy motion with oscillations samples more positions than Lévy motion without oscillations (P = 0.04, upper-tail t test). The oscillatory component generated by the momentum term in Eq. 1 reduces the correlation between successive samples as in Hamiltonian Monte Carlo sampling (), thus enabling the sampler to efficiently sample the space. The sampling performance of our mechanism (Lévy motion with oscillations) is much better than Brownian motion either with (P ≪ 1010, upper-tail t test) or without oscillations (P ≪ 1010, upper-tail t test). The mathematical model thus further elucidates the flexible and efficient properties of the attention pattern at the computational (or algorithmic) level; this algorithmic explanation, along with the circuit implementation demonstrated above, provides a comprehensive illustration of why and how our attention mechanism works.

A 2D firing rate model with adaptation

To demonstrate that the circuit mechanism of the transition state mediated by neural adaptation is not restricted to our particular choice of spiking neural circuit models, we consider a firing rate–based, 2D network model; the details of the rate model are described in Materials and Methods. Briefly, synaptic coupling strengths are distance dependent, and it incorporates the neural adaptation mechanism in the form of short-term synaptic depression; two objects are similarly added to the rate model as in our spiking circuit model. As in the spiking circuit model, a critical parameter governing the transition between different network states is the relative strength of inhibition. By changing the inhibition strength J1, we perform numerical bifurcation analysis (Materials and Methods). The results, summarized in the bifurcation diagram of Fig. 10A, are similar to those obtained with the spiking model. At low values of J1 (J1 < 22.2), the network exhibits two steady, localized activity patterns at the two-object locations. At intermediate values of J1 (22.2 ≤ J1 < 35.2), regular switching activity occurs (Fig. 10A); there is one localized activity pattern that switches between two-object locations in alternation. At high values (J1 ≥ 35.2), the network shows chaotic behavior: A localized activity pattern irregularly moves across the network.
Fig. 10.

Dynamics of the firing rate model with neural adaptation.

(A) Bifurcation diagram showing the dependency of local neuronal activity on the inhibitory coupling strength J1. The local neuronal activity is the sum of firing rates for the group of neurons within a radius of 44.4 μm from the center. (B) Snapshots of 2D firing rate maps at different time moments when J1 = 35.2; the left is 100 ms after the right. Green circles denote the 1 SD of the Gaussian profile of two input objects. One object is in the center and another one is in one of the corners.

Dynamics of the firing rate model with neural adaptation.

(A) Bifurcation diagram showing the dependency of local neuronal activity on the inhibitory coupling strength J1. The local neuronal activity is the sum of firing rates for the group of neurons within a radius of 44.4 μm from the center. (B) Snapshots of 2D firing rate maps at different time moments when J1 = 35.2; the left is 100 ms after the right. Green circles denote the 1 SD of the Gaussian profile of two input objects. One object is in the center and another one is in one of the corners. Around the transition point between the regular (ordered) and chaotic (disordered) states (J1 = 35.2), as in the spiking neural circuit model, the activity pattern mainly switches between the locations of the two objects and occasionally moves to other locations (Fig. 10B). However, unlike the spiking neural circuit model, the firing rate model does not show other fluctuating features such as gamma bursts and theta-gamma coupling. The existence of the switching behavior depends on the adaptive mechanism of synaptic depression; if the synaptic depression is removed, then the switching behavior disappears. Therefore, the complex dynamics of the attention activity pattern arise from the same mechanism as in the spiking neural circuit model, i.e., the transition regime equipped with neural adaptation.

DISCUSSION

In this study, we have identified a circuit-based mechanism to understand how visual attention exploits intrinsic spatiotemporal fluctuations to efficiently and flexibly sample external environments. Our mechanism can account for a variety of key temporal properties of bottom-up, stimulus-driven attention that otherwise remain unexplained in existing studies; these include theta band oscillations in neural population activity as well as in behavioral performance (, ) and theta-gamma coupling (, ). In addition, we have revealed a novel spatial property (i.e., superdiffusive Lévy motion displacement in space) of visual spatial attention and elucidated that the attention mechanism with these spatial and temporal properties provides key functional advantages such as flexible switching between exploration and exploitation. Our model makes some direct and testable predictions about the spatiotemporal nature of attentional sampling and has implications for designing efficient probabilistic sampling algorithms in machine learning.

Spatiotemporal properties of attention sampling

In our circuit model of spatial bottom-up attention, a localized activity pattern is the neural basis of the dynamical attention spotlight. This attention activity pattern intermittently samples salient features distributed at different spatial locations and exhibits rich temporal and spatial dynamics that can account for a wide range of neurophysiological and behavioral properties of spatial attention. As we have demonstrated, this pattern exhibits movement with certain periodic components in the theta frequency range, thus providing a mechanistic account of theta oscillatory dynamics in both MUA and LFP activities, as widely observed during both bottom-up (, ) and top-down attention tasks (, ). Note that, in our model, the theta neural activity does not behave similar to a regular clock cycle of some kind as often assumed in the theta theory of attention () but rather exhibits fluctuations in the sampling rate. These fluctuations give rise to the property that theta rhythmic activity rides on top of arrhythmic 1/f-like activity. Theta oscillations accompanied by 1/f activity have been commonly found in experimental studies of attention (). However, in these studies, this 1/f component has been ignored or treated either as noise or as a nuisance variable. In contrast, in our mechanism, this arrhythmic component along with the theta component is a key property of attentional sampling, indicating the rich dynamics of attention. Through highlighting the role of the arrhythmic 1/f component in attentional sampling, our work thus contributes to a growing line of research showing its importance in brain functions (). To further directly test the variability property of theta oscillations as predicted in our study, it would be relevant to analyze their dynamical properties including the variable sampling rate in neural recordings and in behavior by measuring theta peaks on top of 1/f activity in MUA as well as in attention sampling paths. In our circuit model of attention, when the neural activity pattern of attention switches to a certain location, MUA and LFP activities around the location exhibits transient, high-frequency gamma bursts (>60 Hz). These gamma bursts are locked to the top of the theta oscillation, and the amplitude of gamma activity is modulated by theta phase as quantified by the phase-amplitude modulation index. This result thus indicates that each theta sampling cycle is implemented through gamma bursts, as observed in (). More generally, this is consistent with empirical evidence that high-frequency gamma activity indexes cognitive processing including working memory and decision-making with a high spatiotemporal resolution and is modulated by low-frequency activity (). However, existing models of attention cannot capture these temporal locking dynamics. Whereas some circuit models of other cognitive processing such as working memory can capture theta-gamma locking, they often assume that one of the oscillatory elements is imposed externally. For instance, in the classical Lisman and Idiart model of theta-gamma coupling for working memory, gamma activity is generated locally by interactions within a class of interneurons mediating fast γ-aminobutyric acid inhibitory postsynaptic currents, but the theta component is assumed to arise from slow after depolarizations (). In contrast, in our model, the theta-gamma coupling is an intrinsic, emergent property of our circuit model without imposing any external modulating inputs. Aside from explaining its emergence and functional relevance in attention sampling, our study has also identified the neurophysiological mechanisms of how theta-gamma coupling can be modulated; in particular, we have found that changing the strength of neural adaptation can modulate the peak frequency of this coupling. Going beyond the temporal fluctuations (i.e., theta oscillations) that have been the main focus of recent studies of attention, our results offer a novel perspective on the spatial organization of attentional sampling. Spatially, the movement of the attention activity pattern exhibits local movements occasionally interspersed by long jumps, with its movement step sizes following a heavy-tailed, power-law distribution. The movement of this pattern can thus be fundamentally characterized as superdiffusive Lévy motion. Occasional long jumps inherent to the Lévy motion would allow the attention activity pattern to quickly switch to another object that is even further away from the current attention location; this dynamical property would naturally give rise to a seemingly distance-independent property, i.e., distance has no notable effect on attention shifts. This distance independence has been observed in (). In contrast, for the WTA mechanism of bottom-up attention, the time taken by WTA networks to converge to the newly selected location depends primarily on the distance between the two locations (), as also noted in (). As we have illustrated, the Lévy motion is particularly effective for sampling natural scenes replete with many simultaneously presented salient stimuli. It has been found that Lévy motion is essential for animals to optimally search for spatially distributed food () and for lymphocyte T cells to efficiently find targets (). Our results of exploiting spatial Lévy motion for attentional sampling, a key cognitive function, suggest that both internal and external sampling or searching processes may share the same principle. It has been found that attention and saccadic eye movement are closely related, as visual spatial attention determines the end point of saccades (). It has been shown that saccade shifts when viewing natural scenes following Lévy motion (). These previous observations implicate that the Lévy motion property may underlie attention sampling, as predicted in our study. To directly test Lévy motion and the theta sampling rate, it would be relevant to use decoding methods with high spatiotemporal resolution as the one used in () to decode the spatial focus positions of the attention activity pattern and then analyze their properties including heavy-tailed step size distributions and the power spectrum of attention sampling trajectories in the same way as done in our modeling study.

Computational advantages of our attention sampling mechanism

What would be the functional advantages of our attentional sampling mechanism that exploits the rich and complex spatial and temporal fluctuations? Aside from being efficient for sampling complex natural scenes, our mechanism enables the focus of attention to adaptively switch between exploration and exploitation. For the two-object case, we have found that, in addition to dynamical transitions between the two relevant known locations (i.e., exploitation), the intrinsic large fluctuations of neural circuit dynamics would autonomously drive the attention activity patterns occasionally away from these locations and thus spend 20% of the time freely exploring other irrelevant places (i.e., exploration) on the entire field. For the nature scene case, we have found that, while our attention sampling mechanism gives rise to a similar trial-averaged attention map as those found in behavioral studies and in the WTA models, the free exploration capacity of the attention pattern would yield a certain degree of randomness or freedom and thus trial-by-trial variability in attention scan paths, as characterized by the SD of scan path metrics. Our study thus predicts the presence of variability in attention-related behaviors. In contrast, the WTA models are unable to capture the behavioral variability of attention scan paths, because the interplay between the WTA mechanism and inhibition of return would generate deterministic scan paths in the order of saliency of natural scenes, thus lacking the freedom of exploration. A certain degree of free exploration, however, would be important for facilitating the processing of unexpected irrelevant events and searching for more targets in natural environments (). By quantitatively capturing the scan paths and their variability of attentional sampling of natural scenes, our attention model provides a framework for understanding neural mechanisms such as neural adaptation and I-E balance by which exploration-exploitation trade-offs achieve a right balance to efficiently sample the natural world. Similar to attentional sampling, a host of cognitive processes including decision-making and memory retrieval involves trade-offs between exploiting known opportunities and exploring for better opportunities elsewhere (). The mechanism exploiting spatiotemporal fluctuations for attention sampling might thus be of general applicability to understanding these cognitive processes across different domains. As illustrated with the mathematical model, at the algorithmic level, the attention sampling mechanism established in this study provides crucial computational advantages from the theoretical perspective of sampling-based probabilistic inference. Similar to random diffusion (i.e., Brownian motion)–based sampling in the classical MCMC methods (), the random movements of Lévy motion might be able to sample probability distributions. Because of their uniformly distributed small step sizes, when used to sample multimodal probability distributions, MCMC samplers are prone to be trapped in one of the modes of these distributions (). However, this computationally difficulty problem could be naturally resolved due to the presence of the inherent big jumps in Lévy motion, which allows the sampler to adaptively switch to and thus sample from different modes even when they are far away from each other. Furthermore, the theta temporal oscillatory component of attentional sampling may enable the sampler to implement a type of Hamiltonian Monte Carlo sampling, in which oscillatory elements are introduced to achieve the computational advantage of speeding up the sampling process (). The combination of the computational advantages from both the spatial and temporal aspects of our attention sampling mechanism thus has the potential to develop powerful, general probabilistic sampling algorithms for machine learning. Note that incorporating human-like cognitive functions such as attention into large-scale deep neural networks (DNNs) is crucial for developing next-generation powerful DNNs (, ). Motivated by our results, we suggest that future DNNs may benefit from our attention mechanism, whether they are designed or trained to incorporate Lévy motion and oscillatory fluctuations in their attention networks.

Neural circuit mechanism of attention sampling

The spatial and temporal organization properties of attentional sampling emerge in the transition regime between the propagating wave and the asynchronous states in the spiking neural circuit model with neural adaptation. In this transition regime, excitation has large fluctuations and is balanced by inhibition, as found in experimental studies (). Shifting away from this regime via either increasing or decreasing inhibition would result in the reduction in the complexity of spatiotemporal dynamics of attention; all key features including the Lévy motion would disappear. The emergence of Lévy motion in the transition regime of our model is analogous to its emergence in complex physical systems close to transitions between ordered and disordered states (). As compared with complex physical systems, a unique feature of neural systems is that there exist diverse synaptic rules operating across multiple time scales such as synaptic scaling and spike timing–dependent plasticity (). It has been shown that these synaptic rules can interact with ongoing dynamics of neural circuits, enabling the emergence of cortical-like dynamics and balanced excitation and inhibition (); this mechanism might self-organize attention networks toward the state transition regime between different activity states as revealed in this study. As we have illustrated, neural adaptation in firing rate (i.e., spike frequency adaptation) is essential for the genesis of the temporal theta oscillatory component of the movement of the attention activity pattern. Such a neural adaption mechanism underlying slow temporal oscillations such as theta has been studied in neural networks composed of two competing neural populations mediated by cross-inhibition (). These networks of competing neural populations can exhibit oscillatory behaviors where, upon simultaneous presentations of two stimuli, the two populations are active in alteration with a frequency that decreases with stronger inhibition. As in (), the frequency of the theta oscillations is dependent on the relative strength of excitatory and inhibitory inputs. In our model, the frequency of the slow oscillations increases as the strength of inhibition increases; in (), however, stronger inhibition causes oscillations at a lower frequency. As discussed above, theta oscillations in our model exhibit large temporal variability and are accompanied by the arrhythmic 1/f component, instead of being perfectly periodic theta oscillations as in existing networks of two competing neural groups with neural adaptation. This variability is the intrinsic, emergent property of the neural circuit model. In contrast, an external course of noise of suitable amount had to be added to generate certain randomness in the alternative switching between representations of two stimuli in competing neural pool models with adaptation (, ). Another key feature of our circuit models is that they are spatially extended, 2D circuit models, with the neural coupling probability being a function of distance as widely observed in the brain (, ). Such a neurophysiologically realistic, spatial extension feature, in turn, gives rise to much richer spatiotemporal dynamics than what can be expected from simpler models such as those with mutually inhibitory neural pools (). This scenario is similar to other complex systems where this 2D spatial extension is essential for the emergence of complex spatiotemporal dynamics (). In this study, the spatiotemporal mechanism of spatial attention sampling has been mainly illustrated in the paradigm of bottom-up attention, as modeled in the classical WTA models. Similar neural features such as theta oscillations, theta-gamma coupling, and burst-like neural spikes have been widely observed during top-down, goal-driven, or cued attention tasks (, , ), and the state of top-down attention exhibits similar large trial-to-trial fluctuations (). We thus expect that the same spatiotemporal mechanism underlies top-down, goal-driven attention. To directly elucidate this, future work is needed to extend our neural circuits to model specific goal-related tasks. It would also be interesting to extend our circuit model to include different cortical areas to investigate the large-scale neural circuit mechanisms underlying the interactions of bottom-up and top-down attention () and attention-guided perception and behavior (). Our neural circuit model and its extension thus provide a framework for understanding neural mechanisms of attention and its role in brain functions.

MATERIALS AND METHODS

The neural network model

Dynamics

Our circuit model was described previously in (). Briefly, individual neurons are embedded on a 2D plane with periodic boundary conditions. NE excitatory neurons are located at integer coordinates and NI inhibitory neurons are uniformly randomly distributed on the plane. We consider the distance between two neighboring excitatory neurons (one grid unit) in our model to be 7.4 μm based on the measurements from rat visual cortex (). The membrane potential of neuron i in population α ∈ {E, I} evolves according towhere the membrane capacitance C = 0.25 nF, the leaky conductance gL = 16.7 nS, the reversal potential for leaky conductance VL = −70 mV (), is the potassium current that implements spike frequency adaptation (), is the received synaptic current, and is the external current. When the membrane potential reaches the threshold Vth = −50 mV, a spike is generated and the membrane potential is reset to the resetting potential of −60 mV for a refractory period τref = 4 ms (). The potassium current on inhibitory population (, ) and the potassium current on excitatory population is given by , where VK = −85 mV is the reversal potential for potassium current (). The active potassium conductance g(t) is described by i.e., when the mth spike is generated by neuron i at the time , g is increased by ΔgK = 3 nS, and then it decays exponentially with the time constant τK = 80 ms (). The recurrent synaptic current in Eq. 2 iswhere is the conductance of the recurrent current from the presynaptic population β ∈ {E, I}. The excitatory and inhibitory reversal potentials are and , respectively (). The conductance is given bywhere aαβ and Jαβ represent the coupling topology and the connection strength (), respectively; these are described below. The nondimensional gating variable in Eq. 5 describes the synaptic dynamicswhere is the conductance decay time constant ( ms), is the time of the mth spike emitted by neuron j from population β, and is the conduction delay drawn from a uniform random distribution between 0 and 4 ms. The function hβ(·) is defined aswhere . In our model, the external currents are excitatorywhere Jext = 2 nS. Gating variable is driven bywhere is a Poisson spike train with a rate and the saturation term in Eq. 6 is ignored. and are specified below. We implement a key neurophysiological feature of mammal primary cortex in our model; that is, there is an identical ratio between the excitatory postsynaptic currents and inhibitory postsynaptic currents with respect to the whole excitatory population (, ). To model this in our circuit, we consider the I-E ratiowhere denotes the number of connections (in-degree) received by neuron i from the inhibitory population, and denotes the connection strength from inhibitory neuron j to excitatory neuron i. To equalize the I-E ratio ζ across the neurons to a desired network-wide ratio, that is, 〈ζ〉 = ζ, the values for neuron i are sampled from a Gaussian distribution with a mean equal to and an SD that is 25% of the mean. The I-E ratio in the working regime is denoted by ζc = 3.31 and is varied to explore the spatiotemporal dynamics of neural activities in our model. The rest of parameters are the same as in ().

Connectivity

Our circuit model incorporates the key connectivity properties found at the synaptic level, including the distance-dependent connectivity (), the common neighbor property (), and the heterogeneous degrees (); these properties are essential for the emergence of realistic neural dynamics at the individual neuron and the circuit levels, including variability of spiking dynamics and dynamical propagating patterns, as detailed in (). Specifically, the connection probability between each pair of neurons (i from population α and j from population β), where α and β are either excitatory (E) or inhibitory (I), is proportional to their Euclidean distance : . The decay constants for the excitatory connections are and , which are consistent with experimentally measured ranges (). For the inhibitory connection ranges, we use . In addition, the connections have the common neighbor property; that is, the connection probability between a pair of excitatory neurons is proportional to the number of presynaptic neighbors they share (), where the neuron pair with the most common neighbors is twice that of the pair with the least. Experimentally, the distribution of excitatory synaptic connection strengths has been found to be heavily tailed in cortical neurons () and can be fitted by a log-normal function as measured from rat visual cortex layer 5 (). We use a similar log-normal distribution of J, which has a mean of 4.0 nS and an SD of 1.9 nS. Simulations of the model are performed using the Euler method with a time step of 0.1 ms in the custom C++ simulation software and MATLAB 2018a. NE = 63 × 63 = 3969 excitatory neurons and NI = 1000 inhibitory neurons are modeled. The attention sampling dynamics are not sensitive to the network size; for example, we have found that, when NE = 51 × 51, NI = 650 or NE = 103 × 103, NI = 2652, the sampling rates are still in the theta range. The initial membrane potentials are uniformly distributed between Vrt = −60 mV and Vth = −50 mV. A typical trial is one realization of the circuit simulated for at least 10 biological seconds unless otherwise stated, with the first 2 s excluded.

External inputs

The Poisson spike rate in Eq. 9 at time t at position r is given bywhere the base Poisson rate on excitatory neurons and the base Poisson rate on inhibitory neurons . fext(r, t) represents the spatial firing rate profile of two 2D localized objects A and B, to simulate the bottom-up, stimulus-driven attention-monitoring two objectswhere H(·) denotes the Heaviside function, the stimulus onset tonset = 4 s; ‖x − y‖ represents the distance between x and y considering the periodic boundary conditions, stimulus widths σA = σB = 44, and dimensionless stimulus contrast cA = cB = 0.8. sA and sB represent the two object centers; one is at the center of the network space and the other is at the corner of the network space, unless otherwise stated. The results are not sensitive to these values. To demonstrate the computational power of our attention sampling, we add saliency maps to our circuit model in the same way as in (). The saliency map is a scalar, 2D map whose activity topographically represents visual saliency derived from contrasts of low-level visual features (). Saliency maps generated by four different methods are used in this study to demonstrate the generalization of our attention sampling mechanism, including Itti et al. (), AWS (), GBVS (), and Judd et al. () methods (Fig. 6, A and B). Saliency maps are then resized to the same size of the excitatory population (63 by 63) and are multiplied by a Hanning window on the four boundaries for weakening the effect of the periodic boundary condition. Next, the external inputs of excitatory neuron i at r, i.e., in Eq. 8 are given bywhere μ is the scale parameter, SM is the value in the saliency map at the location r, and the bar represents the average value across all locations. μ = 2,0.5,0.6,and 0.26 nA for saliency maps generated by the Itti et al. (), AWS (), GBVS (), and Judd et al. () methods, respectively; the attention pattern dynamics are not sensitive to the value of μ; any value from 0.1 to 5 nA would yield similar results.

Detection of the localized activity pattern

To track the localized activity pattern, we use the following detection method (). As the firing rate profile of the excitatory population in our model can be fitted well in a 2D circular Gaussian function, we thus assume the firing rate of excitatory neuron i aswhere r is the coordinate vector of neuron i and is the parameter vector that contains the center position, the variance, and the height of the Gaussian function, respectively. With the assumption that firing activity of neuron i is a Poisson process with rate , the log-likelihood ln L(θ) of the parameters is given bywhere is the spike count of excitatory neuron i over a small window Δt = 5 ms centered around time t. We then perform the maximal likelihood fit to the spiking data on the plane with periodic boundaries. The goodness of fit is measured by comparing the Gaussian model with a null hypothesis that the firing rate profile is uniform. If the log-likelihood ratio >2, then the Gaussian profile is more supported by the data than the null model. We regard any fitting with log-likelihood ratio <2 as an irregular spiking state. We can then track the localized activity pattern based on its center positions.

Power-law fitting and Lévy α-stable distribution

Using maximum likelihood methods (), we fit power laws to the pattern displacement X. The fitting function is , where f is the probability density function of X. The lower-bound xmin, the higher-bound xmax, and the exponent β are the fitting parameters. For each possible combination of xmin and xmax, we estimate the exponent via the maximum likelihood method and the KS test. We then select the values that give the minimum KS statistic. After finding the best-fit power law, we further assess the goodness of fit via log-likelihood ratio with other distributions and the Vuong test. The symmetric Lévy α-stable distribution, denoted as 𝒮α𝒮(α, γ), with a tail index α and a scale parameter γ. The probability density of 𝒮α𝒮 can be expressed as () The probability density exhibits a heavy tail, with a power-law asymptote of the formfor all 0 < α < 2.

MUA, LFP proxy, and wavelet analysis

MUA is defined as the summed activity of a local group of neurons whose distance to an electrode is within a five-grid-point radius in 1 ms. The result is not sensitive to the radius from two to eight. To obtain an LFP signal in our model, we use the proxy for LFP in the simulated electrode at r ()where is the absolute value of the total current received by neuron i in population α from population β, and the spatial scale of Gaussian window σ = 8 grid units. Note that our results are not sensitive to this spatial scale. To detect the gamma burst, we filter the LFPs from 1 to 200 Hz with a fourth-order Butterworth filter. We then perform wavelet transforms on filtered LFPs by complex Morlet wavelets with bandwidth parameters that are 1.5 times the center frequencies ().

Modulation index of PAC

To quantify PAC, we introduce a modulation index (MI) (). We first filter raw LFP at the two frequency bands (phase frequency fp and amplitude frequency fA; we denote the filtered LFPs as LFP and LFP. We next perform Hilbert transform on the two filtered LFPs to obtain time series of the phases of LFP (denoted as Φ) and the time series of the amplitude envelop of LFP (denoted as A). We then calculate the probability of mean amplitude at the phase bin jwhere ⟨A⟩Φ(j) is the mean A at the phase bin j; the number of phase bins is N = 20. The MI is proportional to Kullback-Leibler distance (DKL) between distribution P and uniform distribution U Therefore, when MI = 0, the mean amplitude is uniformly distributed over the phases; when MI = 1, P is a Dirac-like distribution.

Cross-correlation between reaction time and MUA

The time course of the reaction time is a vector of mean reaction time with an entry for each target onset time, referred to as time course of reaction time as in (), denoted as RT(t). The cross-correlation between RT(t) and MUA(t) is calculated as

Attention maps and their evaluation metrics

The human attention map G is built by inserting 1’s at human fixation locations and convolving the result with a Gaussian kernel (SD = 34) for smoothing (). The simulated attention map A is generated by a similar method as used for generating G but based on the simulated fixation locations in 3 s (the results are not sensitive to the time interval as long as it is larger than 1.5 s). The same Gaussian kernel is convolved with the map of fixation locations in our model. The calculation of the fixation locations of the attention activity pattern is based on its movement. Because of the Lévy motion nature of the pattern, its trajectory consists of clusters of small steps that are intermittently interspersed with long jumps. To distinguish them, we set a threshold that is 97% percentile of displacements over 5 ms. The averaged positions of individual clustered locations are the fixation points of the attention pattern. We then use two metrics to evaluate A by comparing with G: 1) 2D Pearson CC 2) sAUC (). AUC is the area under the receiver operating characteristics (ROC) curve. Using this score, human fixations are considered as the positive set, and some points from the image are sampled nonuniformly to form the negative set. The attention map is then treated as a binary classifier to separate the positive samples from the negatives. By thresholding over the saliency map and plotting the true positive rate versus the false positive rate, an ROC curve can be obtained for each image. The ROC curves are then averaged over all images, and the area underneath the final ROC curve is calculated. Perfect prediction corresponds to a score of 1, while a score of 0.5 indicates a chance level.

Sampling efficiency

Sampling efficiency of the attention activity pattern

To quantify the sampling efficiency of the attention activity patterns, we calculate the number of sampled locations during a fixed time interval. First, the network space (63 by 63) is reduced to 9 by 9 in a coarse-grain way. Second, if the focus point of the attention activity pattern is located in a unit square containing a locally most salient part in the reduced space, then this unit square is counted as a sampled location. The locally most salient points (O1, O2, ⋯) are characterized sequentially; the first one O1 is the point whose value is maximum in the saliency map; then, the pixel values of the area within 5 pixels from O1 are set to −1; the pixels’ value of O2 is maximum in the new map; this process continues until the whole map is −1.

Sampling efficiency of random walks

We use a simple model (Eq. 1) to simulate the attention activity pattern dynamics. The drift term, b(x), is given bywhere is the partial Riesz fractional derivative. We use similar methods proposed by Çelik and Duman () to estimate it It is computationally difficult to calculate the derivative [or gradient; ∇ log π(x)] of complex natural scenes. We thus use a certain number of Gaussian functions to approximate patchy, salient parts in natural scenes (a typical example is shown in Fig. 9B)where Z is the normalization coefficient, 𝒢(μ, σI) is the Gaussian function with the peak location μ, and SD σ = 4. I is the two-by-two identity matrix, and μ is a locally most salient position of a saliency map (63 by 63); we choose the number of salient patches K = 15. The result is not sensitive to K. Euler method is used to integrate the stochastic differential equation, with the step size Δt of 0.001. The initial position of the random walker is drawn from a uniform distribution on the 2D probability landscape, and the positions of a random walker are constrained with periodic boundary conditions on the 2D landscape. In 100,000 steps, if the random walker visits a Gaussian patch, then the patch is counted as a sampled location; repeated visits to the location are excluded. The results are not sensitive to these parameters.

A firing rate model

We use a 2D firing rate model with the neural adaption of synaptic depression (). Individual units are embedded on a 2D plane with periodic boundary conditions. The distance between two units is set as 7.4 μm, the same as in the spiking neural circuit model. The firing rate of neuron i at time t, m(t), is described bywhere the transfer function f(I) = αlog (1 + e). The input current to neuron i is a sum of recurrent current and the external contribution . The recurrent contribution to the current is The synaptic strength W is determined by the distance between units i and j assigned to spatial coordinates r = (x, y), x, y ∈ [0,2π].where J1 and J0 are the excitatory and inhibitory coupling strengths, respectively. The dynamic variable s(t) represents the depression strength of the connections from the presynaptic unit i to all its postsynaptic neighbors, following the dynamicswhere τR is the time constant of recovery from synaptic depression, and U is the probability of used synaptic resources released by each spike. The external current contains two termswhere I is a spatially uniform and constant current. The second term is external inputs to neuron i, given bywhere the stimulus widths are σA = σB = 106, and the stimulus contrast levels cA = cB = 0.03. sA and sB are the center positions of these inputs. Other network parameters are τ = 10 ms, τR = 800 ms, J1 = 38.85, J0 = 24.7, U = 0.8, and α = 0.001. The neural network consists of 151 × 151 neurons with periodic boundary conditions. Simulations of the model are performed using the Euler method with a time step of 0.05 ms. The bifurcation diagram is constructed by ramping up the inhibition strength J1 and plotting the extrema of local activity population in 5 s. The local population activity is the sum of firing rates (m) for the population of neurons within the area whose receptive fields contain one of the two inputs.
  65 in total

1.  Instantaneous correlation of excitation and inhibition during ongoing and sensory-evoked activities.

Authors:  Michael Okun; Ilan Lampl
Journal:  Nat Neurosci       Date:  2008-03-30       Impact factor: 24.884

2.  Spatial statistics and attentional dynamics in scene viewing.

Authors:  Ralf Engbert; Hans A Trukenbrod; Simon Barthelmé; Felix A Wichmann
Journal:  J Vis       Date:  2015-01-14       Impact factor: 2.240

3.  Altered control of visual fixation and saccadic eye movements in attention-deficit hyperactivity disorder.

Authors:  Douglas P Munoz; Irene T Armstrong; Karen A Hampton; Kimberly D Moore
Journal:  J Neurophysiol       Date:  2003-04-02       Impact factor: 2.714

4.  Meaning-based guidance of attention in scenes as revealed by meaning maps.

Authors:  John M Henderson; Taylor R Hayes
Journal:  Nat Hum Behav       Date:  2017-09-25

5.  Storage of 7 +/- 2 short-term memories in oscillatory subcycles.

Authors:  J E Lisman; M A Idiart
Journal:  Science       Date:  1995-03-10       Impact factor: 47.728

6.  Control of the repetitive discharge of rat CA 1 pyramidal neurones in vitro.

Authors:  D V Madison; R A Nicoll
Journal:  J Physiol       Date:  1984-09       Impact factor: 5.182

Review 7.  Exploration versus exploitation in space, mind, and society.

Authors:  Thomas T Hills; Peter M Todd; David Lazer; A David Redish; Iain D Couzin
Journal:  Trends Cogn Sci       Date:  2014-12-03       Impact factor: 20.229

8.  SUN: A Bayesian framework for saliency using natural statistics.

Authors:  Lingyun Zhang; Matthew H Tong; Tim K Marks; Honghao Shan; Garrison W Cottrell
Journal:  J Vis       Date:  2008-12-16       Impact factor: 2.240

9.  Rich-club connectivity, diverse population coupling, and dynamical activity patterns emerging from local cortical circuits.

Authors:  Yifan Gu; Yang Qi; Pulin Gong
Journal:  PLoS Comput Biol       Date:  2019-04-02       Impact factor: 4.475

10.  Lévy walk dynamics explain gamma burst patterns in primate cerebral cortex.

Authors:  Yuxi Liu; Xian Long; Paul R Martin; Samuel G Solomon; Pulin Gong
Journal:  Commun Biol       Date:  2021-06-15
View more
  2 in total

1.  Enhancing spiking neural networks with hybrid top-down attention.

Authors:  Faqiang Liu; Rong Zhao
Journal:  Front Neurosci       Date:  2022-08-22       Impact factor: 5.152

2.  Fractional neural sampling as a theory of spatiotemporal probabilistic computations in neural circuits.

Authors:  Yang Qi; Pulin Gong
Journal:  Nat Commun       Date:  2022-08-05       Impact factor: 17.694

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.