Literature DB >> 24212672

Attention modulates spatial priority maps in the human occipital, parietal and frontal cortices.

Abstract

Computational theories propose that attention modulates the topographical landscape of spatial 'priority' maps in regions of the visual cortex so that the location of an important object is associated with higher activation levels. Although studies of single-unit recordings have demonstrated attention-related increases in the gain of neural responses and changes in the size of spatial receptive fields, the net effect of these modulations on the topography of region-level priority maps has not been investigated. Here we used functional magnetic resonance imaging and a multivariate encoding model to reconstruct spatial representations of attended and ignored stimuli using activation patterns across entire visual areas. These reconstructed spatial representations reveal the influence of attention on the amplitude and size of stimulus representations within putative priority maps across the visual hierarchy. Our results suggest that attention increases the amplitude of stimulus representations in these spatial maps, particularly in higher visual areas, but does not substantively change their size.

Entities: CellLine Chemical Disease Gene Species

Mesh：

Substances：
Oxygen

Year: 2013 PMID： 24212672 PMCID： PMC3977704 DOI： 10.1038/nn.3574

Source DB: PubMed Journal: Nat Neurosci ISSN： 1097-6256 Impact factor: 24.884

Prominent computational theories of selective attention posit that basic properties of visual stimuli are encoded in a series of interacting ‘priority’ maps that are found at each stage of the visual system[1-6]. The maps in different areas are thought to encode different stimulus features (e.g. orientation, color, motion) based on the selectivity of component neurons. Two general themes governing the organization of these maps have emerged. First, accurately encoding the spatial location of relevant stimuli is the fundamental goal of these priority maps, as spatial position is necessary to guide saccadic eye movements (and other exploratory and reflexive motor responses). Second, priority maps early in the visual system primarily reflect the physical salience of stimuli in the visual field, whereas priority maps in later areas increasingly index the behavioral relevance of stimuli, independent from physical salience[4,5]. Although many studies have investigated the influence of spatial attention on single unit neural activity over the last several decades[7-17], directly examining the impact of attention on the topographic profile across an entire spatial priority map is a major challenge because single-units have access to a limited window of the spatial scene[5]. This is a key limitation, because the relationship between changes in the size and amplitude of individual spatial receptive fields (RFs; or voxel-level RFs across populations of neurons) and changes in the fidelity of population-level spatial encoding are not related in a straightforward manner (see ref [18] for a discussion of this issue with respect to population codes for orientation). For example, if spatial RFs are uniformly shrunk by attention while viewing a stimulus, the population-level spatial representation (or priority map) carried by all those neurons might shrink or become sharper, but the code may be more vulnerable to uncorrelated noise (as there is less redundant coding of any given spatial position by the population). Alternatively, a uniform increase in spatial RF size might blur or increase the size of a spatial representation encoded by a population, but such a representation might be more robust to neural noise due to increased redundancy. Further complicating matters is the observation that spatial RFs have been shown to both increase and decrease in size with attention as a function of where the spatial RF is positioned relative to the attended stimulus. Spatial RFs tuned near an attended stimulus grow, and spatial RFs fully encompassing an attended stimulus shrink[10,19-23]. These RF size changes occur in parallel to changes in the amplitude (gain) of neural responses with attention[7-17]. Thus, the net impact of all of these changes on the fidelity of population-level spatial representations is unclear, and addressing this issue requires assessing how attention changes the profile of spatial representations encoded by the joint, region-level pattern of activity. Here, we assessed the modulatory role of attention on the spatial information content of putative priority maps by using an encoding model to reconstruct spatial representations of attended and unattended visual stimuli based on multivariate BOLD fMRI activation patterns within visually responsive regions of occipital, parietal, and frontal cortex. These reconstructions can be considered to reflect region-level spatial representations and they allow us to quantitatively track changes in parameters which characterize the topography of spatial maps within each region of interest (ROI). Importantly, this technique exploits the full multivariate pattern of BOLD signal across an entire region to evaluate the manner in which spatial representations are modulated by attention, rather than comparing multivariate decoding accuracy or considering the univariate response of each voxel in isolation. This approach can be used to examine mechanisms of attentional modulation that cannot be easily characterized by measuring changes in either the univariate mean BOLD signal or decoding accuracy[24-33] (Fig. 1, see also ref [34]).

Figure 1

The effects of spatial attention on region-level priority maps

Spatial attention might act via one of several mechanisms to change the spatial representation of a stimulus within a putative priority map.(a) The hypothetical spatial representation carried across an entire region in response to an unattended circular stimulus. (b) Under one hypothetical scenario, attention might enhance the spatial representation of the same stimulus by amplifying the gain of the spatial representation (i.e. multiplying the representation by a constant greater than 1). (c) Alternatively, attention might act via a combination of multiple mechanisms such as increasing the gain, decreasing the size, and increasing the baseline activity of the entire region (i.e. adding a constant to the response across all areas of the priority map). (d) Cross-sections of panels a–c. Note that this is not meant as an exhaustive description of different attentional modulations. (e) These different types of attentional modulation can give rise to identical responses when the mean BOLD response is measured across the entire expanse of a priority map. Note that simple Cartesian representations, such as those shown in a–c, may be visualized in early visual areas where retinotopy is well-defined at the spatial resolution of the BOLD response. However, later areas might still encode precise spatial representations of a stimulus even when clear retinotopic organization is not evident, so using alternative methods for reconstructing stimulus representations, such as the approach described in Figure 3, is necessary to evaluate the fidelity of information encoded in putative attentional priority maps.

Our results reveal that spatial attention increased the amplitude of region-level stimulus representations within putative priority maps carried by areas of occipital, parietal, and frontal cortex. However, we found little evidence that attention changes the size of stimulus representations in region-level priority maps, even though we observed increases in spatial filter size at the single-voxel level. In addition, the reconstructed spatial representations based on activation patterns in later regions of occipital, parietal, and frontal cortex showed larger attentional modulation than those from early areas, consistent with the hypothesis that representations in later regions increasingly transition to more selectively represent relevant stimuli[4,5]. These changes in the gain of spatial representations should theoretically increase the efficiency with which information about relevant objects in the visual field can be processed and subsequently used to guide perceptual decisions and motor plans[18].

Results

To evaluate how task demands influence the topography of spatial representations within different areas of the visual system, we designed a BOLD fMRI experiment that required participants to perform one of three tasks using an identical stimulus display (Fig. 2a). On each trial, participants (n = 8) maintained fixation at the center of the screen (see Online Methods: Eyetracking, Supplementary Fig. 1) while a full-contrast flickering checkerboard was presented in one of 36 spatial locations that sampled 6 discrete eccentricities (Fig. 2b). Participants either reported a faint contrast change at the fixation point (the “attend fixation” condition), reported a faint contrast change of the flickering checkerboard stimulus (the “attend stimulus” condition), or performed a spatial working memory task in which they compared the location of a probe stimulus, T2, with the remembered location of a target stimulus, T1, presented within the radius of the flickering checkerboard(the “spatial working memory” condition, see Fig. 2c). The spatial working memory task was included as an alternate means of inducing focused and sustained spatial attention around the stimulus position[35].

Figure 2

Task design & behavioral results

(a) Each trial consisted of a 500 ms target stimulus (T1), a 3000 ms flickering checkerboard (6 Hz, full contrast, 2.34° diameter), and a 500 ms probe stimulus (T2). T1 & T2 were at the same location on 50% of trials, and slightly offset on the remaining 50% of trials. During the stimulus presentation period, the stimulus dimmed briefly on 50% of trials and the fixation point dimmed on 50% of trials (each independently randomly chosen). Participants maintained fixation throughout the experiment, and eye position measured during scanning did not vary as a function of either task demands or stimulus position (see Supplementary Fig. 1). (b) On each trial, a single checkerboard stimulus appeared at one of 36 overlapping spatial locations with a slight spatial offset between runs (see Online Methods). Each spatial location was sampled once per run. This 6 × 6 grid of stimulus locations probes 6 unique eccentricities, as indicated by the color code of the dots (not present in actual stimulus display). (c) On alternating blocks of trials, participants either detected a dimming of the fixation point (attend fixation), detected a dimming of the checkerboard stimulus (attend stimulus), or they indicated if the spatial position of T1 and T2 matched (spatial working memory). Importantly, all tasks used a physically identical stimulus display – only the task demands varied. Each participant completed between 4 and 6 scanning runs of each of the 3 tasks. (d) For the attend fixation task, performance was better when the stimulus was presented at peripheral locations. In contrast, performance declined with increasing stimulus eccentricity in the attend stimulus and spatial working memory conditions. All error bars reflect ±1 S.E.M.

On average, performance in the attend fixation task was slightly, though non-significantly, higher than in the attend stimulus or spatial working memory tasks (Fig. 2d, main effect of condition: F(2,14) = 0.951, p = 0.41; attend fixation: 87.37 ± 6.46%, attend stimulus: 81.00 ± 6.67%, spatial working memory: 80.00 ± 2.09% accuracy, mean ± S.E.M.). However, we observed a different pattern of response errors across the 3 task demands: accuracy for the attend fixation condition was lowest on trials in which the flickering checkerboard stimulus was presented near fixation, whereas accuracy dropped off with increasing stimulus eccentricity for the attend stimulus and spatial working memory tasks (Fig. 2d, condition × eccentricity interaction: F(10,70) = 7.235, p < 0.0001). To compare spatial representations carried within different brain regions as a function of task demands, we first functionally identified 7 ROIs in each hemisphere of each participant using independent localizer techniques (see Online Methods: Functional localizers; Supplementary Table 1). Next we used an encoding model[36] (see also refs. [34,37,38]) to reconstruct a spatial representation of the stimulus that was presented on each trial using activation patterns from each ROI (Fig. 3). This method results in a “spatial representation” of the entire visual field measured on each trial that is constrained by activation across all voxels within each ROI. As a result, we obtain average spatial representations for each stimulus position for each ROI for each task condition which accurately reflect the stimulus viewed by the observer (Fig 4a). This method linearly maps high dimensional voxel space to a lower-dimensional spatial information space that corresponds to visual field coordinates (see Online Methods: Encoding Model for details).

Figure 3

Encoding model used to reconstruct spatial representations of visual stimuli

Spatial representations of stimuli in each of the 36 possible positions were estimated separately for each ROI. (a) Training the encoding model: a set of linear spatial filters forms the basis set, or “information channels”, that we use to estimate the spatial selectivity of the BOLD responses in each voxel (see Online Methods: Encoding model, Supplementary Figs. 2 & 3). The shape of these filters determines how each information channel should respond on each trial given the position of the stimulus that was presented (thus forming a set of regressors, or predicted channel responses). Then, we constructed a design matrix by concatenating the regressors generated for each trial. This design matrix, in combination with the measured BOLD signal amplitude on each trial, was then used to estimate a weight for each channel in each voxel using a standard general linear model (GLM). (b) Estimating channel responses: given the known spatial selectivity (or weight) profile of each voxel as computed in step a, we then used the pattern of responses across all voxels on each trial in the ‘test’ set to estimate the magnitude of the response in each of the 36 information channels on that trial. This estimate of the channel responses is thus constrained by the multivariate pattern of responses across all voxels on each trial in the test set, and results in a mapping from voxel space (hundreds of dimensions) onto a lower-dimensional channel space (36 dimensions, for mathematical details see Online Methods). Finally, we produced a smooth reconstructed spatial representation on every trial by summing the response of all 36 filters after weighting them by the respective channel responses on each trial. An example of a spatial representation computed from a single trial using data from V1 when the stimulus was presented at the location depicted in (a) is shown in the lower right panel.

Figure 4

Task demands modulate spatial representations

(a) Reconstructed spatial representations of each of 36 flickering checkerboard stimuli presented in a 6 × 6 grid. All 36 stimulus locations are shown, with each location’s representation averaged across participants (n = 8) using data from bilateral V1 during attend stimulus runs. One participant was not included in this analysis (AG3, see Supplementary Fig. 4). Each small image represents the reconstructed spatial representation of the entire visual field, and the position of the image in the panel corresponds to the location of the presented stimulus. (b) A subset of representations (corresponding to the upper left quadrant of the visual field, dashed box in a) for each ROI and each task condition. Results are similar for other quadrants (not shown, although see Fig. 5 for aggregate quantification of all reconstructions). All reconstructions in a and b are shown on the same color scale.

As a point of terminological clarification, we emphasize that we are reporting estimates of the spatial representation of a stimulus display based on the distributed activation pattern across all voxels within a ROI. Throughout the Results section, we will therefore refer to our actual measurements as “reconstructed spatial representations”. However, in the Discussion, we will interpret these measurements in the context of putative attentional priority maps that are thought to play a key role in shaping perception and decision making[1-6].

Reconstructed spatial representations of visual stimuli

Reconstructed spatial representations based on activation patterns in each ROI exhibited several qualitative differences as a function of stimulus eccentricity, task demands, and ROI (which we more formally quantify below). First, we note that representations were very precise in V1 (Fig 4a), and became successively coarser and more diffuse in areas of extrastriate, parietal, and frontal cortex (Fig. 4b). Similarly, representations of more eccentric stimuli were more diffuse compared to more foveal stimuli (e.g., compare eccentric to foveal representations within each ROI). We also observed higher fidelity representations of the upper visual field when using only voxels from the ventral aspects of V2 and V3, and higher fidelity representations of the lower visual field when using only voxels from the dorsal aspects of these regions (Supplementary Fig. 5a). This observation, which is consistent with known receptive field locations in non-human primates, confirms that our encoding model method recovered known properties of these visual subregions and that these reconstructions were not merely the result of fitting idiosyncratic aspects of our particular data set (i.e. overfitting noise). We further demonstrated this point by using the model to reconstruct representations of completely novel stimuli (Supplementary Fig. 5b). Second, the profile of reconstructed spatial representations within many regions also varied with task demands, consistent with the notion that these spatial representations reflect spatial maps of attentional priority. Note that, especially in hV4, hMT+, the intraparietal sulcus (IPS), and superior precentral sulcus (sPCS), the magnitude of the spatial representations increased when the participant was either attending to the flickering checkerboard stimulus or performing the spatial working memory task compared to when they were performing a task at fixation.

Size of spatial representations across eccentricity & ROI

Before formally evaluating the effects of attention on the profile of spatial representations, we first sought to quantify changes in the size of these representations due to stimulus eccentricity and ROI for comparison with known properties of the primate visual system. To this end, we fit a smooth surface to the spatial representations associated with each of the three task conditions separately for each of the 36 possible stimulus locations in each ROI (see Online Methods: Curvefitting and Supplementary Fig. 2). These fits generated an estimate of the amplitude, baseline offset, and the size of the represented stimulus within each reconstructed spatial representation. We averaged the fit parameters obtained from each ROI across stimulus locations that were at equivalent eccentricities and then across participants (yielding 6 sets of fit parameters, one set for each of the 6 possible stimulus eccentricities, see color code in Fig. 2b). We then used these fit parameters to make inferences about how the magnitudes and shapes of spatial representations of stimuli from each ROI varied across stimulus positions. First, we quantified the accuracy of fits by computing the Euclidean distance between the centroid of the fit function and the actual location of the stimulus across all eccentricities and task conditions. The estimated centroids were generally accurate and closely tracked changes in stimulus location (Fig 4a). However, the distances between fit centroids and the actual stimulus positions in sPCS were nearly double those of the next least accurate region, hMT+ (sPCS: 3.01° ± 0.077°, hMT+: 1.68° ± 0.17°, mean ± S.E.M.). Error distances in all other areas were relatively small (V1: 0.67° ± 0.084°, V2: 0.77° ± 0.12°, V3: 0.75° ± 0.095°, hV4: 1.16° ± 0.13°, IPS: 1.46° ± 0.20°). Thus, the relatively low correspondence between the estimated and actual stimulus position based on data from the sPCS suggests that the resulting fit parameters should be interpreted with caution (we return to this point in the Discussion). In early visual ROIs V1, V2, V3, and hV4, the size of the reconstructed spatial representations increased with increasing eccentricity, regardless of task condition (Fig. 5, main effect of eccentricity, 2-way ANOVA within each ROI, all p’s < 0.0004; unless otherwise specified, all statistical tests on fit parameters to spatial representations employed a non-parametric permutation procedure and corrected for multiple comparisons, see Online Methods: Statistical Methods). This increase in size with eccentricity is expected, given the use of a constant stimulus size and the well-documented increase in the size of spatial RFs in early visual areas with increasing eccentricity[39]. In addition, the size of the reconstructed stimulus representations also increased systematically from V1 to sPCS, which is also consistent with the known expansion of mean spatial RF sizes in parietal and frontal cortex[40,41] (3-way ANOVA, significant main effect of ROI on fit size, p < 0.0001).

Figure 5

Fit parameters to reconstructed spatial representations, averaged across like eccentricities

For each participant, we fit a smooth 2D surface (see Online Methods: Curvefitting) to the average reconstructed stimulus representation in all 36 locations, separately for each task condition and ROI. We allowed the amplitude, baseline, size, and center ({x,y} coordinate) of the fit basis function to vary freely during fitting. Fit parameters were then averaged within each participant across like eccentricities, and then averaged across participants. The size of the best fitting surface varied systematically with stimulus eccentricity and ROI, but did not vary as a function of task condition. In contrast, the amplitude of the best fitting surface increased with attention in hV4, hMT+ and sPCS (with a marginal effect in IPS, see text).

*, †, × indicate main effect of task condition, eccentricity, and interaction between task and eccentricity, respectively at the p < 0.05 level, corrected for multiple comparisons (see Online Methods: Statistical Procedures). Grey symbols indicate trends at the p < 0.025 level, uncorrected for multiple comparisons. Error bars reflect within-participant S.E.M.

One alternative explanation is that the size of represented stimuli increases with eccentricity because there is more trial-to-trial variability in the center point of the represented stimulus within reconstructions at more peripheral stimulus locations. In turn, this increase in trial-to-trial variability would ‘smear’ the spatial representations, leading to larger size estimates. However, our data speak against this possibility as increased variability in the reconstructed stimulus locations would also result in lower estimated amplitudes, so increases in fit size and decreases in fit amplitude across conditions would always be yoked and correlating the change in amplitude and the change in size within each eccentricity across each condition pair would reveal a negative correlation (e.g., if the size of the spatial representation measured at a given eccentricity increases with attention, then the amplitude decreases). No combinations of condition pair, eccentricity and ROI revealed a significant correlation between change in amplitude and change in size (all p’s > 0.05, corrected using FDR, see Online Methods: Statistical procedures). Furthermore, in a follow-up analysis we computed the population receptive field (pRF) for each voxel[42], which revealed that voxels tuned to more eccentric visual field positions have a larger pRF size (Supplementary Figs. 8–9; Supplementary Table 2). This combination of analyses supports the conclusion that increases in fit size with increases in stimulus eccentricity are not purely due to increased variability in reconstructed spatial representations.

Effects of attention on spatial representations

Despite being sensitive to expected changes in representation size based on anatomical properties of the visual system, task demands exerted a negligible influence on the size of reconstructed spatial representations, with no areas showing a significant effect (hV4 was closest at p = 0.033, but this did not survive correction for multiple comparisons, and p-values in all other regions were > 0.147). In contrast, the fit amplitude in hV4, hMT+, IPS, and sPCS is significantly modulated by task condition, with a higher amplitude in the attention and working memory conditions than in the fixation condition (Fig. 5, 3-way ANOVA, main effect of task condition, p = 0.0003). For example, in hV4, the amplitude of the best fitting surface to spatial representations of attended stimuli was higher during the attend stimulus and spatial working memory conditions compared to the attend fixation condition (2-way ANOVA, main effect of task condition, p < 0.0001). Similar effects were observed in hMT+ (2-way ANOVA, p = 0.0007) and sPCS (2-way ANOVA, p = 0.0007). A similar pattern was evident in IPS as well, but it did not survive correction for multiple comparisons (2-way ANOVA, uncorrected p = 0.011). Within individual ROIs, there was a significant interaction between task condition and eccentricity in hMT+ (p = 0.0003) with larger increases in amplitude observed for more eccentric stimuli. It is important to note that this increase in the amplitude of spatial representations with attention corresponds to a focal gain modulation that is restricted to the portion of visual space in the immediate neighborhood of the attended stimulus. Changes in fit amplitude do not result from a global increase in the BOLD signal that equally influences the response across an entire ROI; such a general and widespread modulation would be accounted for by an increase in the baseline fit parameter (see Supplementary Fig. 2; below). Finally, the impact of task condition on the amplitude of reconstructed spatial representations was more pronounced in later visual areas hV4/hMT+/IPS/sPCS compared to earlier areas V1/V2/V3 (3-way interaction between ROI, condition and eccentricity, p = 0.043). In addition to an increase in the fit amplitude of the reconstructed spatial representations, IPS and sPCS also exhibited a spatially global increase in baseline response levels across the entire measured spatial representation in the attend stimulus and spatial working memory conditions compared to the attend fixation condition (Fig. 5, 2-way ANOVAs, main effect of condition, IPS: p = 0.0014; sPCS: p = 0.0012; see Supplementary Fig. 6). The spatially non-selective increases may reflect the fact that spatial RFs in these regions are often large enough to encompass the entire stimulus display[40,41], so all stimuli might drive some increase in the response, irrespective of spatial position.

Controlling difficulty across task conditions

Slight differences in task difficulty in the first experiment (Fig. 2d) might have contributed to observed changes in the spatial representations. To address this possibility, we ran four participants from the original cohort in a second experimental session while carefully equating behavioral performance across all 3 tasks (Fig. 6a). Overall accuracy during this second session did not differ significantly across the 3 conditions, although a similar interaction is observed between task condition and stimulus eccentricity (Fig. 6a, 2-way repeated-measures ANOVA, main effect of condition: F(2,6) = 0.043, p = 0.96, condition × eccentricity interaction: F(10,30) = 3.28, p = 0.005; attend fixation: 78.8 ± 2.80%, attend stimulus: 80.0 ± 2.60%, spatial working memory: 79.8 ± 1.76%, mean ± S.E.M.). In addition, we also identified IPS visual field maps 0–3 using standard procedures so that we could more precisely characterize the effects of attention on stimulus representations in sub-regions of our larger IPS ROI (refs [31,32,43,44]; see Online Methods: Mapping IPS subregions, Supplementary Fig. 7).

Figure 6

Results are consistent when task difficulty is matched

(a) Four participants were re-scanned while carefully matching task difficulty across all three experimental conditions. As in Figure 2d, performance is better on the attend fixation task when the checkerboard is presented in the periphery, and performance on the attend stimulus and spatial working memory tasks is better when the stimulus is presented near the fovea. (b) A subset of illustrative reconstructed stimulus representations from V1, hV4, hMT+, IPS 0/1, averaged across like eccentricities (correct trials only, number of averaged trials indicated by inset). See Supplementary Figure 7 for details on IPS subregion identification.

To ensure that behavioral performance was not unduly biasing our results, we reconstructed spatial representations using only correct trials (~80% of total trials, Fig 6a). All representations were co-registered based on stimulus eccentricity before averaging (see Fig. 2b for corresponding eccentricity points). Even though our sample size was smaller (n = 4 vs. n = 8), the influence of attention on the topography of spatial representations was similar to our initial observations (Fig. 6b). In addition, mapping out retinotopic subregions of the IPS revealed that the functionally-defined IPS ROI presented in Figure 5 primarily corresponds to IPS0 and IPS1 (Supplementary Fig. 7a–b). When examining best-fit surfaces to spatial representations from this experiment (Fig. 7, fits computed using coregistered representations and only correct trials for each participant, see Online Methods: Curvefitting), we found that attention significantly modulated the amplitude across all regions (3-way ANOVA, main effect of task condition, p = 0.0162). When considered in isolation, only hV4 shows a significant change in amplitude with attention after correction for multiple comparisons (2-way repeated-measures ANOVA, p = 0.0022). However, similar trends were observed in V1, V2, and V3 (uncorrected p’s = 0.0243, 0.042, and 0.031, respectively). No significant main effect of task condition on the size of representations was found (all p’s > 0.135, minimum p for hMT+), and overall baseline levels only significantly increased as a function of task condition in hMT+ (p = 0.00197). Across all ROIs, there was a main effect of eccentricity on fit size (3-way ANOVA, p = 0.0016), but no main effect of task condition on fit size (3-way ANOVA, p = 0.423).

Figure 7

Fit parameters to spatial representations after controlling for task difficulty

As in Figure 5, a surface was fit to the averaged, coregistered spatial representations for each participant. However, in this case task difficulty was carefully matched between conditions, and representations were based solely on trials in which the participant made a correct behavioral response (Fig. 6b). Results are similar to those reported in Figure 5: attention acts to increase the fit amplitude of spatial representations in hV4, but does not act to decrease size. In hMT+, attention also acted in a non-localized manner to increase the baseline parameter. Statistics as in Figure 5. Error bars reflect within-participant S.E.M.

Population receptive fields (pRFs) expand with attention

For these same 4 participants, we computed the population receptive field (pRF, ref [42]) for each voxel in V1, hV4, hMT+ and IPS0 using data from the behaviorally-controlled replication experiment. Here, we computed pRFs by first using the initial step of our encoding model estimation procedure (Fig. 3a) to determine the response of each voxel to each position in the visual field (Supplementary Figs. 8–9, Online Methods: population receptive fields). We then fit each voxel’s response profile with the same surface used to characterize spatial representations. By comparing pRFs computed using data from each condition independently, we found that a majority of pRFs in hV4, hMT+ and IPS0 increase in size during either the attend stimulus or spatial working memory condition compared to the attend fixation condition. In contrast, pRF size in V1 was not significantly modulated by attention (Supplementary Fig. 9; see Supplementary Results for statistics). To reconcile the results that voxel-level pRFs expanded with attention, yet region-level spatial representations remained a constant size, we simulated data using estimated pRF parameters from hV4 (a region for which spatial representations increase in amplitude and pRFs increase in size; see Online Methods: Simulating data with different pRF properties) under different pRF modulation conditions. In the first condition, we generated data using pRFs with sizes centered around two mean values, resulting in a pRF scaling across all simulated voxels (average size across voxels increases, but some voxels decrease in size and others increase). Under these conditions, spatial representations increase in size (Supplementary Fig. 10a–b). In a second pRF modulation scenario, we used the fit pRF values from one participant’s hV4 ROI (Supplementary Fig. 8) to simulate data. In this case, spatial representations remained the same size, but increased in amplitude, consistent with our observations using real data (Figs. 5, 7, Supplementary Fig. 10c–d; this conclusion was also supported when pRF data from the other 3 observers was used to seed the simulation). Thus, the pattern of pRF modulations across all voxels enhances the amplitude of spatial representations while preserving their size.

Discussion

Spatial attention has previously been shown to alter the gain of single-unit responses associated with relevant visual features such as orientation[7-9,12,13,16,17] and motion direction[11,14,15], as well as to modulate the size of spatial RFs[10,19-23]. Here, we show that these local modulations jointly operate to increase the overall amplitude of the region-level spatial representation of an attended stimulus, without changing its represented size. Furthermore, these amplitude modulations were especially apparent in later areas of the visual system such as hV4, hMT+, and IPS, consistent with predictions made by computational theories of attentional priority maps[4,5]. We were able to reconstruct robust spatial representations across a range of eccentricities and for all 3 task conditions in all measured ROIs. Importantly, even though an identical reconstruction procedure was used in all areas, the size of the reconstructed spatial representations increased from early to later visual areas (Fig. 5). Single-unit receptive field sizes across cortical regions are thought to increase in a similar manner[39-41,45,46]. In addition, representations of stimuli presented at higher eccentricities were larger than representations of stimuli presented near the fovea, which also corresponds to known changes in RF size with eccentricity[39,42]. Furthermore, simulating data under conditions in which we uniformly scale the mean size of voxel-level pRFs reveals that such changes are detectable using our analysis method (Supplementary Fig. 10a–b). Thus, this technique is sensitive to detect changes in the size of spatial representations of stimuli that are driven by known neural constraints such as relative differences in RF size across cortical ROIs and eccentricity, even though these factors are not built-in to the spatial encoding model. Together, these empirical and modeling results suggest that at the level of region-wide priority maps, the representation of a stimulus does not expand or contract under the attentional conditions tested here, and underscores the importance of incorporating response changes across all encoding units when evaluating attentional modulations. The quantification method we implemented for measuring changes in spatial representations across tasks, eccentricities, and ROIs involved fitting a surface defined by several parameters: center location, amplitude, baseline offset, and size (Supplementary Fig. 2). Changes in activation which carry no information about stimulus location (such as changes in general arousal or responsiveness to stimuli presented in all locations due to large RFs) will influence the baseline parameter, as such changes reflect increased/decreased signal across an entire region. In contrast, a change in the spatial representation that changes the representation of a visual stimulus would result in a change in the amplitude or in the size parameter (or both). Here, we demonstrated that attention primarily operates by selectively increasing the amplitude of stimulus representations in several putative priority maps (Figs. 5 and 7), rather than increasing the overall BOLD signal more generally across entire regions. Interestingly, spatial reconstructions based on activation patterns from sPCS were relatively inaccurate compared to other ROIs, and this ROI primarily exhibited increases in the fit baseline parameter (Fig. 5). This region, which may be a human homolog of the functionally-defined macaque frontal eye fields[47,48] (FEF), may show degraded spatial selectivity in the present study due to the relatively large size of spatial receptive fields observed in many FEF neurons (typically ≥ 20° diameter: see ref. [41]) and the small area subtended by our stimulus display (9.31° horizontally across). Consistent with this possibility, previous reports of retinotopic organization in human frontal cortex used stimuli presented at higher eccentricities in order to resolve spatial maps (≥10°, ref. [49] to 25°, ref. [45]).

Attentional priority maps

The extensive literature on spatial “salience” or “priority” maps[1-6] postulates the existence of one or several maps of visual space, each carrying information about behaviorally relevant objects within the visual scene. Furthermore, priority maps in early visual areas (e.g. in primary visual cortex) are thought to primarily encode low-level stimulus features (e.g., contrast), whereas priority maps in later regions are thought to increasingly weight behavioral relevance over low-level stimulus attributes[4]. While many important insights have stemmed from observing single-unit responses as a function of changes in attentional priority (see [5] for a review), these results provide information about how isolated “pixels” in a priority map change under different task conditions. A previous fMRI study used multivariate decoding (classification) analyses to identify several frontal and parietal ROIs that exhibit similar activation patterns during covert attention, spatial working memory, and saccade generation tasks[32]. These results provide strong support for the notion that common priority maps support representations of attentional priority across multiple tasks. Here, we assessed how the holistic landscape across these priority maps measured using fMRI changed as attention was systematically varied. Our demonstration that spatial representation amplitude is enhanced with attention in later ROIs, but not earlier ones, supports the hypothesis that priority maps in higher areas are increasingly dominated by attentional factors, and suggests that these attentional modulations of priority maps operate via scaling the amplitude of the behaviorally-relevant item without changing its represented size.

Population receptive fields

In addition to measuring spatial representations carried by the pattern of activation across entire visual regions, we also estimated the voxel-level pRFs[42] for a subset of participants and ROIs by adding constraints to our encoding model estimation procedure (Supplementary Figs. 8–9; Online methods: population receptive fields). This alternative tool has been used previously to evaluate the aggregate spatial RF profile across all neural populations within voxels across different visual ROIs[42]. Changes in voxel-level pRFs can inform how a region dynamically adjusts the spatial sensitivity of its constituent filters in order to modulate its overall spatial priority map. First, we replicated the typical result that voxel-level pRFs tuned for more eccentric visual field positions are larger in size (Supplementary Table 2), and that pRFs for later visual regions tend to be larger than pRFs for earlier visual regions (Supplementary Fig. 9). Second, results from this complementary analysis revealed that, in regions which showed enhanced spatial representation amplitude with attention (hV4, hMT+, IPS0), pRF size increased (Supplementary Figs. 8–9), even though the corresponding region-level spatial representations did not increase in size (Fig. 7). This may seem like a disconnect, given that the particular pattern of pRF changes across all voxels within a region jointly shapes how the spatial priority map changes with attention. However, there is not necessarily a monotonic mapping between the size of the constituent filters and the size of population-level spatial representations (see below, Information content of attentional priority maps). Indeed, simulations based on the observed pattern of pRF changes with attention give rise to region-level increases in representation amplitude in the absence of changes in representation size, just as we observed in our data (Supplementary Fig. 10). This finding, together with our primary results concerning region-level spatial representations, provides important evidence that attentional modulation of spatial information encoding is a process that strongly benefits from study at the large-scale population level.

Comparing to previous results

At the level of single unit recordings, attention has been shown to decrease the size of MT spatial RFs when an animal is attending to a stimulus encompassed by the recorded neuron’s RF[19-21] and to increase the size of spatial RFs when an animal is attending nearby the recorded neuron’s RF[20-22]. In V4, spatial RFs appear to shift toward the attended region of space in a subset of neurons[10]. With respect to cortical space, these single-unit attentional modulations of spatial RFs suggest that unifocal attention may act to increase the cortical surface area responsive to a stimulus of constant size. Consistent with this prediction, our measured pRFs for extrastriate regions hV4, hMT+ and IPS0 increased in size with attention. In contrast, one previous report suggested that spatial attention instead narrows the activation profile along the cortical surface of visual cortex in response to a visual stimulus[50]. However, this inference was based on patterns of inter-trial correlations between BOLD activation patterns associated with dividing attention between 4 stimuli (one presented in each quadrant). These patterns were suggested to result from a combination of attention-related gain and narrowing of population-level responses[50]; that is, a narrower response along the cortical surface with attention. We did not observe any significant attention-related changes in the size of the reconstructed spatial representations in either primary visual cortex or other areas in extrastriate, parietal, or frontal cortex. However, the tasks performed by observers and the analysis techniques implemented were very different between these studies. Most notably, observers in the present study and in previous fMRI[24-33] and single-unit studies[10,19-21] were typically required to attend to a single stimulus, whereas population-level activation narrowing was observed when participants simultaneously attended to the precise spatial position of 4 Gabor stimuli, one presented in each visual quadrant[50]. Furthermore, our observation that pRFs increased in size during the attend stimulus and spatial working memory conditions is compatible with the pattern of spatial RF changes in single-units[10,19-23], and our data and simulations show that these local changes can result in a region-level representation that changes only in amplitude, not size (Supplementary Fig. 10). Collectively, it seems probable that the exact task demands (unifocal vs. multifocal attention) and stimulus properties (single stimulus vs. multiple stimuli) may play a key role in determining how attention influences the profile of spatial representations. Future work using analysis methods sensitive to region-level differences in spatial representations (e.g. applying encoding models like that described here to data acquired when participants perform different tasks), in conjunction with careful identification of neural RF properties across those task demand conditions (e.g., from simultaneous multi-unit electrophysiological recordings or in vivo two-photon Ca2+ imaging in rodents and primates), may provide complementary insights into when and how attention changes the shape and/or changes the amplitude of stimulus representations in spatial priority maps and how those changes are implemented in neural circuitry. Importantly, while our observations are largely consistent with measured RF changes at the single-unit level[10,19-23], we cannot make direct inferences that such single-unit changes are in fact occurring. A number of mechanisms, including a mechanism whereby only the gain of different populations is modulated by attention, could also account for the pattern of results we see both in our region-level spatial representations (Figs. 5 & 7) and our pRF measurements (Supplementary Figs. 8–9). We do note, however, that some neural mechanisms are highly unlikely given our measured spatial representations and pRFs. For example, we would not observe an increase in pRF size if spatial RFs of neurons within those voxels were to exclusively narrow with attention. As a result of these interpretational concerns, we restrict the inferences we draw from our results to the role of attention in modulating region-level spatial priority maps measured with fMRI, and make no direct claims about spatial information coding at a neural level.

Information content of attentional priority maps

One consequence of an observed increase in the amplitude of reconstructed priority maps is that the mutual information (MI) between the stimulus position and the observed BOLD responses should increase (see ref [18] for a more complete discussion). This increase can occur, in theory, because MI reflects the ratio of signal entropy (variability in neural responses tied systematically to changes in the stimulus) to noise entropy (variability in neural responses that is not tied to changes in the stimulus). Thus a multiplicative increase in the gain of the neural responses associated with an attended stimulus should increase MI because it will increase the variability of responses that are associated with an attended stimulus location, which will in turn increase signal entropy. In contrast, a purely additive shift in all neural responses (reflected by an increase in the fit baseline parameter) will not increase the dynamic range of responses associated with an attended stimulus location, causing MI to either remain constant (under a constant additive noise model), or even to decrease (under a Poisson noise model, in which noise increases with the mean). Previous fMRI work on spatial attention has not attempted to disentangle these two potential sources of increases in the BOLD signal, highlighting the utility of approaches that can support more precise inferences about how task demands influence region-level neural codes[24-33]. The information content of a neural code is not necessarily monotonically related to the size of the constituent neural filters[18]. Extremely small (pinpoint) or extremely large (flat) spatial filters each individually carry very little information about the spatial arrangement of stimuli within the visual field. Accordingly, the optimal filter size lies somewhere between these two extremes, and thus it is not straightforward to infer whether a change in filter size results in a more or less optimal neural code (in terms of information encoding capacity). By simultaneously estimating changes in filter size across an entire ROI subtending the entire stimulated visual field, we were able to demonstrate that the synergistic pattern of spatial filter (pRF) modulations with attention jointly constrains the region-level spatial representation to maintain a constant size, despite most voxels exhibiting an increase in pRF size (Supplementary Figs. 8–10). Together, our results demonstrate the importance of incorporating all available information across entire ROIs when evaluating the modulatory role of attention on the information content of spatial priority maps.

Online Methods

Participants

10 neurologically healthy volunteers (5 female, 25 ± 2.11 years, mean ± standard dev.) with normal or corrected-to-normal vision were recruited from the University of California, San Diego (UCSD). All participants provided written informed consent in accordance with the human participants Institutional Review Board at UCSD and were monetarily compensated for their participation. Participants participated in 2–3 scanning sessions, each lasting 2 hours, for the original experiment. Data from 2 participants (1 female) were excluded from the main analysis because of excessive head movement (AJ3) or because of unusually noisy reconstructions during attend fixation runs (AG3, see below). In the follow-up experiment in which behavioral performance was carefully controlled and IPS subregions were retinotopically mapped, 4 participants of our original cohort were scanned for an additional 2 sessions, each lasting 1.5–2 hrs.

Stimulus

Stimuli were rear-projected on a screen (90 cm width) located 380 cm from the participant’s eyes at the foot of the scanner table. The screen was viewed using a mirror attached to the headcoil. We presented an identical stimulus sequence during all imaging runs while asking observers to perform several different tasks. Each trial began with the presentation of a small red dot (T1) that was presented for 500 ms, followed by a flickering circular checkerboard stimulus at full contrast (2.34° diameter, 1.47 cycles/°) that was presented for 3 s, followed by a probe stimulus (T2) that was identical to T1. A 2 s intertrial interval (ITI) separated each trial (Fig. 2a). T1 was presented between 0.176° and 1.104° from the center of the checkerboard stimulus along a vector of a random orientation (in polar coordinates, θ was randomly chosen along the range of 0° to 360°, and r was uniformly sampled from the range 0.176° to 1.104°). This ensured that the location of T1 was not precisely predictive of checkerboard location. On 50% of trials T2 was presented in the same location as T1, and the remaining trials, T2 was presented between 0.176° and 1.104° from the center of the checkerboard along a vector oriented at least 90° from the vector along which T1 was plotted (r was uniformly sampled from the range 0.176° to 1.104° and θ was randomly chosen by adding between 90° and 270°, uniformly sampled, to θ). Polar coordinates use the center of the checkerboard stimulus as the origin. During the working memory condition (see below), participants based their response on whether T1 and T2 were presented in the exact same spatial position. The location of the checkerboard stimulus was pseudo-randomly chosen on each trial from a grid of 36 potential stimulus locations, spaced by 1.17°. The stimulus location grid was jittered by 0.827° diagonally either up and to the left or down and to the right on each run, allowing for an improved sampling of space. All figures are presented aligned to a common space by removing jitter (see below). On each run, there were 36 trials (one trial for each stimulus location) and 9 null trials in which participants passively fixated for the duration of a normal trial (6 s). We scanned participants for between 4 and 6 runs of each task, always ensuring each task was repeated an equal number of times.

Tasks

Participants performed one of 3 tasks during each functional run (Fig. 2c). During attend fixation runs, participants responded when they detected a brief contrast dimming of the fixation point (0.33 s) which occurred on 50% of trials. During attend stimulus runs, participants responded when they detected a brief contrast dimming of the flickering checkerboard stimulus (0.33 s) which occurred on 50% of trials. During spatial working memory runs, participants made a button press response to indicate whether T2 was in the same or a different location as T1. Importantly, all three events (T1, checkerboard, T2) occurred during all runs, ensuring that the sensory display remained identical and that we were measuring changes in spatial representations as a function of task demands rather than changes as a result of inconsistent visual stimulation. For the follow-up behavioral control experiment we dynamically adjusted difficulty (contrast dimming or T1/T2 separation distance) to achieve consistent accuracy of ~75% across tasks.

Eye tracking

Participants were instructed to maintain fixation during all runs. Fixation was monitored during scanning for 4 participants using an ASL LRO-R long-range eyetracking system (Applied Science Laboratories) with a sampling rate of 240 Hz. We recorded mean gaze as a function of stimulus location and task demands after excluding any samples in which both pupil & corneal reflection were not reliably detected (Supplementary Fig. 1).

Imaging

We scanned all participants on a 3T GE MR750 research-dedicated scanner at UCSD. Functional images were collected using a gradient EPI pulse sequence and an 8-channel head coil (19.2 × 19.2 cm FOV, 96 × 96 matrix size, 31 3 mm thick slices with 0 mm gap, TR = 2250 ms, TE = 30 ms, flip angle = 90°), yielding a voxel size of 2 ×2 × 3 mm. We acquired oblique slices with coverage extending from the superior portion of parietal cortex to ventral occipital cortex. We also acquired a high-resolution anatomical scan (FSPGR T1-weighted sequence, TR/TE = 11/3.3 ms, TI = 1100 ms, 172 slices, flip angle = 18°, 1 mm3 resolution). Functional images were coregistered to this scan. Images were preprocessed using FSL (Oxford, UK) and BrainVoyager 2.3 (BrainInnovations). Preprocessing included unwarping the EPI images using routines provided by FSL, slice-time correction, 3D motion correction (6 parameter affine transform), temporal high-pass filtering (to remove first, second and third order drift), transformation to Talairach space, and normalization of signal amplitudes by converting to Z-scores. We did not perform any spatial smoothing beyond smoothing introduced via resampling during the coregistration of functional images, motion correction, and transformation to Talairach space. When mapping IPS subregions, we scanned those participants using an identical pulse sequence, but instead used a 32 channel Nova Medical headcoil.

Functional localizers

All regions of interest (ROIs) used were identified using independent localizer runs acquired across multiple scanning sessions. Early visual areas were defined using standard retinotopic procedures[51,52]. We identified the horizontal and vertical meridians using functional data projected onto gray/white matter boundary surface reconstructions for each hemisphere. Using these meridians, we defined areas V1, V2v, V3v, hV4, V2d, and V3d. Unless otherwise indicated, data were concatenated across hemispheres and across dorsal/ventral aspects of each respective visual area. We scanned each participant for between 2 and 4 retinotopic mapping runs (n = 3 completed 2 runs, n = 3 completed 3 runs, n = 2 completed 4 runs). Human middle temporal area (hMT+) was defined using a functional localizer in which a field of dots either moved with 100% coherence in a pseudo-randomly selected direction or were randomly replotted on each frame to produce a visual ‘snow’ display[53,54]. Dots were each 0.081° in diameter and were presented in an annulus between 0.63° and 2.26° around fixation. During coherent dot motion, all dots moved at a constant velocity of 2.71°/s. Participants attended the dot display for transient changes in velocity (during coherent motion) or replotting frequency (snow). Participants completed between 1 and 3 runs of this localizer (n = 2 completed 1 run, n = 3 completed 2 runs, n = 3 completed 3 runs). IPS and superior precentral sulcus sPCS ROIs were defined using a functional localizer which required maintenance of a spatial location in working memory, a task commonly used to isolate IPS and sPCS, which is the putative human FEF[47,49]. A flickering checkerboard subtending ½ the visual field appeared for 12 s, during which time two spatial working memory trials were presented. During the flickering checkerboard presentation, we presented a red target dot for 500 ms, followed 2 s later by a green probe dot for 500 ms. After the probe dot appeared, participants indicated whether the probe dot was in the same location or a different location as the red target dot. Here, we limited our definition of IPS to the posterior aspect (Supplementary Table 1). ROIs were functionally defined with a threshold of FDR-corrected p < 0.05 or more stringent when patches of activation abutted one another. Participants completed between 1 (n = 2) and 2 (n = 6) runs of this scan. We also used data from these IPS/sPCS localizer scans to identify voxels in all other ROIs that were responsive to the portion of the visual field in which stimuli were presented in the main tasks since the large checkerboard stimuli subtended the same visual area as the stimulus array used in the main task. All ROIs were masked on a participant-by-participant basis such that further analyses only included voxels with significant responses during this localizer task (FDR-corrected p < 0.05).

Mapping IPS subregions

To determine the likely relative contributions of different IPS subregions to the localized ROI measured for all participants, we scanned the 4 participants who make up the behaviorally-controlled cohort presented in Figures 6 and 7 using a polar angle mapping stimulus and attentionally demanding task. We used 2 stimulus types and behavioral tasks to define borders between IPS subregions[31,32,43,44]. On all runs, we used a wedge stimulus spanning 72° polar angle and presented between 1.75° and 8.75° eccentricity rotating with a period of 24.75 s. On alternating runs, the wedge was either a 4 Hz flickering checkerboard stimulus (black/white, red/green, or blue/yellow) or a field of moving black dots (0.3 °, 13 dots/deg2, moving at 5 °/s, changing direction every 8 s). During checkerboard runs, participants quickly responded after detecting a brief (250 ms) contrast dimming of a portion of the checkerboard. During moving dots runs, participants quickly responded after detecting a brief (417 ms) increase in dot speed. Targets appeared with 20% probability every 1.5 s. Difficulty was adjusted to achieve approximately 75% correct performance by changing the magnitude of the contrast dimming (checkerboard) or dot speed increment (moving dots) between runs. On average, participants performed with 84.1% accuracy on the contrast dimming task and 75.4% accuracy on the moving dots task. 2 participants completed 14 (8 clockwise, 6 counter clockwise) runs, and 1 participant completed 10 runs (AC, 5 clockwise, 5 counter clockwise). 1 participant was scanned with 2 different stimulus setups: half of all runs used the parameters described above, and half used a wedge spanning 60° polar angle and rotating with a period of 36.00 s (AB, 6 runs clockwise, 6 runs counter clockwise). Preprocessing procedures were identical to those used for the main task. To compute the best visual field angle for each voxel in IPS, we shifted signals from counter-clockwise runs earlier in time by twice the estimated HRF delay (2 × 6.75 s = 13.5 s), then removed the first and last full cycle of data (we removed 22 TRs for all participants except AB, for which we removed 32 TRs), then reversed the time series so that all runs are “clockwise”. We then averaged these time-inverted counter-clockwise runs with clockwise runs. We computed power and phase at the stimulus frequency (1/24.75 Hz or 1/36 Hz, participant AB) and subtracted the estimated HRF delay (6.75 s) to align signal phase in each voxel with visual stimulus position. Finally, we projected maps onto reconstructed cortical surfaces for each subject and defined IPS 0–3 by identifying upper and lower vertical meridian responses (Supplementary Fig. 7a). Low statistical thresholds were used (computed using normalized power at the stimulus frequency) to identify borders of IPS subregions. Voxels were selected for further analysis by thresholding their activation during the same independent localizer task used to functionally define IPS and sPCS.

Encoding model

To measure changes in spatial representations under different task demands, we implemented an encoding model to reconstruct spatial representations of each stimulus that was used in the main task[36] (see also refs. [34,37,38]). This technique assumes that the signal measured in each voxel can be modeled as the weighted sum of different discrete neural populations, or information channels, that have different tuning properties (see ref. [36]). Using an independent set of ‘training’ data, we estimated weights that approximate the degree to which each underlying neural population contributed to the observed BOLD response in each voxel (Fig. 3a). Next, an independent set of ‘test’ data was used to estimate the activation within these information channels based on the activation pattern across all voxels within an ROI on each test trial using the information channel weights in each voxel that were estimated during the training phase (Fig. 3b). This approach requires specifying an explicit model for how neural populations encode information. Here, we assumed a simple model for visual encoding within each ROI that focused exclusively on the spatial selectivity of visually-responsive neural populations. To this end, we built a basis set of 36 2D spatial filters. We modeled these filters as cosine functions raised to a high power: f(r) = (½ cos(rπ/s) + ½)[7] for r < s, 0 elsewhere (Supplementary Fig. 2). This allowed the filters to maintain an approximately-Gaussian shape while reaching 0 at a fixed distance from the center (s°), which helped constrain curvefitting solutions (below). The s (size constant) parameter was fixed at 5r, which is 5.8153°. The 36 identical filters formed a 6×6 grid spanning visual space. Filters were separated by 2.094°, with centers tiled uniformly from 5.234° above, below, left and right of fixation (Fig. 3a). The full-width half-maximum (hereafter, FWHM) of all filters was 2.3103° (Supplementary Figs. 2 and 3). This ratio of filter size to spacing was chosen to avoid high correlations between predicted channel responses (caused by too much overlap between channels, which can result in a rank-deficient design matrix) and to accomplish smooth reconstructions (if filters are too small, reconstructed spatial representations are “patchy”, see Supplementary Fig. 3 for an illustration of reconstruction smoothness as a function of filter size:spacing ratio). All filters were assigned identical FWHMs so that known properties of the visual system, such as increasing receptive field size with eccentricity and along the visual stream[39-41], could be recovered without being built-in to the analysis. To avoid circularity in our analysis, we used a cross-validation approach to compute channel responses on every trial. First, we used all runs but three (1 run of each task condition) to create a ‘training’ set that had an equal number of trials in each condition. Using this training set, we estimated channel weights within each voxel across all task conditions (i.e., runs 1–5 of attend fixation, attend stimulus, and spatial working memory were used together to estimate channel weights, which were used to compute channel responses for run 6 of each task condition). The use of an equal number of trials from each condition in the training set ensures that channel weight estimation is not biased by any changes in BOLD response across task demands. Next, the weights estimated across all task demand conditions were used to compute channel response amplitudes for each trial individually. Trials were then sorted according to their task condition and spatial location. During the training phase, we created a design matrix which contained the predicted channel response for all 36 channels on every trial (Fig. 3a). These predicted channel responses were computed by convolving each basis function with a mask subtending the area over which the stimulus was presented and normalizing the design matrix to 1, such that reconstruction amplitudes are in units of BOLD Z-scores. To extract relevant portions of the BOLD signal on every trial for computing channel responses, we took an average of the signal over 2 TRs beginning 6.75 s after trial onset. This range was chosen by examination of BOLD HRFs and was the same across all participants. Qualitatively, results do not change when other reasonable HRF lags are used, such as using 2 TRs starting 4.5 s post-stimulus. Using this approach, we modeled voxel BOLD responses as a weighted sum of channel responses comprising each voxel[36,38]. This can be written as a general linear model of the form: where B1 is the BOLD response in each voxel measured during every trial (m voxels × n trials), W is a matrix that maps channel space to voxel space (m voxels × k channels), and C1 is a design matrix of predicted channel responses on each trial (k channels × n trials). The weight matrix was estimated by: Then, using data from the held out test data set (), the weight matrix estimated above was used to compute channel responses on every trial (), which were then sorted by task condition and spatial position.

Reconstructing spatial representations

To reconstruct the region-wide representation of the visual stimulus viewed on every trial, we computed a weighted sum of the basis set, using each channel response as the weight for the corresponding basis function (Fig. 3b, bottom right). Reconstructions were computed out to 5.234° eccentricity across the horizontal and vertical meridians, though visual stimuli only subtended at maximum 4.523° eccentricity across the horizontal or vertical meridians. This was done to avoid edge artifacts in the reconstructions. Additionally, at this stage the reconstructed visual fields were shifted to account for the slight jitter introduced in the presented stimulus locations and align reconstructions from all trials. Runs in which stimuli were jittered up and to the left were reconstructed by moving the centers of the basis functions down and to the right, and runs in which stimuli were jittered down and to the right were reconstructed by moving the centers of the basis functions up and to the left. These shifts serve to counter the spatial jitter of stimulus presentation for visualization and quantification. By including spatial jitter during stimulus presentation, we are able to attain a more nuanced estimate of channel weights by sampling 72 stimulus locations rather than 36. We averaged each participant’s reconstructions at all 36 spatial locations for each task condition across trials. For Figure 4, all n = 8 participants’ average reconstructions for each task condition were averaged and reconstructions from all ROIs/task conditions visualized on a common color scale to illustrate differences in channel response amplitude across the different task conditions and spatial locations. The 3×3 grid shown in Figure 4b is chosen as it is representative - results are similar for all quadrants. For the follow-up control experiment, we plotted reconstructed spatial representations from only correct trials by coregistering all representations for trials at matching eccentricities, then averaging across all coregistered representations for each participant at each eccentricity. We coregistered representations for like eccentricities to the top left quadrant (see inset, bottom of Fig. 6b). Representations were rotated in 90° steps and flipped across the diagonal (equivalent to a matrix transpose operation on pixel values) as necessary. Importantly, this analysis depends on two necessary conditions. First, individual voxels must respond to certain spatial positions more than others, although the shape of these spatial selectivity profiles is not constrained to follow any particular distribution (e.g. it need not resemble a Gaussian distribution). Second, the spatial selectivity profile for each voxel must be stable across time, such that spatial selectivity estimated based on data in the training set can generalize to the held-out test set.

Curvefitting

To quantify the effects of attention on visual field reconstructions we fit a basis function to all 36 average reconstructions for each participant for each task condition for each ROI using fminsearch as implemented in MATLAB 2012b (which uses the Nelder-Mead simplex search method; Mathworks, Inc). The error function used for fitting was the sum of squared errors between the reconstructed visual stimulus and the function: where r is computed as the Euclidean distance from the center of the fit function. We allowed baseline (b), amplitude (a), location (x, y), and size (s) to vary as free parameters. The size s was restricted so as not to be too large or too small (confined to 0.5815° < s < 26.17°), and the location was restricted around the region of visual stimulation (x, y lie within stimulus extent borders + 1.36° each side). Due to the number of free parameters in this function, we performed a two-step stochastic curve-fitting procedure to find the approximate best fit function for each reconstructed stimulus. First, we averaged reconstructions for each spatial location across all 3 task conditions and performed 50 fits with random starting points. The fit with the smallest sum squared error was used as the starting point around which all other starting points were randomly drawn when fitting to reconstructions from each task condition individually (same distributions used around these new starting points). When fitting individual task condition reconstructions, we performed 150 fits for each condition. We used parameters from the fit with the smallest sum squared error as a quantitative characterization of the reconstructed visual stimulus. Then, we averaged fit parameters across like eccentricities within each task condition, ROI, and participant. For the follow-up control experiment, we performed an identical fitting procedure on each of the coregistered representations to directly estimate best fit parameters at each eccentricity.

Excluded participant

For one participant (AG3), reconstructions from the attend fixation runs were unusually noisy and could not be well approximated by the basis function used for fitting. However, both attend stimulus and spatial working memory runs exhibited successful reconstructions (Supplementary Fig. 4). Recall that the estimated channel weights used to compute these stimulus reconstructions were identical across the 3 task conditions, so only changes in information coding across task demands could account for this radical shift in reconstruction fidelity. Because this participant’s reconstructions could not be accurately quantified for the attend fixation condition, their reconstructions and fit parameters for all conditions have been left out of data presented in the Results. However, as noted above, data from this participant are consistent with our main conclusion that attentional demands influence the amplitude of spatial representations.

Evaluating the relationship between amplitude and size

It may be the case that our observation of increasing spatial representation size with increasing stimulus eccentricity is purely a result of intertrial variability in the reconstructed stimulus position. That is, the same representation could be jittered across trials, and the resulting average representation across trials would appear “smeared” – and would be fit with a larger size and smaller amplitude. If this were true, changes in these parameters would always be negatively correlated with one another – an increase in size across conditions would always co-occur with a decrease in amplitude. To evaluate this possibility, for each eccentricity and each ROI and each condition pair (attend stimulus & attend fixation, spatial working memory & attend stimulus, and spatial working memory & attend fixation) we correlated the change in size with the change in amplitude (each correlation contained 8 observations, corresponding to n = 8 participants). To evaluate the statistical significance of these correlations, we repeated this procedure 10,000 times, each time shuffling the condition labels separately for size and amplitude, recomputing the difference, and then recomputing the correlation between changes in size and changes in amplitude. This resulted in a null distribution of chance correlation values against which we determined the probability of obtaining the true correlation value by chance. After correction for the false discovery rate, no correlations were significant (FDR, all p > 0.05; and note that FDR is more liberal than Bonferroni correction).

Representations from ventral and dorsal aspects of V2 & V3

For Supplementary Figure 5a, we generated reconstructions using an identical procedure to that used for Figure. 4, except we only used voxels assigned to the dorsal or ventral aspects of V2 & V3 instead of combining voxels across dorsal and ventral aspects as was done in the main analysis.

Stimulus reconstructions - novel stimuli

For Supplementary Figure 5b, we estimated channel weights using all runs of all task conditions from the main task as a ‘training set’. We used these weights to estimate channel responses from BOLD data taken from an entirely novel data set, which consisted of responses to a hemi-annulus shaped radial checkerboard (Supplementary Fig 5b, top row). This new experiment featured 4 stimulus conditions: left-in, left-out, right-in, right-out. Inner hemi-annuli subtended 0.633° to 2.262° eccentricity. Outer hemiannuli subtended 2.262° to 4.523° eccentricity. Stimuli were flickered at 6 Hz for 12 s on each trial while participants performed a spatial working memory task on small probe stimuli presented at different points within the current stimulus. BOLD signal used for reconstruction was taken as the average of 4 TRs beginning 4.5 s after stimulus onset. This data was used as the ‘test set’. Otherwise, the reconstruction process was identical to that for the main experiment, as were all other scan parameters & preprocessing steps.

Population receptive field estimation

To determine whether the spatial sensitivity of each voxel across all trials and all runs changed across conditions we implemented a novel version of a population receptive field analysis (pRF; refs [42,55]). For this analysis, we estimate the unimodal, isotropic pRF which best accounts for BOLD responses to each stimulus position within every single voxel. This analysis is complementary to the primary analyses described above. For 4 participants (those presented in Figs. 6 & 7 and Supplementary Fig. 7) and 4 ROIs for each participant (V1, hV4, hMT+, and IPS0, chosen as this set includes both ROIs with [hV4, hMT+ and IPS0] and without [V1] attentional modulation), we used data across all runs within each task condition and ridge regression[56] to identify pRFs for each voxel under each task condition. We computed these pRFs using a similar method to that used to compute channel weights in the encoding model analysis (Fig. 3a, Online Methods: the univariate step 1 of the encoding model, see equation 1). We generated predicted responses with the same information channels used for the encoding model analysis (Fig. 3a), and reconstructed pRFs for each task condition for a given voxel were defined as the corresponding spatial filters weighted by the computed weight for each channel (Supplementary Fig. 8a). In the main analysis in which we computed spatial reconstructions based on activation patterns across an entire ROI (Figs. 4 & 6c), any spatial information encoded by a voxel’s response could be exploited; this is true even if the voxel’s response to different locations was not unimodal (it need not follow any set distribution, so long as it responds consistently). However, univariate pRFs computed on a voxel-by-voxel basis cannot be well-characterized by an isotropic function if they are not unimodal[57]. Thus, to ensure that most pRFs were sufficiently unimodal to fit an isotropic function, we used ridge regression[56,57] when computing spatial filter weights for the pRF analysis. The regression equation for computing channel weights then becomes: where is an identity matrix (k × k). To identify an optimal ridge parameter (λ) we computed the Bayes information criterion (BIC; ref [58]) value across a range of λ values (0 to 500) for each voxel using data concatenated across all 3 task conditions. This allowed for an unbiased selection of λ with respect to task condition. The λ with the minimum mean BIC value across all voxels within a ROI was selected, and this λ was used to compute channel weights for each of the 3 task conditions separately. An increasing λ value results in greater sparseness of the best-fit channel weights for each voxel, and a λ value of 0 corresponds to ordinary least squares regression. After computing pRFs for each task condition, we fit each pRF with the same function used to fit spatial representations (Online Methods: Curvefitting) using a similar optimization procedure. We restricted fit size (FWHM) to be at greatest 8.08°, which corresponds to nearly the full diagonal distance across the stimulated visual field. This boundary was typically only encountered for hMT+ and IPS0, and served to discourage the optimization procedure from fitting large, flat surfaces. Then, we computed an R value for each fit, and used only voxels for which the minimum R across conditions was greater than or equal to the median of minimum R across conditions from all voxels in that participant’s ROI (Supplementary Fig. 8a–b). Because we only have a single parameter estimate for each condition for each voxel, we evaluated whether fit size is more likely to increase or decrease between each pair of task conditions (attend stimulus vs. attend fixation, spatial working memory vs. attend stimulus and spatial working memory vs. attend fixation) for each region for each participant by determining the percentage of voxels which lie above the unity line in a plot of one condition against another (see Supplementary Fig. 8d).

Simulating data with different pRF properties

In order to assess whether our region-level multivariate spatial representation analysis would be sensitive to changes in voxel-level univariate population receptive fields, we generated simulated data using two different pRF modulation models. For the first model (Supplementary Fig. 10a–b), we randomly generated 500 pRF functions so as to uniformly sample the visual field for each of 2 conditions (Condition A, smaller pRFs, and Condition B, larger pRFs). Across the 2 conditions, each simulated voxel’s pRF maintained its preferred position while its amplitude and baseline were each randomly and independently sampled across conditions from the same normal distribution (amplitude: μ = 0.8513, σ = 0.25; baseline: μ = −0.1952, σ = 0.25; these values were taken from the average fit pRF parameters across all participants for hV4, attend fixation and attend stimulus conditions, Supplementary Fig. 9a). pRF size (FWHM) was sampled from a normal distribution with σ = 0.5 and a mean of μ = 4.405° for Condition A (mean of pRF size for hV4, attend fixation) and μ = 4.89° for Condition B (mean of pRF size for hV4, attend stimulus; an increase of 11%). In our simulation, this resulted in 79% of simulated voxels showing larger pRF sizes in Condition B compared to Condition A. For the second model (Supplementary Fig. 10b), we used the upper median split of fit pRFs for the single participant shown in Supplementary Figure 8c, ROI hV4, to generate simulated BOLD data. This allowed us to simulate region-level BOLD data for each attention condition tested in our experiment, and enabled us to determine whether the changes in univariate voxel-level pRF size we observe (Supplementary Fig. 9) are consistent with the multivariate region-level spatial representations we present in the main text (Figs. 5, 7). After generating voxel-level pRFs using each of the two models described above, we added noise to the simulated weights (Gaussian noise added independently to each channel weight, σ = 0.1) and presented model voxels with 6 runs of all 36 spatial positions for each condition. We simulated each voxel’s BOLD response as the predicted channel response (response of corresponding spatial filter, Fig. 3a) to each stimulus weighted by the corresponding channel weights. We added Gaussian noise to the resulting BOLD data for each simulated voxel independently (σ = 0.1). Then, all analyses of multivariate spatial representations proceeded identically to those described above. We computed spatial representations using estimated channel weights computed across all conditions within a model (i.e. Condition A & Condition B, Supplementary Fig. 10a–b; or attend fixation, attend stimulus, and spatial working memory; Supplementary Fig. 10c–d), then fit average spatial representations with a smooth surface (see Online Methods: Curvefitting) to determine the amplitude and size of each spatial representation. We then averaged these parameters across all 36 positions.

Statistical procedures

All behavioral analyses on accuracy data were performed using a 2-way repeated-measures ANOVA with task condition and stimulus eccentricity modeled as fixed effects (3 levels and 6 levels, respectively, Fig. 2d, Fig. 6a). To assess whether fit parameters to reconstructed spatial representations reliably changed as a function of task demands, we performed a multi-stage permutation testing procedure. This non-parametric procedure was adopted because the spatial filters (basis functions) used to estimate the spatial selectivity of each voxel during the ‘training’ phase (Fig. 3a) overlapped, they were not independent (violating a key assumption of standard statistical tests). For each parameter (rows in Figs. 5 and 7), we first found ROI/parameter combinations which showed an omnibus main effect in a repeated-measures ANOVA (1 factor, 18 levels), corrected using a false discovery rate (FDR) algorithm[59] across all ROIs. Then, we computed F-scores for a 2-way repeated measures design with eccentricity and condition as factors (6 levels and 3 levels, respectively) for ROIs with significant omnibus main effects. For all tests, because we had a relatively small n (n = 8 for Fig. 5, n = 4 for Fig. 7 & Supplementary Fig. 7) and the range of parameters was in some cases restricted to be positive (size), we computed an F distribution for the null hypothesis that there is no main effect of omnibus factor (omnibus test) or that there is no main effect of condition, eccentricity, or their interaction (for a follow-up 2-way test) by shuffling trial labels within each participant 100,000 times. For each data permutation, we computed a new F score for the omnibus test, and for ROI/parameter combinations with a significant omnibus effect, a main effect of condition, eccentricity, and their interaction. P-values were estimated as the probability that the F score computed based on the shuffled data was equal to or greater than the F scores computed using the actual data. These additional tests were corrected for multiple comparisons using Bonferroni’s method within each parameter. We also occasionally highlight trends in the data by reporting p-values which do not reach significance under correction for multiple comparisons at this sample size as “marginal effects”, and such p-values are always reported as “uncorrected” in the text. For display, marginally significant tests are reported on Fig. 5 and Fig. 7 at uncorrected p < 0.025. In addition, we performed a 3-factor repeated-measures ANOVA with ROI, task condition, and eccentricity modeled as fixed effects to determine whether fit parameters changed across ROIs (n = 8, Fig. 5; n = 4, Fig. 7). We implemented the same permutation procedure described above to compute p-values (10,000 iterations). To determine whether pRF size increases at higher eccentricities, we computed a linear fit to a plot of each voxel’s pRF size vs. its pRF eccentricity for each ROI for each condition for each participant (Supplementary Fig. 8c). To determine whether the slope of the fit line was reliably positive for a given ROI/participant/condition, we computed confidence intervals around the best fit slopes using bootstrapping (resampled all voxels with replacement 10,000 times) and the related p value was defined as the as the probability that the slope was ≤ 0. We used a Bonferroni-corrected significance threshold for 48 planned comparisons (4 ROIs × 4 participants × 3 conditions) of α = 0.001 (see Supplementary Results, Supplementary Table 2 for reported statistics). To evaluate the statistical significance of the pRF size increase (Supplementary Fig. 9), we first performed a 2-way repeated-measures ANOVA with ROI and condition modeled as fixed effects and participant modeled as a random effect in which we shuffled ROI and condition labels for each participant and recomputed the percentage of voxels which increased in size across each condition pair. We repeated this shuffling procedure 10,000 times and compared F-scores computed using the real labels to the distribution generated using the shuffled labels, as above. Then, we compared whether each condition pairing resulted in a significant change in pRF size for each ROI by computing a T score testing against the null hypothesis that 50% of voxels show an increase in pRF size. As above, we generated a null T distribution by shuffling condition labels within each participant 10,000 times. For this analysis we used a Bonferroni-corrected significance threshold for 12 (4 ROIs × 3 conditions) planned comparisons of α = 0.0042.

55 in total