Susanne Stoll1, Nonie J Finlayson2, D Samuel Schwarzkopf2. 1. Experimental Psychology, University College London, 26 Bedford Way, London, WC1H 0AP, UK. Electronic address: stollsus@gmail.com. 2. Experimental Psychology, University College London, 26 Bedford Way, London, WC1H 0AP, UK.
Abstract
Our visual system readily groups dynamic fragmented input into global objects. How the brain represents global object perception remains however unclear. To address this question, we recorded brain responses using functional magnetic resonance imaging whilst observers viewed a dynamic bistable stimulus that could either be perceived globally (i.e., as a grouped and coherently moving shape) or locally (i.e., as ungrouped and incoherently moving elements). We further estimated population receptive fields and used these to back-project the brain activity measured during stimulus perception into visual space via a searchlight procedure. Global perception resulted in universal suppression of responses in lower visual cortex accompanied by wide-spread enhancement in higher object-sensitive cortex. However, follow-up experiments indicated that higher object-sensitive cortex is suppressed if global perception lacks shape grouping, and that grouping-related suppression can be diffusely confined to stimulated sites and accompanied by background enhancement once stimulus size is reduced. These results speak to a non-generic involvement of higher object-sensitive cortex in perceptual grouping and point to an enhancement-suppression mechanism mediating the perception of figure and ground.
Our visual system readily groups dynamic fragmented input into global objects. How the brain represents global object perception remains however unclear. To address this question, we recorded brain responses using functional magnetic resonance imaging whilst observers viewed a dynamic bistable stimulus that could either be perceived globally (i.e., as a grouped and coherently moving shape) or locally (i.e., as ungrouped and incoherently moving elements). We further estimated population receptive fields and used these to back-project the brain activity measured during stimulus perception into visual space via a searchlight procedure. Global perception resulted in universal suppression of responses in lower visual cortex accompanied by wide-spread enhancement in higher object-sensitive cortex. However, follow-up experiments indicated that higher object-sensitive cortex is suppressed if global perception lacks shape grouping, and that grouping-related suppression can be diffusely confined to stimulated sites and accompanied by background enhancement once stimulus size is reduced. These results speak to a non-generic involvement of higher object-sensitive cortex in perceptual grouping and point to an enhancement-suppression mechanism mediating the perception of figure and ground.
Keywords:
Functional magnetic resonance imaging; Global perception; Population receptive field; Searchlight back-projection; Visual perceptual grouping; Visual space
Perceptual grouping binds together local image elements into global and coherent objects and segregates them from other objects in our visual field including the background (Roelfsema, 2006; Roelfsema and Houtkamp, 2011). This enables object recognition and tracking even if visual input is fragmented across space and time (Anderson and Sinha, 1997; Anstis and Kim, 2011; Lorenceau and Shiffrar, 1992), such as when we perceive a vehicle passing behind a row of trees. However, despite its ubiquity in everyday life, it remains unclear how global object perception is represented in the visual brain.A plethora of studies in monkeys suggests that information about figure-ground organization is represented in lower and mid-tier visual areas. In particular, neurons in V1 and V4 respond more strongly to tilted elements belonging to a global shape as opposed to the background (Lamme, 1995; Poort et al., 2016, 2012). Likewise, V1 and V4 responses to elements grouped into contours are enhanced, whereas those to ungrouped background elements are suppressed (Chen et al., 2014; Gilad et al., 2013). Taken together, these findings indicate that the monkey visual system draws upon a response amplitude code to mediate figure-ground segregation. The functional relevance of such signatures and whether they are mediated by feedback, feedforward, or lateral connections or a combination thereof, however, remains a matter of active debate and research (e.g., de-Wit et al., 2012; Poort et al., 2016, 2012).Do similar mechanisms exist in humans? Although a series of (early) functional magnetic resonance imaging (fMRI) studies addressed this question (e.g., Altmann et al., 2003; Scholte et al., 2008; Seghier et al., 2000), their analyses techniques often lacked the spatial sensitivity to quantify retinotopically-constrained response amplitude codes. More recently, however, Kok and de Lange (2014) combined standard fMRI recordings and population receptive field (pRF) modeling (Dumoulin and Wandell, 2008) to investigate the topographic profile of V1 and V2 activity to illusory Kanizsa shapes in much greater detail. When compared to non-illusory control stimuli, activity to Kanizsa shapes increased, whereas activity to the illusion-inducing elements decreased, while background activity remained unchanged. This pattern of results has been replicated recently and there is evidence that it might be laminar-specific (Kok et al., 2016). Another topographic fMRI study reported ground suppression in V1 (and also V2) without figure enhancement for structure-from-asynchrony textures vs unstructured control stimuli (Likova and Tyler, 2008). Thus, here too, a response amplitude mechanism emerges in lower visual areas, distinctively labeling multiple objects including the background.The interpretation of these and similar studies is, however, complicated by the fact that changes in perception always went hand in hand with changes in the physical properties of the stimulus. This makes it impossible to determine unequivocally the source of such activity modulations. Bistable stimuli, for which our perception alternates between two mutually exclusive states without changes in the physical properties of the stimulus, provide a way to circumvent this issue.Fang et al. (2008) and Murray et al. (2002) used a very elegant bistable stimulus (Lorenceau and Shiffrar, 1992) allowing for the investigation of perceptual grouping in dynamic occluded scenes where object tracking is often required. In their studies, participants underwent fMRI while viewing a translating diamond stimulus whose corners were occluded by three bars of the same color as the background. This stimulus could either be perceived as four individual segments translating vertically out-of-phase and thus incoherently (local, no-diamond percept; Fig. 1, A.) or as a diamond shape translating horizontally in-phase behind occluders and thus coherently (global, diamond percept; Fig. 1, B., and Supplementary Video S1). When participants experienced the global compared to the local percept, a striking pattern of results was observed: a reduction of activity in V1 (and also V2) accompanied by an increase of activity in the lateral-occipital complex (LOC) – a brain region known to respond more strongly to images of intact objects and shapes than a scrambled version thereof (e.g., Grill-Spector et al., 1998; Malach et al., 1995). Notably, this response pattern has recently been replicated (Grassi et al., 2018).
Fig. 1
Diamond experiment Example frames of the diamond stimulus and potential response amplitude profiles when the global percept is contrasted to the local one. A. Local, no-diamond percept. Here, the diamond stimulus was perceived as four individual segments oscillating vertically and incoherently with the segments on the left/right moving towards/away from one another, respectively, or vice versa (not shown). B. Global, diamond percept. Here, the four segments were grouped together and perceived as a diamond shape oscillating horizontally and coherently behind three occluders. The gray dashed frame denotes the inferred (but occluded) contours during the global state. The white arrows indicate the perceived movement direction of the diamond stimulus. Only in the global state, the perceived and physical movement direction coincided. C. Previously suggested response amplitude profile. The whole visual field is suppressed. D. Hypothesized response amplitude profile. The segments and background region are suppressed whereas the corners and center regions are enhanced. E. Response amplitude profile when the segments and corners region are predicted during the global state. The segments region is suppressed (due to a match between bottom-up input and higher-level feedback), the corners region enhanced (due to a mismatch between bottom-up input and higher-level feedback), and activity in the background and center region unchanged. F. The same as E., but if the whole diamond shape is predicted during the global state. The center region is now also enhanced. Black lines represent the extreme positions of the diamond stimulus. Black solid lines denote the visible ungrouped diamond segments (local, no-diamond percept). Black dashed lines additionally illustrate the inferred but invisible diamond shape when the segments were grouped together (global, diamond percept). White lines denote different visual field portions. Blue areas: Suppressive effects. Red areas: Enhancement effects. Black areas: No effect.
Diamond experiment Example frames of the diamond stimulus and potential response amplitude profiles when the global percept is contrasted to the local one. A. Local, no-diamond percept. Here, the diamond stimulus was perceived as four individual segments oscillating vertically and incoherently with the segments on the left/right moving towards/away from one another, respectively, or vice versa (not shown). B. Global, diamond percept. Here, the four segments were grouped together and perceived as a diamond shape oscillating horizontally and coherently behind three occluders. The gray dashed frame denotes the inferred (but occluded) contours during the global state. The white arrows indicate the perceived movement direction of the diamond stimulus. Only in the global state, the perceived and physical movement direction coincided. C. Previously suggested response amplitude profile. The whole visual field is suppressed. D. Hypothesized response amplitude profile. The segments and background region are suppressed whereas the corners and center regions are enhanced. E. Response amplitude profile when the segments and corners region are predicted during the global state. The segments region is suppressed (due to a match between bottom-up input and higher-level feedback), the corners region enhanced (due to a mismatch between bottom-up input and higher-level feedback), and activity in the background and center region unchanged. F. The same as E., but if the whole diamond shape is predicted during the global state. The center region is now also enhanced. Black lines represent the extreme positions of the diamond stimulus. Black solid lines denote the visible ungrouped diamond segments (local, no-diamond percept). Black dashed lines additionally illustrate the inferred but invisible diamond shape when the segments were grouped together (global, diamond percept). White lines denote different visual field portions. Blue areas: Suppressive effects. Red areas: Enhancement effects. Black areas: No effect.At first sight, such inverse activity modulations reflect exactly the type of relationship proposed by hierarchical predictive coding models (e.g., Clark, 2013; Mumford, 1992; Murray et al., 2004; Rao and Ballard, 1999). These models assume that lower visual areas flag an error whenever the predictive feedback from higher visual areas conflicts with the bottom-up input they receive. The general idea here is that when higher visual areas (e.g., the LOC) arrive at a global and coherent interpretation of a visual stimulus (e.g., the diamond shape behind occluders), the predictability of the bottom-up input is increased and thus the error signal attenuated. When the global diamond percept is then contrasted to the local no-diamond percept, a differential reduction of activity emerges in lower visual areas (e.g., V1).As such, these models predict that the reduction in V1 activity for the global percept should be restricted to the retinotopic representation of the visible diamond segments (Fig. 1, E. and F.). This prediction, however, seems difficult to reconcile with the finding that the suppressive effects in V1 for the diamond vs no-diamond percept extend well beyond stimulated sites (i.e., the visible diamond segments) into the remaining background region (Fig. 1, C.; de-Wit et al., 2012). It is also incompatible with evidence showing that variations of the diamond stimulus result in increased (instead of decreased) V1 activity for the diamond vs the no-diamond percept (Caclin et al., 2012).These discrepant results may be due to the coarse analyses techniques employed previously, precluding a more fine-grained inspection of topographic signatures underlying the perception of the diamond stimulus. The possibility remained, for instance, that V1 activity corresponding to the region within the diamond frame (i.e., the center) and/or the invisible parts (i.e., the occluded corners) increases, whereas activity corresponding to the more peripheral background is suppressed during the diamond state (Fig. 1, D.). de-Wit et al. (2012) considered much of these subareas as background region, although the center and corners region could, arguably, be treated as figure and/or contour regions too. Although this hypothesis argues against hierarchical predictive coding models in a strict sense (e.g., Mumford, 1992; Murray et al., 2004; Rao and Ballard, 1999) because there should be no systematic activity modulations in the peripheral background region (Fig. 1, E. and F.), it is compatible with the more general idea of a response amplitude mechanism labeling different parts of a visual scene distinctively (e.g., Gilad et al., 2013; Kok and de Lange, 2014; Lamme, 1995). Interestingly, such a response pattern has recently been observed for another dynamic bistable global-local stimulus (Anstis and Kim, 2011, Grassi et al., 2017).Here, we combined standard fMRI measurements and pRF modeling (similar to Kok and de Lange, 2014) to test for fine-grained response amplitude mechanisms mediating global object perception. In a first experiment, we mapped the retinotopic organization of participants’ cortices and estimated the pRF of each voxel in visual cortex. In three further experiments, we recorded brain activity whilst participants viewed the diamond stimulus or a set of non-ambiguous stimuli with similar motion features but stable shape information to test for the generalizability of our findings. We then used each voxel’s pRF to back-project the voxel-wise brain activity measured during stimulus perception into visual space via a searchlight procedure. This allowed us to directly read out retinotopically-specific response amplitude codes along a large portion of the visual hierarchy.
Retinotopic mapping experiment
Methods
Participants
All participants ( = 11) of the three global object perception experiments took part in the retinotopic mapping experiment. We refer to these participants as P1–P11. They all had normal or corrected-to-normal visual acuity and gave written informed consent to partake in our experiments (see 3.1.1., 4.1.1., and 5.1.1. Participants for more details). If participants took already part in the retinotopic mapping experiment in the scope of another study in our laboratory, we reused these data. All experimental procedures were approved by the University College London Research Ethics Committee.
Apparatus
Functional and anatomical images were collected using a Siemens Avanto 1.5 T magnetic resonance imaging (MRI) scanner. To prevent obstructed view, we used a customized version of the standard 32 channel coil, where the front visor was removed, reducing the number of channels to 30. For one participant (P2), however, the structural images were acquired with the standard 32 channel coil. Key presses were recorded via an MRI-button box for right-handers. Stimuli were projected onto a screen (resolution: 1920 1080 pixels; refresh rate: 60 Hz; background color: gray) at the back of the MRI scanner bore and viewed via a head-mounted mirror (viewing distance: approximately 67–68 cm; stimulus dimensions are based on the latter value; note that the variance in exact head/eye position is typically greater than this range). A list of software and toolboxes used in all experiments can be found in Supplementary Table S1.
Stimuli
The retinotopic mapping stimulus consisted of a simultaneous wedge-and-ring aperture (Supplementary Fig. S1 and Supplementary Video S2) centered within a screen-bounded rectangle in background gray. The wedge aperture was a sector (polar angle: 12°) of a disk (diameter: 17.03 dva), moving clockwise or counterclockwise in 60 discrete steps during 1 cycle (1 step/s). Consecutive wedges overlapped by 50%. The ring aperture consisted of an expanding or contracting annulus whose diameters varied in 36 logarithmic steps during 1 cycle (1 step/s). The diameter of the inner circle (minimum: 0.48 dva) was 56–58% of that of the outer circle (maximum: 40.38 dva, extending beyond the screen dimensions). The diameter of any current circle (outer or inner) was 10–11% larger/smaller compared to the previous one.The wedge-and-ring aperture was superimposed onto circular images (diameter: 17.03 dva) depicting intact natural and colorful scenes/objects or a phase-scrambled version thereof ( = 456). The images and the wedge-and-ring aperture were centered around a central black fixation dot (diameter: 0.13 dva) that was superimposed onto a central disk (diameter: 0.38 dva). Within the resulting annulus surrounding the fixation dot, the opacity level of the gray background increased radially inwards in 12 equal steps (step size: 0.02 dva) from fully transparent (α = 0%) to fully opaque (α = 100%).To support fixation compliance, a black polar grid (line width: 0.02 dva) at low opacity (α = 10.2%) centered around the fixation dot was superimposed onto the screen. The polar grid consisted of 10 circles whose diameters were evenly spaced between 0.38 and 27.35 dva, and 12 radial lines evenly spaced between polar angles of 0° and 330°. The radial lines extended from an eccentricity of 0.13–15.14 dva.
Procedure
The retinotopic mapping experiment consisted of 3 runs. Excluding the initial dummy interval (10 s; fixation dot and polar grid only), each run comprised 4 blocks. At the beginning of each block, the wedge-and-ring aperture was presented (90 s; 1.5 cycles of wedge rotation; 2.5 cycles of ring expansion/contraction), followed by a fixation interval (30 s; fixation dot and polar grid only).The order of wedge and ring movement in each run was clockwise and expanding (block 1), clockwise and contracting (block 2), counterclockwise and expanding (block 3), or counterclockwise and contracting (block 4). Within each block, the type of carrier image (intact or phase-scrambled) alternated every 15 s with the first carrier image always being phase-scrambled in odd-numbered blocks and intact in even-numbered blocks. The carrier images themselves were switched every 500 ms and displayed 1–2 times in pseudorandomized order during each run. To avoid confounds due to the spatial distribution of low-level features, the images were always rotated with the orientation of the wedge aperture.Participants had to fixate the fixation dot continuously and press a key whenever the dot turned red. Every 200 ms, with a probability of 0.03, the fixation dot underwent a randomized change in color for 200 ms (from black to red, green, blue, cyan, magenta, yellow, white, or remaining black). To also ensure attention on the wedge-and-ring aperture, participants were required to press a key whenever a Tartan image appeared. Due to technical issues, for one participant (P3), the last 10 vols (part of the final 30 s fixation interval) were not acquired in one run. To account for this, we also eliminated the last 10 vols in the remaining two runs for this participant before submitting the functional data to our preprocessing procedure.
MRI acquisition
Functional images were acquired with a T2∗-weighted multiband 2D echo-planar imaging sequence (Breuer et al., 2005) from 36 transverse slices centered on the occipital cortex (repetition time, TR = 1 s, echo time, TE = 55 ms, voxel size = 2.3 mm isotropic, flip angle = 75°, field of view, FoV = 224 mm 224 mm, no gap, matrix size: 96 96, acceleration = 4). Slices were oriented to be approximately parallel to the calcarine sulcus while ensuring adequate coverage of the ventral occipital and inferior parietal cortex. Anatomical images were acquired with a T1-weighted magnetization-prepared rapid acquisition with gradient echo (MPRAGE) sequence (TR = 2.73 s, TE = 3.57 ms, voxel size = 1 mm isotropic, flip angle = 7°, FoV = 256 mm 224 mm, matrix size = 256 224, 176 sagittal slices).
Preprocessing
After removing the first 10 vols of each run to allow for T1-related signals to reach equilibrium, functional images were bias-corrected for intensity inhomogeneity, realigned, unwarped, and coregistered to the anatomical image. The anatomical image was used to construct a surface model, onto which the preprocessed functional data were projected. For each vertex in the surface mesh, we created an fMRI time series in each run by identifying the voxel in the functional images that fell half-way between the vertex coordinates in the gray-white matter and the pial surface. Finally, each time series was linearly detrended and z-standardized.
Data analysis
PRF estimation
The preprocessed time series for each vertex were averaged across runs. To estimate the pRF for each vertex, we then implemented a forward-modeling approach restricted to the posterior third of the cortex. Each pRF was modeled as a 2D isotropic Gaussian with four free parameters: x, y, σ, and β, where x and y denote the pRF center position in Cartesian coordinates relative to fixation, σ the size of the pRF, and β the amplitude of the signal. The pRF center position and size were expressed in dva. The estimation procedure was identical to our previous studies (Moutsiana et al., 2016; van Dijk et al., 2016). The resulting parameter maps were modestly smoothed with a spherical Gaussian kernel (FWHM = 3 mm; for experiment-specific smoothing procedures of pRF and response data, see 3.1.7. Data analysis). Note that vertices with a very poor goodness-of-fit (
.01) and/or artifacts (σ or 0) were removed prior to smoothing.
Delineation of visual areas
Using the smoothed color-coded maps for eccentricity and polar angle projected onto the surface model of each hemisphere, we manually delineated V1–V3, V3A, V3B, LO-1, LO-2 (see all Wandell et al., 2007), V4, VO-1, and VO-2 (see all Winawer and Witthoft, 2015). Polar angle reversals served as a primary indicator for identifying boundaries between visual areas (Engel et al., 1997; Sereno et al., 1995). Example maps used for back-projection purposes (see 3.1.7. Data analysis) including delineations can be found in Supplementary Fig. S1 (C. and D.).For all data analyses, the quarterfield delineations of each hemisphere were merged and areas V3B, LO-1, LO-2, VO-1, and VO-2 combined into a larger complex we label the ventral-and-lateral occipital complex (VLOC). These subareas tended to show increased activation for intact vs phase-scrambled images (Supplementary Fig. S1, E.), ensuring the functional validity of the VLOC as an object-sensitive complex. To this end, we performed a voxel-wise general linear model (GLM) for each participant on the preprocessed fMRI data from the retinotopic mapping experiment. The GLM comprised a constant boxcar regressor for each carrier type (intact vs phase-scrambled), convolved with a canonical hemodynamic response function. The fixation intervals were modeled implicitly and the obtained realignment estimates used as nuisance regressors. We applied Restricted Maximum Likelihood estimation with a first order autoregressive model, a high-pass filter (HPF) of 155 s, and implicit masking (threshold: 0.8). The voxel-wise differential beta values resulting from the GLM were then projected onto the surface model and smoothed moderately with a spherical Gaussian kernel (FWHM = 3 mm). Note that values flagged by implicit masking were discarded from smoothing and any subsequent visualizations. Similar functional localization procedures were applied previously to localize the LOC (e.g., de-Wit et al., 2012; Fang et al., 2008; Grill-Spector et al., 1998), which does typically not fully include the VO subareas and is not based on retinotopic principles.3 We thus refrained from labeling our complex ‘LOC’.Importantly, compared to V1–V3, the subareas of the VLOC are smaller with fewer vertices and a sparser distribution of pRFs around the vertical meridian and the peripheral visual field (Amano et al., 2009; Larsson and Heeger, 2006). Combining these areas into the VLOC thus ensured a more complete coverage of the visual field in each participant, which was the basis for subsequent data analyses. Nonetheless, although less reliable, we provide exploratory searchlight back-projections (see 3.1.7. Data analysis for details) for a combination of these subareas in Supplementary Fig. S5, Fig. S9, and Fig. S13.
Diamond experiment
Five healthy participants (P1–P5; 1 male; age range: 20–37 years; all right-handed), including the authors DSS and SS, took part in the diamond experiment.Apart from the apparatus of the retinotopic mapping experiment, we used an EyeLink 1000 MRI compatible eye tracker system to record eye movement data of participants’ left eye.The bistable diamond stimulus (similar to de-Wit et al., 2012; Fang et al., 2008) comprised a black rhombus-shaped frame (size: 7.92 7.92 dva; line width: 0.16) located around a white central fixation dot (diameter: 0.16 dva). Three vertical rectangles displayed in background color occluded the corners of the diamond stimulus. The middle rectangle (size: 3.75 17.03 dva) was centered around the fixation dot. The left and right rectangles (size: 22.84 17.03 dva, respectively) were centered vertically with their vertical line of symmetry coinciding with the left and right edges of the screen, so that the visible segments of the diamond had a length of 2.61 dva. When the diamond stimulus was centered around fixation, its corners were located at 5.6 dva eccentricity. The movement of the diamond followed a horizontal sine wave (A = 1.29 dva, f = 0.5 Hz, ω = 3.14, φ = 0).The diamond display evoked two alternating and mutually exclusive perceptual states: a local percept of four individual segments translating vertically out-of-phase and thus incoherently (no-diamond; Fig. 1, A.) or a global percept of an inferred diamond shape translating horizontally in-phase behind three occluders and thus coherently (diamond; see all Fig. 1, B., and Supplementary Video S1).The diamond experiment comprised 1 practice run (not analyzed) and 5 experimental runs. Experimental runs started with a background-only dummy interval (10 s). Next, an initial fixation interval (15 s) was presented, followed by the diamond display (400 s) and a final fixation interval (15 s). Except for the dummy interval, the fixation dot was continuously presented.Participants were required to fixate the fixation dot continuously. During the diamond interval, they indicated their current percept via pressing a key assigned to their right index finger (diamond) or right middle finger (no-diamond). Except for the first percept in any given run, participants had to indicate perceptual switches only, but were allowed to press any key again if they lost track. During each run, participants’ eye position and pupil size were recorded at 60 Hz (downsampled). Prior to scanning, all participants were tested behaviourally in a separate session outside the scanner to ensure they could clearly perceive both perceptual states and spent a roughly equal amount of time in either. Three recruited participants were unable to do so and hence replaced.Functional images were acquired with the same sequence as in the retinotopic mapping experiment.The preprocessing was identical to the retinotopic mapping experiment using the same structural image.
Searchlight back-projections
To explore potential response amplitude mechanisms mediating global object perception, we first performed a voxel-wise GLM on the preprocessed data (HPF: 128 s). We used a variable epoch boxcar regressor (Grinband et al., 2008) for each perceptual state (diamond or no-diamond) as well as the period from the onset of the diamond display until participants’ first key press. The variable epochs for each perceptual state were the same as in the analysis of perceptual durations (see Supplementary material, 1.1.1. Data analysis). In all other respects (e.g. estimation procedure and nuisance regressors), the GLM was identical to the one specified for the retinotopic mapping experiment.We computed the following contrasts of interest: diamond vs fixation, no-diamond vs fixation, and diamond vs no-diamond. The first two contrasts allowed us to verify the validity of our searchlight back-projection approach. Based on previous research on the positive and negative BOLD signal (Fracasso et al., 2018; Goense et al., 2012; Shmuel et al., 2002, 2006), we expected an increase of activity in the area within which the visible diamond segments moved and a decrease in non-stimulated sites, especially in lower visual areas (V1/V2), where pRF size is small (e.g., Alvarez et al., 2015; Amano et al., 2009; Dumoulin and Wandell, 2008; van Dijk et al., 2016). The contrast diamond vs no-diamond corresponded to analyses applied in previous work involving the diamond stimulus (e.g., de-Wit et al., 2012; Fang et al., 2008). Based on the study by Fang et al. (2008) and de-Wit et al. (2012), we expected decreased activity in the area within which the diamond segments moved. However, we had no clear expectations as to how the remaining visual field would behave due to the coarser analyses techniques applied previously (de-Wit et al., 2012), evidence from figure-ground studies (Chen et al., 2014; Gilad et al., 2013; Kok and de Lange, 2014; Lamme, 1995; Likova and Tyler, 2008; Poort et al., 2012, 2016), and findings showing increased activity for the diamond vs no-diamond percept (Caclin et al., 2012).The voxel-wise differential beta values from the GLM were subsequently projected onto the surface model. Both the raw pRF data and the differential beta estimates were then modestly smoothed in an identical fashion using a spherical Gaussian kernel (FWHM = 3 mm). Vertices whose pRF estimates showed a very poor goodness-of-fit ( .01) or artifacts (σ or 0) were removed prior to smoothing. Vertices flagged by implicit masking were likewise discarded from smoothing as well as any subsequent analyses. We then used the delineations for each visual area and hemisphere from the retinotopic mapping experiment to extract pRF estimates and differential beta estimates of vertices falling within their spatial extent and pooled them across hemispheres for each participant. Vertices whose pRF estimates showed a poor goodness-of-fit ( .05) and/or eccentricities outside the stimulated retinotopic mapping area ( 8.5 dva) were discarded.Subsequently, we defined a mesh grid (size: 17 17 dva) covering the stimulated retinotopic mapping area. The grid point coordinates were separated from one another by 0.1 dva in both the horizontal and vertical dimension (range: −8.5-8.5 dva, respectively). Next, a circular searchlight (radius: 1 dva) was passed through visual space (restricted to a maximal eccentricity of 8.5 dva 0.1 dva to not miss any vertices) by translating its center point from one grid point to the next (for a similar approach in brain space, see Kriegeskorte et al., 2006). All vertices whose pRF center position fell into a given searchlight at a particular location were then identified. The differential beta estimates corresponding to the set of vertices within a given searchlight were summarized as a t-statistic by performing a one-sample t-test against 0. This way, we were able to account for the different numbers of vertices in each searchlight. T-statistics based on a single vertex/no vertices were set to 0. Importantly, t-statistics were only used as a descriptive measure here. Of note, this searchlight procedure automatically normalizes the input data into a standard space as defined by the mesh grid.For the vertices within a given searchlight, we derived the inverse Euclidean distance of their pRF center position from the respective searchlight center, normalized by the searchlight radius. These normalized vertex-wise weights were summed up searchlight-wise, resulting in summary weights where higher values reflect a higher number of vertices within a given searchlight as well as vertices with a pRF center position closer to the searchlight center. The summary weights were then normalized via dividing them by the percentile of the resulting distribution of summary weights. Normalized summary weights 1 were set to 1. Summary weights based on a single vertex were set to 0. Using the grid point coordinates, the resulting t-statistic maps were visualized as a heatmap. The color saturation of the heatmap was calibrated using the normalized summary weights, so that a higher saturation reflected a higher normalized summary weight.The searchlight back-projections were obtained for each visual area and contrast of interest by pooling the data from all participants (after participant-wise smoothing). The pooling of data across participants improved the precision of searchlight back-projections because vertices from different participants complemented one another and covered the visual field more completely. Due to insufficient visual field coverage in V3A and particularly V4 in some participants, we excluded these areas from the searchlight and all subsequent analyses. Nonetheless, exploratory back-projections for V3A are listed in Supplementary Fig. S5.
Representational similarity of searchlight back-projections
To explore the impact of each participant’s data set on the pooled searchlight back-projections, we performed a representational similarity analysis (Kriegeskorte, 2008). To this end, we first conducted a leave-one-subject-out (LOSO) analysis by repeating the searchlight back-projections analysis whilst iteratively leaving out one participant. We then determined the dissimilarity (1-Spearman correlation) between the LOSO and the pooled back-projection matrices. Moreover, to assess the similarity structure more comprehensively, we also determined the dissimilarity between the individual (i.e., participant-wise) and the LOSO or pooled back-projections matrices. Importantly, for each back-projection pair, t-statistics based on a single vertex/no vertices were removed from both matrices prior to calculating the dissimilarity measure.To visually summarize the dissimilarity structure, the resulting square matrices of dissimilarities (with zeros along the diagonal) were projected onto a 2D ordination space via classical (metric) multidimensional scaling (cMDS; Gower, 1966; Mardia, 1978). The lower the dissimilarity between two back-projection matrices, the closer they should be located in the 2D ordination space. Accordingly, if the pooled back-projections are representative of the whole study sample, the LOSO and individual back-projections should tightly cluster around or coincide with them.
Results
Searchlight back-projections
Fig. 2 depicts the searchlight back-projections for the pooled data per visual area and contrast of interest. When comparing the diamond or no-diamond percept to fixation, activity increased in the area within which the visible diamond segments moved. This pattern was fairly focal in V1 with suppressed differential activity in non-stimulated sites, but became more diffuse in V2, V3, and the VLOC.
Fig. 2
Diamond experiment Searchlight back-projections of differential brain activity as a function of contrast of interest and visual area. T-statistics surpassing a value of ±25 (first and second row) or ± 15 (third row) were set to that value. The saturation of colors reflects the number of vertices with a pRF inside a given searchlight plus the inverse distance of these pRFs from the searchlight center. White lines represent the extreme positions of the diamond stimulus. White solid lines denote the visible ungrouped diamond segments. White dashed lines additionally illustrate the inferred but invisible diamond shape when the segments were grouped together. D = Global, diamond percept. ND = Local, no-diamond percept. Fix = Fixation baseline. VLOC = Ventral-and-lateral occipital complex. Pooled = Data pooled across all 5 participants. pRF = Population receptive field.
Diamond experiment Searchlight back-projections of differential brain activity as a function of contrast of interest and visual area. T-statistics surpassing a value of ±25 (first and second row) or ± 15 (third row) were set to that value. The saturation of colors reflects the number of vertices with a pRF inside a given searchlight plus the inverse distance of these pRFs from the searchlight center. White lines represent the extreme positions of the diamond stimulus. White solid lines denote the visible ungrouped diamond segments. White dashed lines additionally illustrate the inferred but invisible diamond shape when the segments were grouped together. D = Global, diamond percept. ND = Local, no-diamond percept. Fix = Fixation baseline. VLOC = Ventral-and-lateral occipital complex. Pooled = Data pooled across all 5 participants. pRF = Population receptive field.For the contrast diamond vs no-diamond, we observed a wide-spread suppression of activity in V1, particularly along the horizontal meridian. Although V2 and V3 showed similar suppressive effects, these were less extensive and intermixed with distinct opposite effects. There was also no clear indication of a suppression streak along the horizontal meridian. Finally, unlike V1–V3, the contrast diamond vs no-diamond showed a wide-spread increase of activity in the VLOC.
Representational similarity of searchlight back-projections
Fig. 3 depicts the cMDS solution for dissimilarities calculated between the individual (Supplementary Fig. S4), pooled, and LOSO searchlight back-projections, separately for each contrast of interest and visual area. The corresponding representational dissimilarity matrices can be found in Supplementary Fig. S6.
Fig. 3
Diamond experiment Classical (metric) multidimensional scaling of the dissimilarities in Supplementary Fig. S6 as a function of contrast of interest and visual area. D = Global, diamond percept. ND = Local, no-diamond percept. Fix = Fixation baseline. VLOC = Ventral-and-lateral occipital complex. P1–P5 = Participant 1–5. Pooled = Data pooled across all 5 participants. Pooled-P1-Pooled-P5 = Data pooled across 4 participants with 1 participant left out (as indicated by the suffix).
Diamond experiment Classical (metric) multidimensional scaling of the dissimilarities in Supplementary Fig. S6 as a function of contrast of interest and visual area. D = Global, diamond percept. ND = Local, no-diamond percept. Fix = Fixation baseline. VLOC = Ventral-and-lateral occipital complex. P1–P5 = Participant 1–5. Pooled = Data pooled across all 5 participants. Pooled-P1-Pooled-P5 = Data pooled across 4 participants with 1 participant left out (as indicated by the suffix).For all visual areas and contrasts, the LOSO back-projections essentially coincided with the pooled back-projections, highlighting a low degree of dissimilarity. Thus, the pooled back-projections do not seem to be driven by a single participant. The individual back-projections clustered around the pooled ones, but less tightly than the LOSO back-projections, suggesting a higher degree of dissimilarity. Strikingly, for the contrast diamond vs no-diamond in V1 and V2, the back-projection pattern for P5 was located far away from the remaining ones, indicating a high degree of dissimilarity (see all Fig. 3). Indeed, when examining the representational dissimilarity matrices directly (Supplementary Fig. S6), it becomes evident that the back-projections for P5 in V1 and V2 showed a pattern largely opposite to the other participants (see also Supplementary Fig. S4).
Discussion
Here, we explored response amplitude mechanisms in human visual cortex underlying global object perception. Participants viewed a bistable diamond stimulus that was either perceived as four individual segments moving vertically and incoherently (local, no-diamond percept) or a diamond shape drifting horizontally and coherently behind occluders (global, diamond percept).When contrasting either the diamond or no-diamond percept to fixation, our searchlight back-projections revealed enhanced activity in cortical sites stimulated by the visible diamond segments. This differential increase was concise in V1 along with reduced activity in non-stimulated sites, but became more wide-spread in V2, V3, and the VLOC. We therefore replicate previous work on stimulus-evoked retinotopic activation and background suppression in visual cortex (Fracasso et al., 2018; Goense et al., 2012; Shmuel et al., 2002, 2006). Our findings furthermore comply with predictions based on between-area differences in pRF size (Alvarez et al., 2015; Amano et al., 2009; Dumoulin and Wandell, 2008; van Dijk et al., 2016). Specifically, given that pRF size is larger in higher visual areas, there is a greater number of peripherally located pRFs encoding the visible diamond segments, resulting in a more diffuse topographic representation. In sum, these results confirm our expectations and validate our searchlight back-projection approach.When we directly compared the diamond to the no-diamond percept, our searchlight analysis indicated a large-scale suppression of activity in V1 along with tendentially less extensive suppressive effects in V2 and V3. This global dampening effect speaks against the idea of a response amplitude mechanism in lower visual cortex labeling different portions of the diamond display distinctively to mediate global object perception (Chen et al., 2014; Gilad et al., 2013; Grassi et al., 2017; Kok and de Lange, 2014; Lamme, 1995; Likova and Tyler, 2008; Poort et al., 2012, 2016). Critically, however, it echoes prior reports of retinotopically-unspecific deactivation during the diamond vs no-diamond percept and an attenuation of these effects in V2/V3 (de-Wit et al., 2012).In contrast, there was a wide-spread enhancement of activity in the VLOC for the diamond compared to the no-diamond percept. This finding mirrors previous studies on the diamond stimulus identifying the LOC as a source for modulatory feedback in lower visual areas (Fang et al., 2008; Murray et al., 2002). This idea is corroborated by a large body of work highlighting the sensitivity of LOC responses to global shape and intact objects even under occlusion conditions (Grill-Spector et al., 1999; Hegdé et al., 2008; Lerner et al., 2002, 2004; Malach et al., 1995; Vinberg and Grill-Spector, 2008). Moreover, given that visual stimulation was identical in the diamond and no-diamond percept, the universal deactivation we observed in lower visual cortex cannot be attributed to physical stimulus differences (Dumoulin and Hess, 2006) and was thus likely subject to top-down modulation.However, it is unclear whether the inverse response patterns in the VLOC/LOC and lower visual cortex we and others (Fang et al., 2008; Grassi et al., 2018; Murray et al., 2002) quantified can be regarded as a generic perceptual grouping mechanism operating irrespective of shape perception. Recent evidence suggests, for instance, that activity in the LOC also decreases for intact vs scattered objects with abolished inter-part relations (Margalit et al., 2017) as it is the case during the no-diamond percept. In order to address this question, our third experiment used a non-ambiguous stimulus consisting of four circular apertures, each carrying a random dot kinematogram (RDK). In the local condition, the RDKs translated vertically and incoherently. In the global condition, however, they moved horizontally and coherently and could thus be grouped together without forming a hybrid shape. These conditions closely echoed the motion features of the diamond stimulus whilst keeping shape information (i.e., the four circular apertures) constant and allowing for perceptual grouping. If the inverse pattern between the VLOC/LOC and lower visual cortex indeed constitutes a generic grouping mechanism, we should be able to conceptually replicate the findings from our diamond experiment.
Dots experiment
The authors DSS and SS as well as 3 other healthy participants (P1, P2 and P6–P8; 1 male; age range: 24–38 years; 1 left-handed) partook in this experiment.All apparatus were identical to the diamond experiment although the viewing distance to the head-mounted mirror was approximately 67 cm here as this facilitated the use of the eye tracker.The dots stimulus comprised four circular apertures through which a random dot kinematogram (RDK), that is, a field (size: 2.85 2.85 dva) of moving black dots (diameter: 0.11 dva) was presented. The apertures were generated by removing all dots falling outside or on the edge of a circle (diameter: 2.85 dva) centered within the dots field. The aperture centers were positioned at the corners of a square (size: 5.69 5.69 dva) centered around a white central fixation dot (diameter: 0.16 dva). The dots of each aperture had a density of 12.33 dots/dva2. All dots had a lifetime of 9 frames and were repositioned randomly within their field once they died. If the dots moved beyond the edge of their field, they were moved back by 1 field width. The position of a given dot at the beginning of each block was determined randomly as was the time a dot had already lived.In the global horizontal condition, the dots in all apertures moved synchronously according to a horizontal sine wave (A = 1.31 dva, f = 0.5 Hz, ω = 3.14, φ = 0; Fig. 4, B.). In the local vertical condition, they followed an identical but vertical sine wave with the dots in the bottom-right and top-left apertures moving anti-synchronously ( = 0) relative to the dots in the top-right and bottom-left apertures ( = π; Fig. 4, A., and Supplementary Video S3). The horizontal condition mimicked the perceived movement during the global diamond percept and enabled participants to group the 4 apertures together through the Gestalt principle of common fate similar to the diamond stimulus. The vertical condition mirrored the perceived movement during the local no-diamond percept. Notably, the number of apertures and shape information remained the same in both conditions.
Fig. 4
Dots experiment Example frames of the dots stimulus. A. Local, vertical condition. Here, the dots oscillated vertically and incoherently with the dots in the left/right apertures moving towards/away from one another, respectively, or vice versa (not shown), so that the apertures were perceived as four individual elements. B. Global, horizontal condition. Here, the dots in all apertures oscillated horizontally and coherently, so that the apertures could be grouped together into a global Gestalt without forming a hybrid shape. Since this stimulus was non-ambiguous, the white arrows naturally indicate the perceived and physical movement direction of the dots within the aperture.
Dots experiment Example frames of the dots stimulus. A. Local, vertical condition. Here, the dots oscillated vertically and incoherently with the dots in the left/right apertures moving towards/away from one another, respectively, or vice versa (not shown), so that the apertures were perceived as four individual elements. B. Global, horizontal condition. Here, the dots in all apertures oscillated horizontally and coherently, so that the apertures could be grouped together into a global Gestalt without forming a hybrid shape. Since this stimulus was non-ambiguous, the white arrows naturally indicate the perceived and physical movement direction of the dots within the aperture.The dots experiment comprised 8 experimental runs. Excluding the initial dummy interval (10 s without fixation dot), each run was split into 8 blocks. Within each block, a fixation interval (15 s) was presented followed by the dots stimulus (30 s) in either the vertical or horizontal condition. Within each run, the horizontal and vertical conditions were presented in an alternating fashion, starting with the vertical condition in uneven-numbered and the horizontal condition in even-numbered runs. At the end of each run, a final fixation interval (15 s) was displayed.Participants were required to fixate the fixation dot continuously. In the dots interval, they indicated whenever the dots in one of the circular apertures flickered shortly (by changing their color to background gray for 200 ms) via pressing a key with their right index finger (left apertures) or right middle finger (right apertures). The number of flicker events per block was determined randomly but was always 3, 6, or 9 with a gap of at least 200 ms between consecutive flicker events. The aperture within which the flicker events occurred was determined randomly. Participants, eye position and pupil size were recorded during all but the final fixation interval at 60 Hz (downsampled).The MRI acquisition was as in the retinotopic mapping and diamond experiment.The preprocessing was identical to the retinotopic mapping and diamond experiment. It is of note, however, that P7 moved more than other participants during the dots experiment. Moreover, for this participant, coregistration in the retinotopic experiment was also less ideal than for others. It is thus important to perform any analyses with and without this participant.The searchlight back-projection analysis was conducted in the same manner as in the diamond experiment with exceptions as follows. The voxel-wise GLM on the preprocessed data (HPF: 185 s) involved a constant epoch boxcar regressor for each condition (horizontal or vertical) and an event-related regressor for the onset of the flicker events. We calculated the following contrasts of interest: horizontal vs fixation, vertical vs fixation, and horizontal vs vertical. The contrasts horizontal or vertical vs fixation were equivalent to the contrasts diamond or no-diamond vs fixation, respectively. The contrast horizontal vs vertical mirrored the contrast diamond vs no-diamond. Exploratory back-projections for V3A can be found in Supplementary Fig. S9.The representational similarity analysis was conducted as in the diamond experiment.Fig. 5 shows the searchlight back-projection profiles pooled across participants for each visual area and contrast of interest. When comparing the horizontal or vertical condition to fixation, there was enhanced activity in areas carrying the RDKs. This pattern was spatially relatively precise in V1 with suppressive effects in the central and peripheral visual field, and became more wide-spread in V2, V3, and the VLOC.
Fig. 5
Dots experiment Searchlight back-projections of differential brain activity as a function of contrast of interest and visual area. T-statistics surpassing a value of ±35 (first and second row) or ± 25 (third row) were set to that value. The saturation of colors reflects the number of vertices with a pRF inside a given searchlight plus the inverse distance of these pRFs from the searchlight center. White lines represent the spatial extent of the circular apertures carrying the RDK. H = Global, horizontal condition. V = Local, vertical condition. Fix = Fixation baseline. VLOC = Ventral-and-lateral occipital complex. Pooled = Data pooled across all 5 participants. RDK = Random dot kinematogram. pRF = Population receptive field.
Dots experiment Searchlight back-projections of differential brain activity as a function of contrast of interest and visual area. T-statistics surpassing a value of ±35 (first and second row) or ± 25 (third row) were set to that value. The saturation of colors reflects the number of vertices with a pRF inside a given searchlight plus the inverse distance of these pRFs from the searchlight center. White lines represent the spatial extent of the circular apertures carrying the RDK. H = Global, horizontal condition. V = Local, vertical condition. Fix = Fixation baseline. VLOC = Ventral-and-lateral occipital complex. Pooled = Data pooled across all 5 participants. RDK = Random dot kinematogram. pRF = Population receptive field.For the direct comparison between the horizontal and vertical condition, we observed a fairly wide-spread deactivation across the whole visual field in all visual areas, occasionally intermixed with fairly focal opposite effects. These diffuse suppressive effects were particularly eminent around the central visual field and stimulated areas but not in the background area.Fig. 6 illustrates the cMDS solution for the dissimilarities between the individual (Supplementary Fig. S8), pooled, and LOSO searchlight back-projections per contrast of interest and visual area. Supplementary Fig. S10 shows the corresponding representational dissimilarity matrices.
Fig. 6
Dots experiment Classical (metric) multidimensional scaling of the dissimilarities in Supplementary Fig. S10 as a function of contrast of interest and visual area. H = Global, horizontal condition. V = Local, vertical condition. Fix = Fixation baseline. VLOC = Ventral-and-lateral occipital complex. P1–P2 and P6–P8 = Participant 1–2 and 6–8. Pooled = Data pooled across all 5 participants. Pooled-P1-Pooled-P2 and Pooled-P6-Pooled-P8 = Data pooled across 4 participants with 1 participant left out (as indicated by the suffix).
Dots experiment Classical (metric) multidimensional scaling of the dissimilarities in Supplementary Fig. S10 as a function of contrast of interest and visual area. H = Global, horizontal condition. V = Local, vertical condition. Fix = Fixation baseline. VLOC = Ventral-and-lateral occipital complex. P1–P2 and P6–P8 = Participant 1–2 and 6–8. Pooled = Data pooled across all 5 participants. Pooled-P1-Pooled-P2 and Pooled-P6-Pooled-P8 = Data pooled across 4 participants with 1 participant left out (as indicated by the suffix).The LOSO back-projections generally accorded well with the pooled ones, highlighting a low degree of dissimilarity. As such, the pooled back-projections do not seem to be driven by single participants including P7 who moved more than other participants and for whom coregistration was difficult. The individual back-projections clustered circularly around the pooled ones, albeit less closely than the LOSO back-projections, indicating a higher degree of dissimilarity. This was particularly eminent for the contrast horizontal vs vertical in V1, V2, and the VLOC (see all Fig. 6). As the representational dissimilarity matrices indicate (Supplementary Fig. S10), this pattern highlights the highly idiosyncratic nature of the individual back-projections (see also Supplementary Fig. S8).Here, we investigated response amplitude mechanisms related to the perception of a global Gestalt in an attempt to generalize the findings of our diamond experiment beyond shape perception. Participants viewed four apertures carrying random dots that moved either vertically and incoherently (local, vertical condition) or horizontally and coherently, allowing perceptual grouping into a global configuration (global, horizontal condition). These conditions echoed the global-local aspects of the diamond stimulus without varying in shape information. We hypothesized that if the inverse activity modulations in lower visual cortex and the VLOC/LOC we and others observed (Fang et al., 2008; Grassi et al., 2018; Murray et al., 2002) indeed mediate global object perception per se, we should be able to conceptually replicate this pattern.To validate our analysis procedures, we compared the horizontal or vertical condition to fixation. Our searchlight back-projections highlighted increased differential activity in physically stimulated sites and suppressive effects in non-stimulated sites. The spatial precision of this pattern was relatively high in V1 and decreased from V2 over V3 to the VLOC. Collectively, these results are in line with our diamond experiment and confirm the spatial sensitivity of our back-projection approach.To generalize the findings of our diamond experiment, we compared the horizontal and vertical condition directly, revealing a diffuse pattern of suppressed differential activity across large portions of the visual field in all visual areas. The wide-spread deactivation in lower visual cortex is consistent with our previous diamond results. The diffuse deactivation in the VLOC, however, contradicts the idea that its previously established inverse relationship to lower visual cortex represents a generic response amplitude mechanism mediating global object perception beyond shape perception.An interesting additional finding is that V1 and V2 activity in the more peripheral background area did not seem to be strongly suppressed for the horizontal relative to the vertical condition, but showed a tendency to remain fairly unchanged or slightly enhanced. This could suggest that the dampening effects we observed are diffusely related to the stimulus and level out further in the periphery. Alternatively, this may be related to a comparably sparser distribution of pRFs in the background area along with a fairly large size and central presentation of the dots stimulus and thus relative undersampling of the background area. Consequently, the question arises as to whether the large-scale deactivation in lower visual cortex also occurs if the dots stimulus is smaller, e.g., confined to one visual field quadrant only. Critically, if this were not the case and the deactivation quadrant-specific and not present in the remaining visual field, this could be regarded as a diffuse instantiation of a stimulus-referred response amplitude mechanism. In our fourth experiment, we therefore essentially repeated the dots experiment, but moved the dots stimulus to the top-right visual field quadrant.
Dots quadrant experiment
The author SS and 4 other healthy participants (P1, P6, and P9–P11; 1 male; age range: 20–36 years; all right-handed) participated in this experiment.All apparatus were identical to the dots experiment.The dots quadrant stimulus was identical to the dots stimulus except that the stimulus configuration was smaller and repositioned. Specifically, the dots field subtended 0.58 0.58 dva and the diameter of the circular apertures was thus 0.58 dva. The aperture midpoints were centered around the corners of a square with a size of 2.27 2.27 dva. The dots configuration was always presented in the top-right visual field quadrant. Its midpoint was located at a distance of 3.41 dva in the x- and y-direction from the center of the screen. The density of the dots in each aperture was 60.31/dva2 and thus higher than in the dots experiment. This way, we ensured that the movement of the dots was still clearly perceivable. As in the dots experiment, there was a local
vertical (Fig. 7, A.) and global
horizontal condition (Fig. 7, B., and Supplementary Video S4).
Fig. 7
Dots quadrant experiment Example frames of the dots quadrant stimulus. A. Local, vertical condition. Here, the dots oscillated vertically and incoherently with the dots in the leftmost/rightmost apertures moving towards/away from one another, respectively, or vice versa (not shown), so that the apertures were perceived as four individual elements. B. Global, horizontal condition. Here, the dots in all apertures oscillated horizontally and coherently, so that the apertures could be grouped together into a global Gestalt without forming a hybrid shape. Since this stimulus was non-ambiguous, the white arrows naturally indicate the perceived and physical movement direction of the dots within the aperture. The dots quadrant stimulus was only presented in the top-right visual field quadrant. For reasons of visibility, we cut out the stimulus region to provide a zoomed-in view, as indicated by the black dashed lines and the black double-headed arrows.
Dots quadrant experiment Example frames of the dots quadrant stimulus. A. Local, vertical condition. Here, the dots oscillated vertically and incoherently with the dots in the leftmost/rightmost apertures moving towards/away from one another, respectively, or vice versa (not shown), so that the apertures were perceived as four individual elements. B. Global, horizontal condition. Here, the dots in all apertures oscillated horizontally and coherently, so that the apertures could be grouped together into a global Gestalt without forming a hybrid shape. Since this stimulus was non-ambiguous, the white arrows naturally indicate the perceived and physical movement direction of the dots within the aperture. The dots quadrant stimulus was only presented in the top-right visual field quadrant. For reasons of visibility, we cut out the stimulus region to provide a zoomed-in view, as indicated by the black dashed lines and the black double-headed arrows.The procedure of the dots quadrant experiment was the same as for the dots experiment, although here, participants were required to press their right index/middle finger when the dots of any of the leftmost/rightmost apertures flickered. Moreover, eye tracking data were also collected during the final fixation interval.The MRI acquisition was identical to the other experiments except that we additionally collected a rapid MPRAGE (TR = 1.150 s, TE = 3.6 ms, voxel size = 2 mm isotropic, flip angle = 7°, FoV = 256 mm 208 mm, matrix size = 128 104, 80 sagittal slices) to aid coregistration of the functional to the structural images if the structural image was acquired in a separate session.The preprocessing was identical to all other experiments. However, if rerunning automated coregistration after manual registration failed, we performed a 2-pass-procedure where the functional images were first coregistered to the short MPRAGE and then to the long MPRAGE. Where necessary, this 2-pass-procedure was also applied to the retinotopic mapping data of a given participant.
Searchlight back-projections and representational similarity of searchlight back-projections
The searchlight back-projection and representational similarity analysis were conducted in the same manner as in the dots experiment. Exploratory back-projections for V3A can be found in Supplementary Fig. S13.Fig. 8 depicts the searchlight back-projection profiles for the pooled data as a function of visual area and contrast of interest. When contrasting the horizontal or vertical condition to fixation, our back-projection profiles highlighted enhanced activity in stimulated visual field portions. This differential enhancement was confined to the top-right visual field quadrant in V1 and V2 with suppressive effects in the remaining quadrants, but increasingly extended into the top-left and bottom-right quadrants from V3 to the VLOC.
Fig. 8
Dots quadrant experiment Searchlight back-projections of differential brain activity as a function of contrast of interest and visual area. T-statistics surpassing a value of ±25 (first and second row) or ± 15 (third row) were set to that value. The saturation of colors reflects the number of vertices with a pRF inside a given searchlight plus the inverse distance of these pRFs from the searchlight center. White lines represent the spatial extent of the circular apertures carrying the RDK. H = Global, horizontal condition. V = Local, vertical condition. Fix = Fixation baseline. VLOC = Ventral-and-lateral occipital complex. Pooled = Data pooled across all 5 participants. RDK = Random dot kinematogram. pRF = Population receptive field.
Dots quadrant experiment Searchlight back-projections of differential brain activity as a function of contrast of interest and visual area. T-statistics surpassing a value of ±25 (first and second row) or ± 15 (third row) were set to that value. The saturation of colors reflects the number of vertices with a pRF inside a given searchlight plus the inverse distance of these pRFs from the searchlight center. White lines represent the spatial extent of the circular apertures carrying the RDK. H = Global, horizontal condition. V = Local, vertical condition. Fix = Fixation baseline. VLOC = Ventral-and-lateral occipital complex. Pooled = Data pooled across all 5 participants. RDK = Random dot kinematogram. pRF = Population receptive field.For the contrast horizontal vs vertical, we observed a tendency for suppressive effects in stimulated areas of V1 and V2 and enhanced effects in the remaining visual field. In V3 and the VLOC, this pattern was much more pronounced and wide-spread.Fig. 9 shows the cMDS solution for the dissimilarities calculated between the individual (Supplementary Fig. S12), pooled, and LOSO searchlight back-projections by contrast of interest and visual area. The corresponding representational dissimilarity matrices can be found in Supplementary Fig. S14.
Fig. 9
Dots quadrant experiment Classical (metric) multidimensional scaling of the dissimilarities in Supplementary Fig. S14 as a function of contrast of interest and visual area. H = Global, horizontal condition. V = Local, vertical condition. Fix = Fixation baseline. VLOC = Ventral-and-lateral occipital complex. P1, P6, and P9–P11 = Participant 1, 6, and 9–11. Pooled = Data pooled across all 5 participants. Pooled-P1, Pooled-P6, and Pooled-P9-Pooled-P11 = Data pooled across 4 participants with 1 participant left out (as indicated by the suffix).
Dots quadrant experiment Classical (metric) multidimensional scaling of the dissimilarities in Supplementary Fig. S14 as a function of contrast of interest and visual area. H = Global, horizontal condition. V = Local, vertical condition. Fix = Fixation baseline. VLOC = Ventral-and-lateral occipital complex. P1, P6, and P9–P11 = Participant 1, 6, and 9–11. Pooled = Data pooled across all 5 participants. Pooled-P1, Pooled-P6, and Pooled-P9-Pooled-P11 = Data pooled across 4 participants with 1 participant left out (as indicated by the suffix).In virtually all cases, the LOSO back-projections coincided well with the pooled ones, suggesting a low degree of dissimilarity and thus speaking against an overly strong influence of single participants. The individual back-projections tended to cluster around the pooled ones, albeit less tightly than the LOSO back-projections, highlighting a higher degree of dissimilarity. However, some individual back-projections were located far apart from one another or the pooled back-projections. This was particularly true for the contrast horizontal vs vertical in V1 and the VLOC (see all Fig. 9). As confirmed by the representational dissimilarity matrices (Supplementary Fig. S14), this structure is indicative of a fairly high degree of dissimilarity and with that inter-individual variability (see also Supplementary Fig. S12).Here, we tested for a diffuse instantiation of a stimulus-referred response amplitude mechanism related to parafoveal Gestalt perception. Participants viewed apertures filled with random dots in the top-right visual field quadrant. The dots moved either vertically and incoherently (local, vertical condition) or horizontally and coherently (global, horizontal condition). Based on the results of our dots experiment, we hypothesized that any suppression of activity might be diffusely related to the physical stimulus and thus the top-right visual field quadrant or bordering areas.In line with our hypothesis, when contrasting the horizontal to the vertical condition, our searchlight back-projections revealed a trend for a reduction of activity near the stimulus location in V1 and V2 – a pattern that became more pronounced and wide-spread in V3 and the VLOC. Moreover, we observed an increase of activity in the remaining visual field in all visual areas. We therefore found evidence for an enhancement-suppression mechanism, possibly mediating the perception of figure and ground, as suggested previously (Chen et al., 2014; Gilad et al., 2013; Grassi et al., 2017; Kok and de Lange, 2014; Lamme, 1995; Likova and Tyler, 2008; Poort et al., 2012, 2016).The absence of clear suppressive effects in V1 and V2 (as compared to V3 and the VLOC) might be related to the functional architecture of the visual cortex, noisy voxels, and the size of the dots quadrant stimulus. Specifically, in lower visual areas, pRFs are smaller and with that the number of pRFs encoding the physical stimulus tendentially reduced (although not necessarily), resulting in diminished response gain. Consequently, noisy voxels are likely to have a more pronounced impact on searchlight-wise response amplitude quantifications. Moreover, stimulus-driven activity modulations tend to be weaker for smaller and more eccentric stimuli (Nasr et al., 2015) and the distribution of pRFs sparser in the peripheral visual field, as qualified by the saturation weighting in our searchlight back-projections. This might have additionally contributed to the unclear patterns in V1 and V2. Nevertheless, our validation analyses showed that when contrasting the vertical or horizontal condition to fixation, we were able to effectively stimulate the cortical area corresponding to the top-right visual field quadrant. This confirms the general feasibility of our back-projection approach.
General discussion
In three fMRI experiments, we used dynamic bistable (diamond experiment) and non-ambiguous stimuli (dots and dots quadrant experiment) to explore response amplitude mechanisms underlying global object perception in human visual cortex. All these stimuli could either be perceived globally (i.e., as a grouped and coherently moving Gestalt) or locally (i.e., as ungrouped and incoherently moving elements). Using pRFs as an encoding model, we back-projected brain responses measured during stimulus perception into visual space via a searchlight procedure. This enabled us to read out topographic profiles with great spatial detail.
Signatures in lower visual cortex
When contrasting global to local perception, our diamond and dots experiment revealed a fairly wide-spread suppression of activity across the whole visual field in lower visual cortex. However, unlike our diamond experiment, our dots experiment provided little evidence for pronounced activity modulations in the background region, suggesting that these suppressive effects might be diffusely related to the physical stimulus. Our dots quadrant experiment largely confirmed this notion, but revealed additionally a wide-spread increase of activity in the background area. Whereas the wide-spread suppressive effects from the diamond experiment speak against a response amplitude code mediating the perception of figure and ground, the results from the dots and dots quadrant experiment are largely compatible with this idea. In any case, the outcomes of our experiments seem to converge in that they suggest that perceptual grouping is associated with a reduction of activity in lower visual cortex.Surprisingly, however, all these findings are at odds with recent evidence showing a decrease of brain activity in the background and stimulus region of another bistable global-local stimulus along with an increase in the center and inferred contour region for global vs local perception (Anstis and Kim, 2011; Grassi et al., 2017). Unlike our diamond stimulus, this bistable stimulus triggers a local percept of four individually rotating disk pairs or a global percept of two floating squares circling around the stimulus center. The mismatch in findings might therefore be related to differences in physical stimulus properties, such as the type and/or direction of motion (i.e., rotary vs oscillatory and rotational vs horizontal/vertical, respectively).The emergence of suppressive effects in the dots and dots quadrant experiment, where shape information was kept constant during global and local perception, further highlights the importance of motion properties. This idea is in line with findings of reduced activity in lower visual cortex for coherent vs incoherent motion (Braddick et al., 2001; Costagli et al., 2014; Harrison et al., 2007; McKeefry et al., 1997; Schindler and Bartels, 2017), although no or opposite effects have also occasionally been observed (Braddick et al., 2001; Rees et al., 2000). However, unlike these studies on motion coherence, we did not compare coherent to random motion nor did Grassi et al. (2017). Rather, all our stimuli always comprised coherent motion, but were either perceived as ungrouped and moving vertically out-of-phase (local) or grouped and moving horizontally in-phase (global). Accordingly, although speculative, the perceived axis of motion might constitute an important factor driving our results.A potential reason for a horizontal-vertical imbalance might be that there is a bias for vertical motion in lower visual cortex resulting in generally higher response amplitudes. In the case of the diamond experiment (in particular), this directional anisotropy might additionally interact with feature-based attention. Specifically, given that information about motion direction is inherently ambiguous for the diamond stimulus, during the local diamond state, observers may direct their attention to vertical motion and during the global diamond state to horizontal motion.Interestingly, there is evidence for increased responses to horizontal/vertical motion around the horizontal/vertical meridian in lower visual cortex (Clifford et al., 2009). Along with a plethora of similar studies (Maloney et al., 2014; Raemaekers et al., 2009; Schellekens et al., 2013), this finding points to a radial response bias. Importantly, such a radial anisotropy appears incompatible with our results, as (if anything) it should produce meridian-related antagonistic effects for global as compared to local perception (i.e., an increase in differential activity around the horizontal meridian and decrease around the vertical meridian), which we did not observe. Critically, however, it is hitherto not clear in how far these radial anisotropies are due to vignetting (Roth et al., 2018) and/or aperture-inward biases (Wang et al., 2014), leaving open the possibility for a vertical-horizontal anisotropy.The role of feature-based attention as a perceptual modulator fits in with evidence that the attended direction of motion can be decoded from activity in lower visual cortex (Kamitani and Tong, 2006) even in the absence of direct physical stimulation (Serences and Boynton, 2007) and the idea that feature-based attention acts fairly globally across the visual field (Jehee et al., 2011; Maunsell and Treue, 2006; Saenz et al., 2002; Serences and Boynton, 2007; Treue and Martinez Trujillo, 1999). Strikingly, the combinatory effect of anisotropies and feature-based attention might also help explain why variations of the diamond stimulus triggering a local percept of vertical motion and a global percept of rotational motion (Caclin et al., 2012) or other bistable global-local stimuli (Grassi et al., 2017) produce distinct differential response profiles. Most importantly, as for our findings, this combinatory effect leads to the prediction that rotating the diamond display by 90° should produce the opposite pattern of results for global vs local perception.Leaving all inconsistencies aside, our study overlaps with studies on motion coherence (Braddick et al., 2001; Costagli et al., 2014; Harrison et al., 2007; McKeefry et al., 1997; Schindler and Bartels, 2017) and Grassi et al.’s (2017) work in that it points to stimulus-referred suppressive effects for global vs local perception. This suppression might be related to a recently reported phenomenon known as the global slow-down effect (Kohler et al., 2009, 2014). This effect comprises a slow-down in the perceived speed of a stimulus configuration as a result of perceptual grouping and has hitherto only been demonstrated behaviourally (Kohler et al., 2009, 2014) for variations of the stimulus (Anstis and Kim, 2011) used by Grassi et al. (2017). As such, it would be worthwhile to examine whether the effect holds true for the diamond stimulus and ultimately also our dots and dots quadrant stimuli along with more conventional motion displays because these stimulus classes abstract from shape perception (for a similar point and a discussion on potential underlying mechanisms see Kohler et al., 2014).The broad background enhancement we observed in the dots quadrant experiment, which was absent in the diamond and dots experiment, might be due to spatial attention. In particular, perceiving a grouped and coherently moving object parafoveally might require fewer attentional resources than perceiving an ungrouped and incoherently moving object. Accordingly, in the vertical condition, fewer attentional resources might have been available for processing the background area. This interpretation fits in with reports that spatial attention results in increased brain responses even in the absence of physical stimulation (Kastner et al., 1999; Silver et al., 2009). Due to the size and central presentation of the diamond and dots stimulus, we might have been unable to observe similar effects in the diamond and dots experiment. It is furthermore possible that the background enhancement is related to perceived background luminance, which has recently been found to be increased for global vs local perception (Han and VanRullen, 2016, 2017).Interestingly, our finding of diffuse figure-related suppression and background enhancement seems incompatible with studies in monkeys reporting opposite effects (e.g., Poort et al., 2016; Gilad et al., 2013). This inconsistency might be due to the lack of background elements in our study. Indeed, there is evidence that BOLD responses in V1–V3 to aligned vs unaligned contours increase in the presence of background clutter and decrease in the absence of it. A flip of antagonistic background responses was, however, not observed (Qiu et al., 2016).Building upon previous research involving the diamond stimulus (de-Wit et al., 2012), it is important to highlight that our results in lower visual cortex across all experiments seem to contradict suggestions of predictive coding theories that suppressive effects should be confined to cortical sites encoding the physical stimulus and accompanied by unchanged activity in the background region (e.g., Mumford, 1992; Murray et al., 2004; Rao and Ballard, 1999). They furthermore seem to conflict with alternative accounts, such as response sharpening (e.g., Kersten et al., 2004; Kersten and Yuille, 2003; Murray et al., 2004). Response sharpening accounts assume that predictive feedback from higher-tier areas sharpens diffuse responses in lower-tier areas (due to noise or ambiguity) by increasing activity matching the global interpretation of the bottom-up input and decreasing non-matching activity. Accordingly, when contrasting global to local object perception, activity should increase in stimulated and decrease in non-stimulated sites – a pattern we did not observe.Critically, due to the complex nature of the BOLD signal and its relation to neuronal activity as well as its limited spatio-temporal resolution (e.g., Goense et al., 2012; Logothetis, 2003, 2008; Shmuel et al., 2006), the interpretability of increases and decreases in brain activity is limited. For instance, a study applying optical imaging coupled with electrode recordings in macaque V1 showed that decreases in metabolic activity can be accompanied by local increases in spiking (Kinoshita et al., 2009). Such a mismatch could be explained by a net decrease of inhibitory activity, resulting in facilitated spiking of some stimulus-responsive cells, but an overall decrease in metabolic activity, manifesting itself in reduced optical signal strength. The population responses we assessed in our study therefore do not rule out that the signatures hypothesized by predictive coding (e.g., Mumford, 1992; Murray et al., 2004; Rao and Ballard, 1999) or response sharpening accounts (e.g., Kersten et al., 2004; Kersten and Yuille, 2003; Murray et al., 2004) manifest at the single neuron level. Similarly, our study considers the visual cortex as a 2D sheet and consequently ignores the possibility of local behavior as well as entanglement of such signatures across laminae (Kuehn and Sereno, 2018).
Similarities and differences between higher and lower visual cortex
Whereas our findings for the VLOC in the dots and dots quadrant experiment largely paralleled those in lower visual cortex for global vs local perception, we observed a large-scale response enhancement in the diamond experiment that was antagonistic to responses in lower visual cortex. The absence of this antagonism when shape information did not change suggests that it does not represent a generic grouping mechanism.It could be argued that our failure to find evidence for such an opposite pattern is due to the fact that non-ambiguous stimuli strongly favor a single perceptual interpretation and thus involve less predictive feedback (Wang et al., 2013). This explanation seems unlikely because an inverse V1-LOC relationship has also been established for non-ambiguous shape-like stimuli vs unstructured displays (Murray et al., 2002). Moreover, at least broadly in line with our results, recent studies (Grassi et al., 2016, Grassi et al., 2017, Grassi et al., 2018) found no evidence for increased activity in the LOC (or subregions thereof) when a dynamic, bistable global-local stimulus constantly triggered shape-based interpretations (i.e., moving disks forming large squares or small circles).The absence of a (stimulus-related) increase in VLOC activity in the dots and dots quadrant experiment seems incompatible with a study reporting enhanced LOC activity for intact compared to scattered objects with disturbed inter-part relations (Margalit et al., 2017). Yet, in this study, inter-part relations were abolished by disturbing the contiguity of different shape parts. In our experiments, however, the position of the apertures did not change during the local state nor did shape information, which might explain the discrepant results.Importantly, however, the fact that we observed comparable response patterns for the VLOC and lower visual cortex in the dots and dots quadrant experiment makes the VLOC no less likely as a potential feedback hub. Similarly, in the context of predictive coding models (e.g., Mumford, 1992; Murray et al., 2004; Rao and Ballard, 1999), where higher-level areas are assumed to show distinct response profiles, it could be argued that VLOC activity is (partially) mediated by pathways bypassing V1 (potentially via MT), effectively rendering it a lower-level area. Evidence for direct geniculate inputs to monkey MT (Sincich et al., 2004) has, for instance, been used to explain suppressive effects in hMT+ for coherent vs incoherent motion perception (Harrison et al., 2007). However, even when considering alternative pathways, predictive coding accounts cannot easily explain the background enhancement we observed.
Inter-individual variability
The wealth of evidence presented here is based on data pooled across a small number of participants. As such, it is important to find ways of flagging an overly large influence of a single participant. Although the results of our representational similarity analyses did not indicate such a bias, they collectively highlighted the highly idiosyncratic nature of the individual back-projection profiles. Some of these idiosyncrasies are likely due to a lower signal-to-noise ratio at the individual level triggered by a generally lower number of available data points. They might also be related to inter-individual variability in pRF estimates and processing of the global-local stimuli, such as gaps in visual field maps (see Supplementary Fig. S4, Fig. S8, and Fig. S12), processing differences in subareas of the VLOC, or differences in switch rates, perceptual durations (see Supplementary material, 1.1.2. Results, and Supplementary Fig. S2), perceptual vividness, and attention allocation.
Eye movements
Our results might be confounded by (excessive) eye movements. Yet, all participants for whom we were able to acquire eye tracking data showed a relatively high degree of fixation stability. Moreover, we found little evidence for systematic eye position biases related to the perceived movement direction of our stimuli (see all Supplementary material, 1.1.1., 1.2.1., 1.3.1. Data analysis, 1.1.2., 1.2.2., 1.3.2. Results, and Supplementary Fig. S3, Fig. S7, and Fig. S11). Consequently, differences in eye position variability cannot easily explain our results, although we cannot preclude the involvement of eye-related dynamics including blinks, pupil dilatation, visually-guided saccades and/or microsaccades (e.g., Hupé et al., 2009; Tse et al., 2010).
Descriptive statistics
The present study presents a description of brain activity patterns underlying global object perception. It remains to be seen whether the signatures identified here replicate and generalize beyond the specific conditions tested, such as stimulus type, sample, field strength, scanner type, experimental design, and analysis type.
Conclusion
We found evidence for a suppression of activity in lower visual cortex accompanied by an increase of activity in the VLOC for global relative to local object perception. While the suppressive effects in lower visual cortex manifested themselves irrespective of shape grouping, this was not the case for the enhanced responses in the VLOC. Instead, once shape information was held constant during both global and local object perception, the VLOC also showed a decrease of activity. As such, the antagonistic patterns between lower visual cortex and the VLOC we initially quantified cannot be regarded as a generic grouping mechanism. We furthermore observed that grouping-related suppressive effects can be diffusely confined to stimulated visual field portions (once stimulus size is reduced) and surrounded by enhancement effects, potentially pointing to a response amplitude mechanism mediating the perception of figure and ground.
Data and code availability
Preprocessed data, analysis code, and stimulus videos are available at https://doi.org/10.17605/OSF.IO/E6C8S.
Contributions
Susanne Stoll: Conceptualization; Data curation; Formal analysis; Investigation; Methodology; Project administration; Software; Validation; Visualization; Writing - original draft; Writing - review & editing. Nonie J. Finlayson: Investigation; Writing - review & editing. D. Samuel Schwarzkopf: Conceptualization; Funding acquisition; Investigation; Methodology; Project administration; Resources; Software; Supervision; Writing - review & editing.
Declaration of competing interest
The authors declare no conflict of interest. The research sponsor had no role in the study design, the collection, analysis and interpretation of the data or the write-up and decision to submit this article for peer review.
Authors: R Malach; J B Reppas; R R Benson; K K Kwong; H Jiang; W A Kennedy; P J Ledden; T J Brady; B R Rosen; R B Tootell Journal: Proc Natl Acad Sci U S A Date: 1995-08-29 Impact factor: 11.205