JohnMark Taylor1, Yaoda Xu2. 1. Visual Inference Laboratory, Zuckerman Institute, Columbia University, United States. Electronic address: johnmarkedwardtaylor@gmail.com. 2. Department of Psychology, Yale University, United States. Electronic address: xucogneuro@gmail.com.
Abstract
Despite decades of research, our understanding of the relationship between color and form processing in the primate ventral visual pathway remains incomplete. Using fMRI multivoxel pattern analysis, we examined coding of color and form, using a simple form feature (orientation) and a mid-level form feature (curvature), in human ventral visual processing regions. We found that both color and form could be decoded from activity in early visual areas V1 to V4, as well as in the posterior color-selective region and shape-selective regions in ventral and lateral occipitotemporal cortex defined based on their univariate selectivity to color or shape, respectively (the central color region only showed color but not form decoding). Meanwhile, decoding biases towards one feature or the other existed in the color- and shape-selective regions, consistent with their univariate feature selectivity reported in past studies. Additional extensive analyses show that while all these regions contain independent (linearly additive) coding for both features, several early visual regions also encode the conjunction of color and the simple, but not the complex, form feature in a nonlinear, interactive manner. Taken together, the results show that color and form are encoded in a biased distributed and largely independent manner across ventral visual regions in the human brain.
Despite decades of research, our understanding of the relationship between color and form processing in the primate ventral visual pathway remains incomplete. Using fMRI multivoxel pattern analysis, we examined coding of color and form, using a simple form feature (orientation) and a mid-level form feature (curvature), in human ventral visual processing regions. We found that both color and form could be decoded from activity in early visual areas V1 to V4, as well as in the posterior color-selective region and shape-selective regions in ventral and lateral occipitotemporal cortex defined based on their univariate selectivity to color or shape, respectively (the central color region only showed color but not form decoding). Meanwhile, decoding biases towards one feature or the other existed in the color- and shape-selective regions, consistent with their univariate feature selectivity reported in past studies. Additional extensive analyses show that while all these regions contain independent (linearly additive) coding for both features, several early visual regions also encode the conjunction of color and the simple, but not the complex, form feature in a nonlinear, interactive manner. Taken together, the results show that color and form are encoded in a biased distributed and largely independent manner across ventral visual regions in the human brain.
Research over the past several decades has provided us with a wealth of
knowledge regarding the representation of color and form information in the primate
brain. Both color and form information have been shown to be represented and
transformed across multiple levels of processing, with the relevant neural processes
spanning the entire visual processing hierarchy, from the retina to higher-level
ventral stream regions. Notably, human fMRI studies have identified form-processing
regions in lateral and ventral occipito-temporal cortex (OTC) (Malach et al., 1995; Grill-Spector et al., 1998; Kourtzi and
Kanwisher, 2001; Orban et al.,
2004), and both monkey neurophysiology and human fMRI studies have
reported color-processing regions in ventral OTC (Hadjikhani et al., 1998; Brewer et al.,
2005; Conway et al., 2007; Lafer-Sousa and Conway, 2013; Lafer-Sousa et al., 2016; Chang et al., 2017; Conway,
2018). Despite these advances, past studies tended to examine a single
feature in isolated brain regions with a range of different methods or stimuli,
making it difficult to construct an overarching view of how color and form are coded
relative to each other within a brain region and across different regions along the
primate ventral processing pathway.In the present study, we address these limitations by comprehensively probing
visual processing regions along the ventral visual pathway with the same stimuli to
document the extent to which color and form are encoded in overlapping versus
independent brain regions, compare the magnitude of color and form decoding for the
regions that encode both (allowing us to test whether regions that have shown
univariate selectivity for a given feature exhibit a corresponding multivariate
feature preference), and determine whether regions with information about both
features encode it in an additive versus interactive manner. We draw on a paradigm
developed by Seymour et al. (2010) that uses
fMRI and multi-voxel pattern analysis (MVPA) to examine color and form coding, both
replicating their results and extending them in two ways. First, in addition to the
early visual areas examined in their study, we examined higher-level ventral stream
regions exhibiting univariate selectivity to either color or shape information.
Second, in addition to examining the coding of color and orientation (a low-level
form feature) as in their study, we documented the coding of color and a mid-level
form feature, curvature, across the early visual and ventral visual regions.
Together, our approach provides an updated documentation of the representation of
color, form, and their conjunction across the human ventral visual pathway.
Color and form processing across the visual hierarchy
Past work has demonstrated that both color and form information is
successively transformed across a series of processing stages, spanning from
early visual cortex to anterior temporal lobe regions. Early visual areas V1 to
V4 have been shown to contain both color and shape information, with some degree
of mesoscale segregation of neurons tuned to each of these two features (e.g.,
Livingstone and Hubel, 1988; Gegenfurter et al., 1996; Conway, 2001; Johnson
et al., 2001; Ts’o et al.,
2001; Brewer et al., 2005;
Conway et al., 2007; Brouwer and Heeger, 2009; Conway et al., 2010; Seymour et al., 2010; Shapley and
Hawken, 2011).For higher ventral regions beyond V4, coding for color and form exhibit
more anatomical separation. Specifically, macaque inferotemporal cortex (IT)
contains neurons tuned to high-level shape features (e.g., Tanaka, 1996; DiCarlo
et al., 2012; Lehky and Tanaka,
2016; Bao et al., 2020), and
arguably homologous regions in the human lateral and ventral OTC exhibit higher
fMRI responses to coherent shapes than scrambled stimuli (Malach et al., 1995; Grill-Spector et al., 1998; Kourtzi
and Kanwisher, 2001; Orban et al.,
2004). Critically, damage to these cortical regions can result in
loss of form perception, with spared color perception (Benson and Greenberg, 1969; Goodale and Milner, 2004). Analogously, a series of
posterior, central and anterior color-selective regions in ventral visual cortex
have been shown to exhibit color-tuning in the macaque, and show higher fMRI
responses to colored than greyscale stimuli in the human brain (Hadjikhani et al., 1998; Brewer et al., 2005; Conway et al., 2007; Lafer-Sousa and
Conway, 2013; Lafer-Sousa et al.,
2016; Chang et al., 2017;
Conway et al., 2018). Damage to these
color regions has been linked to deficits in color processing with largely
spared form processing (Siuda-Krzywicka and
Bartolomeo, 2020; Bouvier and Engel,
2006).The existence of regions reliably showing univariate selectivity to color
and form, along with the lesion data, is consistent with the view that different
features are encoded by anatomically distinct neural populations in high-level
vision. However, color and form information may be encoded in distributed,
fine-grained activation patterns that univariate mean-activation methods cannot
detect (e.g., Haxby et al., 2001).
Indeed, macaque IT and color regions contain both color and form information
(Komatsu and Ideura, 1993; McMahon and Olson, 2009; Chang et al., 2017; Rosenthal et al., 2018; Duyck et
al., 2021) and the human shape-selective region in lateral occipital
cortex has been found to contain color information using fMRI multivoxel pattern
analysis (Bannert and Bartels, 2013, 2018).How should we reconcile the existence of regions showing univariate
selectivity for color or form with the evidence suggesting that tuning for these
features might be broadly distributed throughout the ventral visual pathway? A
primary goal of the present study is to systematically document the multivariate
coding of color and form throughout the human ventral visual processing pathway,
compare how the relative coding strength of these two types of features may vary
across brain regions, and determine whether regions exhibiting univariate
feature selectivity as reported in previous work would show a similar bias in
multivariate feature decoding.
Independent or interactive coding of color and form?
The mesoscale segregation of neurons specialized for processing color and
form features in early visual areas, the existence of distinct higher-level
visual regions showing univariate selectivity to color or form, and the
behavioral deficits associated with damage to these regions are consistent with
independent coding of color and form in the primate brain. Available
psychophysical evidence also supports this view, such as from visual search and
illusory conjunction effects (e.g., Treisman and
Gelade, 1980; Treisman and Schmidt,
1982), leading Treisman and Gelade
(1980) to posit feature integration theory:
different visual features are initially encoded on their own distinct feature
maps, and focused attention then spatially links the different features
associated with the same object to encode conjunctions of features.Meanwhile, various behavioral studies have shown that color and form may
be automatically encoded in a conjoined and interactive format without requiring
a separate, laborious attention-driven binding step (e.g., Stromeyer, 1969; Victor et al., 1989; Cavanagh,
1991; Heywood et al., 1991;
Barbur et al., 1994, 1998; Holcombe and Cavanagh, 2001; Mandelli and Kiper, 2005). At the level of
neural coding, non-additive (interactive) feature coding has been found in human
early visual areas (Engel, 2005; Seymour et al., 2010; see more details of
the latter study below) and macaque V4 and color regions (Bushnell and Pasupathy, 2012, and Chang et al., 2017). It is largely absent in IT, and
has not been explicitly tested in V1 and V2 in macaques (Friedman et al., 2003; McMahon and Olson, 2009). Notably, interactive
feature coding in the human brain has thus far only been tested for simple form
features, such as orientation, leaving it unknown whether this processing format
is also used for the conjunction of color with more complex form features.A second goal of the present study is thus to examine the prevalence of
non-additive color and form coding in the ventral visual pathway by testing
whether it is present for both the conjunction of color and simple form features
and that of color and more complex form features, and determining whether it can
be found in lower as well as higher ventral regions in the human brain. Due to
the “combinatorial explosion” involved in directly encoding every
possible combination of color and form features, it is possible that interactive
coding may only be used for some form features but not others, making it
important to determine how broadly it applies.
Present study
To answer the outstanding questions raised above, we replicated and
extended a previous human fMRI MVPA study by Seymour et al. (2010) that examined color and orientation coding in
early visual areas. In this study, spiral stimuli were shown that were either
clockwise or counterclockwise, and either red or green (see Fig. 1 for an illustration). In the single
conjunction condition, spiral stimuli for each orientation and
color combination were shown in different blocks of trials, with the phase of
each spiral alternating over the course of the block to ensure that any form
decoding was not a confound from differing retinotopic footprints of the
stimuli. fMRI decoding revealed the presence of both color and orientation
information in V1, V2, V3, and V4. In the double conjunction
condition, pairs of stimuli with both features
differing (e.g., either Red-Clockwise and Green-Counterclockwise, or
Red-Counterclockwise and Green-Clockwise) were shown alternating throughout a
block of trials, such that the two kinds of block had the same individual
features, but differed in how they were conjoined. fMRI decoding revealed
interactive coding of color and orientation throughout V1 to V4. These regions
thus appeared to encode not just color and orientation, but also how they were
combined.
Fig. 1.
Stimuli and experimental design. A. In Experiment 1 (left),
logarithmic spiral stimuli (adapted from Seymour
et al., 2010) were shown that could be oriented clockwise or
counterclockwise, and colored red or green. These spirals have the property that
their arms are a fixed angle from the radius at all points, ensuring that gross
radial biases in cortical retinotopic maps could not drive decoder performance.
In Experiment 2 (right), spiky and curvy tessellation stimuli were used, with
the same colors as Experiment 1. The stimuli alternated phase once per second,
such that black shapes within the circular aperture became colored, and vice
versa. B. The two kinds of blocks present in both experiments.
Stimuli were either presented in single-conjunction blocks, where a single
stimulus type (e.g., Red-CW spiral) was presented for the entire block with its
phase alternating once per second, or in double-conjunction blocks, where
stimuli varying with respect to both features alternated once per second within
a block. Thus, in Experiment 1, the two kinds of double-conjunction block were
Red-CW/Green-CCW and Red-CCW/Green-CW; in Experiment 2, the two kinds of
double-conjunction block were Red-Spiky/Green-Pinwheel and
Red-Pinwheel/Green-Spiky.
While Seymour et al. (2010) was
elegantly designed and theoretically informative, it was limited to the coding
of color and orientation in early visual areas. To address the two main
questions we raised earlier, here we extended Seymour et al. (2010) in two important ways: first, we examine not
only early visual regions but also higher-level ventral stream regions defined
based on their univariate selectivity to either color or form, and second, we
examine color and form coding not just for a simple form feature (orientation),
but also a mid-level form feature, curvature. Our study additionally allowed us
to replicate the interactive coding for color/orientation conjunctions in early
visual areas as reported by Seymour et al.
(2010) and test whether such a coding scheme is specific to simple
form features in early visual areas, or is a broader motif of color-form
processing in human ventral cortex. As an additional extension of their study,
we devised a new analysis technique, pattern difference MVPA,
that we used as a further method to probe for interactive color-form tuning;
this method can also be used to look for subtle interaction effects in fMRI
paradigms beyond the present study.We found that color and form could be decoded from activity in early
visual areas V1 to V4, as well as in the posterior color-selective region and
shape-selective regions in ventral and lateral occipitotemporal cortex defined
based on their univariate selectivity to color or shape, respectively (the
central color region only showed color but not form decoding). Meanwhile,
decoding bias towards one feature or the other existed in the color- and
shape-selective regions, largely consistent with their univariate feature
selectivity reported in past studies. While all regions encoding both color and
shape contained independent (linearly additive) coding of the two features,
several analyses found evidence that early visual cortex additionally contains a
tuning component that encodes the conjunction of color and the simple, but not
the complex, form feature in a nonlinear, interactive manner. Taken together,
the results show that color and form are encoded in a biased distributed and
largely independent manner across ventral visual regions in the human brain.
Materials and methods
Participants
Experiment 1 included 12 healthy,
right-handed adults (7 females, between 25 and 34 years old, average age 30.6
years old) with normal color vision and normal or corrected to normal visual
acuity. Experiment 2 included 13 healthy
adults (7 females, between 25 and 34 years old, average age 28.7 years old).
Four participants partook in both experiments. Participants were members of the
Harvard community with prior scanning experience. All participants gave informed
consent prior to the experiments and received payment. The experiments were
approved by the Committee on the Use of Human Subjects at Harvard
University.
Stimuli
Experiment 1: Colored spirals
Stimulus design and experimental design for Experiment 1 were largely adapted from Seymour et al. (2010), with identical
stimuli and tasks but some differences in the number and timing of the
blocks. Participants viewed colored spiral stimuli that varied by
color—red or green—and orientation—clockwise (CW) or
counterclockwise (CCW)—resulting in four different kinds of spirals
(Fig. 1A). Spirals were presented
on a black background.The spirals used were logarithmic spirals, defined
by the formula r=aeb,
which have the property that the angle between the radius of the spiral and
an arm of the spiral at any point is fixed, in this case at 45°. This
property ensures that there is a constant relationship between the location
of an edge of a spiral arm in visual space and the radial component of its
angle, as would not be the case if oriented gratings were used (for example,
a horizontal oriented grating would have a maximal radial component along
the horizontal midline, and minimal radial component along the vertical
midline). This constraint accounts for the known radial
bias in early visual cortex, in which radial orientations are
preferentially represented in early visual topographic maps (e.g., zones of
cortex corresponding to the top of the visual field have an
over-representation of vertically oriented angles), ensuring that successful
decoding of orientation could not simply be due to activation of different
sub-regions of topographic maps (Sasaki et
al., 2006; Mannion et al.,
2009; Seymour et al.,
2010). Stimuli were generated by first drawing 40 spiral lines at
evenly spaced angles from the origin according to the above formula and
filling in alternating regions of the spiral with the stimulus color and the
background color, black, resulting in 20 spiral arms. The spiral subtended a
circular region covering 9.7° of visual angle, with an internal
aperture in the middle, within which a white fixation dot was displayed. As
mentioned earlier, the spiral arms could be oriented either clockwise or
counterclockwise. Additionally, depending on which of the spiral arms were
colored and which were black, each spiral could be presented in one of two
phases.The exact spiral colors used in the experiment were generated using
the following procedure. To generate initially isoluminant shades of red and
green, each participant performed a flicker-adjustment procedure inside the
scanner (Kaiser, 1991), in which a
flickering checkerboard with the two colors being adjusted flashed at 30 hz,
and participants adjusted the colors until the flickering sensation was
minimal. Specifically, the two colors had RGB values of the form red-hue =
[178, 178 - X, 89] and green-hue = [0, X, 89], where participants adjusted
the “X” parameter until isoluminance was achieved. This
procedure guarantees that the two colors are isoluminant and sum to neutral
gray, thereby equally stimulating all chromatic channels. Participants
performed ten trials of this procedure, and the average “X”
value was used to produce the initial colors. However, since this procedure
might theoretically have some associated imprecision, each color was
presented at either +/−10% of its initially calibrated luminance
value on any given run of the experiment, where the number of high-luminance
and low-luminance runs was balanced across the red and green colors. This
manipulation ensures that any residual between-hue luminance differences
will be far smaller than the within-hue luminance differences, reducing the
likelihood that luminance, rather than hue, could drive MVPA classification
during analysis. The luminance adjustment procedures were identical to those
of Seymour et al. (2010), with the
minor difference that their study varied the luminance settings of a given
color within a run, whereas we varied it between runs.
Experiment 2: Colored tessellation patterns
For this experiment, we constructed two different tessellation
stimuli, consisting either of a curvy or a spiky pattern within a circular
aperture (Fig. 1A). These stimuli were
deliberately designed so as not to resemble any real-world entities, and we
decided upon a curvy versus spiky contrast because curvature is a salient
mid-level visual feature, in contrast with orientation, which can be
considered a lower-level visual feature (Gallant et al., 1993; Srihasam
et al., 2014; Yue et al.,
2014). The “phase” of the tessellation stimuli
could also vary, based on whether a given region of the stimulus was
currently colored or black. Exactly the same procedure as Experiment 1 was used to calibrate the colors of
the two stimuli, and the stimuli subtended the same visual angle
(9.7°) as in Experiment 1.
Procedure
Participants viewed 12 s blocks of the stimuli and had to detect a
30% luminance increment or decrement using a button press (index finger for
increase, middle finger for decrease). On any given block, two 500 ms
luminance changes were presented, one in the first half and one in the
second half of the block, and never in the first or last two stimuli of the
block. The number and timing of the increments and decrements within the
blocks was balanced across the whole experiment, and across all stimulus
conditions described below. There were 9 s fixation blocks between the
stimulus blocks and at the end of the run, with a 12 s fixation block at the
beginning of the run. This allowed us to better separate fMRI responses from
adjacent blocks (note that Seymour et al.,
2010 included no fixation blocks between stimulus blocks in their
design).The experiment included two kinds of runs (Fig. 1B). In the single-conjunction
runs, only a single kind of spiral (RedCW, RedCCW, GreenCW, or
GreenCCW) was presented for a given block, with its phase alternating once
per second (for a total of 12 phase alternations per block), with no blank
period between phase alternations within a block (i.e., the alternations
between successive phases were instantaneous). This phase alternation
ensures that all conditions were equated in their retinotopic footprint over
the course of each block, removing this as a possible confound in form
decoding. Since two starting phases were possible, each of the four spiral
types could begin on either starting phase, resulting in 8 different block
types for these runs. Each run contained one instance of each of the 8 types
of block, totaling 180 s per run. Participants completed 12 such runs, thus
viewing a total of 24 blocks of each of the four spiral types over the whole
session. To ensure that block types were roughly matched in terms of their
placement within the runs and how frequently they appeared next to other
block types, a random balanced Latin square procedure was used to generate
the block order for each subject; specifically, two random 8 × 8
balanced Latin squares were generated, the second square was truncated to 4
× 8, and the two squares were concatenated, giving the block order
for 12 runs of 8 blocks each.In the double-conjunction runs, there were two
block conditions: a block could either alternate between RedCW and GreenCCW,
or between RedCCW and GreenCW, with the phase of each spiral type
alternating at each presentation; for instance, an example block would
progress through the sequence RedCW-Phase1, GreenCCW-Phase2, RedCW-Phase2,
GreenCCW-Phase1, and then repeat. Since each block condition could begin on
either one of the two spirals in either one of the two phases, there were
therefore four different block types for each block condition. Due to how
the spirals were constructed and how the stimuli alternated phase within
each block type, varying the starting stimulus in this manner ensured that
the two block conditions were matched in how frequently each pixel took on
values of red (25% of the time), green (25% of the time), and black (50% of
the time) both over the course of any given block and at any given time
point across the four block types within each block condition. This ensured
that pixel-level information could not drive decoding during the MVPA
analysis. The stimulus timing, number of blocks, counterbalancing method,
and task for these runs was otherwise identical to that of the
single-conjunction runs. Participants completed 12 double-conjunction runs,
and thus viewed each kind of double conjunction block 48 times. The
single-conjunction runs and double-conjunction runs alternated in sets of
three (e.g., three double-conjunction runs, then three single-conjunction
runs), with the type of the initial run set counterbalanced across
participants. Note that while Seymour et al.
(2010) interleaved single-conjunction blocks and
double-conjunction blocks within the same run, we separated them into
different runs. This allowed us to form two completely independent datasets
to more rigorously validate results showing interactive coding of color and
form.Exactly the same task and experimental design were used in Experiment 2 as in Experiment 1, with only the stimuli varying. Due
to how the tessellation stimuli were constructed and the manner in which
they alternated phase within the double conjunction blocks, they shared with
the spirals the property that each pixel was matched in its frequency of
taking on values of red, green, and black both over the course of the block,
and at corresponding timepoints for the two block conditions, across the
four block types within each block condition.
Localizer experiments
As regions of interest in both experiments, we included
retinotopically-defined regions V1, V2, V3, and V4 in early visual cortex,
and functionally-defined shape and color regions in occipitotemporal visual
cortex.To localize topographic visual field maps, we followed standard
retinotopic mapping techniques (Sereno et
al., 1995). A 72° polar angle wedge swept either clockwise
or counterclockwise (alternating each run) across the entire screen, with a
sweeping period of 36.4 s and 10 cycles per run. The entire display
subtended 23.4 × 17.6° of visual angle. The wedge contained a
colored checkerboard pattern that flashed at 4 Hz. Participants were asked
to detect a dimming in the polar angle wedge. Each participant completed
4–6 runs, each lasting 364 s.We localized two shape regions in lateral occipitotemporal (LOT) and
ventral occipitotemporal (VOT) cortex, following the procedure described by
Kourtzi and Kanwisher (2001), and
subsequently used in several of our own lab’s studies (Vaziri-Pashkam and Xu, 2017; Vaziri-Pashkam et al., 2019). LOT and
VOT approximately correspond to the locations of LO and pFs (Malach et al., 1995; Grill-Spector et al., al., 1998; Kourtzi and Kanwisher, 2001) but extend further
into the temporal cortex in order to include as many form-selective voxels
as possible in occipitotemporal regions. Specifically, in a separate
scanning session from the main experiment (usually the same one as the
retinotopic mapping session), participants viewed black-and-white pictures
of faces, places, common objects, arrays of four objects, phase-scrambled
noise, and white noise in a block design paradigm, and responded with a
button press whenever the stimulus underwent a slight spatial jitter, which
occurred randomly twice per block. Each block contained 20 images from the
same category, and each image was presented for 750 ms each, followed by a
50 ms blank display, totaling 16 s per block, with four blocks per stimulus
category. Each run also contained a 12 s fixation block at the beginning,
and an 8 s fixation block in the middle and end. Images subtended
9.5° of visual angle. Participants performed either two or three
runs, each lasting 364 s.We also localized a series of color-selective regions in ventral
temporal cortex, using a procedure similar to Lafer-Sousa et al. (2016). Two runs of a color
localizer were presented during the main scan session, one at the middle and
one at the end of the session. In these runs, participants viewed 16 s
blocks consisting of either colorful, highly saturated natural scene images
selected from the online Places scene database (Zhou et al., 2018) or greyscale versions of these
images. Participants responded when an image jittered back and forth, which
occurred twice per block. Images subtended 9.5° of visual angle, and
were each presented for 750 ms (50 ms blank period between stimulus
presentations within a block). Each run contained 16 blocks, 8 for each of
the two stimulus types, for a total run duration of 292 s including an
initial 20 s fixation block, and an 8 s fixation block in the middle and the
end of the run.
MRI methods
MRI data were collected using a Siemens PRISMA 3T scanner, with a
32-channel receiver array headcoil. Participants lay on their backs inside the
scanner and viewed the back-projected display through an angled mirror mounted
inside the headcoil. The display was projected using an LCD projector at a
refresh rate of 60 Hz and a spatial resolution of 1280×1024. An Apple
Macbook Pro laptop was used to create the stimuli and collect the motor
responses. Stimuli were created using Matlab and Psychtoolbox (Brainard 1997).A high-resolution T1-weighted structural image (1.0 × 1.0
× 1.3 mm) was obtained from each participant for surface reconstruction.
All Blood-oxygen-level-dependent (BOLD) data were collected via a
T2*-weighted echo-planar imaging (EPI) pulse
sequence that employed multiband RF pulses and Simultaneous Multi-Slice (SMS)
acquisition. For the two main experiments, including the color localizer runs,
69 axial slices tilted 25° towards coronal from the AC-PC line (2 mm
isotropic) were collected covering the whole brain (TR = 1.5 s, TE = 30 ms, flip
angle = 75°, FOV = 208 m, matrix = 104×104, SMS factor = 5). For
the retinotopic mapping and LOC localizer sessions, 64 axial slices tilted
25° towards coronal from the AC-PC line (2.3 mm isotropic) were collected
covering the whole brain (TR = 0.65 s, TE = 34.8 ms, flip angle = 52°,
matrix = 90×90, SMS factor = 8). Different slice prescriptions were used
here for the different localizers to be consistent with the parameters used in
our previous studies, and to optimize data collection for each paradigm (e.g.,
retinotopic mapping using a rotating checkerboard wedge benefits from a low TR
to more finely capture phase-varying voxel responses). The slices were used to
construct 3D brain volumes, which were then projected onto each
participant’s cortical surface, thus placing the data from different
localizers in a common anatomical space such that the exact slice prescriptions
used had minimal impact on the final results.
Data analysis
FMRI data were analyzed using FreeSurfer (surfer.nmr.mgh.harvard.edu), FsFast (A.M. Dale, Fischl, and Sereno, 1999) and in-house
Python scripts. The exact same analysis pipeline was used for the two
experiments, except that any analyses comparing clockwise versus
counterclockwise spirals in Experiment 1
instead compared the spiky and curvy tessellation patterns in Experiment 2, due to the differing stimuli used.
Preprocessing was performed using FsFast. All functional data was
motion-corrected to the first image of the run of the experiment. Slicetiming
correction was applied, but smoothing was not. A generalized linear model (GLM)
with a boxcar function convolved with the canonical HRF was used to model the
response of each trial, with the three motion parameters and a linear and
quadratic trend used as covariates in the analysis. The first eight TRs of each
run (prior to the presentation of the first stimulus) were included as nuisance
regressors to remove them from further analysis. A beta value reflecting the
brain response was extracted for each trial block in each voxel. ROIs were
defined on the cortical surface (placing the results of the separate localizers
in a common anatomical space) and then projected back to the native 3D
functional space of the main experiment for further analysis.
ROI definitions
Using independent localizers, we defined ROIs in early visual areas
and in higher visual regions showing univariate selectivity to shapes or
colors. Fig. 2 depicts all ROIs for an
example participant. For all ROIs, the results of the respective localizer
paradigms described above were projected onto the cortical surface using
Freesurfer and manually defined (details for different regions described
below); ROIs were then converted to the native functional volume space of
the main experiment to extract the voxels used in ROI analyses.
Fig. 2.
Lateral and ventral views of left and right hemispheres from an example
participant, showing all regions of interest used in the study. Retinotopically
defined areas V1, V2, V3, and V4 shown with black outlines; object-selective
regions LOT and VOT shown with white outlines; posterior, central, and anterior
color-selective regions shown with blue, green, and magenta outlines,
respectively, along with their activation maps from the color versus greyscale
localizer used to define them.
V1 to V4.
Areas V1 through V4 were localized on each participant’s
cortical surface by manually tracing the borders of these visual maps
activated by the vertical meridian of visual stimulation (identified by
locating the phase reversals in the phase-encoded mapping), following
the procedure outlined in Sereno et al.
(1995).
LOT and VOT.
Following the procedure described by Kourtzi & Kanwisher (2001), LOT and VOT
were defined as the clusters of voxels in lateral and ventral
occipitotemporal cortex, respectively, that respond more to photos of
real-world objects than to phase-scrambled versions of the same objects
(p <. 001 uncorrected). These regions
correspond to the location of LO and pFs (Malach et al., 1995; Grill-Spector et al., al.,1998; Kourtzi & Kanwisher, 2001), but
extend further into the temporal cortex in our effort to include as many
object-selective voxels as possible in occipito-temporal regions.
Ventral Stream Color Regions.
Following Lafer-Sousa et al.
(2016), several color regions were identified in ventral
temporal cortex as clusters of voxels responding more to colored images
than to greyscale versions of the same images (p
< .001, uncorrected). Since participants had varying numbers of
such regions, we divided the regions in each hemisphere into anterior,
central, and posterior color regions, following Lafer-Sousa et al. (2016). We were able to
identify posterior and central color regions in every hemisphere of
every participant in both experiments. In Experiment 1, we were able to localize the anterior color
region in both hemispheres of 7/12 participants, one hemisphere of 3/12
participants, and neither hemisphere of 2/12 participants. In Experiment 2, we were able to
localize the anterior color region in both hemispheres of 8/13
participants, one hemisphere of 3/13 participants, and neither
hemisphere of 2/13 participants. The inconsistency in localizing this
color region was possibly due to its location being close to the ear
canals where large MRI susceptibility effects and signal dropoff could
occur. We note that our rate of localizing this color region was similar
to that of Lafer-Sousa et al.
(2016), who reported that this region was found in both
hemispheres of 6/13 participants, one hemisphere of 4/13 participants,
and neither hemisphere of 3/13 participants. These anterior regions were
generally relatively small (mean 49 voxels, std 46 voxels, min 4 voxels,
max 163 voxels), precluding us from conducting meaningful decoding
analyses in these regions. We thus omit them from further analysis. For
reference, Supplemental Figure 1 shows these color-selective regions
for all participants across both experiments (along with
retinotopically-defined V4 for comparison), since fewer studies have
examined these regions compared to the early visual and shape-selective
regions.
V4 and VOT with Color Regions Removed.
We observed that the color regions overlapped with areas V4 and
VOT in some cases. To document the extent to which color and form
decoding in V4 and VOT might be affected by the color regions within
them, we also ran several of the analyses on versions of V4 and VOT with
the color-selective regions removed.
ROI overlap analysis
As noted just previously, we observed that areas V4 (defined
retinotopically), the posterior color region (defined using a color versus
greyscale localizer), and area VOT (defined using an object versus scrambled
localizer) overlapped to some degree. To quantify this overlap, we computed
the pairwise percent overlap between each of these ROIs, where percent
overlap was defined as the percentage of the number of overlap voxels over
the averaged number of voxels for the two ROIs, as we did in a previous
study (Cant and Xu, 2012; see also
Kung et al., 2007).
Multivoxel pattern analysis
In order to equate the number of voxels used in each ROI, the top
300 most active voxels in a stimulus-versus-rest GLM contrast across all the
runs were selected. In addition to the ROIs described above, we also
constructed an ROI for each participant consisting of the 300 most active
voxels from the entire V1–V4 sector defined by the union of
V1–V4, in order to test more sensitively for potentially subtle
effects in several analyses. For several of the analyses (noted in each
section below that describes the analysis), we analyzed subsets of the 100,
200, 300, 400, and 500 most active voxels per ROI, to determine the extent
to which the presence of an effect depended on the number of voxels
selected. A beta value was extracted from each voxel of each ROI for every
trial block. To remove response amplitude differences across stimulus
conditions, trial blocks and ROIs, beta values were z-normalized across all
the voxels for each trial block in each ROI. For each of the contrasts of
interest (described below), these beta values were used to train and test a
linear support vector machine (SVM) classifier (with regularization
parameter c = 1), using leave-one-run-out cross-validation.
T-tests were performed to compare the decoding accuracy of the various
measures to chance (one-sample, one-tailed t-test;
one-tailed was used because below-chance decoding is not conceptually
meaningful). To account for the fact that four participants partook in both
experiments with the other participants being different between the two
experiments, in cases where decoding was compared between pairs of
conditions between the two experiments, a partially-overlapping
t-test (Derrick et al.,
2017) was performed. Likewise, to examine the influence of
experiment, feature type, and their interaction on decoding in each region
and between regions, a linear mixed effects analysis was performed (since
this analysis, unlike the classical ANOVA, is able to explicitly account for
subject-specific variance when only a subset of participants complete both
experiments). Correction for multiple comparisons was applied using the
Benjamini–Hochberg procedure with false discovery rate controlled at
q < 0.05, with the details of this correction
described for each analysis below (Benjamini
and Hochberg, 1995). Specific details for each analysis were as
follows.
Feature Decoding.
To assess the extent to which regions carried information about
single features, in the single-conjunction blocks we trained and tested
the classifier on color (red vs. green) and form (CW vs. CCW spirals in
Experiment 1 and curvy vs.
spiky tessellations in Experiment
2), where both values of the other feature
were fed into each bin of the classifier (e.g., for color decoding,
RedCW and RedCCW versus GreenCW and GreenCCW). Decoding for each
condition was compared to chance (one sample t-test,
one-tailed). Decoding for each feature was also compared between
experiments (partially-overlapping t-test, two-tailed),
and color and shape decoding compared within experiments
(within-subjects t-test, two-tailed). Correction for
multiple comparisons was performed within each ROI across analyses of
the same kind: thus, for comparing decoding for each of the four
conditions (two features by two experiments) to chance, correction for
multiple comparisons was performed across these four comparisons; and
for the four pairwise comparisons in each ROI (comparing color versus
form decoding within each experiment, and comparing decoding for each
feature across experiments), correction was performed across these four
pairwise tests. We additionally performed mixed-effects analyses in each
ROI to directly compare the results of the two experiments. The
mixed-effect analysis is analogous to a two-way ANOVA, but takes into
account the fact that a partially overlapping set of participants took
part in the two experiments. We examined the main effect of feature type
(color vs. form), the form features used in the two experiments
(orientation and curvature), and their interaction. To test for broad
trends in feature coding across the visual hierarchy, we also averaged
the decoding accuracy of ROIs showing qualitatively similar response
profiles via their proximity and their ordinal pattern of their feature
decoding strengths over the two experiments, and the same analyses were
performed for these sectors as were performed in the individual ROIs,
with correction for multiple comparisons applied in the same manner.
Note that this averaging of the individual ROI decoding accuracies
across sectors is different from the V1–V4 macro-ROI described
earlier that is used in other analyses, where decoding is performed once
in a macro-ROI consisting of the most active voxels in the union of
V1–V4. Further linear mixed-effects analyses were used to verify
that the decoding profiles in these sectors in fact varied from one
another.Additionally, to document whether there exist any hemispheric
differences in color and form coding, within each experiment we ran a
within-subjects t-test between the left and right
hemisphere for both color and form coding. Since this analysis was
exploratory, no corrections for multiple comparisons were performed.Finally, to examine the extent to which feature decoding results
for V4 and VOT are driven by their overlap with the color regions, we
constructed ROIs consisting of V4 and VOT minus their overlap with the
color regions. The same feature decoding analyses were run for these
ROIs as for the other ROIs, with the same correction for multiple
comparisons applied. Additionally, two-way mixed-effects analyses with
ROI and experiment as factors were run to examine whether decoding for
either color or form significantly decreased in either region when the
color-selective regions were removed (analyses conducted separately for
each feature).
Feature Cross-Decoding.
To assess whether the two features were represented
independently in each ROI (i.e., whether the representation of one
feature was invariant to changes in the other feature) and whether there
was any evidence of interactive feature coding, in the single
conjunction blocks we performed cross-feature decoding in which we
trained a classifier to discriminate two values of a relevant feature
while the irrelevant feature was held at one value, and tested the
classifier’s performance on the relevant feature when the
irrelevant feature changed to the other value (e.g., train an
orientation classifier on RedCW vs. RedCCW, and test orientation
decoding on GreenCW vs. GreenCCW, or vice versa, with the results from
the two directions averaged together). We did this for both features
serving as the relevant feature. For comparison purposes, we also
performed within-feature decoding, where we held the irrelevant feature
constant between training and testing. This allowed us to compare the
cross- and within-feature decoding using a matched number of trials.
Decoding of each condition was compared to chance (one-sample
t-test, one-tailed). Additionally, within-feature
and cross-feature decoding were compared (within-subjects
t-test, one-tailed; one-tailed was performed
because only a decrease, not an increase, in performance from
cross-decoding is interpretable) within each feature and experiment to
determine whether coding for each feature is tolerant to changes in the
other. Correction for multiple comparisons was performed within the set
of comparisons done for each ROI (i.e., eight comparisons for comparing
each condition to chance; four comparisons for comparing within-feature
decoding to cross-decoding for the two experiments and two
features).Since both kinds of cross-decoding drop — a drop in color
decoding across form features, or a drop in form decoding across colors
— are conceptually similar in that they both reflect a more
interactive feature representation, a one-tailed t-test
was performed within each experiment to take both effects into account
to test for an overall main effect of lower decoding in the
cross-feature versus within-feature decoding conditions (note that this
is the same as assessing a main effect of decoding difference between
the within-feature and cross-feature decoding conditions across the two
types of features using an ANOVA test, but looking at this main effect
in a particular direction). Since this comparison provides critical
evidence regarding whether or not interactive color and form coding may
exist in a brain region, to perform an exhaustive search, we ran this
particular analysis separately for the top 100, 200, 300, 400, and 500
most active voxels in each ROI, and also a combined V1–V4 ROI
that includes the most active n voxels across the entire early visual
sector. Given that SVM is sensitive to both power and noise (such that
including too few voxels may exclude some of the informative voxels and
thus provide insufficient power, whereas including too many voxels may
add noise), testing the effect at a range of voxel sizes allowed us to
assess the stability of any positive results obtained and how it may be
affected by the number of voxels included in the analysis. Correction
for multiple comparisons is applied within each voxel set, and
separately within the early visual ROIs (since these constitute a
replication of the results of Seymour et
al., 2010) and the ventral stream ROIs; thus, correction is
applied across five values for the early visual ROIs (V1, V2, V3, V4,
and the combined V1–V4 macro-ROI; while the last is not strictly
independent of the first four, we include it in the correction to err on
the conservative side), and four values for the ventral stream ROIs
(LOT, VOT, and the posterior and central color regions).
Pattern Difference MVPA.
To probe for the presence of interactive color and form
representation in an ROI, we ran a novel analysis to examine whether the
encoding of one feature (form or color) depends on the value of the
other feature—that is, whether voxels in an ROI show aggregate
evidence of a color-by-shape interaction effect in their tuning.
Specifically, we first took the difference between the z-normalized beta
values associated with RedCW and RedCCW, and between GreenCW and
GreenCCW (Fig. 3). We then trained
and tested an SVM (leave one run out cross-validation) on these
difference vectors to examine whether the pattern differences between
the two orientations change based on the color of the stimulus. We also
performed the opposite analysis, comparing the beta value differences
for the two different orientations (RedCW — GreenCW versus RedCCW
— GreenCCW). The mean classification accuracies of these two
directions of the analysis were then averaged, since an interaction
effect implies a “difference of differences” in both
directions (i.e., a difference in form pattern differences across
colors, or a difference in color pattern differences across form
features). Simulations with a known ground truth verified that the two
directions of the analysis yield similar results, and so the results
from the two directions were averaged rather than arbitrarily choosing
one direction or the other. If the encoding of one feature is invariant
to values of the other feature (i.e., the voxels exhibit only main
effects with no interactions), SVM should discriminate these difference
vectors at chance (50%); by contrast, if the encoding of one feature
changes based on the other feature (i.e., an interaction effect), the
classification should be above chance. Thus, the SVM classification step
serves to aggregate small and potentially heterogeneous
interaction effects across voxels (e.g., one voxel
might show a superadditive interaction effect for RedCW, while another
voxel might show a superadditive interaction effect for Green-CCW),
analogous to how standard SVM decoding analyses aggregate small and
potentially heterogeneous pairwise effects (e.g., some
voxels might slightly prefer one condition and other voxels might prefer
another) across voxels. The same analysis was performed for the
tessellation stimuli in Experiment
2, replacing CW and CCW with the spiky and curvy stimulus
conditions. One sample, one-tailed t-tests were performed for each ROI
to determine if decoding of the pattern differences was above chance
(one-tailed t-tests were used because below-chance decoding is not
conceptually meaningful).
Fig. 3.
Logic of the pattern difference MVPA analysis for A, the
color and orientation spiral stimuli in Experiment 1, and B, the
color and curvature tessellation stimuli in Experiment 2. In this analysis, we
examined which ROIs might code features in a manner that depends on the value of
the other feature. From each ROI, we extracted and z-normalized the patterns
associated with pairs of conditions matched on one feature but varying on the
other, and took the difference between these patterns (e.g., GreenCCW - RedCCW).
We did the same for the other value of the constant feature (e.g., GreenCW -
RedCW). We then used SVM to determine whether these difference patterns were
distinguishable from each other. This was done both possible ways —
discriminating pattern differences in form across colors, and distinguishing
pattern differences in color across the two values of each form feature —
and the decoding accuracies were averaged.
As in the cross-decoding drop analysis, we also ran this
analysis separately on the top 100, 200, 300, 400, and 500 most active
voxels in each ROI, so as to test exhaustively for the presence of
interactive color-form coding in each ROI, and determine the extent to
which the results depend upon the number of voxels selected; we also
included the V1–V4 macro-ROI as in the previous analysis, and
corrected for multiple comparisons in the same manner (separately within
each voxel set and within the early visual and ventral stream ROIs).We note that the information captured by this analysis is
distinct from the information conveyed by feature cross-decoding.
Feature cross-decoding would succeed so long as the pairs of patterns
being cross-decoded end up on the correct side of the SVM decision
boundary, even if the differences between the respective patterns were
distinct (i.e., if main effects in feature coding far exceeded any
interaction effects), or even if most units in the population exhibited
interactive tuning, with just a small subset of units exhibiting strong
invariant tuning for either feature (such that they provide an axis
along which cross-decoding could succeed). By contrast, this method
provides a more direct test regarding the existence of interactive
coding in the representational space.
Double Conjunction Decoding.
As another way of examining which regions may contain
interactive coding of color and form, we trained and tested the
classifier on the two kinds of double conjunction blocks in each
experiment (e.g., RedCW/GreenCCW and RedCCW/GreenCW). These blocks
contained color and form features alternating once per second. Due to
the sluggishness of the hemodynamic response, the pattern of BOLD
activity present in each region would roughly constitute a superposition
of the patterns associated with the two kinds of stimuli in each block.
Since these two kinds of blocks both contained the two color and two
form features used (e.g., red, green, clockwise, and counterclockwise),
but differ in how they were conjoined, only regions encoding color and
form in an interactive manner should be able to decode the two kinds of
blocks from each other. The results of this analysis were compared
against chance (50% decoding) using a one-sample, one-tailed
t-test (one-tailed t-tests were used because
below-chance decoding is not conceptually meaningful).As in the cross-decoding drop and pattern difference analyses,
we performed this analysis separately on the top 100, 200, 300, 400, and
500 most active voxels in each ROI and a V1–V4 macro-ROI
consisting of the most active voxels across the entire sector.
Correction for multiple comparisons is applied in the same way as the
previous two analyses: within each voxel set, and separately within the
early visual ROIs and the ventral stream ROIs.
Results
Using fMRI MVPA, in the two experiments of this study, we examined the
representation of simple and complex form features, color, and their conjunction in
human early visual areas (V1 to V4) and higher-level ventral regions showing
univariate selectivity to shape (LOT and VOT) and color (posterior and middle color
regions) (see Fig. 2 for examples of these
regions). This study served to both replicate the results of a study from Seymour et al. (2010), and extend their results
from early visual cortex to higher-level ventral visual regions and from orientation
to more complex form features. We aimed to understand the coding strength of these
two types of features within a given brain region and across different brain regions
along the ventral visual cortex, whether the multivariate feature selectivity of
each region matches the univariate selectivity reported in past literature (e.g.,
Lafer-Sousa et al., 2016), and whether
these two types of features are represented in a predominantly
independent/orthogonal, or an interactive manner when representations of both
features are found within the same brain region. We examined the coding of color and
orientation in Experiment 1 by showing
clockwise and counterclockwise spirals appearing in red and green colors, and the
coding of color and curvature in Experiment 2
by showing spiky and curvy tessellations appearing in red and green colors. The
phase of all stimuli alternated once per second, equating the overall stimulation
across the visual field (and ruling out the possibility that any
“form” decoding could merely be due to differences in the spatial
envelope of the stimuli). In some of the runs, only a single stimulus type was
present in each block. FMRI pattern decoding from these runs were used to determine
which brain regions contain color and/or form information and how the relative
coding strength of color and form may change across the ventral visual pathway. From
these blocks, two analyses were used to test for the presence of independent versus
interactive coding for color and shape: a standard cross-decoding analysis, and a
novel method that explicitly tested for color-shape interaction effects in the voxel
tuning across each ROI. In the other runs, these stimuli were presented in blocks
where stimuli of different forms and colors were alternated, which we analyzed using
a method adapted from Seymour et al. (2010)
as another metric to test for the presence of interactive coding.
ROI overlap
Since retinotopic V4, the posterior color region, and area VOT overlap
to some degree, we quantified this overlap for each pair of these ROIs. Across
all the participants from both Experiments
1 and 2, V4 and the posterior
color region overlapped by 40.7% +/− 2.4% (mean +/− s.e.). VOT and
the posterior color region overlapped by 16.4% +/− 2.7%. VOT and V4
overlapped by 17.5% +/− 3.5%. There is thus a sizable overlap between V4
and the posterior color region, with both also overlapping slightly with VOT.
Despite these overlaps, as described below, there were significant differences
in how color and form were represented in these brain regions that could not be
predicted by the amount of anatomical overlap. Consequently, we grouped brain
regions in a later analysis by their overall functional response profile, rather
than by the amount of anatomical overlap.
Color and form decoding
To document whether color and form information were present in a brain
region, we compared color and form decoding accuracy in each region against
chance level performance (Fig. 4). Here
decoding was performed between fMRI response patterns differing in one feature
dimension while allowing these patterns to take on either value of the other
feature dimension (e.g., color decoding in Experiment 1 was performed by contrasting the red clockwise and red
counterclockwise conditions against the green clockwise and green
counterclockwise conditions). Except for the central color region, which showed
no significant form decoding in either experiment (ts <
1.14, ps > 0.18), both color and form were decodable
significantly above chance in both experiments in every brain region examined,
including V1 through V4, the two shape regions LOT and VOT, and the posterior
color region (ts > 2.27, ps <
0.03, one-tailed as only values above chance-level performance are meaningful
here; results were corrected for multiple comparisons using the
Benjamini-Hochberg procedure across the four tests within each ROI, since these
tests were of the same kind; see Methods
for more details). Fig. 4 depicts these
results with the significance level of each t-test for
above-chance decoding labeled with asterisks at the top of each bar. Color and
form information is thus widely distributed throughout the ventral visual
cortex, with both features present in every ROI tested with the exception of
form information in the central color region.
Fig. 4.
Results of color and form decoding in both experiments for
(A) early visual areas, (B) shape regions,
(C) color regions, and (D) sectors, which were
formed by averaging the decoding of brain regions showing similar response
profiles. Overall, V1 and V2 show a preference for orientation over curvature
and color. V3 shows an equal preference to orientation and curvature over color.
VOT and V4 showed equal preference to color and curvature over orientation; the
overlap of V4 and VOT with the color regions partially, but not entirely, drove
color decoding in these regions. Removing the color region overlap resulted in
VOT showing a preference for curvature over orientation and color. LOT showed a
preference for curvature over color and orientation. Lastly, the posterior color
region showed a preference for color over orientation but not over curvature,
while the central color region showed a preference for color over both form
features. * p <0.05; ** p <0.01;
*** p <0.001 for t-tests testing for above chance
(> 50%) decoding (all one-sample t-tests, one-tailed, and corrected for
multiple comparisons).
To characterize the coding strength of color and the two types of form
features (i.e., orientation and curvature) within a given region, we next
conducted detailed comparisons within and across the two experiments (all
statistical results are reported in Table
1). We noted that color coding did not vary between the two
experiments in any of the brain regions examined even though only a subset of
the participants completed both experiments. Because color stimulation was
comparable between the two experiments (as the stimuli in both experiments
subtended the same visual angle, with the same pixel-level presentation of
colors), this suggests that participant performance at the group level was
comparable and fairly stable across the two experiments. This enabled us to
directly compare orientation and curvature coding between the two experiments
and evaluate how the processing of these two form features may differ within a
brain region. To account for the fact that partially overlapping sets of
participants partook in both experiments, a linear mixed effects analysis
(analogous to an ANOVA test) was performed to determine the influence of
experiment, feature type, and their interaction on decoding in each region and
between regions, and a partially-overlapping t-test (Derrick et al., 2017; analogous to a
t-test) was performed to compare between pairs of
conditions across the two experiments. Within each experiment, within-subjects
t-tests were used to compare color and form decoding. For these pairwise t-tests
(two comparing color and shape decoding within each experiment, and two
comparing decoding for each feature across the two experiments), correction for
multiple comparisons was applied across the four tests performed within each
ROI.
Table 1
Summary of statistical comparisons within each ROI for color and form
decoding results. Mixed-effects analyses were conducted to test the effect of
experiment, feature type, and their interaction. Within-subject t-tests were
conducted to test the difference between color and form decoding within each
experiment. Partially-overlapping t-tests were conducted to compare the decoding
of each feature across experiments.
Main Effects and
Interaction
Form vs. Color Within
Experiment
Spirals vs. Tessellations
Within Feature
ROI
Experiment
Feature
Interaction
Spirals
Tessellations
Form
Color
V1
z = 1.06
z = 5.80
z = 3.51
t(11) = 4.81
t(12) = 1.10
t(16.6) = 6.21
t(16.6) = 0.97
p = .29
p < .001
p < .001
p = .001
p = .34
p < .001
p = .34
***
***
**
***
V2
z = 0.029
z = 5.50
z = 2.56
t(11) = 5.13
t(12) = 2.05
t(16.6) = 4.12
t(16.6) = 0.20
p = .98
p < .001
p = .01
p = .001
p = .09
p = .002
p = .84
***
*
**
†
**
V3
z = 0.85
z = 2.61
z = 0.082
t(11) = 2.21
t(12) = 2.71
t(16.6) = 1.14
t(16.6) = 1.20
p = .39
p = .009
p = .93
p = .096
p = .075
p = .27
p = .27
**
†
†
V4
z = 0.74
z = 3.87
z = 2.91
t(11) = −3.88
t(12) = 0.18
t(16.6) =
−2.59
t(16.6) = 1.17
p = .46
p < .001
p = .004
p = .01
p = .85
p = .036
p = .34
***
**
*
*
V4 w/out
z = 0.69
z = 0.97
z = 2.31
t(11) = −0.15
t(12) = 2.01,
t(16.6) =
−2.38
t(16.6) = 0.97
Color
p = .49
p = .33
p = .02
p = .34
p = .13
p = .11
p = .34
*
LOT
z = 1.11
z = 0.42
z = 3.1
t(11) = 0.40
t(12) = 4.93
t(16.6) =
−2.52
t(16.6) = 1.24
p = .26
p = .67
p = .002
p = .70
p = .001
p = .04
p = .31
**
**
*
VOT
z = 1.36
z = 4.06
z = 3.26
t(11) = −3.19
t(12) = 0.58
t(16.6) =
−2.66
t(16.6) = 1.18
p = .18
p < .001
p = .001
p = .03
p = .57
p = .033
p = .33
***
**
*
*
VOT w/out
z = 0.88
z = 0.55
z = 2.70
t(11) = −0.44
t(12) = 3.96,
t(16.6) =
−2.59
t(16.6) = 0.92
Color
p = .37
p = .58
p = .007
p = 0.66
p = .008
p = 0.038
p = .49
**
**
*
Posterior Color
z = 0.62
z = 3.80
z = 1.69
t(11) = −3.33
t(12) = 1.59
t(16.6) =
−2.23
t(16.6) = 0.61
p = .54
p < .001
p = .091
p = 0.026
p = .18
p = .079
p = .55
***
†
*
†
p = .55
Central Color
z = 0.72
z = 4.19
z = 0.33
t(11) = −3.27
t(12) = −5.71
t(16.6) =
−1.44
t(16.6) =
−0.65
p = .47
p < .001
p = .74
p = 0.015
p < .001
p = 0.22
p = .52
***
*
***
V1–V3
z = 0.75
z = 5.51
z = 2.40
t(11) = 4.49,
t(12) = 2.64,
t(16.6) = 4.23
t(16.6) = 0.88
p = .46
p < .001
p = .017
p = .002
p = .03
p = .002
p = .39
***
*
**
*
**
V4/VOT
z = 1.09
z = 4.50
z = 3.52
t(11) = −3.77
t(12) = 0.41
t(16.6) =
−2.89
t(16.6) = 1.33
p = .28
p < .001
p = .001
p = .012
p = .69
p = .02
p = .26
***
**
*
*
Color
z = 0.78
z = 5.33
z = 1.50
t(11) = −4.30
t(12) = −3.81
t(16.6) =
−1.69
t(16.6) = 0.73
Regions
p = .43
p < .001
p = .13
p = .005
p = .005
p = .15
p = .47
***
**
**
p < .10;
p <0.05;
p <0.01;
p <0.001 (all two-tailed, and corrected for
multiple comparisons).
As shown in Fig. 4 and Table 1, overall, in early visual areas, V1
and V2 showed a main effect of higher form than color decoding, with decoding
further being higher for orientation than for either curvature or color. V3 also
showed a main effect of higher form than color decoding, but with similar
decoding for both form features. V4, on the other hand, showed a main effect of
higher color than form decoding, with decoding further being higher for color
and curvature than for orientation. In the two form-selective regions, VOT, like
V4, showed a main effect of higher color than form decoding, with decoding
further being higher for color and curvature than for orientation. LOT, on the
other hand, showed no main effect of feature decoding, but higher decoding for
curvature than for either color or orientation, consistent with its role in
object shape processing. Both color regions showed a main effect of higher color
than form decoding. While the posterior color region showed higher decoding for
color and curvature than for orientation with no significant difference between
decoding for color and curvature, the central color region showed higher
decoding for color than for either kind of form feature.Since V4 and VOT overlapped somewhat with the posterior color region, we
performed additional analyses examining decoding in these regions when their
overlap with the color-selective regions was removed. The same feature decoding
analyses were performed in these regions as in the other regions (Fig. 4). Mixed-effects analyses were also performed
for each feature across the two experiments to directly compare form and color
decoding in these regions with or without the parts of these regions that
overlapped with the color regions. For form decoding, V4 showed no main effect
of overlap when the overlap with color-selective regions was removed
(Z = 0.58, p = .57), but VOT showed a
slight main effect with a trend towards an increase in form decoding
(Z = 1.66, p = .096). However, for color
decoding, in both ROIs there was a main effect of overlap, with color decoding
significantly decreasing when the posterior color region was removed
(Zs > 3.4, ps < 0.01),
though color decoding remained significantly above chance (ts
> 2.26, ps < 0.03, one-tailed; corrected for
multiple comparisons across the four t-tests of decoding performance against
chance in each ROI, as in all other ROIs). Removing the overlapping color region
from V4 and VOT also changed the relative coding strength of color and form in
these regions (see the detailed stats reported in Table 1). Both regions no longer showed an overall
main effect of higher color than form decoding, with VOT now showing a greater
sensitivity to curvature than to color or orientation changes. The latter is
consistent with VOT’s role in object shape processing. Thus, removing the
color-sensitive voxels from VOT and V4 removed their apparent feature preference
for color.Based on the overall similarity of their response profiles and their
anatomical proximity, ROIs were grouped into sectors to allow us to directly
compare the feature coding characteristics between the different sectors: early
visual areas V1–V3, lateral visual area LOT, ventral visual areas V4/VOT,
and Color Regions (including the posterior and central color regions). Decoding
accuracies were averaged within each sector across the component brain regions.
The decoding profiles within each sector are reported in Fig. 4 and Table
1, and they are overall consistent with the profile of the individual
regions comprising the sector. Three-way mixed-effects models (sector ×
feature × experiment) performed on each pair of sectors reveal
significant or trending two-way and/or three-way interactions involving sector
for each pair, verifying that each of these sectors indeed exhibits a distinct
feature encoding profile from each of the others (significant or trending
effects included: for Color Regions vs. LOT, sector × feature and 3-way
interaction; for Color Regions vs. V1–V3, sector × feature and
3-way interaction; for Color regions vs. V4/VOT, 3-way interaction; for LOT vs.
V1–V3, sector × feature and 3-way interaction; for LOT vs. V4/VOT,
sector × feature; for V1–V3 vs. V4/VOT, sector × feature
and 3-way interaction; all Zs > 1.8, ps
< 0.07).We found only scattered and limited evidence for hemispheric differences
in color or form coding. In Experiment 1,
V1 showed higher form decoding in the right hemisphere, and V3 showed higher
color decoding in the right hemisphere (ts > 2.36,
ps < 0.05; both two-tailed and uncorrected), but
these effects were not present in Experiment
2 (ts < 0.60,
ps > 0.56; two tailed and uncorrected), and no other
ROIs exhibited a hemispheric difference for decoding of either feature
(ts < 1.7, ps >
0.12; two tailed and uncorrected).Overall, with the exception of the central color region, all other
regions examined showed significant decoding for both color and form, even for
shape and color regions defined based on their univariate selectivity for color
or form. At the same time, significant coding bias also exists in every region
examined: even early visual areas show some feature coding preference, and in
higher visual regions, such a preference appears to be largely consistent
between multivariate decoding and the univariate feature preferences that define
the regions.
Color and form cross-decoding
To understand how color and form are coded together in a brain region,
we next examined the extent to which each feature is encoded in a manner that is
tolerant to changes in the other feature. To do so, we performed cross-decoding
and trained an SVM classifier on one feature (e.g., form) within one value of
the other feature (e.g., red), and tested the classifier in the other value of
the other feature (e.g., green). Additionally, to obtain a baseline measure of
feature decoding with an equal amount of data for comparison purposes, we also
performed within-feature decoding, and trained and tested a classifier in one
feature within the same value of the other feature. Fig. 5 depicts the results of these analyses. Every
region that showed successful decoding of a given feature in the previous
analysis also exhibited significant cross-decoding of that feature
(ts > 1.92, ps < 0.05;
one-tailed t-test, corrected for multiple comparisons with the
eight comparisons performed for each ROI). Meanwhile, V1 and V2, but not other
regions, also exhibited a significant or trending drop in decoding when
performing cross-feature rather than within-feature decoding (Fig. 5): specifically, V1 showed a significant or
trending cross-decoding drop for both color and orientation in Experiment 1 (ts > 2.00,
ps < 0.08; one-tailed and corrected for the four
cross-decoding tests within the ROI), and V2 exhibited a significant or trending
cross-decoding drop for color in Experiment
1, and for both color and curvature in Experiment 2 (ts > 1.61, ps
< 0.09; one-tailed and corrected for the four cross-decoding tests within
the ROI).
Fig. 5.
Results of feature cross-decoding analysis for (A) early
visual areas, (B) shape regions, and (C) color
regions. Solid bars show decoding accuracy for features trained and tested
within the same value of the other feature (e.g., train on RedCW vs. RedCCW,
test on RedCW vs. RedCCW); striped bars show decoding where training and testing
for a feature is done across values of the other feature (e.g., train on RedCW
vs. RedCCW, test on GreenCW vs. GreenCCW). Every region exhibiting successful
decoding of a feature also exhibits significant cross-decoding; that said, V1
and V2 show a significant or trending drop in cross-decoding in several cases.
† p <0.10, * p <0.05, **
p <0.01, *** p <0.001 for
t-tests testing for above chance (> 50%) decoding (one-sample t-tests,
one-tailed) and for t-tests testing for greater within-feature decoding than
cross-decoding (within-subjects t-tests, one-tailed), all corrected for multiple
comparisons.
As the presence of a cross-decoding drop is an informative index of an
interactive, rather than a completely orthogonal, relationship between color and
form coding, to examine this effect in detail, we performed a set of further
analyses. To increase power, we combined the effect from both color and form
decoding (since a drop in either is suggestive of interactive coding between the
features) and tested the amount of decoding drop in each ROI using one-tailed
t-tests. Fig. 6 shows the results of this
analysis for the main voxel set used throughout this study (i.e., 300 most
active voxels in each ROI). To examine how the results may depend upon the
number of voxels included in each ROI and reduce the possibility of obtaining
null results due to too few or too many voxels being included, we also conducted
this analysis separately for the top 100, 200, 300, 400, or 500 most active
voxels in each ROI. Given that SVM is sensitive to both power and noise (such
that including too few voxels may exclude some of the informative voxels and
thus provide insufficient power whereas including too many voxels may add
noise), testing the effect at a range of voxel sizes may provide us with a more
sensitive way to document the effect. Tables
2 and 3 (top panel) show the
results of this analysis for Experiment 1
(spirals) and Experiment 2
(tessellations), respectively. Correction for multiple comparisons was applied
within each voxel set, and separately within the early visual and ventral
ROIs.
Fig. 6.
Results of the three analyses testing for interactive color/form coding
— cross-decoding drop, pattern difference decoding, and double
conjunction decoding — for Experiment 1 (A) and Experiment 2 (B), when
the most active 300 voxels from each ROI were included in the analysis.
Table 2
Statistical results from Experiment 1 (spirals) for the three types of
analyses that measure interactive coding for color and form: cross-decoding drop
(top), pattern difference decoding (middle), and double conjunction decoding
(bottom). Analyses were performed separately for the top 100 to 500 most active
voxels in each ROI. All results were from one-sample, one-tailed t-tests
examining whether the effects were significantly above chance. The first line of
each cell shows the decoding accuracy (or cross-decoding accuracy drop) and
standard error, and the second line shows the t-statistic and p-value separated
by a slash; statistical significance is indicated with a marker on the right of
each cell where applicable. Correction for multiple comparisons was applied
across the set of ROIs within each combination of analysis, voxel set, and
sector (e.g., across the 4 tests conducted for pattern difference decoding in
the Top300 voxel set for the higher-level ventral stream sector).
ROI
Top100
Top200
Top300
Top400
Top500
Cross-Decoding
Drop
V1
2.5 (1.4)1.78 / .085
†
3.7 (1.6)2.29 / .055
†
4.0 (1.5)2.67 / .055
†
3.3 (1.6)2.00 / .09
†
2.5 (1.6)1.48 /
.14
V2
3.6 (1.1)3.18 / .02
*
4.3 (1.4)2.87 / .04
*
2.0 (1.8)1.05 /
.19
3.6 (2.2)1.54 /
.12
2.2 (2.5).85 /
.26
V3
0.5 (1.7).31 /
.48
2.0 (1.5)1.25 /
.15
2.4 (2.1)1.10 /
.19
2.9 (1.9)1.40 /
.12
4.1 (2.3)1.73 /
.14
V4
−1.6
(1.8)−.82 / .78
0.0 (1.5)0.0 / .5
1.5 (1.5).92 /
.19
.9 (1.8).46 / .33
.8 (1.6).49 / .32
V1–V4
3.2 (1.1)2.70 / .03
*
1.5 (1.0)1.48 /
.14
2.9 (1.6)1.77 /
.13
3.6 (1.5)2.24 / .09
†
2.8 (1.5)1.86 /
.14
LOT
−.3
(2.6)−.129 / .55
−.5
(2.1)−.22 / .73
−.5
(2.3)−.20 / .58
0.0 (2.7).016 /
.67
0.7 (2.2).32 /
.59
VOT
−.3
(2.1)−.14 / .55
.2 (2.1).07 / .73
.7 (1.6).41 / .58
0.0 (2.0)−.02 /
.67
−.4
(2.0)−.19 / .59
Posterior Color
.4 (2.3).18 / .55
1.8 (2.8).60 /
.73
2.8 (2.9).92 /
.58
2.3 (2.5).87 /
.67
2.5 (2.5).94 /
.59
Central Color
.8 (2.2).36 / .55
−1.2
(1.9)−.62 / .73
0.0 (2.4)−.02 /
.58
−1.3
(2.9)−.44 / .67
−.6
(2.4)−.23 / .59
Pattern Difference
Decoding
V1
54.0 (2.4)1.62 /
.16
55.4 (2.6)2.01 / .09
†
55.9 (2.5)2.22 / .06
†
54.6 (3.6)1.23 /
.15
55.7 (2.6)2.13 / .05
†
V2
52.4 (1.7)1.38 /
.16
56.8 (2.0)3.28 / .02
*
53.4 (2.3)1.43 /
.13
55.5 (2.2)2.37 / .047
*
54.3 (2.0)2.07 / .05
†
V3
50.8 (1.3).57 /
.29
51.3 (2.0).62 /
.27
49.9 (1.7)−.05
/ .51
52.1 (1.6)1.25 /
.15
51.9 (1.8)1.01 /
.21
V4
51.9 (2.4).75 /
.29
52.9 (2.4)1.16 /
.22
52.8 (2.0)1.33 /
.13
51.5 (2.2).64 /
.27
51.7 (2.4).70 /
.25
V1–V4
53.7 (2.5)1.43 /
.16
51.4 (1.7).76 /
.27
55.2 (2.0)2.55 / .06
†
56.1 (2.0)2.84 / .04
*
55.8 (2.0)2.75 / .047
*
LOT
48.2 (2.7)−.64
/ .92
50.9 (2.8).30 /
.76
50.3 (2.9).09 /
.69
50.3 (3.2).10 /
.82
50.5 (2.9).17 /
.75
VOT
47.2 (1.8)−1.50
/ .92
48.8 (2.2)−.52
/ .89
49.7 (2.3)−.11
/ .69
47.7 (2.4)−.94
/ .82
48.2 (2.5)−.70
/ .75
Posterior Color
51.5 (2.7).53 /
.92
52.7 (2.7).95 /
.72
52.0 (2.8).69 /
.69
51.6 (2.8).54 /
.82
50.3 (2.5).13 /
.75
Central Color
49.1 (2.5)−.33
/ .92
47.7 (1.7)−1.3
/ .89
49.0 (1.9)−.51
/ .69
49.0 (2.3)−.39
/ .82
49.6 (2.2)−.19
/ .75
Double Conjunction
Decoding
V1
55.9 (1.8)3.06 / .03
*
53.2 (1.9)1.63 / .08
†
52.8 (2.1)1.25 /
.20
53.1 (1.6)1.87 / .055
†
52.4 (1.1)2.10 / .049
*
V2
51.9 (1.8)1.02 /
.21
54.4 (1.4)3.1 / .012
*
54.9 (1.2)4.07 / .002
**
54.3 (1.1)3.68 / .009
**
54.3 (1.1)3.79 / .008
**
V3
51.0 (1.2).82 /
.21
52.8 (1.6)1.63 / .08
†
51.8 (1.8).99 /
.21
52.9 (1.4)1.92 / .055
†
52.4 (1.5)1.55 / .09
†
V4
52.2 (1.2)1.80 / .08
†
50.2 (1.1).15 /
.44
48.4 (1.6)−.96
/ .82
50.1 (1.9).044 /
.48
50.7 (2.1).32 /
.38
V1–V4
53.2 (1.3)2.39 / .045
*
54.8 (1.1)4.16 / .004
**
56.9 (1.4)4.64 / .002
**
53.7 (1.5)2.32 / .05
†
55.6 (1.7)3.17 / .01
*
LOT
50.3 (1.5).19 /
.43
50.0 (1.1)−.04
/ .66
49.7 (1.6)−.21
/ .78
50.2 (2.3).07 /
.77
50.4 (1.8).23 /
.77
VOT
51.6 (1.0)1.59 /
.28
51.7 (1.6)1.01 /
.44
50.9 (1.7).50 /
.78
48.7 (1.6)−.76
/ .77
50.8 (1.6).48 /
.77
Posterior Color
51.6 (1.5)1.09 /
.30
51.3 (1.5).80 /
.44
48.9 (1.0)−1.0
/ .84
49.8 (1.1)−.15
/ .77
49.3 (.8)−.83 /
.79
Central Color
51.2 (1.8).62 /
.37
49.2 (1.8)−.42
/ .66
49.9 (1.9)−.05
/ .78
49 (2.3)−.41 /
.77
49.6 (2.2)−.19
/ .77
p < .10,
p <0.05,
p <0.01, and
p <0.001.
Table 3
Statistical results from Experiment 2 (tessellations) for the three
types of analyses that measure interactive coding for color and form:
cross-decoding drop (top), pattern difference decoding (middle), and double
conjunction decoding (bottom). Analyses were performed separately for the top
100 to 500 most active voxels in each ROI. All results were from one-sample,
one-tailed t-tests examining whether the effects were significantly above
chance. The first line of each cell shows the decoding accuracy (or
cross-decoding accuracy drop) and standard error, and the second line shows the
t-statistic and p-value separated by a slash; statistical significance is
indicated with a marker on the right of each cell where applicable. Correction
for multiple comparisons was applied across the set of ROIs within each
combination of analysis, voxel set, and sector (e.g., across the 4 tests
conducted for pattern difference decoding in the Top300 voxel set for the
higher-level ventral stream sector).
ROI
Top100
Top200
Top300
Top400
Top500
Cross-Decoding
Drop
V1
0.0 (1.2)−.03 /
.51
2.2 (1.3)1.62 /
.30
1.5 (1.3)1.08 /
.33
.9 (1.5).58 / .47
.8 (1.8).42 / .57
V2
1.8 (1.7)1.04 /
.51
1.8 (1.4)1.25 /
.30
2.7 (1.0)2.59 / .06
†
2.6 (1.5)1.67 /
.30
1.3 (2.0).64 /
.57
V3
.7 (1.5).45 / .51
−.5
(1.6)−.28 / .77
−1.0
(1.4)−.67 / .91
−1.7
(1.3)−1.29 / .89
−1.5
(1.7)−.85 / .79
V4
.2 (1.4).14 / .51
−1.8
(1.4)−1.21 / .88
−1.4
(.9)−1.41 / .91
−.4
(.8)−.48 / .85
−.4
(1.0)−.39 / .79
V1–V4
.3 (1.2).23 / .51
−.2
(1.3)−.17 / .77
1.4 (1.6).87 /
.33
1.2 (1.5).78 /
.47
1.6 (1.6).93 /
.57
LOT
−2.4
(1.3)−1.8 / .95
1.0 (.6)1.72 /
.12
1.0 (1.3).71 /
.34
.8 (1.0).78 / .51
.1 (.9).14 / .57
VOT
−1.6
(1.3)−1.15 / .95
−.9
(1.1)−.76 / .77
−.3
(1.8)−1.6 / .93
−1.5
(2.2)−.68 / .75
−.2
(2.3)−.08 / .57
Posterior Color
3.0 (1.0)2.95 / .02
*
2.0 (1.2)1.67 /
.12
1.0 (1.2).77 /
.34
.7 (1.0).68 / .51
−.2
(1.2)−.19 / .57
Central Color
−1.6
(1.7)−.88 / .95
.7 (1.3).52 / .41
1.1 (1.5).67 /
.34
.3 (1.6).17 / .58
.4 (1.5).28 / .57
Pattern Difference
Decoding
V1
50.7 (1.7).40 /
.57
52.2 (2.1)1.02 /
.27
53.0 (2.0)1.47 /
.16
53.9 (1.5)2.57 / .06
†
52.2 (1.5)1.44 /
.22
V2
52.3 (1.0)2.19 /
.12
52.9 (1.6)1.75 /
.13
53.0 (1.8)1.56 /
.16
52.0 (2.0).97 /
.29
52.6 (2.4)1.08 /
.25
V3
50.2 (1.7).09 /
.57
47.4 (1.5)−1.70
/ .95
48.4 (1.9)−.81
/ .78
47.9 (1.3)−1.53
/ .92
48.1 (1.6)−1.16
/ .87
V4
50.2 (2.1).072 /
.57
50.4 (1.9).21 /
.53
49.1 (1.8)−.48
/ .78
48.8 (1.7)−.69
/ .92
47.8 (1.8)−1.20
/ .87
V1–V4
49.7 (1.8)−.18
/ .57
52.6 (1.4)1.78 /
.13
52.3 (1.6)1.38 /
.16
54.3 (2.0)2.10 / .07
†
53.3 (1.8)1.73 /
.22
LOT
49.9 (1.7)−.05
/ .69
50.6 (1.5).40 /
.35
53.2 (1.6)1.99 /
.14
50.4 (1.9).20 /
.56
49.8 (2.3)−.07
/ .59
VOT
50.4 (1.9).21 /
.69
50.6 (1.3).41 /
.35
49.0 (1.8)−.50
/ .69
48.6 (2.3)−.58
/ .71
49.4 (2.2)−.24
/ .59
Posterior Color
51.8 (1.7)1.00 /
.67
52.3 (1.6)1.39 /
.35
51.1 (1.8).59 /
.43
51.0 (1.7).54 /
.56
49.5 (2.0)−.23
/ .59
Central Color
47.9 (1.4)−.14
/ .91
51.5 (1.7).84 /
.35
50.8 (1.6).47 /
.43
50.6 (1.4).43 /
.56
49.7 (1.7)−.18
/ .59
Double Conjunction
Decoding
V1
50.4 (1.3).31 /
.48
52.3 (1.7)1.34 /
.17
52.9 (1.8)1.55 /
.37
51.6 (2.0).79 /
.43
51.0 (2.0).50 /
.43
V2
50.7 (1.6).42 /
.48
52.6 (1.3)1.86 /
.17
51.4 (1.9).74 /
.39
51.4 (2.0).66 /
.43
50.7 (1.7).41 /
.43
V3
51.0 (1.9).48 /
.48
49.2 (1.9)−.41
/ .65
51.7 (1.7).94 /
.39
49.8 (1.4)−.11
/ .59
51.0 (2.3).41 /
.43
V4
49.2 (1.6)−.49
/ .68
49.4 (1.9)−.33
/ .63
48.2 (2.4)−.70
/ .75
49.4 (2.3)−.24
/ .59
48.1 (2.5)−.73
/ .76
V1–V4
51.7 (1.8).88 /
.48
51.9 (1.4)1.3 /
.17
50.0 (1.8)0.0 /
.63
51.8 (1.8).97 /
.43
51.6 (1.6).99 /
.43
LOT
49.4 (1.7)−.37
/ .64
49.9 (2.0)−.04
/ .52
48.8 (1.7)−.67
/ .74
51.1 (1.9).58 /
.49
50.4 (2.2).17 /
.55
VOT
50.7 (1.9).36 /
.53
53.4 (3.0)1.11 /
.29
49.7 (2.6)−.12
/ .74
49.5 (2.0)−.23
/ .59
49.8 (1.7)−.13
/ .55
Posterior Color
50.8 (3.0).26 /
.53
50.7 (2.6).27 /
.52
51.3 (3.3).37 /
.74
51.2 (3.3).35 /
.49
50.6 (3.1).20 /
.55
Central Color
53.8 (1.6)2.30 / .08
†
51.8 (1.3)1.31 /
.29
49.5 (1.4)−.33
/ .74
51.0 (1.3).75 /
.49
50.8 (1.3).60 /
.55
p < .10,
p <0.05,
p <0.01, and
p <0.001.
In Experiment 1, V1, V2, and a
macro-ROI composed of V1 through V4 exhibit a significant or trending drop in
cross-decoding across multiple voxel selection conditions. By contrast, in Experiment 2, V2 showed a trend for a
cross-decoding drop in just one voxel selection condition, and the posterior
color region showed a significant cross-decoding drop in just one voxel
selection condition. Thus, the strongest evidence for interactive coding based
on the cross-decoding drop metric is for orientationcolor conjunctions in early
visual regions. Other than these cases, however, color and form exhibit no
significant drop in cross-decoding across the ventral visual pathway.
Directly testing for interactive color-form coding using pattern difference
analysis
Successful cross-decoding merely requires that the test patterns lie on
the same side of the SVM classification boundary as the corresponding training
patterns, and this can occur even in the presence of interactive tuning in the
population. For example, a population with many units exhibiting interactive
tuning can exhibit successful cross-decoding as long as the population also
contains units with invariant tuning to the feature being cross-decoded, and so
testing for a cross-decoding drop is only an indirect measurement of interactive
tuning in a neural population. To remedy this and more directly test for
interactive color-form coding across the human visual system, we performed a
novel pattern difference MVPA analysis to specifically focus on
the interactive effects that may be present in the response patterns in a brain
region. Specifically, we extracted two difference vectors, each
between two stimuli that differed on the same feature dimension (e.g., one
difference vector could be RedCW minus GreenCW, while the other could be RedCCW
minus GreenCCW). We then tested whether these two difference vectors could be
discriminated using SVM. We did this separately for both color and form and then
averaged the results (see Fig. 3 for a
detailed illustration of this approach). If the encoding of one feature is
completely independent and orthogonal to values of the other feature (i.e., only
main effects), then chance-level decoding is expected; by contrast, if the
encoding of one feature changes based on the other feature (i.e., an interaction
effect), then above chance-level decoding is expected. This analysis essentially
examines whether there is any interactive color and form coding in an ROI, with
the SVM classification step serving to aggregate small interaction effects
across voxels.To test sensitively and exhaustively for the presence of interactive
coding using this analysis method, and reduce the possibility that any null
results are due to a nonoptimal number of voxels being used, for each ROI we
performed the analysis separately for the top 100, 200, 300, 400, or 500 most
active voxels. Fig. 6 depicts the results
of this analysis for the top 300 most active voxels (since this was the main
voxel set used throughout this study); Tables
2 and 3 (middle panel) show
the results for all voxel sets. In Experiment
1 (spirals), we found trending or above-chance pattern difference
decoding in multiple voxel sets from each of V1, V2, and the macro-ROI composed
of V1–V4 (one-sample, one-tailed t-tests; corrected for multiple
comparisons within each voxel set, and within each anatomical sector as
described in Methods). In Experiment 2 (tessellations), we only found a trend in
one voxel set each from V1 and the V1–V4 macro ROI, that did not
replicate in any other voxel selection conditions. The overall pattern of
results, then, is similar to that of the cross-decoding drop analysis: evidence
of interactive color/form decoding is most reliably found in early visual cortex
but not in higher-level ventral regions, and for orientation but not for
curvature.
Testing for interactive color-form coding using double conjunction
decoding
As another way to test for the presence of interactive coding of color
and form, in an independent set of data, following Seymour et al. (2010), we examined which ROIs are
able to discriminate between two pairs of stimuli, where each pair has the same
set of four individual features, but conjoined in different ways. Specifically,
we trained a classifier to discriminate between two kinds of blocks, each
consisting of alternating pairs of stimuli with different form and color
features, such that the same set of four features is present in each kind of
block, but combined in different ways (e.g., one kind of block alternated
between RedCW spirals and GreenCCW spirals, and the other alternated between
RedCCW and GreenCW spirals). If a region encodes these features in a completely
additive, orthogonal manner, such that tuning to a feature does not depend on
the value of the other feature, then patterns of activity in this region should
not be able to distinguish these two kinds of block; by contrast, if there is
any interactive coding of features, such that some voxels are sensitive to
particular pairings of color and form features, then an SVM
classifier should be able to distinguish these two kinds of blocks.As in the decoding drop and pattern difference analyses, we performed
this analysis separately on the top 100, 200, 300, 400, and 500 most active
voxels from each ROI. Fig. 6 shows the
results of this analysis for the set of 300 voxels, and Tables 2 and 3 (bottom panel) show the results for all voxel sets (one-sample,
one-tailed t-test; correction for multiple comparisons within
each voxel set, and within each anatomical sector as described in Methods). In Experiment
1 (spirals), we found above-chance or trending double conjunction
decoding for multiple voxel sets in V1, V2, V3, and the V1–V4 macro ROI.
A trend was found in V4 for just one voxel set. By contrast, in Experiment 2 (tessellations), a trend was found for
one voxel set in the central color region, with no other trending or significant
results in any voxel set or ROI.In order to compare our results more directly with those of Seymour et al. (2010), we also re-ran the
analysis with three changes to the pipeline to better match their analysis
approach. First, we included all voxels falling under p
< .01 in a task versus rest contrast, instead of using the top 300 voxels
in such a contrast. Second, instead of z-normalizing the beta values going into
the analysis across voxels within each trial, we normalized the beta values of
each voxel across all its trials. Third, we did not apply
correction for multiple comparisons. When we used the p
< .01 activation threshold for voxel selection, we found no significant
conjunction decoding in any individual ROI, or in the V1–V4 macro-ROI,
with either within-voxel normalization (ts < 1.07,
ps > 0.15, uncorrected) or across-voxel
normalization (ts < 1.17, ps >
0.13, uncorrected), with the exception of a trend in V1 (t(11)
= 1.56, p = .07, uncorrected). When we selected the most active
300 voxels (as we primarily used in our study), but used the within-voxel
normalization method used by Seymour et al.
(2010), we found significant conjunction coding in V2, V3, and the
V1–V4 macro-ROI (ts > 2.62, ps
< 0.02, uncorrected), along with a trend in V1 (t(11) =
1.59, p = .07, uncorrected) but no significant or trending
decoding in V4 (t(11) = −0.14, p = .55,
uncorrected). All in all, then, we replicate their finding of conjunction coding
for V1, V2, and V3 when we apply the normalization method their study used, but
not in V4.
Discussion
Using fMRI pattern decoding and examining color and orientation coding in
Experiment 1 and color and curvature
coding in Experiment 2, the present study
extends an earlier study by Seymour et al.
(2010) and provides a comprehensive and updated documentation of the
coding of color and form information across the ventral visual processing pathway in
the human brain.Broadly, we found that color and form information is nearly always
anatomically commingled in the human ventral visual pathway. This includes early
visual areas V1 to V4, thus replicating the color and form decoding results of Seymour et al. (2010), and previously
documented higher ventral visual regions defined based on their univariate
selectivity for color or shape, including the posterior color region, LOT, and VOT.
This is especially striking in the case of LOT, since it is nowhere in the
anatomical vicinity of the color-selective regions. The only exception to this
pattern is the central color region which showed significant color decoding, but no
form decoding, in both experiments, making it unique among the regions we examined.
We were unable to reliably localize the anterior color region in every participant
here due to its location near the MRI signal dropout zone (at a rate similar to
Lafer-Sousa et al., 2016). Overall,
across the human ventral visual processing pathway, we found a largely distributed
representation of color and form features, even in higher visual regions defined by
their univariate selectivity for one feature or the other.That said, coding preference for either feature, quantified using MVPA,
varied across regions, and depended on the specific form feature tested. V1 and V2
were most sensitive to orientation differences, and less so to either curvature and
color differences, thus showing a preference for orientation over curvature and
color. V3 showed higher sensitivity to either form feature than to color. VOT and
V4, which greatly overlapped, showed equally strong sensitivity to color and
curvature differences, but less sensitivity to orientation differences. The latter
could potentially be due to the mirror symmetry of the clockwise and
counterclockwise spirals used, since some evidence suggests that responses in VOT
may be invariant to mirror-symmetric transformations (Dilks et al., 2011). The overlap of V4 and VOT with the
color regions partially, but not entirely, drove color decoding in these regions:
removing the color region overlap significantly decreased color decoding in these
regions, but it remained above chance. Interestingly, removing the color region
overlap also resulted in VOT showing a preference for curvature over orientation and
color, consistent with this region’s univariate selectivity for complex
object shapes. LOT showed roughly equal sensitivity to color and orientation
changes, but far greater sensitivity to curvature changes, consistent with its
univariate selectivity for complex object shapes. Finally, the posterior color
region showed greater sensitivity to color than orientation, but an equal
sensitivity to color and curvature, while the central color region showed a greater
sensitivity to color than to either form feature. Thus, despite an overall
distributed representation of color and form features, even early visual areas show
a feature preference, and in higher visual regions, their feature preferences are
largely consistent between the multivariate measures used in this study and their
univariate feature selectivity extensively documented by previous studies. These
results show that color and form features are represented in the human brain in a
biased distributed manner.That said, color and form information in different regions may potentially
play different roles in visual information processing. For instance, achromatopsia
patients can perceive isoluminant, color-defined shapes (e.g., a red square on a
green background), even if they cannot report the colors that define the shape
(Victor et al., 1989; Heywood et al., 1991; Barbur et al., 1994, 1998). This suggests that only feature information
in some regions may be available to conscious perception.To understand how color and form may be represented together in regions that
code for both features, we examined the extent to which color and form are encoded
in an orthogonal manner (with coding for each feature unaffected by
the value of the other feature), an interactive manner (where
coding for each feature depends on the value of the other feature), or some mixture
of these motifs. In order to exhaustively test for the presence of interactive
tuning and examine whether the results depend upon the set of voxels examined in
each ROI, we performed each of these analyses on the 100, 200, 300, 400 and 500 most
active voxels in each region.Using a cross-decoding approach, we found most regions encode color and form
information in a manner that is tolerant to changes in the other feature,
demonstrating some independence in representation between these two features in each
region. To assess the existence of interactive coding, we examined the amount of
cross-decoding drop. We also devised a novel analysis method, pattern
difference MVPA, that tests for the presence of multivariate
interaction effects in voxel populations with greater sensitivity (see Methods). We reasoned that successful
cross-decoding could coexist with interaction effects in a population when the
interaction effects are small and leave the representations on the correct side of
the classification boundary, or if interactively tuned voxels coexist with voxels
that show strong independent tuning. By contrast, pattern difference MVPA presents a
more direct test of interactive coding in a population. As a final test of
interactive coding, in a separate data set, we also examine decoding using the
double conjunction methodology developed by Seymour
et al. (2010).Across these three different analysis techniques and two independent
datasets, we found evidence for interactive coding for color and orientation in
early visual cortex, with these effects replicating across varying numbers of voxels
included in the decoding analysis. These results largely aligned with those of Seymour et al. (2010), with one exception:
while their study found significant interactive color/orientation coding in V4, we
only found weak and non-replicable evidence for such coding in V4, only finding a
trend for one analysis method in one voxel set. On the other hand, evidence for
interactive coding for color and curvature was scarce, with no brain region showing
replicable significant results across different analysis methods or voxel sets. Thus
replicable evidence exists for interactive coding of color and form in early visual
cortex and for simple form features, but not in higher-level visual regions or for
more complex form features, where color and form appear to be encoded more
orthogonally. It should be noted that even in early visual cortex we obtained much
stronger decoding results for single features than for feature conjunctions and that
cross-decoding accuracy was above chance. This suggests that, despite the presence
of interactive color and orientation coding in early visual areas, color and form
representations still exhibit a high degree of independence in all regions
examined.As an experimental method, fMRI depends on the heterogeneity of neuronal
tuning across voxels at the probed spatial resolution. Our results thus should be
understood within the limitations of this method, like all other fMRI studies. That
said, the spatial scale measured by fMRI often reasonably tracks the documented
spatial heterogeneity of neuronal feature tuning in several of the ROIs that were
examined. For example, V1 orientation columns are organized at a scale visible to
fMRI and plausibly contribute to fMRI MVPA decoding (Yacoub et al., 2008; Pratte et al.,
2016), V2 neurons are organized into “stripe” patterns,
approximately 1–2 m wide, with different kinds of stripes exhibiting
different feature tuning (Ts’o et al.,
2001), and monkey IT neurons are often organized into clusters 0.5 mm in
diameter containing neurons with similar tuning (Wang et al., 1996; Tsunoda et al.,
2001). As such, neural organization at the mesoscale visible to fMRI is
not arbitrary or meaningless, but well-suited to capture the spatial tuning
heterogeneity across neurons in many cases. This has enabled the representations
visible to fMRI to be linked to the underlying neural computations, with fMRI
decoding strength from human ventral and dorsal visual regions being tightly
correlated with behavioral performance. For example, color decoding in V4, but not
V1, reflected perceptual color space (Brouwer and
Heeger, 2009), orientation decoding in early visual areas and superior
intraparietal sulcus during the delay period of a visual working memory task tracked
behavioral change detection performance (Bettencourt
and Xu, 2016), and both object exemplar decoding and object category
decoding in ventral and dorsal regions reflected perceived object similarity as
measured by behavioral visual search and similarity judgement tasks (Mur et al., 2013; Charest
et al., 2014; Jeong and Xu, 2016;
Cohen et al., 2017; Xu and Vaziri-Pashkam, 2019). Thus, the mesoscale
neuronal organization visible to fMRI can be used to probe the underlying neural
computations.In our study, decoding for each feature depends on the amount of variation
we introduced within each feature. Because similarity within a feature likely
changes across brain regions (e.g., two similar colors in one region may become
dissimilar in another region), it would not have been possible to equate color and
form variations for all the brain regions examined. Thus we have chosen what we
believe to be reasonably large variations within each feature, including choosing
two spirals with opposite directions, two tessellation stimuli with either all
straight or all curved contours (thereby greatly varying an important midlevel form
feature, curvature), and two hues that are maximally distinctive. These feature
variations allow us to make a reasonable evaluation of the relative coding strength
of color and form in each brain region, and more importantly, how the feature coding
bias may change across visual regions. Although it could be argued that perhaps a
wider array of colors and form features could have been sampled, by using a small
number of stimuli chosen to greatly differ with respect to a chosen dimension (hue,
orientation, curvature), we were able to maximize our power, giving us more
confidence that any null results were not due to an inadequate number of trials.
Furthermore, the logic of the double-conjunction design we used in one of the
analyses requires two pairs of stimuli that differ with respect to two features.One confound in comparing MVPA decoding across different experiments is that
decoding accuracy can be affected not just by the strength of the underlying neural
tuning, but also by factors like different analysis parameters, different levels of
noise, and differences in data quality. For the present two experiments, however,
the analysis pipelines were completely identical, removing analysis-related
confounds. Furthermore, color decoding was statistically indistinguishable between
experiments for every ROI, providing a common metric that suggests that levels of
noise and data quality did not substantially differ, lending validity to the
between-experiment comparisons. Another important confound in fMRI decoding
approaches is that two experimental conditions can be discriminated by a linear
classifier purely on the basis of differences in their noise covariance across
voxels, even if their pattern centroids are the same (Hebart and Baker, 2018). Since our pattern difference
analysis was novel, we therefore performed a control analysis in which we subtracted
the mean pattern centroid of the training data within each condition to equate the
pattern centroids between conditions while maintaining differences in covariance
structure, and then fed these transformed patterns into our classifier. As a test
case, we examined the macro-ROI consisting of the most active 300 voxels across
V1–V4 in Experiment 1 (spirals), since
we found replicable evidence for interactive color-form coding in this sector. We
found chance level decoding with this data transformation (mean decoding accuracy
50.09%; t(11) = 0.21, p = .42 for one-sample,
one-tailed t-test comparing against chance), suggesting that this
confound does not account for our results.One potential limitation of this study was that the stimuli were
non-naturalistic and arguably “texture-like”. This may have
contributed to several of the null results, such as the failure to find form coding
in the central color patches (which Chang et al.,
2017 found in the macaque), and the limited scope of conjunction coding
that the study identified. However, one key advantage of the stimuli used was that,
by repeatedly presenting the two complementary phases of the same stimuli we used in
both experiments, it allowed the whole central visual field to be equally
stimulated, increasing the odds of identifying conjunction decoding anywhere in the
central visual field. Moreover, past work has found that object ensembles containing
repeated shapes activate high-level object shape regions just like single objects,
supporting the use of such stimuli to drive these regions (Cant and Xu, 2012). Although the stimuli in Experiment 2 were not scaled for eccentricity,
this would only account for the null interactive coding findings if this coding
motif only occurs over specific spatial scales, which would imply that it plays a
rather specific rather than general role in visual processing.In the present study, we found significant decoding of color and form much
more reliably and broadly than we found evidence of interactive coding for these
features, raising the question of what underlying patterns of neuronal tuning may
account for these results. It is possible that neurons exhibiting interactive
color/form tuning exist in higher-level ventral regions, but are not clustered in a
sufficiently heterogeneous manner across voxels to be visible to fMRI MVPA. However,
even if this is the case, it is interesting that such heterogeneity would be present
for form- and color-coding neurons in higher-level ventral regions so as to enable
decoding of individual features, and present for conjunction-coding neurons in early
visual cortex so as to enable conjunction decoding in these
regions, but absent for conjunction-coding neurons in higher-level ventral regions.
At the very least, if these neuronal populations do exist, we can conclude that they
are distributed very differently from the other neuronal populations involved in
color and form coding in the ventral visual cortex. It is also possible such
neuronal populations simply do not exist, thereby avoiding the potential
combinatorial explosion involved in having dedicated neurons for encoding the
combination of every form and every color.Treisman and colleagues have famously argued that independently coded
features can be conjoined via their shared location (Treisman and Gelade, 1980). One proposed neural mechanism for achieving
this has been long-range synchronized firing between neurons corresponding to
different features of the same object at the same spatial location (Singer, 1999), with the posterior parietal cortex (PPC)
serving a critical role in mediating this process (Robertson, 2003) as damage to PPC can result in feature binding deficits
(Cohen and Rafal, 1991; Friedman-Hill et al., 1995). However, it is unclear how
such a code would be generated and read out, and the wiring patterns and temporal
firing precision of neurons between brain regions may be insufficient to implement
this code (Shadlen and Movshon, 1999).
Nevertheless, binding through a shared location via a neural mechanism other than
synchrony is still possible. Every region we examined was either defined through
retinotopic mapping or plausibly overlaps with a region that exhibits retinotopy
(e.g., the posterior color patches overlap with V4, and the central color patches
potentially overlap with retinotopic regions VO1 and/or VO2; see Brewer et al., 2005; Larsson and Heeger, 2006; Wandell and
Winawer, 2011). The co-existence of color and form representation,
together with the presence of a detailed spatial map, could facilitate a binding by
location mechanism at the local level without evoking long-range couplings between
brain regions through neural synchrony, thereby serving as a potential binding
mechanism (see also Di Lollo, 2012). How
should we then bridge our results with the documented role of parietal cortex in
binding? While past accounts posit that parietal cortex plays a purely spatial role
in linking different features (e.g., Cohen and
Rafal, 1991; Friedman-Hill et al.,
1995), more recent accounts emphasize its role in the direct encoding and
maintenance of task-relevant visual information (Bettencourt and Xu, 2016; Vaziri-Pashkam
and Xu, 2017; Xu, 2017; Xu; 2018a, 2018b). It is possible that the commingling of color and form
information on spatially organized ventral stream cortical maps serves to implicitly
define the binding of features, but that parietal cortex must then explicitly
extract these bindings for conscious perception and task-relevant processing. At
minimum, the present study charts the anatomical layout and coding scheme of the
ventral stream feature representations over which any putative parietal mechanism
involved in feature binding might operate.To conclude, our comprehensive approach illuminates the overall architecture
of color and form processing in the human brain. Color and form information was not
anatomically segregated into distinct anatomical regions defined by their univariate
selectivity to either feature, but instead was generally co-localized in the same
brain regions in a biased distributed manner throughout the ventral visual
processing pathway, with decoding from color- and shape-selective regions largely
consistent with their univariate preferences. Convergent evidence from three
analyses and two independent data sets further shows that the joint coding of color
and form within a region is overwhelmingly additive, with an additional (and
relatively small) interactive component present in a subset of cases, reliably found
only for the joint coding of color with simple form features in early visual cortex.
Thus, the predominant relationship between color and form processing in the human
ventral visual hierarchy appears to be one of anatomical coexistence but mostly
representational independence.Supplemental Figure 1. Ventral view of brain showing
color-sensitive patches for all participants; posterior color patches are
shown in red, central color patches are shown in green, anterior color
patches are shown in blue, and retinotopic V4 is shown as a black
outline.
Authors: R Malach; J B Reppas; R R Benson; K K Kwong; H Jiang; W A Kennedy; P J Ledden; T J Brady; B R Rosen; R B Tootell Journal: Proc Natl Acad Sci U S A Date: 1995-08-29 Impact factor: 11.205