Literature DB >> 30042846

fMRI-based decoding of reward effects in binocular rivalry.

Gregor Wilbertz^1,2, Bianca M van Kemenade^1,2,3,4, Katharina Schmack^1,2, Philipp Sterzer^1,2,3,5.

Abstract

Binocular rivalry is a phenomenon where the simultaneous presentation of two different stimuli to the two eyes leads to alternating perception of the two stimuli. The temporary dominance of one stimulus over the other is influenced by several factors. Here, we studied the influence of reward on binocular rivalry dynamics and its neural representation in visual cortex. Orthogonal rotating grating stimuli were shown continuously, while monetary reward was given during the conscious perception of one stimulus but not the other. Periods of perceptual dominance were assessed both through participants' subjective report and objectively using functional magnetic resonance imaging and multi-voxel pattern analysis. Results did not confirm previous evidence for an effect of reward on perceptual dominance durations. Exploratory post-hoc analyses indicated that knowledge regarding both the reward contingency and the subjective nature of perceptual alternations may have interfered with potential reward effects on perceptual phase durations, suggesting a moderating role of meta-cognitive awareness in reward-based perceptual inference. Future studies of top-down influences on bistable perception should carefully consider the methodological challenges related to meta-cognitive awareness.

Entities: Chemical Disease Gene Species

Keywords: MVPA; awareness; binocular rivalry; fMRI; reward

Year: 2017 PMID： 30042846 PMCID： PMC6007140 DOI： 10.1093/nc/nix013

Source DB: PubMed Journal: Neurosci Conscious ISSN： 2057-2107

Highlights –MVPA and fMRI allow decoding of ongoing bistable perception. –No evidence for a reward effect on reported or decoded perception. –Conditioning of bistable perception might depend on meta-cognitive awareness.

Introduction

Binocular rivalry is a well-known phenomenon (e.g. Wheatstone 1838) that occurs when two incompatible stimuli are presented separately to the two eyes and consequently compete for awareness in an alternating fashion. It is well established that the dynamics of binocular rivalry, e.g. durations of perceptual dominance of a stimulus, are influenced by low level stimulus properties such as contrast and brightness (for review, see Blake and Logothetis 2002). There is also evidence that higher-level factors affect perceptual fluctuations during binocular rivalry and related bistable phenomena, including expectations (Sterzer ; Denison ; Schmack , 2016b), emotional relevance of a stimulus (Alpers ; Alpers and Gerdes 2007; Sterzer ; Gerdes and Alpers 2014; Schmack ) or motivational value (Balcetis ). Furthermore, voluntary and involuntary attention can increase the dominance of the attended stimulus or decrease the dominance of the unattended stimulus (Chong ; van Ee ; Hancock and Andrews 2007). It has long been hypothesized that the perceptual selection processes that underlie binocular rivalry are also subject to reinforcement learning (Bruner and Goodman 1947). However, evidence for top-down effects, specifically those regarding motivational factors such as reward, has been questioned due to methodological challenges related to the assessment of perceptual fluctuations on the basis of introspective reports, which are easily biased by subjective criteria (Orne 1962; Rees and Fishbein 1970; Erdelyi 1974; Firestone and Scholl 2016). Recent work overcame these challenges by using indirect measures of perceptual fluctuations in binocular rivalry. For example, rather than asking participants to report their perceptual states, a recent study used a probe detection task from which perceptual states could be inferred (Wilbertz ). This measure revealed a relative increase of rewarded percept durations in one group of participants and a relative decrease of punished percept durations in another group. Similarly, Marx and Einhauser (2015) reported differential reward and punishment effects using an eye-movement-based measure of perceptual dominance. In the present study, we sought to extend this research by using another indirect measure of perception based on brain activation. To this aim, we applied fMRI-based multivoxel pattern analysis (MVPA) to decode perceptual fluctuations from neural activation in the visual cortex – a method that has previously proven useful for the tracking of perception during binocular rivalry (Haynes and Rees 2005; Bertolino ) and other types of bistable perception (Brouwer and van Ee 2007; Schmack ; Brascamp ). We hypothesized that such decoding of perceptual fluctuations would reveal an increase in dominance durations for the perceptual state paired with reward. We further asked whether this putative effect of conditioning on perception would remain stable after termination of the conditioning procedure. Given the above-mentioned difficulties in separating genuine reward effects from other influences on perception, we not only built our test on an objective (and thus response-bias free) measure of conscious perception but also carefully monitored participants’ beliefs and insights into our experimental design as well as possible task strategies by rigorous debriefing.

Materials and Methods

Participants

Twenty-nine healthy volunteers participated in the study after giving written informed consent according to the ethical review committee of the German psychological association (Deutsche Gesellschaft für Psychologie). They were recruited from a larger pool of volunteers based on a separate behavioral screening session in order to ensure binocular rivalry with sufficiently long dominance durations (median > 3 s). A total of 7 out of 29 participants had to be excluded due to a) non-corrected vision deficits (i.e. myopia, n = 2), b) excessive head movement during fMRI scanning (>1.5 mm between two consecutive scans, n = 2), c) falling asleep (n = 2) or d) severe problems to fuse the two images (n = 1). The final sample consisted of N = 22 participants (16 female). They all were right-handed medical students, aged 18–30 years, free of any current mental or neurological disorder and with normal or corrected to normal vision. Participants were naïve with regard to binocular rivalry and had never participated in a similar experiment before.

Materials

Two red or blue rotating grating stimuli (similar to Haynes and Rees 2005; Wilbertz ) were used in a binocular rivalry paradigm, i.e. one stimulus was presented to the left eye, one to the right (see Fig. 1). Each stimulus consisted of a monochrome colored ring (0.6° and 3.0° visual angle eccentricity of inner and outer circle, respectively), spatially smoothed infront of a black background. The stimuli had black stripes (0.8° visual angle), orthogonally oriented between the left and right stimulus. Stimuli rotated with 360° per second around a central axis (0.2° visual angle, colored red or blue, respectively). The color of the red stimulus was fixed at 73.7 cd/m2, whereas the blue one was individually adjusted for equivalent subjective brightness in a minimal flickering procedure prior to the experiment (M = 69.3, SD = 32.8 cd/m2; note, these measures refer to the colored part of the stimulus only, whereas brightness of the whole stimulus, including black stripes, could be approximated with half these values). A black divider separated the two stimuli, preventing the left eye from seeing the right stimulus and vice versa. Participants wore prism glasses to promote fusion of the stimuli (Schurger 2009). To enable complete fusion between the two eyes from the beginning, the horizontal distance between the stimuli was individually adjusted prior to their first presentation. Binocular fusion was further aided by a high-contrast square frame (6.4 × 6.4° visual angle) surrounding the stimuli. Finally, two colored dots (0.1° visual angle) in the upper left and upper right corner of each stimulus helped participants to remember the percept–response mapping throughout the experiment (see the “Procedure” section).

Figure 1.

Binocular rivalry stimuli and experimental design

Notes: Left and right eye were stimulated separately by two rotating gratings that competed for perceptual dominance continuously during 18 runs of 180 s each. Unbeknown to the participants, one of the two stimuli was randomly chosen before the experiment to be rewarded whenever it was perceptually dominant in the conditioning phase. During conditioning runs, an acoustic signal repeatedly indicated the delivery of €0.20 (on average every 3.3 s) as long as the respective stimulus dominated perception according to the participant’s report (see “Materials” and “Procedure” sections for more details).

Binocular rivalry stimuli and experimental design Notes: Left and right eye were stimulated separately by two rotating gratings that competed for perceptual dominance continuously during 18 runs of 180 s each. Unbeknown to the participants, one of the two stimuli was randomly chosen before the experiment to be rewarded whenever it was perceptually dominant in the conditioning phase. During conditioning runs, an acoustic signal repeatedly indicated the delivery of €0.20 (on average every 3.3 s) as long as the respective stimulus dominated perception according to the participant’s report (see “Materials” and “Procedure” sections for more details).

Procedure

Outside the scanner, participants were instructed about the task as follows: “During 18 runs of 3 min each you will see a stimulus that alternates in color. Please press button A whenever the stimulus turns to red, and button B whenever the stimulus turns to blue (for mixed colors you have to decide which dominates and press for that color). Sometimes you will hear the sound of a falling coin, which indicates that your balance is increased by €0.20. The final balance will be paid to you at the end of the experiment (approximately between EUR 10 and 30).” Participants were kept naïve with regard to binocular rivalry and the aim of the reward manipulation in order to prevent voluntary control of percept alternation or dominance durations. Furthermore, participants were asked to fixate the center of the stimulus throughout the experiment. The stimulus–response mapping was swapped after each run and the meaning of each button (i.e. color) was presented continuously in the upper corners of the screen to aid memory. Each run consisted of 180 s continuous presentation of the rivaling stimuli. Unbeknown to the participants the experiment was divided into a baseline phase (runs 1–6), an instrumental conditioning phase (runs 7–12) and an extinction phase (runs 13–18, see Fig. 1). During the baseline phase, no reward was given. During the conditioning phase, reward was given contingent on reported perceptual dominance but only during the report of one of the two colors (i.e. red or blue, counterbalanced across participants as well as for left and right and for the a priori stronger or weaker eye). Application was based on the cumulative reported perceptual dominance of the rewarded stimulus, i.e. an online algorithm decided every 2.5 s (±0.5 jitter) whether a reward was delivered or not (50% probability). At the end of the conditioning phase, i.e. after run 12, participants were asked to answer three questions on a continuous analog scale regarding their hypotheses about associations with reward delivery, in order to assess subjective reward contingency awareness (“Was the reward given at random?”, “Did you notice any association with color?”, “Guess which color was more associated with reward?”). During the extinction phase, no reward was given. A functional localizer run (180 s) at the end of the task comprised six alternating blocks of either two red or two blue stimuli shown simultaneously to both eyes (i.e. no rivalry) for 15 s (stimulus-on), followed by 15 s with only the fixation cross (stimulus-off) each. After the experiment, participants were debriefed about their hypotheses regarding the aim of the experiment, the origin of color changes, the reward manipulation as well as possible strategies and responses to the reward. In order to assess the frequency of eye blinks as a potentially confounding factor, eye tracking of the left eye was performed throughout the whole experiment in a subgroup of n = 9 participants using an MRI-compatible high-resolution video eye tracker (iView XTM MRI 50Hz, SensoMotoric Instruments, Teltow, Germany; see Supplementary Material for more details and results).

fMRI data acquisition

Imaging was performed on a 3 Tesla Siemens Trio MR scanner (Siemens AG, Erlangen, Germany) with a standard 12-channel head coil. Ninety-three functional scans per run were acquired using a blood-oxygen-level dependence (BOLD) sensitive T2*-gradient echo planar imaging (EPI) sequence [TR = 2 s, TE = 30 ms, flip angle = 78°, 33 axial slices (descending) with 3 mm thickness, field of view (FOV) = 192 mm, spatial resolution = 3 × 3 mm]. Structural images were acquired using a standard T1-weighted pulse sequence [TR = 1.90 s, TE = 2.52 ms, flip angle = 9°, FOV = 256 × 256 × 192 mm, spatial resolution = 1 × 1 × 1 mm).

Analysis

Functional images were corrected for movement artifacts applying the realignment procedure implemented in SPM8 (Welcome Department of Cognitive Neurology, London, UK, http://www.fil.ion.ucl.ac.uk/spm). Neural activity during the functional localizer run was modeled in a first-level general linear model with one regressor for stimulus-on convolved with the hemodynamic response function (HRF). T-values of the stimulus-on contrast were used to select voxels within visual cortex (V1–V5, as defined in SPM’s anatomy toolbox) for the subsequent MVPA (see Fig. 2). The number n of selected voxels among the most significant voxels in this contrast could vary between 100 and 1000 in steps of 100 and was one of three optimized parameters in a nested cross-validation procedure (see below). Raw values for these voxels were extracted from realigned EPI images from the main experiment, i.e. for all assessed 93 time points, normalized to [−1, 1], and linearly detrended. Each time point was then labeled −1 or 1 for percept A or B, respectively, applying a boxcar function to the behavioral reports of perceptual changes. Time points for which no reported percept was available or for which the behavioral report did not follow a clear alternating perceptual time-course (i.e. if the same button was pressed twice in a row although participants should only report perceptual changes using two different buttons) were not used for training of the classifier and removed from the resulting matrix. In order to account for the latency of the HRF, which can differ substantially between subjects (Handwerker ; Steffener ) as well as reaction time, we optimized the degree to which this boxcar function was shifted in time (15 possible values in steps of 0.5 s between 0 and 7 s, see below) as a second optimized parameter t. Time points without a shift-adjusted stimulation at the beginning or end of each run were removed, resulting in k valid time points. Such aligned matrices of n × k voxel values (features) and 1 × k labels were entered into a support vector machine (SVM, LIBSVM; http://www.csie.ntu.edu.tw/∼cjlin/libsvm). The SVM consisted of a linear kernel, two adjusted weight parameters w1 and w2 (in order to account for unbalanced training data sets) as well as an optimized cost parameter c (which could take 16 values between 2−10 and 25 and was the third optimized parameter). The resulting model was used to predict the corresponding 1 × k* labels for independent data sets of n × k* voxel values. Decoding accuracy was calculated as the mean of two separate accuracies for label −1 (number of corresponding labels −1 in participant’s report and SVM predicted labels divided by the total number of labels −1 in the participant’s report) and label 1 (number of corresponding labels 1 in participant’s report and SVM predicted labels divided by the total number of labels 1 in the participant’s report). Parameters n (number of features, i.e. voxels), t (temporal shift between behavioral and neural data) and c (SVM cost parameter) were optimized in a leave-one-out nested cross-validation procedure: Labels of baseline run 1 were predicted based on a model which had been trained on the remaining five baseline runs 2–6 and used parameters n, t and c that had been optimized within these five training runs. To this end, five subsets of four runs each were used to train a model separately for each possible combination of n, t and c (10 × 15 × 16 = 2400 combinations), and tested on the fifth remaining run. The resulting 5 × 2400 accuracies were evaluated for the best combination of n, t and c which was then used for training on run 2–6 and prediction of run 1. The whole procedure was repeated for prediction of the other baseline runs 2–6. Finally, in order to predict labels of conditioning and extinction runs (7–18), one model was trained on all six baseline runs using parameters n, t and c which had been optimized on six subsets of five training runs each. Note that the critical analyses for assessing the hypothesized reward effect were those that were performed on data from the conditioning and extinction runs, which were independent from the baseline data used for optimization of the SVM. The size of training sets during baseline decoding ranged on average from 330 to 350, the size of training sets during decoding of conditioning and extinction runs ranged on average from 418 to 435 samples (also see Supplementary Fig. S1 in the Supplementary Material for a detailed illustration of data processing steps).

Figure 2.

Multi-voxel pattern analysis (MVPA) procedure of percept decoding.

Notes: Voxels were selected within occipital cortex (V1–V5) based on a separate localizer run. From these voxels, raw values were extracted for each volume acquired during fMRI and labeled according to the current percept, i.e. whether the percept had been reported as red or blue. Voxel raw values and labels were used by a support vector machine to learn to predict corresponding percepts for independent data. The resulting decoded percepts were used to calculate perceptual dominance durations for each of the two stimuli (see the “Analysis” section in the main text and Supplementary Fig. S1 in the Supplementary Material for more details).

Multi-voxel pattern analysis (MVPA) procedure of percept decoding. Notes: Voxels were selected within occipital cortex (V1–V5) based on a separate localizer run. From these voxels, raw values were extracted for each volume acquired during fMRI and labeled according to the current percept, i.e. whether the percept had been reported as red or blue. Voxel raw values and labels were used by a support vector machine to learn to predict corresponding percepts for independent data. The resulting decoded percepts were used to calculate perceptual dominance durations for each of the two stimuli (see the “Analysis” section in the main text and Supplementary Fig. S1 in the Supplementary Material for more details). Distributions of pooled normalized dominance durations derived from direct report (left) and MVPA decoding (right). Notes: Both measures reveal skewed distributions that are well fitted by gamma functions. Shape and scale parameters differed significantly between the two measures, both P < 0.001). Note that normalization of phase duration included division of individual phase durations by the participant’s mean phase duration. To test for an enhancing “acute effect of reward” on percept duration, we compared rewarded with non-rewarded (neutral) percept durations during the conditioning phase. Because of potential baseline differences in the two percepts, rewarded and neutral percept durations were corrected by subtracting their corresponding baseline durations. Moreover, phase durations were normalized, i.e. divided by the participant’s mean phase durations from baseline runs (see Fig. 3 for a distribution of phase durations). The resulting effect of interest can be expressed by the following formula:

Figure 3.

Distributions of pooled normalized dominance durations derived from direct report (left) and MVPA decoding (right).

Notes: Both measures reveal skewed distributions that are well fitted by gamma functions. Shape and scale parameters differed significantly between the two measures, both P < 0.001). Note that normalization of phase duration included division of individual phase durations by the participant’s mean phase duration.

With regard to a secondary hypothesis, changes in percept durations were also analyzed regarding a potential “long-term reward effect:” Analyses were primarily based on phase durations derived from MVPA-decoded neural activity, and secondary on directly reported phase duration data. Before testing for significant effects, data were controlled for potential outliers by applying winsorizing (replacement of extreme values, defined as values outside the range M ± 2 × SD, by M + 2 × SD or M-2 × SD, respectively). Because of the directed hypotheses, acute and long-term reward effects were tested for significance in a one-sided t-test (P < 0.05). Additionally, Bayesian statistics were calculated for these one-sided effects as implemented in the BayesFactor package in R (Morey and Rouder 2015; R Development Core Team 2017), using a Cauchy prior width of 0.707 (Rouder ). In order to further interpret the evidence for or against our hypothesis of a positive reward effect on dominance duration, we considered Bayes factors >3 as moderate evidence for our hypothesis (H1), Bayes factors <1/3 as evidence for the null-hypothesis (H0), and Bayes factors between 1/3 and 3 as ambiguous evidence (Jeffreys 1961). Participants’ responses given in the debriefing were recorded and evaluated by two independent raters to determine whether the participant had insight into the subjective nature of binocular rivalry (category 1) or not (category 2). A coding scheme was used that required raters to consider all recorded comments of the participant and to assign category 1 (aware) also if participants were unsure about the subjective nature of binocular rivalry (see the Supplementary Material for more details on the coding scheme). Raters agreed on the binary classification of all 22 participants (100%).

Results

Average reported percept durations during baseline runs were 5.53 s (SD = 1.39, range = 3.33–8.27) and did not differ between left and right stimulus (t[21] = −0.31, P = 0.757), red and blue (t[21] = 1.67, P = 0.110) or subsequently rewarded and non-rewarded percept (t[21] = 0.34, P = 0.739). MVPA decoding of continuous percepts during baseline runs was correct in 59.38% of time points on average (SD = 5.08, range = 53.16–74.66, t[21] = 8.66, P < 0.001); individual binomial tests revealed above-chance decoding (Ps < 0.05, one-sided) in all but one participant [who had a baseline decoding accuracy of 53.16% which was only marginally above chance (P = 0.064) – post-hoc exclusion of this participant did not affect results, if any slightly increased the significance of main results]. It has to be noted that – though in a nested cross-validation scheme – baseline runs were also used for the individual optimization of different parameters (including number of voxels, temporal shift of BOLD data and a cost parameter in the SVM, see the “Analysis” section). Nevertheless, the average accuracy in conditioning runs was not significantly different from the accuracy in baseline runs (M = −0.90%, SD = 3.53, t[21] = −1.20, P = 0.245, two-sided). [Note, there was a statistical trend for moderation of this decline by insight into the subjective nature of binocular rivalry (t[20] = 1.98, P = 0.062), with a significant decline in decoding accuracy only among participants who were aware of the subjective nature of perceived color changes (M = −2.30, SD = 2.53, t[10] = −3.02, P = 0.013) but not unaware participants (M = 0.49, SD = 3.94, t[10] = 0.42, P = 0.687; see below for further exploration of this factor).] During extinction runs, decoding accuracy showed a trend-wise decrease compared with baseline (M = −1.97%, SD = 4.96, t[21] = −1.86, P = 0.077, two-sided). Importantly, however, absolute decoding accuracy was still above chance both during conditioning and extinction (conditioning: M = 58.47, SD = 6.12 range = 47.60–73.99, t[21] = 6.50, P < 0.001; extinction: M = 57.41, SD = 5.84 range = 48.98–66.36, t[21] = 5.95, P < 0.001; see Supplementary Table S2 in the Supplementary Material for more details). Critically, we based our analyses on MVPA-decoded mean percept durations that were trained exclusively on baseline runs in order to rely on a bias-free proxy of perceptual dominance. Using these MVPA-decoded mean dominance periods revealed no significant differences between the rewarded and non-rewarded stimulus (no acute reward effect: t[21] = 0.084, P = 0.205, Cohen’s d = 0.18, no long-term reward effect t[21] = 0.14, P = 0.554, Cohen’s d = −0.03, see Figs 4A and 5A). Similarly, reported percept durations did not show an acute reward effect during the conditioning phase either (t[21] = 1.04, P = 0.154, Cohen’s d = 0.22). However, reported percept durations for the rewarded stimulus were increased during the extinction phase (long-term reward effect t[21] = 1.85, P =0.042, Cohen’s d = 0.39, see Fig. 4B).

Figure 4.

MVPA-decoded and directly reported mean dominance durations.

Notes: Displayed are normalized mean dominance durations for the non-rewarded (neutral) and the rewarded stimulus: before (baseline), during reward manipulation (conditioning) and after (extinction). The only significant effect is the long-term reward effect in directly reported percept durations. Error bars denote standard error of the mean (SEM).

MVPA-decoded and directly reported mean dominance durations. Notes: Displayed are normalized mean dominance durations for the non-rewarded (neutral) and the rewarded stimulus: before (baseline), during reward manipulation (conditioning) and after (extinction). The only significant effect is the long-term reward effect in directly reported percept durations. Error bars denote standard error of the mean (SEM). Box plots of individual reward effect sizes (A) and sequential Bayes analyses (B). Notes: In the Bayesian analyses, neither MVPA decoded, nor directly reported data provided evidence for an acute reward effect, or a long-term reward effect. However, evidence against the existence of such reward effects was also weak for all but the MVPA-based long-term effect, where the Bayes factor suggested moderate evidence for the H0. These results hold for various priors (as evident from the different curves). Given the absence of evidence for an acute reward effect (based on both decoded and reported dominance durations) and for a long-term reward effect (based on decoded dominance durations), we used Bayes statistics to distinguish between the two possibilities that our data either provided evidence for H0 or no conclusive evidence to support H1 or H0 (Dienes 2014). The Bayes factors for acute reward effects BF01 2.09 and 1.66 (for MVAP decoding and report-based, respectively) marginally favored H0 over the H1, but did not suggest any clear evidence in favor of either according to Jeffrey’s classification (Jeffreys 1961). Regarding long-term reward effects, MVPA-based Bayes factors suggested moderate evidence for H0 (BF01 = 4.96), whereas the report-based effect (which was significant according to conventional t-test, see above) revealed insufficient evidence (BF10 = 1.72). These results also hold for various priors (see Fig. 5B). For exploratory reasons, data were also analyzed for potential negative reward effects (i.e. decreases of the rewarded vs. the non-rewarded percepts), which could, however, be rejected based on moderate to strong evidence (BF01 = 7.26 and 8.52, for MVPA-decoded and reported acute reward effect, respectively, and BF01 = 4.20 and 10.27, for the corresponding long-term effects, respectively).

Figure 5.

Box plots of individual reward effect sizes (A) and sequential Bayes analyses (B).

Notes: In the Bayesian analyses, neither MVPA decoded, nor directly reported data provided evidence for an acute reward effect, or a long-term reward effect. However, evidence against the existence of such reward effects was also weak for all but the MVPA-based long-term effect, where the Bayes factor suggested moderate evidence for the H0. These results hold for various priors (as evident from the different curves).

Ratings immediately after the last conditioning run indicated that most participants were unsure about the “reward contingency”. When asked whether reward was delivered at random, the majority of subjects said yes [n = 15 (68.2%) chose various confidence levels of “yes”, n = 7 (31.8%) chose various levels of “no”]; however, visual analog rating data were not significantly different from zero (t[21] = 1.43, P = 0.166, two-sided; note, answers were given on a visual analog rating scale which allowed the participants to simultaneously specify their confidence about each answer: −100 very sure no, 100 very sure yes), indicating that participants were on average very uncertain about the randomness of reward. They also were overall indifferent when asked about an association between reward and color in general [n = 8 (36.4%) chose various confidence levels of “no”, 13 (59.1%) chose various confidence levels of “yes”, 1 (4.5%) could not decide for one of the two directions; t-test of visual analog rating data against zero: t[21] = −1.75, P = 0.095, two-sided]. However, when asked specifically to indicate whether one of the two colors had a higher probability for reward, participants overall tended toward the correct color [n = 11 (50.0%) chose the rewarded color, 8 (36.4%) were indifferent, 3 (13.6%) chose the non-rewarded color; t-test against zero: t[21] = 2.32, P = 0.030, two-sided]. Debriefing after completion of the experiment indicated that 11 out of 22 participants (50.0%) were still unaware regarding the “subjective nature” of perceived color and color changes during the experiment, i.e. they said they thought that “color changes were presented by the computer” and did not mention any subjective influence on perception or perceptual changes. In contrast, the other half of the sample (11 participants, 50.0%) said they had noticed at least some kind of subjectivity with regard to perception and/or perceptual changes, e.g. “eye blinks”, “eye blinks but not sure”, “focusing but not sure”, “hypothesized different input to the two eyes” (see the Supplementary Material for additional results of the debriefing interview on reported behavioral strategies during the task). Taken together, the results of the debriefing indicated substantial inter-individual differences in meta-cognitive awareness regarding both reward contingency and the subjective nature of binocular rivalry. Given our Bayesian analyses, which overall indicated inconclusive evidence regarding the absence or presence of a reward effect on binocular rivalry, we suspected that the differences in meta-cognitive awareness may have been a major source of variability that may have contributed to this null-effect. Therefore, we performed additional exploratory analyses to assess the influence of reward contingency awareness and insight into the subjective nature of binocular rivalry on the reward effect. Including “reward contingency awareness” as a between-subject variable in a 3 × 2 × 2 ANOVA [with either decoded or reported dominance durations as dependent variable and the factors “time” (baseline, conditioning, extinction), “percept” (rewarded vs. neutral), and “contigency awareness” (unaware vs. aware)] revealed a significant three-way interaction effect for MVPA-decoded data (F[2,40] = 4.38, P = 0.019) but not reported data (F[2,40] = 2.30, P = 0.113). Subgroup analyses showed that significant acute and long-term reward effects were only found for participants who were considered aware of the reward contingency (based on MVPA-decoded data, t[10] = 2.27, P = 0.023, and t[10] = 1.85, P = 0.047, respectively, see Fig. 6B) but not for unaware participants (all P > 0.789, see Fig. 6A; note that effects in the unaware subgroup suggest a numerical decrease of rewarded percepts, which, however, was not significant, all P > 0.118). Including “insight into the subjective nature of binocular rivalry” as between-subject factor in the ANOVA revealed no moderation effect for MVPA decoded (F[2,40] = 0.25, P = 0.780), but a significant three-way interaction effect for reported data (F[2,40] = 7.78, P = 0.001). Participants, who were still unaware of the subjective nature exhibited significant acute and long-term reward effects (based on reported percept, t[10] = 1.84, P = 0.048, and t[10] = 3.92, P = 0.001, respectively, see Fig. 6C), but aware participants did not (all P > 0.579).

Figure 6.

Exploratory post-hoc analyses of subgroups of participants.

Notes: MVPA-decoded reward effects differed between participants who were aware of reward contingency (B, showing a significant increase of the rewarded percepts) and those who were not (A, showing an opposite pattern, which was not significant, however). Report-based reward effects differed between participants who were still unaware of the subjective nature of binocular rivalry at the end of the experiment (C, showing significant increase of the rewarded percepts) and those who were aware (D, showing no significant change). Displayed are normalized mean dominance durations for rewarded and non-rewarded percepts, error bars denote standard error of the mean (SEM).

Exploratory post-hoc analyses of subgroups of participants. Notes: MVPA-decoded reward effects differed between participants who were aware of reward contingency (B, showing a significant increase of the rewarded percepts) and those who were not (A, showing an opposite pattern, which was not significant, however). Report-based reward effects differed between participants who were still unaware of the subjective nature of binocular rivalry at the end of the experiment (C, showing significant increase of the rewarded percepts) and those who were aware (D, showing no significant change). Displayed are normalized mean dominance durations for rewarded and non-rewarded percepts, error bars denote standard error of the mean (SEM).

Discussion

We investigated the effect of monetary reward on neural activity patterns associated with alternating perception during binocular rivalry. More specifically, we trained a classifier to distinguish between patterns of activation in visual cortex before administering reward, hence in the absence of any bias toward a later on rewarded or non-rewarded percept. The same unbiased classifier was then used to decode conscious perception during the delivery of monetary reward. Using this bias-free and objective measure of perceptual fluctuations during binocular rivalry, we found no evidence for any effect of reward on perceptual dominance durations. With the exception of an MVPA-decoded long-term effect (for which moderate evidence for null-hypothesis was found; see below), Bayesian statistics suggested no clear evidence for either H1 or H0, which means that neither the presence nor the absence of reward effects can be firmly concluded from the present data. The same conclusion holds for directly reported perceptual dominance durations. The failure to show an immediate enhancing effect of reward on binocular rivalry dominance durations is at odds with two prior studies (Wilbertz ; Marx and Einhäuser, 2015) and allows several possible interpretations. First, it is possible that positive reward effects found in previous studies depended on specific aspects of these experiments but do not generalize to other conditions. For instance, it has been argued that positive findings in prior studies (i.e. increase of rewarded percepts) could have derived from “pre-perceptual” effects like attention, intentional strategies, eye blinks, etc. rather than direct effects of reward on perception (Masrour ). Attentional confounds were well controlled using an additional experimental attention modulation in the study by Marx and Einhauser (2015), but intentional strategies, though speculative, might indeed have been an issue in this study because participants were not naïve regarding the subjective nature of binocular rivalry. In contrast, increases of rewarded percepts in our own previous study (Wilbertz ) might have been facilitated by simultaneous attention toward the rewarded stimulus, whereas intentional strategies were unlikely as a confounding factor because participants were kept naïve throughout the whole experiment using an orthogonal probe detection task. In principle, it is thus possible that different confounding factors contributed to the positive findings in previous studies. The particular design of the present study – direct report of perception in combination with naïve participants – might have reduced the influence of such confounding factors and hence diminished the reward effect. The potential relevance of meta-cognitive awareness in the study of bistable perception is discussed below. Second, the null-effect in the present study may be explained by a high degree of unexplained variance in the data. This interpretation is supported by the fact that three out of four main analyses revealed ambiguous, i.e. inconclusive evidence according to our Bayesian analyses (favoring neither H0 nor H1 with clear evidence). Such unexplained variance may have been due to general (reward-independent) sources of variability in perceptual dominance durations. Previous studies that used continuous presentation of binocular rivalry stimuli found an increase of mixed percepts over time (Klink ). This could have reduced the validity of assessed perceptual dominance durations in later blocks of our experiment (note that reward was administered during blocks 7–12). Less clear percepts or less valid reports could have increased the amount of unexplained within-subject variability in percept durations and thus affected the measurement of the reward effect. In addition, between-subject variability in the reward effect may have been due to inter-individual differences regarding reward sensitivity (Kim ), or conditionability (Schweckendiek ), but also controllability of perceptual dominance (Dowlati ). Based on rigorous debriefing, we identified two additional potentially relevant factors for the unexplained variance in our data and for reward effects on binocular rivalry in general. Exploratory post-hoc analyses identified reward contingency awareness and insight into the subjective nature of binocular rivalry as significant moderators of decoded or reported reward effects. Although these moderator analyses were purely exploratory and will have to be confirmed in future studies, it is possible that the lack of reward contingency awareness in some of the participants hindered an effective conditioning (cf. Weidemann ). Moreover, attention may have been distracted away from the percept–reward relation and toward the perceptual changes and their subjective controllability, particularly in participants who noticed some own control over their perception in the course of the experiment (note that all participants were initially unfamiliar with the phenomenon of binocular rivalry). Both prior studies (Wilbertz ; Marx et al. 2015) differed markedly from the present study with regard to participants’ levels of insight into the subjective nature of binocular rivalry. Marx and Einhauser (2015) included participants who were familiar with binocular rivalry experiments beforehand and hence possibly less distracted by the phenomenon itself. Moreover, these participants could effectively adapt their perception to the reward, given a high reward contingency awareness in this sample, too. In contrast, our own pervious study (Wilbertz et al. 2014) achieved a low proportion of only 14% of participants gaining (partial) insight into the rivalry phenomenon, which was probably due to the orthogonal task that directed attention away from perceptual changes. It is possible that either full familiarity or the absence of any knowledge about bistability are necessary for an adaptive response to rewarded percept (though, on the basis of potentially different mechanisms, i.e. intentional control of perception vs. automatic adaptation, respectively). Future studies should investigate the effect of rivalry knowledge on the conditionability of bistable percepts. As an intermediate conclusion, researchers might want to fully instruct participants about the bistable phenomenon of their paradigm [in order to avoid distraction from the task and invalid task behavior, mediated by (reasonable) curiosity and fascination about experienced bistable perception], if naivety of participants regarding the subjective nature of perceptual changes cannot be guaranteed throughout the experiment. The absence of any reward effect on reported percept durations is the most likely reason for a similar null-effect on the MVPA-decoded percept durations. Nevertheless, the usage of biological proxys to conscious perception is an important step in the investigation of top-down effects on bistable perception, as these methods [and even more so no-report paradigms (Tsuchiya )] seem appropriate to control the influence of “post-perceptual” factors (like reporting bias). Recently, it has been argued that many claims about top-down effects on perception actually fail to provide unequivocal evidence for true perceptual effects but rather could be explained by effects solely on the measurement of perception (Firestone and Scholl 2016). With regard to the present approach of fMRI-based decoded perception, one might be concerned that changes in decoded occipital activation patterns are related to the delivery of reward per se rather than reflecting its effect on perception. There is evidence that reward alters neural activity on different levels of visual processing, even as early as V1 (Arsenault ). In fact, it is likely that neural activation patterns change, e.g. from baseline to conditioning runs due to reward (or acoustical sound) presentation alone. However, with regard to the direction of this potentially confounding effect, it would be expected that neural activation patterns should be rather distorted by reward (or sound presentation), i.e. might be less similar to those used for training of the classifier during baseline conditions. If the delivery of reward would directly trigger a neural response in visual cortex, the classifier would probably recognize less of those perceptual dominance phases that were associated with reward and rather favor the non-rewarded percept prediction, which would result in an opposite reward effect, i.e. a decrease of rewarded percept. In the present data this is clearly not the case, according to a secondary Bayesian analysis (indicating moderate evidence against a decrease of decoded rewarded percept durations). Thus, we believe that a “directly” reward-related signal in visual cortex is unlikely to play a role in the emergence of any increases in decoded rewarded percepts. One could also argue that an “indirect” effect of reward might confound the decoding by increasing the precision of those neural activation patterns that are associated with the rewarded percept and thus help the classifier to decode these (but not the non-rewarded percept phases). There is evidence for modulation of stimulus-related activity in primary visual cortex by reward (Shuler and Bear 2006; Serences 2008). It is therefore possible that increased precision or decreased uncertainty (van Bergen ) might play a role in any observed predominance of MVPA-decoded rewarded percepts. A secondary question in this study dealt with the long-term effect of reward, which, however, revealed heterogeneous results. Whereas participants’ reports indicated a significant long-term increase of the rewarded percept in the conventional t-test, Bayesian analysis did not confirm this effect; moreover, a potential long-term reward effect was entirely absent in the MVPA-decoded percept durations. This might point to the possibility that the behavioral effect is confounded with non-perceptual variables like response bias or voluntary manipulation of perception – even more, since questions asked between conditioning and extinction phase might have stimulated corresponding hypotheses in the participants. It remains unclear whether MVPA-decoded data could not map this effect due to a drop in decoding accuracy or because behavioral data erroneously suggest a long-term effect which actually was caused by any missed confound. With regard to the hypothesis that (rewarding) consequences shape our perception (Wilbertz ) it is important to note that potential reward effects would be expected to generalize at least temporarily (i.e. should reinforce rewarded percepts also in the absence of reward). In order to investigate the temporal aspect of reward effects, future research could explicitly focus on interleaved test phases within the learning process, e.g. in unpaired trials of a partial reinforcement paradigm.

Conclusions

Using a task design that relied on introspective reports of perception we were not able to replicate previous findings of reinforcement of perceptual dominance durations in binocular rivalry. Because our analyses did not provide clear evidence against this effect either, future studies are necessary to clarify the role of reward in bistable perception.

Funding

This work was supported by the German Research Foundation [http://www.dfg.de/en/ grant number 1430/7-1 to P.S.]; the Berlin School of Mind and Brain [http://www.mind-and-brain.de to B.M.v.K.] and the Stichting Dr Hendrik Muller’s Vaderlandsch Fonds [http://www.mullerfonds.nl]. K.S. is a participant in the Charité Clinical Scientist Program [https://clinical-scientist.charite.de] funded by the Charité Universitätsmedizin Berlin and the Berlin Institute of Health.

Supplementary Data

Supplementary data is available at NCONCC Journal online. Conflict of interest statement. None declared. Click here for additional data file.

38 in total

1. Predicting Subjective Affective Salience from Cortical Responses to Invisible Object Stimuli.

Authors: Katharina Schmack; Julia Burk; John-Dylan Haynes; Philipp Sterzer
Journal: Cereb Cortex Date: 2015-08-01 Impact factor: 5.357

2. Here is looking at you: emotional faces predominate in binocular rivalry.

Authors: Georg W Alpers; Antje B M Gerdes
Journal: Emotion Date: 2007-08

3. Believing is seeing: expectations alter visual awareness.

Authors: Philipp Sterzer; Chris Frith; Predrag Petrovic
Journal: Curr Biol Date: 2008-08-26 Impact factor: 10.834

4. Value and need as organizing factors in perception.

Authors: J S BRUNER; C C GOODMAN
Journal: J Abnorm Psychol Date: 1947-01

5. Delusions and the role of beliefs in perceptual inference.

Authors: Katharina Schmack; Ana Gòmez-Carrillo de Castro; Marcus Rothkirch; Maria Sekutowicz; Hannes Rössler; John-Dylan Haynes; Andreas Heinz; Predrag Petrovic; Philipp Sterzer
Journal: J Neurosci Date: 2013-08-21 Impact factor: 6.167

6. Reward modulates perception in binocular rivalry.

Authors: Svenja Marx; Wolfgang Einhäuser
Journal: J Vis Date: 2015-01-14 Impact factor: 2.240

7. Predicting the stream of consciousness from activity in human visual cortex.

Authors: John-Dylan Haynes; Geraint Rees
Journal: Curr Biol Date: 2005-07-26 Impact factor: 10.834

8. Reinforcement of perceptual inference: reward and punishment alter conscious visual perception during binocular rivalry.

Authors: Gregor Wilbertz; Joanne van Slooten; Philipp Sterzer
Journal: Front Psychol Date: 2014-12-03