Literature DB >> 21697443

The human prefrontal cortex mediates integration of potential causes behind observed outcomes.

Klaus Wunderlich¹, Ulrik R Beierholm, Peter Bossaerts, John P O'Doherty.

Abstract

Prefrontal cortex has long been implicated in tasks involving higher order inference in which decisions must be rendered, not only about which stimulus is currently rewarded, but also which stimulus dimensions are currently relevant. However, the precise computational mechanisms used to solve such tasks have remained unclear. We scanned human participants with functional MRI, while they performed a hierarchical intradimensional/extradimensional shift task to investigate what strategy subjects use while solving higher order decision problems. By using a computational model-based analysis, we found behavioral and neural evidence that humans solve such problems not by occasionally shifting focus from one to the other dimension, but by considering multiple explanations simultaneously. Activity in human prefrontal cortex was better accounted for by a model that integrates over all available evidences than by a model in which attention is selectively gated. Importantly, our model provides an explanation for how the brain determines integration weights, according to which it could distribute its attention. Our results demonstrate that, at the point of choice, the human brain and the prefrontal cortex in particular are capable of a weighted integration of information across multiple evidences.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2011 PMID： 21697443 PMCID： PMC3174823 DOI： 10.1152/jn.01051.2010

Source DB: PubMed Journal: J Neurophysiol ISSN： 0022-3077 Impact factor: 2.714

the human prefrontal cortex (PFC) has long been implicated in the ability to solve decision problems that contain a hierarchical dimensional structure, in which inferences must be rendered over different levels of a stimulus hierarchy (Koechlin et al. 2003). This type of decision problem is typified in the Wisconsin Card Sorting Task (Grant and Berg 1948) and its modern antecedents, such as the intra-/extradimensional set shifting task (Downes et al. 1989). In such tasks, a participant is presented with stimuli that contain multiple dimensions, such as color and shape, and, at any one moment in time, one of these dimensions is causally linked to reward. However, within the “relevant” dimension, only one out of a set of possible within-dimension features is currently rewarded (e.g., if color is relevant, then either “red” or “green” might at a given point in time be rewarded). The decision problem faced by the participant is therefore twofold: to identify which stimulus dimension is currently relevant, and within a given dimension to identify which feature is currently rewarded. This task can become especially challenging, if the relevant dimensions and features change over time, as the participant must constantly update his/her inferences at each level of the decision hierarchy. It is well established that lesions to the PFC impair the capacity to perform such hierarchical decision problems (Dias et al. 1996; Drewe 1974; Milner 1963; Robinson et al. 1980). However, the specific computations underlying the capacity of PFC to solve such decision problems are much less well understood. This kind of problem could be solved by one of two strategies: an attention-gated strategy, in which the agent first identifies which dimension is most likely to be relevant at a given point in time. The agent then focuses exclusively on what is considered to be the relevant dimension, deciding separately which feature is likely to be rewarding. The advantage of such an attention-gating approach is that, by focusing only on the dimension for which there is the strongest evidence, there is no need to incorporate evidence for the features in the less compelling dimension in the decision process, thereby imposing less demand on limited cognitive resources. An alternative integration strategy is to simultaneously evaluate evidence from both levels of the hierarchy, weighted by how much this dimension is judged to be relevant. In other words, even if color is deemed very likely to be relevant, information about which feature could be correct if shape were the relevant dimension is also taken into account to some extent and used in the decision over which compound stimulus ultimately to select. This type of “integration” strategy typifies Bayesian analysis (Berger 1980). Here we aimed to address whether, during the performance of a hierarchical reversal learning task, the human brain follows an attention-gated approach to problem solving, or rather the integrative procedure, whereby less likely explanations for observed phenomena still influence decisions. Our task is a probabilistic variant of a Wisconsin card-sorting task in which each stimulus possessed attributes along two dimensions: color and motion. Within each dimension there were two features, so that color was red or green, while motion (indicated by moving dots) was either leftward or rightward. At any one point in time, one dimension e.g., color was “relevant”, while within this dimension one of the two features was correct (e.g., green), in that choice of a stimulus possessing that feature yielded monetary reward with a high probability and monetary loss with a low probability, while choice of the other stimulus yielded only a low probability of reward and loss with a high probability. We explicitly imposed a hierarchical structure between dimension and feature on this task by reversing the correct feature at a higher rate than the relevant dimension. Human subjects were scanned with functional MRI (fMRI) while they participated in this probabilistic hierarchical decision task. We then used a variety of computational models to account for subjects' behavioral performance on the task. After confirming that subjects used a hierarchical strategy over simply model-free learning of either stimulus or feature values, we compared two possible models: one that implemented an attention-gating strategy, and another that implemented an integration strategy. We then applied these models to the fMRI data to uncover brain regions in which activity at the time of decision-making reflected one or other of these computational strategies. To discriminate which of these models was contributing to the decision process, we used the value signals estimated by each of the model and tested for regions of the PFC correlating with such signals. We focused in particular on the ventromedial PFC (vmPFC), as this region has previously been found to encode value signals at the time of decision making (Boorman et al. 2009; Daw et al. 2006; FitzGerald et al. 2009; Hampton et al. 2006; Hare et al. 2008; Knutson et al. 2005; Wunderlich et al. 2009). We compared and contrasted the capacity of the integration and attention-gating models to capture value-related computations in this region of PFC. We hypothesized that activity in vmPFC would be best captured by the value signal derived from the model that best explained participant's behavioral choice patterns.

MATERIALS AND METHODS

Subjects

Sixteen healthy subjects (drawn from the Caltech student population; 6 women; 18–28 yr old) with no history of neurological or psychiatric illness participated in the study. The study was approved by the Institutional Review Board of the California Institute of Technology, and all subjects provided informed written consent before their participation.

Task Description

The task is a variant of a hierarchical intra-/extradimensional (ID/ED) shift task. In any given trial, subjects were presented with a choice between two stimuli, each of which consists of a feature in the motion dimension, and one feature in the color dimension. The two features belonging to the same stimulus are shown horizontally next to each other, grouped together by a bone-shaped structure in the background (Fig. 1). The motion dimension (always presented on the left side of the stimulus) consisted of a moving dot sequence where the dots moved either to the left or the right; the color dimension could take on either a red or green color-filled rectangle. The features for the upper stimulus were assigned pseudorandomly in each trial, and converse features were assigned to the lower stimulus, i.e., if green and right motion were assigned to the upper stimulus, then the lower stimulus contained red and left motion. We implemented a constraint that identical color/motion pairings within the same stimulus did not occur more than two times in a row to avoid strings of trials in which subjects cannot associate the outcome unambiguously to a chosen motion or color feature. Subjects chose the upper or lower stimulus by pressing one of two distinct buttons on a button box with their right thumb.

Fig. 1.

Task and models. A: subjects chose one of two items, of which each had a color (red or green) and a motion (left- or rightwards moving dots) attribute. The features were randomly assigned to both items. Once the subject selected an item, a box was placed around the target and remained on the screen until 2 s after stimulus onset. After a 3-s delay, they either received a 25 cent reward or a subtraction of 25 cents from their payout. One feature was designated the correct feature, and the choice of the item carrying that feature led to a reward on 80% of the occasions and a loss 20% of the time. Consequently, by choosing this correct item subjects accumulated monetary gain. The other item was incorrect, and choosing it led to a reward 20% of the time and a loss 80% of the time, leading to a cumulative monetary loss. After subjects chose the correct item on three consecutive occasions, the contingencies reversed, with a probability of 50% in every consecutive trial. After two to four of such within-dimension reversals, the relevant dimension changed (extradimensional switch). The intertrial interval (ITI) was variable. B: hierarchical decision model based on attentional shifts. Stimulus outcome associations are learned for color (C) and motion direction (M). According to this model, subjects choose in a two-step process: they first form a hypothesis about which dimension is relevant (either C or M), and then base their reward expectation and choice exclusively on the information learnt about that dimension. C: in the integration model, the available information from both dimensions (C and M) is integrated as a weighted sum, and the decision is based on a linear combination of evidence from both dimensions. Subjects form a hypothesis about the likelihoods that each dimension is relevant, corresponding to weights (w) in the linear combination. Weights are updated on every trial. D: Bayes factors of the comparison of the integration vs. attention-gating model. The integration model fits better to subjects' behavior in every single subject, and, for 13 out of 16 subjects, the log Bayes factor indicated strong evidence in favor of the integration model. At any given time, only one stimulus dimension (color or motion) was causally linked to reinforcement, and subjects were rewarded at a higher rate if they selected the stimulus that contained the correct feature (either red or green for color and leftward or rightward for motion) based on the currently relevant dimension, i.e., one dimension was “relevant”, and within that dimension a particular feature was correct. For example, if “color” was the relevant dimension, then within color, “green” may have been correct. Choice of the stimulus that had the correct feature yielded monetary rewards on a probabilistic basis with 80% probability, whereas selection of the other stimulus yielded reward with only 20% probability. Rewarded trials yielded a prize of 25 cents, while the other trials resulted in a loss of 25 cents (there were no neutral outcomes). This trial outcome, i.e., whether subjects gained or lost money on this trial, was the only feedback provided. Note that, due to the probabilistic contingencies, subjects may occasionally see a loss outcome (−25 cents) after they chose correctly, or see a rewarding outcome (+25 cents) after they chose incorrectly. After subjects chose the stimulus with the correct feature three times in a row (indicating that they learned the relevant dimension and correct feature), there was a 50% probability in each further trial that the correct feature would switch. Furthermore, after a variable number of such within-category switches (1–4; uniformly distributed), the relevant dimension also switched. The adaptive switching rule used here and previously (Hampton et al. 2006) served two purposes: it created a reasonable degree of unpredictability for the next switch and prevented feature switches from occurring before the subject detected which feature is correct. Such a situation is to be avoided, as it would no longer provide us with the desired dynamic range in certainty and value estimates in the imaging analysis. The total number of trials in the fMRI experiment varied across subjects due to these probabilistic switching rules and differences in how fast individual subjects learned the correct response. However, every subject experienced the same number of ID and ED switches. One experiment always contained 9 dimensional switches (5 motion and 5 color blocks) and 30 feature switches (uniformly distributed and pseudorandomly arranged). At the end of the experiment, subjects were paid a flat amount of $25 plus their accumulated earnings (in total $26–41, SD = 4.3). This design imposes a hierarchical structure on the task with feature reversals occurring on a faster timescale than the dimensional switches. As in many real-life problems, participants in our experiment have more than one theory (“which is the relevant dimension?”) to go by in any trial and corresponding characteristics (“which is the correct feature?”) on which to base their decisions between the upper or lower stimulus. Subjects were instructed that, at any given time, only one dimension was relevant for determining reward, and within that dimension one feature was correct. We told subjects that choice of the stimulus containing the correct feature would yield reward with a high probability and choice of the other stimulus with a much lower probability. Subjects were also provided with the information that the correct feature would change after a number of trials, and that the relevant dimension would change at a slower timescale than the correct feature, i.e., the problem was specifically posed as a probabilistic hierarchical reversal task. However, subjects were not given any further explicit information about the exact reward probabilities or about the mechanism of how changes in correct features and dimension were realized. It is also important to note that participants never got direct feedback about whether their theory was currently correct, but only whether their choice proved successful or not. Immediately preceding the fMRI scan, subjects underwent a series of practice blocks in which we familiarized them with the task by gradually increasing its complexity. This training consisted of three blocks: first, subjects worked on a simple reversal task with only one modality. Next, we introduced the second dimension, but indicated every dimensional switch by an auditory signal. Finally, subjects practiced the task without the help of the auditory signal. Subjects experienced four dimensional switches during this stage, which happened according to the same rules as during the following fMRI experiment. As at all times, reversals only progressed after subjects chose consistently the correct stimulus; this rule imposed a performance criterion before subjects progressed to the fMRI stage. The task was presented via back projection on a translucent screen, viewable through a head-coil mounted mirror.

Algorithms

We compared a number of computational strategies for how subjects would solve this task and will focus our investigation on the comparison of two hierarchical strategies, an attention-gating vs. an integration strategy.

Attention-gating model.

The first model in this class, which we will call an attention-gating strategy, computes a decision in a two-step procedure by first evaluating which dimension is currently relevant, and then allocating attention to only that dimension, before subsequently working out which feature within the dimensional category is currently reinforced (Fig. 1). Dimension relevance is evaluated using a reinforcement learning (RL) mechanism that integrates over the previous rewards obtained when selecting that dimension (Sutton and Barto 1998). In decision theoretic terms, this type of strategy is called maximizing, because probabilistic information about which feature is currently correct is used to guide choice only over the dimension deemed currently relevant (and hence gated by selective attention). For instance, if the model decides color is relevant, then choice is computed by taking into account the probability that, within color, red is correct or that green is correct, but probabilistic information over the unselected dimension of motion is ignored. The attention-gating strategy relates to complexity-reducing heuristics, wherein choices are proposed to be taken using the best available evidence (Gigerenzer and Goldstein 1996).

Integration model.

While the attention-gating model maintains only information deemed relevant, an alternative approach is to distribute attention and fully integrate over all of the probabilistic information available to the individual at that moment in time and use this complete information to guide choice (Fig. 1). In other words, even if color is deemed highly likely to be the relevant dimension (i.e., color has a high probability), the model not only takes into account the probability that red or green is correct, but also uses the information it has from the less likely motion dimension about which movement direction might be correct, weighted by the low probability of motion being relevant. In decision theoretic terms, such an integration strategy is formally called marginalization and is one important step into the direction of the full Bayesian approach (see the supplemental materials; the online version of this article contains supplemental data). According to Bayesian theory (Berger 1980), an optimal estimate results from combining information across dimensions with evidence from past trials. The available information is integrated based on weights that are assigned to the dimension and correspond to the likelihood that the dimension is relevant (hereby no theory, even the least likely, is ever excluded). This issue of whether to commit to a theory is frequently encountered in the machine-learning domain, where it is well known that integrating over parameters is optimal (MacKay 1999). The difference between this integration model and a full Bayesian approach is that the latter requires a change in the updating scheme from prediction error based to likelihood ratio based (Yang and Shadlen 2007). We also implemented a full Bayesian model (supplemental methods), which, in a test against behavior, did not result in any significant improvement of model likelihoods compared with the integration model.

Hierarchical Model Implementations

For modeling the data, we assumed that subjects learned the relevant values assigned to the different features on the basis of trial-by-trial experience using prediction error-based updating (Sutton and Barto 1998). If stimulus s is selected on trial t, the value of its two features a (one from each color and motion dimension) are updated via a prediction error, δ(t), as follows: V (t + 1) = V (t) + αδ(t), where α is a learning rate between 0 and 1. The prediction error δ(t) is calculated by comparing the actual reward received, r(t), with the reward that the subject expected to receive from that action in that trial; that is, δ(t) = r(t) − V (t). Specifically for this task, three variables had to be learned and updated in each trial: V, V, and W, keeping track, respectively, of the value of the green feature (vs. red feature), rightwards motion feature (vs. left feature), and the weight of the color vs. motion modality. We assume that V = −V, V = −V, and W= −W. Hence, the updating can be done with the inverse prediction error −δ(t), if the complementary action is chosen. In each trial, a subject chooses either UP or DOWN and thus selects a combination of one color feature (red or green) and one motion feature (left or right). The values of these “intermodality” choices are updated in each trial using the following RL scheme: δ(t) = r(t) − V(t), where i = red (green), if red (green) was chosen; δ(t) = r(t) − V(t), where j = right (left), if right (left) was chosen. The dimensional weight is updated according to: W (t + 1) = W (t) + α (t) with δ (t) = r [V(t) − V(t)] − W(t), where α is the learning rate for the dimensional weight. The dimensional weight is increased, if the difference in expected value for each of the two modalities was larger than the expected dimensional weight. In terms of the typical RL model, the dimensional weight is tracking the ability of one dimension to predict reward, relative to the other. Using these learned values we compared two ways to generate choices.

Attention-gating.

Determine which dimension D has the highest expected dimensional weight, W or W = −W. The decision whether the top or bottom stimulus is chosen is then based on that modality only by choosing the stimulus which contains the feature with the higher weight within dimension D. Given choice between UP = {V,V} and DOWN = {V,V} calculate V·H(W) + V·H(−W), where H is the Heaviside operator, H(+) = 1, H(−) = 0.

Integration.

The value for choosing the top or bottom stimulus is calculated as a weighted sum of both feature values in each stimulus. The integration weights are determined by the current dimensional weight, i.e., weight the two features by how well their dimension has performed. The value for UP is then V = V·w + V·(1 − w), where w = exp(κ·W)/[exp(κ·W) + exp(κ·W)]. Using the definition of sigma from below, this last equation simplifies to w = σ(2·κ·W). We used a softmax procedure to generate choices, where in every trial the probability (P) of choosing action a ∈{UP,DOWN} over b is given by: P = σ{β[V(t) − V(t)]}, where σ(z) = 1/(1 + e−) is the Luce choice rule or logistic sigmoid, and β determines the degree of stochasticity involved in making decisions. We fit the parameters for each model and each subject (learning rates α, α, softmax β, as well as the extra parameter κ for the integration model) such that the model best explained subjects' choices (maximum likelihood, with likelihood where x is the actual choice in trial t).

Nonhierarchical-RL Models

To account for the possibility that subjects solved the task without considering the hierarchical structure at all, we considered various different nonhierarchical RL models. 1) A compound stimulus-based RL model. This model learns a value per stimulus s (the actual composite stimuli, not individual features), That is, by separately learning the values of the four composite stimuli using a Rescorla-Wagner learning rule (Rescorla and Wagner 1972) for V, V, V, V. In each trial, the values of the two presented stimuli are considered for choice, and only the value of the chosen stimulus is updated via a prediction error. In this model, not only the information on hierarchical structure, but also information about dimensional grouping, are neglected. 2) A feature-based RL model in which each of the four features (red, green, left, right) is considered to be a separate item for the RL model to learn about (i.e., V and V are not assumed anti-correlated) and, furthermore, V is also kept at zero, with the choice therefore based on a sum of the two features encountered within each stimulus, e.g., for UP = {left,green}, V = (V + V)/2. Thus, after the choice (DOWN = {red, left}), no updating would be done for V or V. 3) A one-layer version of the hierarchical model above, where V is always kept at zero, and hence there is never any information with regard to which modality is more likely to be correct. This is equivalent to the special case of the Integration model with κ = 0. Such a learner mixes the dimensions equally, e.g., for UP = {left,green}, V = (V)/2, and since V = −V and V = −V, for our simple setup effectively just chooses the option with the larger value of the four stimuli {V, V = −V, V, V = −V}. All other implementation details of models 2 and 3 were otherwise identical to the hierarchical models above. We used a softmax rule to generate choices and fitted the two parameters (learning rate α, softmax scaling β) by maximizing the likelihood of the subject choices.

Behavioral Model Comparison

To compare the behavioral fits for the different models while accounting for differences in model complexity, we report the Bayesian Information Criterion (BIC) (Burnham and Anderson 2002) in Supplemental Table S1, which corrects for the number of parameters, k, in a model based on the number of data points n: BIC = −2·log(l) + k·log(n). The model with the lower BIC explains the subjects' behavior better. For the model comparison between integration and attention-gating, we can also calculate an approximation to the log Bayes Factor (BF) (Kass and Raftery 1995) as log BF = −0.5·(BICintegration − BICattention). The BF specifies the ratio of marginal model likelihoods, and a probability that the attention-gating model is more likely is thus given as Pattention = exp(−log BF). This allows us to assign a probability to our null hypothesis that the attention-gating model is more likely and, in analogy to a P value in statistical testing, reject the attention-gating model if Pattention < 0.05. This criterion is met if the log BF > ∼3 (i.e., BF > 20). To allow inference on the population, we used a pairwise Bayesian model comparison (Stephan et al. 2009) between all our tested models using the BIC corrected likelihoods as model evidences. Note that we base our model comparison on log evidences of the models instead of the fraction of correctly predicted choices. This method is superior to comparing the number of correctly predicted choices (which are calculated from binary data) because such data introduce noise into the system when the model predictions are passed through softmax and fails to take into account parametric variations in the model predictions (Daw 2011).

fMRI Data Acquisition

Data were acquired with a 3T scanner (Trio, Siemens, Erlangen, Germany) using an eight-channel phased array head coil. Functional images were taken with a gradient echo T2*-weighted echo-planar sequence (repetition time = 2.65 s, flip angle = 90°, echo time = 30 ms, 64 × 64 matrix). Whole brain coverage was achieved by taking 45 slices (3-mm thickness, no gap, in-plane resolution 3 × 3 mm), tilted in an oblique orientation at −30° to the anterior commissure-posterior commissure line to minimize signal dropout in orbitofrontal cortex. Subjects' head was restrained with foam pads to limit head movement during acquisition. A high resolution T1-weighted anatomical scan of the whole brain (magnetization-prepared rapid gradient echo sequence, 1 × 1 × 1 mm resolution) was also acquired for each subject.

FMRI Data Analysis

Imaging analysis was performed using SPM5 (Wellcome Trust Centre for Neuroimaging, Institute of Neurology, London, UK). Images were first slice time corrected to repetition time/2, realigned to the first volume to correct for subject motion, spatially normalized to a standard T2* template with a voxel size of 3 mm, and spatially smoothed with an isotropic Gaussian kernel of 8 mm full-width half-maximum to account for anatomical differences between subjects and to allow for valid statistical inference at the group level. Intensity normalization and high-pass temporal filtering (using a filter width of 128 s) were also applied to the data. First, we estimated for each individual subject a general linear model (GLM) for the attention-gating model and separately another GLM for the integration model, differing only in the model-predicted parametric modulator values. Two events were modeled in each trial: the time of the stimulus presentation, parametrically modulated by the variables described below, and the time of the presentation of the outcome, modulated by the binary outcome (+1/−1).

Parametric Modulators

The regressor at the time of choice was parametrically modulated by the following decision variables (in the two separate GLMs, the parametric modulators contained values from the respective model, attention-gating or integration).

1) Stimulus value of the better choice [max(Vup, Vdown)].

We reasoned that, if subjects make a value-based decision between the upper or lower stimulus, we would necessarily find a representation of a decision variable in the brain that encodes the stimulus value. This value differs between our two models: in the case of the integration model, this value is a weighted combination of the two feature values within the stimulus; and in the attention-gating model it represents the feature value of only the dimension that subjects deem more likely (as indicated by the dimensional weight in our model). We also considered alternative representations of the stimulus values: one possibility is chosen values, which are stimulus values V, V weighted by subjects' actual choices. We tested for chosen values in a separate GLM in which we replaced the maximum stimulus value by a chosen value modulator. In this test, we find similar results as for the maximum stimulus value (although slightly weaker effect size), but, as our study was not designed to distinguish between these highly correlated value signals, this difference was not further explored. Another possibility is that the brain might encode the value difference between the top and bottom stimuli instead of the best stimulus. However, in our present task, the values of the two stimuli are mutually dependent on each other, i.e., V = 1 − V. A value difference between the higher and lower valued stimuli ( V − V ) is therefore, in our case, identical to a scaled representation of max(V, V). Another alternative would be a representation of the choice probability Pup = σ[β(V − V)], i.e., the value passed through the sigmoidal softmax function. Applying such a sigmoid essentially asks to what degree subjects use the available information: two subjects could have the exact same information but utilize it differently due to differences in the steepness of the softmax. Thus the sigmoid transformed variables encode the usage of the available information rather than the available value. As the inverse temperatures (β, κ) for most of our subjects are close to (or smaller than) 1, the sigmoidal transformation leads to an almost linear transformation over a large part of the input space, and, therefore, value and the sigmoid transformed variable will be highly correlated anyways.

2) Within-dimensional certainty for color [max(Vred, Vgreen)], and 3) motion [max(Vright, Vleft)].

We defined within-dimensional certainty as confidence about one feature being correct. A high value for one feature indicates that choice of a stimulus containing this feature was repeatedly rewarded. On the other hand, a feature value close to zero means that, in the recent past, neither feature within this dimension was rewarded more often than the other, and we expect that, in this situation, subjects have high uncertainty about which feature within this dimension is currently correct. We can calculate certainty as argmax over the feature values because V and V fluctuate within the range [−1, 1] and V = 1 − V (V = 1 − V). We also tested for neural signals related to uncertainty, which we defined in our case as 1-certainty. Uncertainty is at maximum when subjects assign equal value to the two features (e.g., V and V), whereas the lowest uncertainty occurs when one value is much larger than the other (e.g., large V). We also considered alternative definitions for uncertainty, specifically the entropy V·V = V·(1 − V). In practice, this signal is highly correlated with 1-certainty, and, when we run an additional GLM with the entropy-based uncertainty signals, we found very similar activations to the ones reported here.

4) Across-dimensional certainty [max(Wmotion, Wcolor)].

The across-dimension certainty was similarly defined as the relative ability to predict reward based on the dimension. In a separate GLM, we tested alternatively for the signed values V, V (instead of the within-dimensional certainties) and W (instead of the across-dimensional certainty). We did not find any activity at a corrected significance level in these contrasts. Note, however, that these contrasts are on a scale from red to green and left to right. As there is no common reference point for encoding the dimensions, it is unlikely that the subject group encoded the features on the same scale (i.e., red to green or green to red). We entered all regressors and modulators described above under 1–4 independently (without serial orthogonalization) into the design matrix. Thereby only the additional variance that cannot be explained by any other regressor is assigned to the effect, preventing spurious confounds between regressors (Andrade et al. 1999; Draper and Smith 1998). The regressors were convolved with the canonical hemodynamic response function, and low-frequency drifts were excluded with a high-pass filter (128-s cutoff). Short-term temporal autocorrelations were modeled using an autoregressive process of order 1. Motion correction regressors estimated from the realignment procedure were entered as covariates of no interest. Statistical significance was assessed using linear compounds of the regressors in the GLM, generating statistical parametric maps of t values across the brain for each subject and contrast of interest. The within-dimension certainty contrast shown in Fig. 3 and 3 is an equally weighted linear combination of within-dimension certainty regressors 2 and 3. We also looked at correlations with 2 and 3 separately and found that areas activated by 2 and 3 overlap and are both exclusively located at the same region as the combined contrast shown in Fig. 3.

Fig. 3.

Neural correlates of decision value and model comparison. A: blood-oxygenation-level-dependent (BOLD) responses in medial prefrontal cortex (PFC) correlate significantly with the stimulus value signal from the integration model. B: neural activity in ventromedial PFC (vmPFC) correlates with the trial-by-trial stimulus value signal. Shown is the area that is commonly activated by both the integration and the attention-gating model at P < 0.001 corrected. The comparison between the two models in C is based on this commonly activated region. C: we used a Bayesian model comparison to identify the model that can better explain neural activity in this area. Overall, the integration model (I) explains activity in this vmPFC area better than the attention-gating model (A). The exceedance probability for the integration model is 0.98. The exceedance probability is the probability that one model is more likely than the other one, i.e., that the posterior probability for the integration model is larger than 0.5.

These contrast images were then entered into a second-level random effects analysis using a one sample t-test against zero. The structural T1 images were coregistered to the mean functional echo-planar images for each subject and normalized using the parameters derived from the echo-planar images. Anatomical localization was carried out by overlaying the t-maps on a normalized structural image averaged across subjects and with reference to an anatomical atlas (Duvernoy 1999). We used a Bayesian model comparison (Stephan et al. 2009) to determine which model (GLM attention-gating or GLM integration) better explained the neural activity in vmPFC. First, we computed log-likelihoods for the two models, averaged within a 12-mm sphere (1.5 × smoothing kernel size) in vmPFC. Since the attention-gating model-activated cluster was completely contained by the integration model-activated area, we centered the sphere on the group peak of the weaker attention-gating model (any selection bias toward the stronger correlating model would then be working against us). Next, we calculated posterior model probabilities in this region for every subject and the group of subjects. In brief, the procedure by Stephan et al. rests on treating the model as a random variable and estimating the parameters of a Dirichlet distribution, which describes the probabilities for all models considered. These probabilities then define a multinominal distribution over model space, allowing one to compute how likely it is that a model generated the subjects' data. To decide which model is more likely, we use the conditional model probabilities to quantify an exceedance probability, i.e., a belief that a particular model is more likely than the other model, given the group data. For a detailed description of the technique, also see the methods paper by Stephan, Friston, and colleagues (Stephan et al. 2009). All reported activations survive P < 0.05 family-wise error corrections for multiple comparisons across the whole brain at the cluster level (P < 0.001 height threshold for each voxel). Cluster level correction is more sensitive for activations with a larger spatial extent than for activations with very small clusters. As our functional neuroanatomical hypotheses are all about cortical areas, as opposed to subcortical areas (in which activations with smaller spatial extent are more typically detected), the use of cluster correction is appropriate in this case.

RESULTS

Behavioral Analysis and Model Comparison

On average across subjects, participants performed 294 trials (SD = 62), chose the correct stimulus 60% of the time (SD = 4), and got rewarded in 56% of the trials (SD = 3). We tested if subjects' performance improved over time by splitting the trials in half and comparing the fraction of correct choices in the first half with the correct choices in the second half. There was no difference in the number of correct choices (Tdf = 15 = 0.26, P = 0.80). To test if subjects became better at anticipating ID and ED shifts during the task, we further compared the number of trials required for the first half of switches (n1 = 146, SD = 31) with the second half of switches (n2 = 147, SD = 36) and did not observe any difference (t-test, P > 0.93). Equally, response times between correct (rt = 1.00 s, SD = 0.16) and incorrect (rt = 1.02 s, SD = 0.17) trials were not different (paired t-test, P > 0.15). Subjects responded slightly faster when color was the relevant feature than when it was motion: color (rt = 0.97 s, motion rt = 1.04 s; paired t-test, P < 0.001). While all subjects experienced the same number of switches, one would nevertheless predict that, in relative terms (compared with the total number of trials), fast learners experience a higher proportion of switching trials. To investigate this hypothesis, we correlated the learning rate against the proportion of switches/number of total trials and found a significant relation (r = 0.83, P = 0.0002). We found behavioral evidence that, when solving our task, subjects tend to consider evidences from all dimensions concurrently and rely more on the integration strategy rather than on a hierarchical strategy that only occasionally shifts attention between dimensions (Supplemental Table S1). To statistically evaluate the fit of the attention-gating and the integration model, we next calculated BFs for the comparison of the integration with the attention-gating strategy. BFs indicate the evidence that one model is more likely than the other, with a log BF of 3 or greater corresponding to an exceedance probability of 95% (strong evidence) (MacKay 2003). Our results indicate that the integration model was more likely to explain subjects' behavior than the attention-gating model in all subjects, and 13 out of 16 subjects showed strong evidence for the integration model (Fig. 1; Supplemental Table S2). Alternatively, as the models are nested, we performed a statistical likelihood ratio test, which provides support for the integrating model based on all subject choices (X2df = 1 = 500.5, P < 0.00001; or on the individual subject level for every subject, P < 0.01). Strong support in favor of the integration model is also provided if we compare both models using a Bayesian Model Comparison (Stephan et al. 2009) (α = 17.0 for integration vs. 1 for attention-gating; posterior probability = 0.94 vs. 0.06, exceedance probability ≈ 1.0 vs. 0). Together, this suggests that subjects utilized information from the characteristics of both dimensions, rather than the alternative of concentrating only on the dimension that is believed more likely correct. Figure 2 shows profiles of hidden targets and subject's choices and rewards, and the integration model predicted values of a representative subject.

Fig. 2.

Integration model predictions and behavior. A, top: model-predicted value for tracking the color feature. Bottom: model-predicted value for tracking the motion feature. Middle: learned weight for the dimension. X-axis is time; unit is trial. The relevant dimension is indicated by a gray background in either the color or motion plot. Feature reversals within this block are shown as black lines, and the icon directly above this line denotes the new correct feature. The green/red boxes below the time courses show whether subject's choice on the trial was correct/incorrect, and a blue box indicates that the trial was rewarded. Data are shown for a representative subject over the time of the entire experiment. B: model certainty for the two features (red = color; blue = motion) and the relevant dimension. C: model-predicted choice vs. actual choice for the subject shown in A and B. The proportion of choices of the upper stimulus (dark shading) increases with higher model-predicted value for the upper stimulus. The proportion of bottom stimulus choices (light shading) follow the opposite course. The model-predicted value for the bottom stimulus is 1 − (top stimulus value). In addition to the attention-gating and integration models, we also tested other model variants, such as a fully Bayesian model or simpler nonhierarchical variants of the RL model but found that none of these could explain subject behavior as well as our integrating RL model, even after correcting for the different levels of complexity of the models (Table 1). The attention-gating model can be seen as a special case of the integration model (i.e., a nested model). However, this only happens when the transformation from value to weighting w approximates a Heaviside function (when κ → ∞), as implemented in the attention-gating model. The fitted parameter κ is shown in Supplemental Fig. S3 for all subjects. Notice that the function is far from being a Heaviside for most subjects. Although the extra parameter gives more flexibility in fitting for the integration model, the BIC correction takes this into account.

Table 1.

Pairwise model comparison using Bayesian model comparison

	Hierarchical Model Category			Single-layer Model Category
	Attention	Integration	Bayes	Stimuli	1-Layer	4-Option
Attention		>0.99	0.89	0.12	<0.01	<0.01
Integration	<0.01		0.19	0.02	<0.01	<0.01
Bayes	0.11	0.81		0.01	<0.01	<0.01
Stimuli	0.88	0.98	0.99		0.93	<0.01
1-Layer	>0.99	>0.99	>0.99	0.17		<0.01
4-Option	>0.99	>0.99	>0.99	>0.99	>0.99

Numbers indicate exceedance probabilities of the column model vs. the row model.

Pairwise model comparison using Bayesian model comparison Numbers indicate exceedance probabilities of the column model vs. the row model. Adhering to the model predicted choices is directly task relevant. Those subjects, whose choices could be relatively well explained by the integration strategy, also earned more money, as indicated by a significant correlation between the model likelihood of the integration model with the average earned money/trial (r = 0.50, P < 0.05). Furthermore, the experienced ratio of switches in the task correlates both with the log likelihood of the integration model (r = 0.86, P < 0.001) and the average across-dimensional certainty (r = 0.55, P = 0.03). Subjects adhering to the model (good model fit) experience relatively high switch ratios as the task progresses fast. A high average certainty suggests that subjects have a better grasp of what dimension is correct; again this is related to how fast the task progresses. To analyze whether the success of one strategy over another depends on the frequency of the switches, we correlated across subjects the fraction of experienced switches/total number of trials against the subjects' BFs from the model comparison (indicating the relative model fit). This relationship is not significant (r = 0.35, P > 0.17).

Neural Correlates of Decision Variables

To identify neural correlates of the valuation process, we separately regressed neural activity onto trial-by-trial value signals of the attention-gating and integration models. In particular, we were interested in the maximum stimulus value signal on each trial, as this is the key output variable of the decision-making process. In the case of the attention-gating model, this value corresponds to the RL value of selecting the better stimulus within the dimension that the model predicts is currently relevant. In the integration model, this value is a linear combination of RL values for the color and motion features, weighted by the model predicted across dimension likelihood. Based on previous findings (Hare et al. 2008; Padoa-Schioppa and Assad 2006; Plassmann et al. 2007; Wunderlich et al. 2010), we specifically predicted to find stimulus value signals in vmPFC. We found that the stimulus value signal correlated most strongly with activity in vmPFC, extending dorsally and medially along PFC (Fig. 3). We also found chosen value signals in vmPFC (Supplemental Fig. S1), although the effect size of the stimulus value was higher than that of the chosen value signal at the peak of the activation. Neural correlates of decision value and model comparison. A: blood-oxygenation-level-dependent (BOLD) responses in medial prefrontal cortex (PFC) correlate significantly with the stimulus value signal from the integration model. B: neural activity in ventromedial PFC (vmPFC) correlates with the trial-by-trial stimulus value signal. Shown is the area that is commonly activated by both the integration and the attention-gating model at P < 0.001 corrected. The comparison between the two models in C is based on this commonly activated region. C: we used a Bayesian model comparison to identify the model that can better explain neural activity in this area. Overall, the integration model (I) explains activity in this vmPFC area better than the attention-gating model (A). The exceedance probability for the integration model is 0.98. The exceedance probability is the probability that one model is more likely than the other one, i.e., that the posterior probability for the integration model is larger than 0.5. We found analogous activation patterns correlating with the attention-gating model's decision variables, albeit with a much weaker effect size and smaller extent (Supplemental Fig. S2). In particular, neural activity in medial PFC correlated with the stimulus value of both the gated and integration model. The area correlating with the attention-gating model-predicted variables was thereby entirely contained within the larger area activated by the integration model (Fig. 3).

Neurometric Model Comparison Between Attention-Gating and Integration Models

To discriminate between our two hypotheses about potential strategies, we tested if one model's prediction correlated better with blood-oxygenation-level-dependent (BOLD) activation than the other model's prediction. To perform a quantitative comparison of the value signals from both models within vmPFC, we estimated the two models in separate GLMs and determined their relative probability to explain the measured neural signal in every subject by means of a Bayesian model comparison approach (Stephan et al. 2009). The advantage of this method is that it circumvents any collinearity issues occurring from correlated regressors in a single GLM. Consistent with our behavioral results, we found that, across subjects, the integration model was more likely to be the underlying cause of the neural variability in vmPFC (Dirichlet α = 13.34 for integration vs. α = 4.66 for attention-gating posterior probability = 0.74, exceedance probability = 0.98; Fig. 3). The decisive measure in this comparison is the exceedance probability, which, in our case, indicates that the integration model explains neural data better than the attention-gating model with 98% confidence.

Certainty and Uncertainty Related Signals

To further investigate the neural substrates of this hierarchical decision task, we next looked more closely at decision-related variables of the integration model. Specifically, we tested for correlations between certainty signals and the fMRI data. The across-dimension certainty reflects how likely it is that one of the complementary dimensions is the correct one. The within-dimensions certainty pertains to subjects getting the features for the two dimensions right, which, in our model, reflects a metric of subjects' choices being reinforced. We also hypothesized that we would see increased activity relating to within-dimension uncertainty in areas related to the allocation of cognitive resources and control on the basis of task difficulty. Areas most prominently implicated in this function on the basis of past literature are the anterior cingulate cortex (ACC) and dorsolateral PFC (dlPFC) (Kerns et al. 2004; MacDonald et al. 2000; Matsumoto and Tanaka 2004). We found that within-dimension certainty correlated with activity located in ventral medial PFC (Fig. 4). The peak of this certainty-related activity was located slightly anterior to the peak of the value-related activity described above (Table 2).

Fig. 4.

Table 2.

Locations of significant correlation with value signals of the integration model

Cluster	x	y	z	Z-peak	No. of Voxels
Maximum stimulus value signal
01	0	48	−3	4.71	1,069	Ventromedial prefrontal cortex (BA 10,32)
02	0	−39	54	4.5	1,133	Precuneus (BA 7)
03	−54	−63	−6	4.44	92	Left inferior temporal gyrus (BA 19,37)
04	45	−3	−18	4.24	35	Right circular insular sulcus (BA 21)
05	42	−78	27	4.14	165	Right angular gyrus (BA 19,39)
06	−48	−66	30	4.02	111	Left angular gyrus (BA 19,39)
07	−60	−12	−27	3.89	51	Left inferior tempral sulcus (BA 20)
08	63	−54	−3	3.83	92	Right inferior temporal sulcus (BA 21)
09	24	−45	−15	3.79	87	Parahippocampal gyrus (BA 36)
Intradimensional certainty
01	−3	57	3	3.83	101	Anterior medial prefrontal cortex (BA 10)
Intradimensional uncertainty
01	51	15	45	5.3	537	Right middle frontal gyrus (BA 8)
02	33	51	15	4.65	213	Right frontal pole (BA 10)
03	3	21	48	4.64	43	Dorsomedial prefrontal cortex (BA 8)
04	48	−48	51	4.51	340	Right inferior parietal lobule (BA 40)
05	−42	45	12	4.41	169	Left frontal pole (BA 10)
06	−48	24	36	4.1	131	Left middle frontal gyrus (BA 9)
07	−42	−51	54	3.86	118	Left inferior parietal lobule (BA 40)

BA, Brodmann area. Threshold P < 0.05 family-wise error corrected for multiple comparisons at the cluster level. Montreal Neurological Institute coordinates denote the group peak voxel of each cluster.

Certainty and uncertainty of the integration model. A: BOLD responses in a subcluster of medial PFC correlate significantly with the within-dimension certainty (averaged across color and motion) from the integration model. B: dorsomedial and dorsolateral PFC and the frontal poles show negative correlations with the within-dimension certainty (thus indicating a positive correlation with uncertainty). The color-coding is identical to that in Fig. 3. Locations of significant correlation with value signals of the integration model BA, Brodmann area. Threshold P < 0.05 family-wise error corrected for multiple comparisons at the cluster level. Montreal Neurological Institute coordinates denote the group peak voxel of each cluster. We found within-dimension uncertainty correlating with activity in dorsomedial PFC adjunct to the ACC, bilaterally in dorsolateral PFC, the frontal poles, and parietal cortex (Fig. 4). No significant correlation was found between the BOLD signal and our measure of either certainty or uncertainty across dimensions, even at P < 0.001 uncorrected. A list of all activated regions is shown in Table 2.

DISCUSSION

We investigated how humans deal with complex decision tasks with multiple dimensions and evaluated the computations subserved by the PFC in this function. We specifically compared two strategies: do humans first determine the most likely dimension that explains how rewards are tied to stimuli and then choose on the basis of that theory (attention gated), or choose by integrating over multiple evidences and weighting each dimension by the likelihood that it is correct (integration strategy)? The integration model had a significantly higher likelihood for explaining behavior than the attention-gating model, and neural activity in vmPFC correlated more strongly with trial-by-trial valuations according to the integration model than the attention-gating model. Although there is much prior evidence indicating that PFC plays a critical role in hierarchical decision making (Dias et al. 1996; Downes et al. 1989; Drewe 1974; Lawrence et al. 1996; Milner 1963; Owen et al. 1993; Owen et al. 1991; Robinson et al. 1980), up to now, little was known about the specific computational processes underlying such a contribution. Our results begin to shed light on this question, by indicating that at the level of inference–when working out what choice to take next–the PFC uses probability information in an integrated fashion. In this approach, even the least likely explanation of observed phenomena is taken into account to generate a decision, and its weight is commensurate with its likelihood. Such integration by likelihood has been formalized in Bayesian analysis. Bayesian like processes have been suggested to occur during cross-modal integration in terms of how information from perceptual cues in different sensory modalities is integrated together during perceptual decision making (Battaglia et al. 2003; Beers et al. 1999; Ernst and Banks 2002; Knill and Pouget 2004; Rao et al. 2002), to account for human cognition in perception (Jacobs 1999; Körding et al. 2007), action selection (Kording and Wolpert 2004; Trommershauser et al. 2003), and human causal inference (Gopnik et al. 2004; Griffiths and Tenenbaum 2005; Körding et al. 2007; Tenenbaum 1999). Note that optimality in our task alludes to the integration of possible evidences rather than optimal updating of evidences. Indeed, our decision models do not differ at the learning level: both use the same RL principles to update values, ensuring that gated and integration approaches to decision making were compared in a clean fashion. When we fit a fully Bayesian model (with Bayesian learning of likelihoods, see supplemental methods) to behavior, results were slightly inferior, although the fully Bayesian model still performed better than the attention-gating model. We further examined post hoc whether the integration weights would affect learning by fitting a model that allows for a variable learning rate based on the weights of a dimension. Such a model gave very similar fits and produced very similar regressors to those produced by the integration model used in the fMRI analysis, indicating that we have no evidence on the basis of the present results that the integration weights are being incorporated into the learning rate. Instead these weights appear only to exert influence at the time of choice. Taken together, these findings suggest that, while we have evidence that subjects integrated evidence at the point of choice, we cannot rule out the possibility that, during learning, the updating of values may proceed through a variant of RL, as opposed to full Bayesian updating. Our findings add significantly to current understanding about the role of the vmPFC in computing value during choice. The present results show that, at the point of choice, the computations appear to involve dynamic integration of relevant information from the environment to compute value, as opposed to selectively representing the value of the most relevant features. These findings build on previous evidence that value signals in vmPFC dynamically reflect knowledge of the underlying structure in a task, as opposed to solely reflecting the predictions of a RL model (Hampton et al. 2006), as well as findings that value signals in this area reflect strategic computations during social decision making (Hampton et al. 2008). It is important to note that the present findings do not rule out a role for attention in the computation of value (Krajbich et al. 2010). Rather, our findings can be interpreted as suggesting that, during the choice process, attention might get distributed between the features of relevant stimuli during value computations. The most prominent performance deficit reported on such hierarchical tasks following lesions of PFC is perseveration, a tendency to persist in responding on the previously reinforced category following a change in stimulus contingencies (Milner 1963; Nelson 1976). These deficits have led to the proposal that PFC, and particularly its lateral aspect, contributes to facilitating the switching of attentional focus between relevant cues (Dias et al. 1996; Rogers et al. 2000), which is predicated on the notion that attention is allocated selectively to only one dimension at a time. A failure to switch behavior following a change in contingencies would, therefore, according to this suggestion, occur due to an inability to disengage attention from the previously reinforced category and relocate it toward the new category. Our results suggest that, rather than facilitating the allocation of attention to only one dimension selectively at a time, PFC instead may contribute toward the distribution of attention across multiple probable causes. Thus PFC keeps track of multiple contingencies, even if some of them are temporarily unlikely to be correct. Therefore, the pattern of impairment following lesions in PFC could result from an inability to integrate across contingencies, and hence an inability to distribute attention (or indeed readjust the distribution of attention) optimally across multiple causes, rather than exclusively controlling the attentional focus toward the most likely cause. In the evidence integration approach, trust in one's decision is best measured by our within-dimension certainty measure, which combines the likelihood that one has identified the right feature for both color and motion dimensions. Accordingly, we discovered positive correlations between the within-dimension certainty measure and activation in anterior medial PFC. We found a correlation with within-dimensional uncertainty in dorsomedial PFC, dlPFC, the frontal poles, and parietal cortex. These latter areas are more active in those trials in which subjects are unsure about the correct choice. Our finding of uncertainty related signals in dorsomedial PFC resonates with a recent implication of this region in tracking the volatility in the environment to adjust the rate at which learning occurs (Behrens et al. 2007). Furthermore, ACC and dlPFC have long been suggested to interact together for the purposes of cognitive control (Kerns et al. 2004; Koechlin et al. 2003; MacDonald et al. 2000; Matsumoto and Tanaka 2004), while frontopolar cortex has previously been associated with uncertainty (Yoshida and Ishii 2006) and exploring multiple behavioral alternatives to maximize reward (Koechlin et al. 1999; Koechlin and Hyafil 2007). It is not efficient to maintain a high level of control all the time, because cognitive control requires effort, and the brain therefore regulates control according to demand. The best strategy for this regulation is, in our task, a variation according to the uncertainty about the correct choice, which is captured by our within-dimension uncertainty signal. Note that, in a two-step attention-gated approach, for the effective allocation of attention between dimensions, cognitive control would have to be recruited in relation to uncertainty about what dimension is relevant. High uncertainty across dimensions would then render the task more difficult for subjects because they first have to settle on one dimension before they decide on the feature. In contrast, the continuously integrating model utilizes a one-step procedure, and thus task difficulty is only reflected in the uncertainty of the final output. Indeed our findings indicate that areas involved in recruiting cognitive control correlated only with the general uncertainty about the correct features (within-dimension uncertainty), but not the across-dimension uncertainty, suggesting that, in our task, cognitive control is recruited as a result of general uncertainty about which stimulus is correct rather than which dimension. A number of previous studies have attempted to differentiate the functional contributions of different regions of PFC in hierarchical decision making (Badre et al. 2010; Bechara et al. 2000; Goldman-Rakic 1996; Hampton et al. 2006; Koechlin and Summerfield 2007; Owen 1997; Wallis et al. 2001). Using a selective lesion approach, it was recently shown that damage to a number of different areas of PFC can cause impairment on hierarchical decision making in monkeys, including the orbitofrontal cortex, anterior cingulated, and ventral and dorsal PFCs, and moreover that these regions may make distinct contributions underlying performance (Buckley et al. 2009). Our finding of distinct task-relevant computational signals within PFC is broadly consistent with these findings. While vmPFC is involved in computing the value of particular choices on the basis of prior reinforcement history to guide choice, lateral PFC, frontopolar cortex, and dorsomedial PFC may contribute to the allocation of attention across the stimuli according to the degree of uncertainty (or indeed conflict) about which feature is currently correct. It is also important to consider the relationship between the present results and previous trial based analyses (Cools et al. 2002; Ghahremani et al. 2010; Hampton et al. 2006; O'Doherty et al. 2001) and neurochemical investigations (e.g., Clarke et al. 2005; Robbins and Roberts 2007; Roberts et al. 1994) of reversal learning. Most importantly, these prior studies have reported activity in lateral orbital and adjacent PFC following a change in reinforcement contingencies, leading to a switch in behavior (Cools et al. 2002). Those previous findings have implicated these areas in either the evaluation of negative reinforcement per se, or in the facilitation of behavioral switches through response inhibition and/or response selection, or the detection of contingency changes. The present results are not incompatible with those previous findings on reversal, but rather address a separate issue. Here we address the issue about how value computations are generated at the point of choice when information about value is available through multiple dimensions, while those previous studies pertain to the mechanism by which choice is alternated between different stimulus/action pairs. To assess the extent to which these previous results could be replicated in the present data, we ran an analysis in which we looked for activity immediately preceding a switch in stimulus choice and replicated at least some of the findings from this previous literature (see Supplemental Fig. S4). However, it should be noted that, because in the present study we used a probabilistic structure at both levels of the hierarchy, the present task design is not optimized for analyzing the data in such a trial-based manner. This is because we cannot unambiguously ascertain when subjects subjectively understood a switch in contingencies had occurred for a particular stimulus dimension. Here, each stimulus contained both dimensions, and subjects' choices were therefore not unique in the decomposition of whether their choice was based on color or motion, or in case of the integration model a weighted combination of both. Instead, the design of the task was optimized for a computational model-based analysis, as opposed to the trial based “switch” analysis used in those previous studies. Specifically, switches in our task were more rapid than in previous studies, increasing sensitivity for differentiating between integrating and attention gating: in a stable environment, the weight for the irrelevant dimension in the integration model would become very small over time, and the integration model would eventually predict identical outcomes as the attention-gating model. One possible concern with the present approach is that it is difficult to determine whether attention is distributed at the same time to both dimensions or switched stochastically from one to the other on a trial-by-trial basis. In this scenario, gated attention is noisy, and the dimensions are considered one at a time, but stochastically. However, this can produce similar fits to the integration model, only if alternative dimensions are not considered purely randomly, but with a frequency that mimics integrating behavior. Importantly, such a strategy still relies on quantifying the relevance of the two dimensions, which is captured by the weights in our model. Furthermore, stochastic switching would still require a two-stage decision with a choice about dimension in every trial. To perform such a behavior, the subject would require knowledge about the certainty of the most relevant dimension (across dimensional certainty), a signal that is conspicuously absent from the imaging data, even though other signals, such as the within-dimensional certainty, were present at robust statistical significance levels. Although we cannot rule out the presence of this signal on the basis of a null result, the most parsimonious explanation for our imaging data seems to be an integrative account rather than a stochastic switching account. In general, it remains also possible that, if humans are faced with a hierarchical problem of sufficient complexity (for example, requiring integration over many more than two dimensions), keeping track of all theories simultaneously becomes both cognitively too challenging and normatively ill-advised (Bröder and Schiffer 2003; Diaconis and Freedman 1986). In those instances, subjects might switch to a simpler strategy, like attention gating, which employs fewer resources. Our results indicate that the human brain, and the PFC in particular, is capable of integrating information in an optimally weighted manner. These findings could be used as a basis to promote better decision-making strategies in everyday life and in policy making, whereby decision makers are encouraged to take into account probabilistic information about multiple causes simultaneously rather than being trained to generate decisions based on sequential assumptions that may ultimately prove false.

GRANTS

This research was supported by grants from the Gordon and Betty Moore Foundation to J. P. O'Doherty, a Gordon and Betty Moore Foundation Scholarship to K. Wunderlich, a Marie Curie European reintegration grant to U. R. Beierholm, a Searle Scholarship to J. O'Doherty, and the Caltech Brain Imaging Center and the Gatsby Charitable Foundation.

DISCLOSURES

No conflicts of interest, financial or otherwise, are declared by the author(s).

57 in total

1. Abstract reward and punishment representations in the human orbitofrontal cortex.

Authors: J O'Doherty; M L Kringelbach; E T Rolls; J Hornak; C Andrews
Journal: Nat Neurosci Date: 2001-01 Impact factor: 24.884

2. Bayesian integration in sensorimotor learning.

Authors: Konrad P Körding; Daniel M Wolpert
Journal: Nature Date: 2004-01-15 Impact factor: 49.962

3. Anterior cingulate conflict monitoring and adjustments in control.

Authors: John G Kerns; Jonathan D Cohen; Angus W MacDonald; Raymond Y Cho; V Andrew Stenger; Cameron S Carter
Journal: Science Date: 2004-02-13 Impact factor: 47.728

4. Visual fixations and the computation and comparison of value in simple choice.

Authors: Ian Krajbich; Carrie Armel; Antonio Rangel
Journal: Nat Neurosci Date: 2010-09-12 Impact factor: 24.884

5. A behavioral analysis of degree of reinforcement and ease of shifting to new responses in a Weigl-type card-sorting problem.

Authors: D A GRANT; E A BERG
Journal: J Exp Psychol Date: 1948-08

6. Neural computations underlying action-based decision making in the human brain.

Authors: Klaus Wunderlich; Antonio Rangel; John P O'Doherty
Journal: Proc Natl Acad Sci U S A Date: 2009-09-28 Impact factor: 11.205

7. Contrasting cortical and subcortical activations produced by attentional-set shifting and reversal learning in humans.

Authors: R D Rogers; T C Andrews; P M Grasby; D J Brooks; T W Robbins
Journal: J Cogn Neurosci Date: 2000-01 Impact factor: 3.225

8. 6-Hydroxydopamine lesions of the prefrontal cortex in monkeys enhance performance on an analog of the Wisconsin Card Sort Test: possible interactions with subcortical dopamine.

Authors: A C Roberts; M A De Salvia; L S Wilkinson; P Collins; J L Muir; B J Everitt; T W Robbins
Journal: J Neurosci Date: 1994-05 Impact factor: 6.167

9. Extra-dimensional versus intra-dimensional set shifting performance following frontal lobe excisions, temporal lobe excisions or amygdalo-hippocampectomy in man.

Authors: A M Owen; A C Roberts; C E Polkey; B J Sahakian; T W Robbins
Journal: Neuropsychologia Date: 1991 Impact factor: 3.139

Review 10. Differential regulation of fronto-executive function by the monoamines and acetylcholine.

Authors: T W Robbins; A C Roberts
Journal: Cereb Cortex Date: 2007-09 Impact factor: 5.357

13 in total

Review 1. The neural network underlying incentive-based learning: implications for interpreting circuit disruptions in psychiatric disorders.

Authors: Suzanne N Haber; Timothy E J Behrens
Journal: Neuron Date: 2014-09-03 Impact factor: 17.173

2. Dynamic Interaction between Reinforcement Learning and Attention in Multidimensional Environments.

Authors: Yuan Chang Leong; Angela Radulescu; Reka Daniel; Vivian DeWoskin; Yael Niv
Journal: Neuron Date: 2017-01-18 Impact factor: 17.173

Review 3. Schizophrenia in translation: dissecting motivation in schizophrenia and rodents.

Authors: Eleanor H Simpson; James A Waltz; Christoph Kellendonk; Peter D Balsam
Journal: Schizophr Bull Date: 2012-09-26 Impact factor: 9.306

Review 4. Holistic Reinforcement Learning: The Role of Structure and Attention.

Authors: Angela Radulescu; Yael Niv; Ian Ballard
Journal: Trends Cogn Sci Date: 2019-02-26 Impact factor: 20.229

Review 5. Insights from the application of computational neuroimaging to social neuroscience.

Authors: Simon Dunne; John P O'Doherty
Journal: Curr Opin Neurobiol Date: 2013-03-18 Impact factor: 6.627

Review 6. Measuring reinforcement learning and motivation constructs in experimental animals: relevance to the negative symptoms of schizophrenia.

Authors: Athina Markou; John D Salamone; Timothy J Bussey; Adam C Mar; Daniela Brunner; Gary Gilmour; Peter Balsam
Journal: Neurosci Biobehav Rev Date: 2013-08-28 Impact factor: 8.989

Review 10. From Thought to Action: How the Interplay Between Neuroscience and Phenomenology Changed Our Understanding of Obsessive-Compulsive Disorder.

Authors: J Bernardo Barahona-Corrêa; Marta Camacho; Pedro Castro-Rodrigues; Rui Costa; Albino J Oliveira-Maia
Journal: Front Psychol Date: 2015-11-23