| Literature DB >> 31219633 |
Mario Martinez-Saito1, Rodion Konovalov2, Michael A Piradov2, Anna Shestakova1, Boris Gutkin1,3, Vasily Klucharev1.
Abstract
Competition for resources is a fundamental characteristic of evolution. Auctions have been widely used to model competition of individuals for resources, and bidding behaviour plays a major role in social competition. Yet, how humans learn to bid efficiently remains an open question. We used model-based neuroimaging to investigate the neural mechanisms of bidding behaviour under different types of competition. Twenty-seven subjects (nine male) played a prototypical bidding game: a double action, with three "market" types, which differed in the number of competitors. We compared different computational learning models of bidding: directional learning models (DL), where the model bid is "nudged" depending on whether it was accepted or rejected, along with standard reinforcement learning models (RL). We found that DL fit the behaviour best and resulted in higher payoffs. We found the binary learning signal associated with DL to be represented by neural activity in the striatum distinctly posterior to a weaker reward prediction error signal. We posited that DL is an efficient heuristic for valuation when the action (bid) space is continuous. Indeed, we found that the posterior parietal cortex represents the continuous action space of the task, and the frontopolar prefrontal cortex distinguishes among conditions of social competition. Based on our findings, we proposed a conceptual model that accounts for a sequence of processes that are required to perform successful and flexible bidding under different types of competition.Entities:
Keywords: adaptive learning; internal number line; social competition; striatum; value-based decision-making
Mesh:
Year: 2019 PMID: 31219633 PMCID: PMC6899836 DOI: 10.1111/ejn.14492
Source DB: PubMed Journal: Eur J Neurosci ISSN: 0953-816X Impact factor: 3.386
Figure 1Task design and behavioural results. (a) Each trial consisted of four stages: market type announcement, lottery, bid selection and game outcome feedback. During the market announcement stage (MARKET), the subject was informed of the market type of the current trial. The next stage (LOTTERY) indicated whether the subject would go forth to the next stage or be redirected to the beginning of the next trial. In the former case, a Likert scale was displayed, and the subject had to choose her bid by sliding a vertical bar (CHOICE). Finally, the game outcome stage (OUTCOME) signalled whether the bid was accepted (ACCEPTED) or rejected (REJECTED). (b) Upper: behavioural learning dynamics of bids across all subjects. Lower: pairwise differences of bid sizes among market types. Box “hinges” represent first and third quartiles. (c) Bid adjustments were contingent on the previous trial's outcome of the same market type. [Colour figure can be viewed at http://wileyonlinelibrary.com]
Figure 2RL‐ and DL‐type algorithms comparison. (a) Normal form (top centre) of a one seller versus one buyer (NC market) game: matrix cell colours represent the buyer's payoff. The buyer holds an estimate of the (possibly varying) seller's ask price (horizontal fuzzy white stripe) and tries to maximize profit by choosing the lowest possible bid that does not fall in a cell of the zero‐profit yielding upper right triangle. (b) Simulations enacting bidding behaviour of learning algorithms. Artificial bidders (left column: best‐fitting DL algorithm; right column: best‐fitting RL algorithm) were pitted against the subjects of the prerecorded dataset for 29 sessions and their preferred bids averaged within each trial and market type. (Upper left) Estimated prior parametric action‐value functions (using a Beta distribution with rescaled support and range) for each market type. (Lower left) Simulated maxima of each market action‐value function at each trial. (Upper right) Estimated initial preferred bids. (Lower right) Simulated preferred bids at each trial. Errorbars indicate s.e.m. [Colour figure can be viewed at http://wileyonlinelibrary.com]
Ranks and BIC scores for all fitted algorithms
| Rank by BIC | Random effects (RFX) | Fitted parameters | Fixed effects (FFX) | Agent name | Agent type | Number of parameters | |||
|---|---|---|---|---|---|---|---|---|---|
| Negative log‐likelihood per subject | BIC per subject | Negative log‐likelihood per subject | BIC per subject | Fitted parameters | |||||
| 1 | 28.04 ± 3.97 | 64.97 ± 11.50 |
| 34.15 ± 2.64 | 68.63 ± 5.41 |
| Leptokurtic DL with delta rule | Leptokurtic jitter + delta rule + DL | 5 |
|
|
| ||||||||
|
|
| ||||||||
|
|
| ||||||||
| k = 0.30 ± 0.03 | k = 0.39 | ||||||||
| 2 | 31.79 ± 2.08 | 67.14 ± 7.72 |
| 39.21 ± 2.55 | 78.55 ± 5.21 |
| Gaussian DL with delta rule | Gaussian jitter + delta rule + DL | 2 |
|
|
| ||||||||
| 3 | 35.90 ± 1.89 | 78.91 ± 7.33 |
| 46.26 ± 2.10 | 92.78 ± 4.32 |
| Leptokurtic naive DL | Leptokurtic jitter + DL | 4 |
|
|
| ||||||||
|
|
| ||||||||
|
|
| ||||||||
| 4 | 96.19 ± 1.71 | 195.94 ± 6.98 |
| 101.98 ± 2.04 | 204.09 ± 4.21 |
| Model‐based counterfactual RL | Softmax + Counterfactual learning RL | 2 |
|
|
| ||||||||
| 5 | 101.13 ± 1.00 | 205.82 ± 6.15 |
| 103.88 ± 1.55 | 207.89 ± 3.22 |
| Model‐free RL with coarse bid space | Softmax + model‐free RL | 2 |
|
|
| ||||||||
| 6 | 100.65 ± 1.26 | 204.86 ± 6.07 |
| 103.45 ± 1.66 | 207.03 ± 3.44 |
| Model‐free RL | Softmax + model‐free RL | 2 |
|
|
| ||||||||
| 7 | 120.26 | 242.30 | – | 120.26 | 242.30 | – | Null model | Null | 1 |
“Jitter” refers to the shape of the probability distribution function used to model the variability of the bid selection process. α: learning rate; β: inverse temperature; σ ,σ ,σ 0 : variance of Laplace distributions; k: proportion of trials with explorative (risky) versus exploitative (safe) bids; n up, n down: fixed nudge size in the naive nudger algorithm. ±signify standard error of the mean across subjects.
Some instances of the RFX log‐likelihood optimization did not converge. Only those which achieved convergence are used.
Figure 3Algorithm fit scores and correlations with individual profits during the task. (a) BIC scores averaged within algorithm classes (DL: models 1‐3, RL: models 4‐6 in Table 1). Error bars indicate 95% confidence intervals. (b) Correlation of market differentiation index with profits averaged across the whole task. The line slope corresponds to a (Pearson's product‐moment) correlation coefficient of 0.524 (p = 0.003). (c) Scatter plot of subjects’ DL‐compliance scores and profits averaged across the whole task. The line slope corresponds to a correlation coefficient of 0.466 (p = 0.01). N = 27
Neural activity related to market type recognition and expected value (Figure 4)
| Contrast (Figure) | Region | Cluster p‐value FWE‐corrected | Cluster extent k | Peak T statistic | MNI (x, y, z) |
|---|---|---|---|---|---|
| MARKETxBC vs MARKETxNC (Figure | Left SPL | 0.085 | 43 | 5.31 | −33 −46 48 |
| Right SPL | 0.044 | 53 | 4.55 | 36 −46 60 | |
| Right ANG | 3.92 | 39 −46 45 | |||
| MARKETxSC vs MARKETxNC (Figure | Left SPL | 0.818 | 9 | 3.75 | −33 −52 48 |
| CHOICE_PBV (Figure | Left SPL | 0.630 | 15 | 3.99 | −47 −48 52 |
| REJECTED vs ACCEPTED, MDI‐modulated, group level (Figure | Right SFG | 0.031 | 76 | 5.05 | 21 59 19 |
| Left SFG | 0.125 | 47 | 4.53 | −24 53 23 | |
| Right MFC | 0.582 | 17 | 4.46 | 6 29 −14 | |
| Right ANG | 0.301 | 30 | 4.26 | 60 −52 23 | |
| Right TrIFG | 0.258 | 33 | 4.18 | 54 32 4 | |
| Left MSFG | 0.528 | 19 | 4.11 | −3 50 4 |
Activity is thresholded at p < 0.001 (uncorrected for the whole brain), except for non‐orthogonalized contrasts in striatal areas, which are thresholded at FWER p < 0.05 voxelwise. x, y, z: stereotactic coordinates of the MNI template. Atlas labels were provided by Neuromorphometrics, Inc.
Abbreviations: AIns, anterior insula; ANG, angular gyrus; CblExt, cerebellum exterior; MFC, medial frontal cortex; MFG, middle frontal gyrus; MorG, medial orbital gyrus; MSFG, superior frontal gyrus medial segment; NAcc, accumbens area; OCP, occipital pole; SFG, superior frontal gyrus; SPL, superior parietal lobule; STG, superior temporal gyrus; TrIFG, triangular part of the inferior frontal gyrus.
Neural activity coding error signals pseudo‐RPE and DS (Figure 5)
| Contrast (Figure) | Region | Cluster p‐value FWE‐corrected | Cluster extent k | Peak T statistic | MNI (x, y, z) |
|---|---|---|---|---|---|
| DS (Figure | Left Putamen | <0.001 | 47 | 7.90 | −30 −10 8 |
| Right CblExt | <0.001 | 147 | 7.70 | 33 −58 −40 | |
| Left MorG | <0.001 | 20 | 7.68 | −24 35 −18 | |
| Right Putamen | <0.001 | 35 | 7.62 | 30 −10 4 | |
| Left CblExt | <0.001 | 83 | 7.36 | −15 −52 −18 | |
| Left Caudate | <0.001 | 16 | 7.30 | −24 −19 23 | |
| Right Caudate | <0.001 | 51 | 7.29 | 24 −10 26 | |
| Right Putamen | 6.99 | 24 14 0 | |||
| Right CblExt | 0.001 | 9 | 7.11 | 6 −70 −33 | |
| Right OCP | 0.001 | 12 | 7.02 | 18 −100 8 | |
| Left Caudate | <0.001 | 13 | 6.46 | −21 11 19 | |
| Right SPL | 0.003 | 6 | 6.38 | 45 −43 60 | |
| Pseudo−RPE (Figure | Right CblExt | <0.001 | 119 | 8.49 | 18 −67 −22 |
| Left OCP | <0.001 | 25 | 7.26 | −12 −103 4 | |
| Right NAcc | <0.001 | 48 | 7.18 | 12 17 −11 | |
| Right Putamen | 7.16 | 21 14 −11 | |||
| Right Putamen | <0.001 | 14 | 6.87 | 30 −13 8 | |
| Left SMG | 0.003 | 7 | 6.81 | −57 −34 45 | |
| Left MFG | 0.001 | 10 | 6.66 | −36 35 30 | |
| Left MFG | 0.002 | 9 | 6.30 | −39 38 15 | |
| Right OCP | 0.004 | 6 | 6.11 | 15 −100 11 | |
| Left CblExt | 0.003 | 7 | 6.09 | −12 −52 −22 | |
| Ort‐pseudo‐RPE (Figure | Left MFG | <0.001 | 197 | 5.14 | −24 20 63 |
| Right SPL | 0.315 | 29 | 4.65 | 27 −61 34 | |
| Right MFG | 0.196 | 38 | 4.63 | 42 14 56 | |
| Left SPL | 0.023 | 82 | 4.58 | −21 −46 45 | |
| Right SFG | 0.283 | 31 | 4.31 | 27 14 63 | |
| Right MFG | 0.501 | 20 | 4.19 | 36 38 30 | |
| Right MFG | 0.924 | 5 | 4.09 | 48 41 26 | |
| Right ACgG | 0.728 | 12 | 4.03 | 12 38 11 | |
| Left Nacc | 0.609 | 16 | 4.01 | −9 8 −7 | |
| Left Caudate | 0.788 | 10 | 3.89 | −15 −4 23 | |
| Right MFG | 0.924 | 5 | 3.88 | 39 47 8 | |
| Left ACgG | 0.924 | 5 | 3.65 | −3 32 −11 | |
| Ort‐DS (Figure | Left Caudate | 0.070 | 56 | 5.36 | −27 −7 26 |
| Left Putamen | 4.43 | −27 −10 8 | |||
| Right Caudate | 0.227 | 34 | 5.06 | 24 −10 26 | |
| Right Putamen | 4.15 | 27 −10 11 | |||
| Right STG | 0.057 | 60 | 4.87 | 57 −28 8 | |
| Right Caudate | 0.543 | 18 | 4.71 | 21 20 15 |
Neural activity during OUTCOME stage associated with follow‐up bid increases (Figure 6)
| Contrast (Figure) | Region | Cluster p‐value FWE‐corrected | Cluster extent k | Peak T statistic | MNI (x, y, z) |
|---|---|---|---|---|---|
| ACCEPTED bid increase‐modulated (Figure | Right Caudate | 0.515 | 16 | 5.21 | 18 5 19 |
| Right Putamen | 0.020 | 59 | 5.13 | 18 8 −11 | |
| Right AIns | 4.16 | 33 11 −18 | |||
| Left MFG | 0.764 | 10 | 4.70 | −33 56 19 | |
| Left MFG | 0.035 | 51 | 4.62 | −30 41 34 | |
| Right SMG | 0.047 | 47 | 4.59 | 63 −34 19 | |
| Left Putamen | 0.202 | 28 | 4.50 | −21 8 −7 | |
| Right SFG | 0.917 | 6 | 3.91 | 24 44 26 | |
| Left MSFG | 0.806 | 9 | 3.82 | −9 50 0 | |
| REJECTED bid increase‐modulated (Figure | Right Putamen | 0.818 | 9 | 4.19 | 24 14 −3 |
Figure 4Neural activity related to market type recognition and expected value. (a) Left: stronger superior parietal cortex activity in BC as compared to NC condition during market entrance (MARKET_BC vs MARKET_NC). Right: stronger left superior parietal cortex activity in SC market as compared to NC market during market entrance (MARKET_SC vs MARKET_NC). (b) Activation reflecting modulation by the preferred bid during bid choice (CHOICE_PBV). (c) Feedback processing‐related activity (outcome stage, REJECTED vs ACCEPTED) modulated by individual differences in market differentiation index in the right medial frontal cortex (C Left) and frontopolar cortex (C Right). Activation maps are thresholded at p < 0.001 uncorrected, indicated by black contour lines. Clusters are listed in Table 2. Dual‐coded images represent both significance level and effect size by means of colour saturation and hue, respectively. [Colour figure can be viewed at http://wileyonlinelibrary.com]
Figure 5Neural correlates of pseudo‐RPE and DS signals based on the best‐fitting DL algorithm in anterior putamen and nucleus accumbens area and posterior putamen during OUTCOME. (a) Correlated activity in the anterior (y = 16) and posterior (y = −10) putamen was stronger for pseudo‐RPE and DS, respectively, during feedback. From left to right columns: pseudo‐RPE (p < 0.05, FWER), DS (p < 0.05, FWER), pseudo‐RPE orthogonalized with respect to DS (p < 0.001, unc) and DS orthogonalized with respect to pseudo‐RPE (p < 0.001, unc). The exemplary design matrix illustrates the correspondence between first and second parametric modulators and non‐orthogonalized and orthogonalized regressors, respectively. (b) Barchart of signal estimation (in grand mean percentage) by brain region. Signals were averaged within anatomical ROIs for basal ganglia (Palminteri et al., 2015) and on an 8‐mm sphere in PPC. oDS and oRPE correspond to DS and pseudo‐RPE signals after being orthogonalized with respect to each other, respectively. Activation maps DS and pseudo‐RPE are thresholded at p < 0.05 FWER‐corrected, whereas ort‐DS and ort‐pseudo‐RPE at p < 0.001 uncorrected. Clusters are listed in Table 3. [Colour figure can be viewed at http://wileyonlinelibrary.com]
Figure 6(a) Neural activity during positive feedback (ACCEPTED) in dlPFC (Left) and striatal (Right) areas that was modulated by bid increases in the next trial of the same market type. (b) Neural activity during negative feedback (REJECTED) in putamen that was modulated by bid increases in the next trial of the same market type. Clusters are listed in Table 4. [Colour figure can be viewed at http://wileyonlinelibrary.com]