| Literature DB >> 30302389 |
Przemysław Eligiusz Cieślak1, Woo-Young Ahn2, Rafał Bogacz3, Jan Rodriguez Parkitna1.
Abstract
Selecting the most advantageous actions in a changing environment is a central feature of adaptive behavior. The midbrain dopamine (DA) neurons along with the major targets of their projections, including dopaminoceptive neurons in the frontal cortex and basal ganglia, play a key role in this process. Here, we investigate the consequences of a selective genetic disruption of NMDA receptor and metabotropic glutamate receptor 5 (mGluR5) in the DA system on adaptive choice behavior in mice. We tested the effects of the mutation on performance in the probabilistic reinforcement learning and probability-discounting tasks. In case of the probabilistic choice, both the loss of NMDA receptors in dopaminergic neurons or the loss mGluR5 receptors in D1 receptor-expressing dopaminoceptive neurons reduced the probability of selecting the more rewarded alternative and lowered the likelihood of returning to the previously rewarded alternative (win-stay). When observed behavior was fitted to reinforcement learning models, we found that these two mutations were associated with a reduced effect of the expected outcome on choice (i.e., more random choices). None of the mutations affected probability discounting, which indicates that all animals had a normal ability to assess probability. However, in both behavioral tasks animals with targeted loss of NMDA receptors in dopaminergic neurons or mGluR5 receptors in D1 neurons were significantly slower to perform choices. In conclusion, these results show that glutamate receptor-dependent signaling in the DA system is essential for the speed and accuracy of choices, but at the same time probably is not critical for correct estimation of probable outcomes.Entities:
Keywords: decision-making; dopamine; glutamate receptors; mouse behavior; reinforcement learning
Mesh:
Substances:
Year: 2018 PMID: 30302389 PMCID: PMC6175304 DOI: 10.1523/ENEURO.0331-18.2018
Source DB: PubMed Journal: eNeuro ISSN: 2373-2822
Figure 1.The probabilistic reinforcement learning task. A, Schematic representation of the probabilistic reinforcement learning task. The animal could make a nose-poke in one of two ports. Following a nose-poke, water could have been delivered with the probability depending on the chosen port. The nose-poke ports were randomly assigned 80% or 20% reward probabilities. During each session, the reward probabilities were reversed after 60 trials. B, An example the choice behavior of a mouse in 600 trials (sessions 6–10). The black line shows the probability of choosing the left side (data smoothed with the 21 point moving average). The cyan bars indicate the side with the higher probability of reward delivery. The red dashed line indicates session boundaries. C–H, Probability of selecting the alternative with the higher reward probability by the NR1DATCreERT2 (mutant, n = 6; control, n = 8; C, F), mGluR5KD-D1 (mutant, n = 8; control, n = 9; D, G), and NR1D1CreERT2 (mutant, n = 6; control, n = 9; E, H) strains. C–E, Session-by-session analysis; data were collapsed across trials. F–H, Trial-by-trial analysis; data were collapsed across sessions. Data are represented as the mean ± SEM.
Figure 5.The probability-discounting task. A, Schematic representation of the probability-discounting task. One nose-poke port was associated with the delivery of small certain rewards, while the other nose-poke port was associated with the delivery of large uncertain rewards. Each session consisted of 20 forced trials during which only one port was active, followed by 40 free choice trials during which both ports were active. B–D, The graphs show the frequency of choosing the larger reward as a function of its probability in the NR1DATCreERT2 (mutant, n = 6; control, n = 7; B), mGluR5KD-D1 (mutant, n = 8; control, n = 9; C), and NR1D1CreERT2 (mutant, n = 5; control, n = 9; D) strains. Data are represented as the mean ± SEM.
Statistical table
| Figure | Data structure | Type of test | 95% CIs or 95% HDIs |
|---|---|---|---|
|
| Assumed normal distribution | Hyperposterior distribution | (−0.3601, 0.0779) |
|
| Assumed normal distribution | Hyperposterior distribution | (−0.2952, 0.129) |
|
| Assumed normal distribution | Hyperposterior distribution | (−0.461, −0.1018) |
|
| Assumed normal distribution | Hyperposterior distribution | (−0.3081, 0.2367) |
|
| Assumed normal distribution | Hyperposterior distribution | (−0.2532, 0.4388) |
|
| Assumed normal distribution | Hyperposterior distribution | (−0.5163, −0.1429) |
|
| Assumed normal distribution | Hyperposterior distribution | (−0.3631, 0.0847) |
|
| Assumed normal distribution | Hyperposterior distribution | (−0.2793, 0.2792) |
|
| Assumed normal distribution | Hyperposterior distribution | (−0.4919, 0.2115) |
|
| Assumed normal distribution | Two-tailed | (−0.126, −0.03798) |
|
| Assumed normal distribution | Two-tailed | (−0.05501, 0.1081) |
|
| Assumed normal distribution | Two-tailed | (−0.1675, −0.03254) |
|
| Assumed normal distribution | Two-tailed | (−0.01564, 0.1316) |
|
| Assumed normal distribution | Two-tailed | (−0.1263, 0.01433) |
|
| Assumed normal distribution | Two-tailed | (−0.01022, 0.1521) |
|
| Assumed normal distribution | Two-tailed | (−0.126, −0.03624) |
|
| Assumed normal distribution | Two-tailed | (−0.0746, 0.04841) |
|
| Assumed normal distribution | Two-tailed | (−0.1797, −0.03847) |
|
| Assumed normal distribution | Two-tailed | (−0.02373, 0.05712) |
|
| Assumed normal distribution | Two-tailed | (−0.1268, 0.01674) |
|
| Assumed normal distribution | Two-tailed | (−0.02587, 0.1276) |
|
| Assumed normal distribution | Bonferroni-corrected | (−0.151, −9.144) |
|
| Assumed normal distribution | Bonferroni-corrected | (−5.636, −14.629) |
|
| Assumed normal distribution | Bonferroni-corrected | (0.535, −9.078) |
|
| Assumed normal distribution | Bonferroni-corrected | (−5.594, −13.919) |
|
| Assumed normal distribution | Two-tailed | (0.07629, 0.2383) |
|
| Assumed normal distribution | Bonferroni-corrected | (1.211, −5.670) |
|
| Assumed normal distribution | Bonferroni-corrected | (−2.656, −9.537) |
|
| Assumed normal distribution | Bonferroni-corrected | (−4.688, −11.769) |
|
| Assumed normal distribution | Bonferroni-corrected | (−8.758, −15.433) |
|
| Assumed normal distribution | Two-tailed | (0.03127, 0.2603) |
|
| Assumed normal distribution | Bonferroni-corrected | (3.862, −5.833) |
|
| Assumed normal distribution | Bonferroni-corrected | (2.707, −6.988) |
|
| Assumed normal distribution | Bonferroni-corrected | (1.383, −9.238) |
|
| Assumed normal distribution | Bonferroni-corrected | (−0.746, −9.418) |
|
| Assumed normal distribution | Two-tailed | (−0.09834, 0.06706) |
|
| Assumed normal distribution | Bonferroni-corrected | (44.207, −33.477) |
|
| Assumed normal distribution | Bonferroni-corrected | (42.318, −35.366) |
|
| Assumed normal distribution | Bonferroni-corrected | (46.199, −31.485) |
|
| Assumed normal distribution | Bonferroni-corrected | (23.622, −54.062) |
|
| Assumed normal distribution | Bonferroni-corrected | (18.145, −26.659) |
|
| Assumed normal distribution | Bonferroni-corrected | (13.173, −31.631) |
|
| Assumed normal distribution | Bonferroni-corrected | (18.416, −26.388) |
|
| Assumed normal distribution | Bonferroni-corrected | (18.624, −26.179) |
|
| Assumed normal distribution | Bonferroni-corrected | (25.957, −20.935) |
|
| Assumed normal distribution | Bonferroni-corrected | (18.868, −28.024) |
|
| Assumed normal distribution | Bonferroni-corrected | (1.613, −45.279) |
|
| Assumed normal distribution | Bonferroni-corrected | (19.101, −27.791) |
|
| Assumed normal distribution | Bonferroni-corrected | (−8.519, −16.751) |
|
| Assumed normal distribution | Bonferroni-corrected | (−7.398, −15.630) |
|
| Assumed normal distribution | Bonferroni-corrected | (−6.524, −14.756) |
|
| Assumed normal distribution | Bonferroni-corrected | (−4.346, −12.578) |
|
| Assumed normal distribution | Bonferroni-corrected | (−6.166, −15.530) |
|
| Assumed normal distribution | Bonferroni-corrected | (−2.561, −11.925) |
|
| Assumed normal distribution | Bonferroni-corrected | (−1.947, −11.312) |
|
| Assumed normal distribution | Bonferroni-corrected | (−1.615, −10.979) |
|
| Assumed normal distribution | Bonferroni-corrected | (−2.772, −8.595) |
|
| Assumed normal distribution | Bonferroni-corrected | (−1.419, −7.243) |
|
| Assumed normal distribution | Bonferroni-corrected | (−0.924, −6.748) |
|
| Assumed normal distribution | Bonferroni-corrected | (−1.544, −7.368) |
|
| Assumed normal distribution | Bonferroni-corrected | (−0.135, −7.745) |
|
| Assumed normal distribution | Bonferroni-corrected | (0.421, −7.189) |
|
| Assumed normal distribution | Bonferroni-corrected | (0.512, −7.098) |
|
| Assumed normal distribution | Bonferroni-corrected | (0.636, −6.974) |
|
| Assumed normal distribution | Bonferroni-corrected | (2.767, −5.107) |
|
| Assumed normal distribution | Bonferroni-corrected | (3.120, −4.754) |
|
| Assumed normal distribution | Bonferroni-corrected | (3.388, −4.486) |
|
| Assumed normal distribution | Bonferroni-corrected | (2.297, −5.578) |
|
| Assumed normal distribution | Bonferroni-corrected | (2.447, −5.754) |
|
| Assumed normal distribution | Bonferroni-corrected | (3.649, −4.552) |
|
| Assumed normal distribution | Bonferroni-corrected | (4.181, −4.020) |
|
| Assumed normal distribution | Bonferroni-corrected | (0.952, −7.249) |
Model comparisons using the LOOIC
| Group | Model 1 | Model 2 | |
|---|---|---|---|
| NR1DATCreERT2
| 16,167.7 | 14,203.5 | |
| mGluR5KD-D1
| 18,640.8 | 17,888.8 | |
| NR1D1CreERT2
| 17,247.8 | 16,011.6 |
Lower values of LOOIC indicate better model fits. The best performing model is highlighted with bold type; model 3 outperformed other models in all groups. Model 1, Separate learning rates for positive and negative reward PE; model 2, a single learning rate for PE and fictitious updating for the unchosen option; model 3, separate learning rates for positive and negative PE and fictitious updating for the unchosen option.
Figure 2.Computational modeling results. A–C, Density plots of posterior group parameter distributions with the best model (model 3) for the NR1DATCreERT2 (A), mGluR5KD-D1 (B), and NR1D1CreERT2 (C) strains. Credible differences are marked with stars, and vertical bars below the plots show 95% HDI ranges.
Figure 3.Effects of previous outcomes on choice. A–C, Probabilities of repeating the same choice when the previous response was rewarded (win-stay) or switching to an alternative choice when the preceding response yielded no reward (lose-shift) in the NR1DATCreERT2 (mutant, n = 6; control, n = 8; A), mGluR5KD-D1 (mutant, n = 8; control, n = 9; B), and NR1D1CreERT2 (mutant, n = 6; control, n = 9; C) strains. The probability of win-stay was calculated as the number of times the animal chose the same side as the side chosen during the previously rewarded trial divided by the total number of rewarded trials, while the lose-shift probability was calculated as the number of times the animal changed its choice when the preceding response yielded no reward divided by the total number of unrewarded trials. D–F, Simulation performance of the best model (model 3) with respect to mimicking win-stay/lose-shift choice behavior. Data are represented as the mean ± SEM. **p < 0.01 (t test).
Figure 4.Reaction times in the probabilistic reinforcement learning task. A–I, Graphs show the reaction times observed in the NR1DATCreERT2 (mutant, n = 6; control, n = 8; A–C), mGluR5KD-D1 (mutant, n = 8; control, n = 9; D–F), and NR1D1CreERT2 (mutant, n = 6; control, n = 9; G–I) strains. A, D, and G show the time elapsed from the trial onset to the choice port entry. B, E, and H show the time from the new trial onset to the choice port entry following previously unrewarded (lose) or rewarded (win) trials. C, F, and I summarize the time from the reward delivery to the reward port entry. Values represent the mean choice latency (all sessions combined) ± SEM. *p < 0.05, **p < 0.01, ***p < 0.001 (Bonferroni-corrected t test or t test).
Figure 6.Reaction times in the probability-discounting task. A–C, Time elapsed from the trial onset to the choice port entry during the forced choice (left) and free choice (right) trials in the NR1DATCreERT2 (mutant, n = 6; control, n = 7; A), mGluR5KD-D1 (mutant, n = 8; control, n = 9; B), and NR1D1CreERT2 (mutant, n = 5; control, n = 9; C) strains. Bars represent the mean choice latency ± SEM. *p < 0.05, **p < 0.01, ***p < 0.001 (Bonferroni-corrected t test).