| Literature DB >> 32462432 |
Neil M Dundon1,2, Neil Garrett3,4, Viktoriya Babenko1, Matt Cieslak5, Nathaniel D Daw3, Scott T Grafton6.
Abstract
Appraising sequential offers relative to an unknown future opportunity and a time cost requires an optimization policy that draws on a learned estimate of an environment's richness. Converging evidence points to a learning asymmetry, whereby estimates of this richness update with a bias toward integrating positive information. We replicate this bias in a sequential foraging (prey selection) task and probe associated activation within the sympathetic branch of the autonomic system, using trial-by-trial measures of simultaneously recorded cardiac autonomic physiology. We reveal a unique adaptive role for the sympathetic branch in learning. It was specifically associated with adaptation to a deteriorating environment: it correlated with both the rate of negative information integration in belief estimates and downward changes in moment-to-moment environmental richness, and was predictive of optimal performance on the task. The findings are consistent with a framework whereby autonomic function supports the learning demands of prey selection.Entities:
Keywords: Decision-making; Learning; Sequential foraging; Sympathetic stress
Year: 2020 PMID: 32462432 PMCID: PMC7651516 DOI: 10.3758/s13415-020-00799-0
Source DB: PubMed Journal: Cogn Affect Behav Neurosci ISSN: 1530-7026 Impact factor: 3.282
Fig. 1Prey selection paradigm. Subjects decide whether to capture or release serially approaching invaders during their 2-s approach to the cockpit (panel A). Releasing an invader progresses immediately to the next invader, while capturing the invader incurs a capture time cost, and fuel reward. Four invader identities (panel B) map onto a two-by-two reward-by-cost value space, and can be described categorically as high (green), mid (blue), or low (brown) profitability. Subjects foraged for 12 min in each of two environments (panel C) with different proportions of invader profitability. Panel D: Replication of Garret and Daw (2019) learning asymmetry. Order of foraging (boom – downturn, BD; downturn – boom, DB) predicts optimal behavior (higher rank 3 captures in downturn relative to boom state). Learning deterioration of an environment takes longer than learning state improvement. Error bars illustrate the standard error of the mean across subjects
Fig. 2Dynamics of a template heart beat (k), as measured by electrocardiogram (ECG; green) and impedance cardiogram (ICG; blue). Pre-ejection period (PEP) indexes sympathetic-mediated myocardial contractility, computed as the time between early ventricular depolarization (point Q on the ECG) and the opening of the aortic valve (point B on the ICG). Note that in our analyses we used a more easily identified ECG landmark for early ventricular depolarization (point R) and reverse-signed each estimate (see Methods:). Heart rate (influenced by both sympathetic and parasympathetic activity) is computed as the reciprocal of the R-R intervals
Fig. 3Relationship between autonomic states, value and capture. Panel A: A single model of trial-wise choice returns a significant main effect where increased contractility (shorter PEP) increases capture (left panel) regardless of value and a significant interaction where increased low value capture is associated with decreased heart-rate (HR). Here, value is described as in Eq. [1], i.e., reward relative to opportunity cost in current objective rate of reward, with two levels of the continuous value (0 = low; 2 = high) modelled for illustration. As a guide, gray arrows above the plot describe direction of increased (+) or decreased (-) drive within the respective physiological variable. Panel B: PEP tracks deterioration in environmental richness. Only trial-wise changes in PEP significantly predicted trial-wise changes in environmental richness (), negative coefficient indicates that decreases in reward rate, i.e., deterioration, increased contractility (shorter PEP). Panel C: PEP predicts optimal learning. Blockwise changes in percentage of rank 3 captures (downturn – boom) modeled as a function of blockwise changes in PEP (red) and HR (gray); separately for mean physiological changes in the early (0–360 s) and later (360–720 s) portion of blocks. Optimal performance (higher D-B rank 3 score) predicted by higher relative drive in early downturn state, relative to early boom, only for PEP. *p<0.05; ***p<0.001
Model fitting and parameters for Symmetry and Asymmetry Models. The table summarizes for each model its fitting performances and its average parameters
| Model | LOOcv | α | α+ | α- | β0 | β1 |
|---|---|---|---|---|---|---|
| Symmetry Model | 1230.54 | -2.49 [95% CI: ± 0.54] | 1.80 [95% CI: ± 0.66] | 0.15 [95% CI: ± 0.05] | ||
| Asymmetry Model | 1076.02** | - | -2.17 [95% CI: ± 0.38] | -2.38 [95% CI: ± 0.49] | -0.15 [95% CI: ± 1.79] | 0.13 [95% CI: ± 0.03] |
LOOcv leave-one-out cross-validation scores, summed over participants, α learning rate for both positive and negative prediction errors (Symmetric Model), α+ learning rate for positive prediction errors, α- average learning rate for negative prediction errors (Asymmetric Model), β softmax intercept (bias towards reject), β softmax slope (sensitivity to the difference in the value of rejecting versus the value of accepting an option). Data are expressed as mean and 95% confidence intervals (CIs) (calculated as 1.96*standard error). Note: learning rates displayed here (α, α+, and α-) are untransformed parameters from the model fitting procedure; the function 0.5 + 0.5*erf(α/sqrt(2)) is subsequently applied to transform these to conventional learning rates within the range 0–1 **p<0.01 comparing LOOcv scores between the two models, paired-sample t-test
Model fitting and parameters for physiology-modulated learning models. Each model allows either α+ or α- (untransformed) to be modulated on a trial by trial (t) basis according to one of the physiological readouts (PEP or HR) according to an intercept and a slope (e.g., α+(t) = intercept + w*PEP(t)). A transfer function [0.5 + 0.5*erf(α/sqrt(2))] is then used to convert this to a conventional learning rate. The alternate learning rate is not modulated by physiology and is fit as a single free parameter. The table summarizes for each model its fitting performances and its average parameters
| Model | LOOcv | α+ | Intercept (α+) | w(α+) | α- | Intercept (α-) | w(α-) | β0 | β1 |
|---|---|---|---|---|---|---|---|---|---|
| PEP Modulate α+ | 1023.63 | -2.60 [95% CI: ± 1.23] | 0.002 [95% CI: ± 0.02] | -2.67 [95% CI: ± 0.40] | -0.19 [95% CI: ± 1.88] | 0.13 [95% CI: ± 0.03] | |||
| PEP Modulate α- | 1008.02* | -2.41 [95% CI: ± 0.21] | -2.11 [95% CI: ± 1.05] | -0.007 [95% CI: ± 0.01] | -0.30 [95% CI: ± 1.83] | 0.14 [95% CI: ± 0.03] | |||
| HR Modulate α+ | 1031.42 | -2.05 [95% CI: ± 0.96] | -0.49 [95% CI: ± 1.25] | -2.71 [95% CI: ± 0.34] | -0.22 [95% CI: ± 1.79] | 0.13 [95% CI: ± 0.04] | |||
| HR Modulate α- | 1082.25 | -2.14 [95% CI: ± 0.34] | -1.50 [95% CI: ± 0.83] | -1.09 [95% CI: ± 1.31] | -0.33 [95% CI: ±1.97] | 0.13 [95% CI: ± 0.03] |
LOOcv: leave-one-out cross-validation scores, summed over participants, α+ learning rate for positive prediction errors (untransformed), α- average learning rate for negative prediction errors (untransformed), intercept unstandardized intercept for regressing learning rate (α+ or α-) against physiology, w unstandardized slope for regressing learning rate against physiology, β softmax intercept (bias towards reject), β softmax slope (sensitivity to the difference in the value of rejecting versus the value of accepting an option) Data are expressed as mean and 95% confidence intervals (CIs) (calculated as 1.96*standard error) *p<0.05 comparing LOOcv scores against scores for the basic Asymmetry Model, which does not include physiology measures (see Table 1), paired-sample t-test