| Literature DB >> 24639662 |
Noah Zarr1, William H Alexander2, Joshua W Brown1.
Abstract
Humans are known to discount future rewards hyperbolically in time. Nevertheless, a formal recursive model of hyperbolic discounting has been elusive until recently, with the introduction of the hyperbolically discounted temporal difference (HDTD) model. Prior to that, models of learning (especially reinforcement learning) have relied on exponential discounting, which generally provides poorer fits to behavioral data. Recently, it has been shown that hyperbolic discounting can also be approximated by a summed distribution of exponentially discounted values, instantiated in the μAgents model. The HDTD model and the μAgents model differ in one key respect, namely how they treat sequences of rewards. The μAgents model is a particular implementation of a Parallel discounting model, which values sequences based on the summed value of the individual rewards whereas the HDTD model contains a non-linear interaction. To discriminate among these models, we observed how subjects discounted a sequence of three rewards, and then we tested how well each candidate model fit the subject data. The results show that the Parallel model generally provides a better fit to the human data.Entities:
Keywords: Parallel model; behavioral research; discounting; exponential discounting; hyperbolic discounting; model fitting; recursive model; temporal difference learning
Year: 2014 PMID: 24639662 PMCID: PMC3944395 DOI: 10.3389/fpsyg.2014.00178
Source DB: PubMed Journal: Front Psychol ISSN: 1664-1078
Figure 1An example display of a choice as seen by participants.
Figure 2Each plot displays each participant's discounting curve for a particular delayed reward schedule. The vertical axis represents the proportion of total available reward subjects would have to be given immediately for them to be indifferent between the immediate and delayed rewards. The horizontal axis represents the delay before the first reward in the sequence would be received. The black line shows the mean responses with error bars showing the standard deviation of responses for that particular spacing and onset time.
Model fitting results.
| Parallel2 | 321.87 | 6.19 | 5 (20%) |
| Mixed | 322.37 | 6.16 | 3 (12%) |
| Parallel | 324.61 | 6.11 | 2 (8%) |
| Exponential | 325.85 | 5.91 | 7 (28%) |
| HDTD | 327.02 | 5.77 | 3 (12%) |
| HDTD2 | 327.85 | 5.87 | 0 (0%) |
| Serial | 333.76 | 6.08 | 5 (20%) |
Pairwise comparisons between models based on BIC scores.
| HDTD | |||||||
| Parallel | |||||||
| Exponential | |||||||
| HDTD2 | |||||||
| Parallel2 | |||||||
| Mixed | |||||||
| Serial |
An asterisk indicates significance. For significant comparisons, the winning model is in parentheses.
Summary statistics of the best-fitting discount parameters for Parallel model in each condition.
| $3000 single reward | 0.135 | 0.484 |
| Three $1000 rewards with 12 month spacing | 0.262 | 0.728 |
| Three $1000 rewards with 60 month spacing | 0.189 | 0.671 |
| $1000 single reward | 0.092 | 0.244 |
Pairwise comparisons between the best-fitting discount parameters for the Parallel model in each condition.
| $3000 single | ||||
| 12 mo. spacing | ||||
| 60 mo. spacing | ||||
| $1000 single |