| Literature DB >> 19596631 |
Michael R W Dawson1, Brian Dupuis, Marcia L Spetch, Debbie M Kelly.
Abstract
The matching law (Herrnstein 1961) states that response rates become proportional to reinforcement rates; this is related to the empirical phenomenon called probability matching (Vulkan 2000). Here, we show that a simple artificial neural network generates responses consistent with probability matching. This behavior was then used to create an operant procedure for network learning. We use the multiarmed bandit (Gittins 1989), a classic problem of choice behavior, to illustrate that operant training balances exploiting the bandit arm expected to pay off most frequently with exploring other arms. Perceptrons provide a medium for relating results from neural networks, genetic algorithms, animal learning, contingency theory, reinforcement learning, and theories of choice.Entities:
Mesh:
Year: 2009 PMID: 19596631 DOI: 10.1109/TNN.2009.2025588
Source DB: PubMed Journal: IEEE Trans Neural Netw ISSN: 1045-9227