Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Spike-based decision learning of Nash equilibria in two-player games.

Literature DB >> 23028289

Spike-based decision learning of Nash equilibria in two-player games.

Abstract

Humans and animals face decision tasks in an uncertain multi-agent environment where an agent's strategy may change in time due to the co-adaptation of others strategies. The neuronal substrate and the computational algorithms underlying such adaptive decision making, however, is largely unknown. We propose a population coding model of spiking neurons with a policy gradient procedure that successfully acquires optimal strategies for classical game-theoretical tasks. The suggested population reinforcement learning reproduces data from human behavioral experiments for the blackjack and the inspector game. It performs optimally according to a pure (deterministic) and mixed (stochastic) Nash equilibrium, respectively. In contrast, temporal-difference(TD)-learning, covariance-learning, and basic reinforcement learning fail to perform optimally for the stochastic strategy. Spike-based population reinforcement learning, shown to follow the stochastic reward gradient, is therefore a viable candidate to explain automated decision learning of a Nash equilibrium in two-player games.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2012 PMID： 23028289 PMCID： PMC3459907 DOI： 10.1371/journal.pcbi.1002691

Source DB: PubMed Journal: PLoS Comput Biol ISSN： 1553-734X Impact factor: 4.475

29 in total

1. Learning spike-based population codes by reward and population feedback.

Authors: Johannes Friedrich; Robert Urbanczik; Walter Senn
Journal: Neural Comput Date: 2010-07 Impact factor: 2.026

2. Optimal spike-timing-dependent plasticity for precise action potential firing in supervised learning.

Authors: Jean-Pascal Pfister; Taro Toyoizumi; David Barber; Wulfram Gerstner
Journal: Neural Comput Date: 2006-06 Impact factor: 2.026

3. Operant matching is a generic outcome of synaptic plasticity based on the covariance between reward and neural activity.

Authors: Yonatan Loewenstein; H Sebastian Seung
Journal: Proc Natl Acad Sci U S A Date: 2006-09-28 Impact factor: 11.205

Spike-based decision learning of Nash equilibria in two-player games.

1. Learning spike-based population codes by reward and population feedback.

2. Optimal spike-timing-dependent plasticity for precise action potential firing in supervised learning.

3. Operant matching is a generic outcome of synaptic plasticity based on the covariance between reward and neural activity.

4. Reinforcement learning through modulation of spike-timing-dependent synaptic plasticity.

5. A spiking neural network model of an actor-critic learning agent.

6. Reinforcement learning in populations of spiking neurons.

Review 7. A neural substrate of prediction and reward.

8. Synaptic theory of replicator-like melioration.

9. An electrophysiological analysis of coaching in Blackjack.

10. Spatio-temporal credit assignment in neuronal population learning.

Review 1. Building functional networks of spiking model neurons.

2. Goal-Directed Decision Making with Spiking Neurons.