Literature DB >> 23028289

Spike-based decision learning of Nash equilibria in two-player games.

Johannes Friedrich1, Walter Senn.   

Abstract

Humans and animals face decision tasks in an uncertain multi-agent environment where an agent's strategy may change in time due to the co-adaptation of others strategies. The neuronal substrate and the computational algorithms underlying such adaptive decision making, however, is largely unknown. We propose a population coding model of spiking neurons with a policy gradient procedure that successfully acquires optimal strategies for classical game-theoretical tasks. The suggested population reinforcement learning reproduces data from human behavioral experiments for the blackjack and the inspector game. It performs optimally according to a pure (deterministic) and mixed (stochastic) Nash equilibrium, respectively. In contrast, temporal-difference(TD)-learning, covariance-learning, and basic reinforcement learning fail to perform optimally for the stochastic strategy. Spike-based population reinforcement learning, shown to follow the stochastic reward gradient, is therefore a viable candidate to explain automated decision learning of a Nash equilibrium in two-player games.

Entities:  

Mesh:

Year:  2012        PMID: 23028289      PMCID: PMC3459907          DOI: 10.1371/journal.pcbi.1002691

Source DB:  PubMed          Journal:  PLoS Comput Biol        ISSN: 1553-734X            Impact factor:   4.475


  29 in total

1.  Learning spike-based population codes by reward and population feedback.

Authors:  Johannes Friedrich; Robert Urbanczik; Walter Senn
Journal:  Neural Comput       Date:  2010-07       Impact factor: 2.026

2.  Optimal spike-timing-dependent plasticity for precise action potential firing in supervised learning.

Authors:  Jean-Pascal Pfister; Taro Toyoizumi; David Barber; Wulfram Gerstner
Journal:  Neural Comput       Date:  2006-06       Impact factor: 2.026

3.  Operant matching is a generic outcome of synaptic plasticity based on the covariance between reward and neural activity.

Authors:  Yonatan Loewenstein; H Sebastian Seung
Journal:  Proc Natl Acad Sci U S A       Date:  2006-09-28       Impact factor: 11.205

4.  Reinforcement learning through modulation of spike-timing-dependent synaptic plasticity.

Authors:  Răzvan V Florian
Journal:  Neural Comput       Date:  2007-06       Impact factor: 2.026

5.  A spiking neural network model of an actor-critic learning agent.

Authors:  Wiebke Potjans; Abigail Morrison; Markus Diesmann
Journal:  Neural Comput       Date:  2009-02       Impact factor: 2.026

6.  Reinforcement learning in populations of spiking neurons.

Authors:  Robert Urbanczik; Walter Senn
Journal:  Nat Neurosci       Date:  2009-02-15       Impact factor: 24.884

Review 7.  A neural substrate of prediction and reward.

Authors:  W Schultz; P Dayan; P R Montague
Journal:  Science       Date:  1997-03-14       Impact factor: 47.728

8.  Synaptic theory of replicator-like melioration.

Authors:  Yonatan Loewenstein
Journal:  Front Comput Neurosci       Date:  2010-06-17       Impact factor: 2.380

9.  An electrophysiological analysis of coaching in Blackjack.

Authors:  Johannes Hewig; Ralf H Trippe; Holger Hecht; Michael G H Coles; Clay B Holroyd; Wolfgang H R Miltner
Journal:  Cortex       Date:  2008-01-28       Impact factor: 4.027

10.  Spatio-temporal credit assignment in neuronal population learning.

Authors:  Johannes Friedrich; Robert Urbanczik; Walter Senn
Journal:  PLoS Comput Biol       Date:  2011-06-30       Impact factor: 4.475

View more
  2 in total

Review 1.  Building functional networks of spiking model neurons.

Authors:  L F Abbott; Brian DePasquale; Raoul-Martin Memmesheimer
Journal:  Nat Neurosci       Date:  2016-03       Impact factor: 24.884

2.  Goal-Directed Decision Making with Spiking Neurons.

Authors:  Johannes Friedrich; Máté Lengyel
Journal:  J Neurosci       Date:  2016-02-03       Impact factor: 6.167

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.