Literature DB >> 8924633

Multiagent reinforcement learning in the Iterated Prisoner's Dilemma.

T W Sandholm1, R H Crites.   

Abstract

Reinforcement learning (RL) is based on the idea that the tendency to produce an action should be strengthened (reinforced) if it produces favorable results, and weakened if it produces unfavorable results. Q-learning is a recent RL algorithm that does not need a model of its environment and can be used on-line. Therefore, it is well suited for use in repeated games against an unknown opponent. Most RL research has been confined to single-agent settings or to multiagent settings where the agents have totally positively correlated payoffs (team problems) or totally negatively correlated payoffs (zero-sum games). This paper is an empirical study of reinforcement learning in the Iterated Prisoner's Dilemma (IPD), where the agents' payoffs are neither totally positively nor totally negatively correlated. RL is considerably more difficult in such a domain. This paper investigates the ability of a variety of Q-learning agents to play the IPD game against an unknown opponent. In some experiments, the opponent is the fixed strategy Tit-For-Tat, while in others it is another Q-learner. All the Q-learners learned to play optimally against Tit-For-Tat. Playing against another learner was more difficult because the adaptation of the other learner created a non-stationary environment, and because the other learner was not endowed with any a priori knowledge about the IPD game such as a policy designed to encourage cooperation. The learners that were studied varied along three dimensions: the length of history they received as context, the type of memory they employed (lookup tables based on restricted history windows or recurrent neural networks that can theoretically store features from arbitrarily deep in the past), and the exploration schedule they followed. Although all the learners faced difficulties when playing against other learners, agents with longer history windows, lookup table memories, and longer exploration schedules fared best in the IPD games.

Mesh:

Year:  1996        PMID: 8924633     DOI: 10.1016/0303-2647(95)01551-5

Source DB:  PubMed          Journal:  Biosystems        ISSN: 0303-2647            Impact factor:   1.973


  8 in total

Review 1.  Game theory and neural basis of social decision making.

Authors:  Daeyeol Lee
Journal:  Nat Neurosci       Date:  2008-03-26       Impact factor: 24.884

2.  Special agents can promote cooperation in the population.

Authors:  Xin Wang; Jing Han; Huawei Han
Journal:  PLoS One       Date:  2011-12-21       Impact factor: 3.240

3.  Sustainability is possible despite greed - Exploring the nexus between profitability and sustainability in common pool resource systems.

Authors:  Friedrich Burkhard von der Osten; Michael Kirley; Tim Miller
Journal:  Sci Rep       Date:  2017-05-23       Impact factor: 4.379

4.  Confronting barriers to human-robot cooperation: balancing efficiency and risk in machine behavior.

Authors:  Tim Whiting; Alvika Gautam; Jacob Tye; Michael Simmons; Jordan Henstrom; Mayada Oudah; Jacob W Crandall
Journal:  iScience       Date:  2020-12-17

5.  Nash equilibria in human sensorimotor interactions explained by Q-learning with intrinsic costs.

Authors:  Cecilia Lindig-León; Gerrit Schmid; Daniel A Braun
Journal:  Sci Rep       Date:  2021-10-21       Impact factor: 4.379

6.  Learning and innovative elements of strategy adoption rules expand cooperative network topologies.

Authors:  Shijun Wang; Máté S Szalay; Changshui Zhang; Peter Csermely
Journal:  PLoS One       Date:  2008-04-09       Impact factor: 3.240

7.  A game theoretic framework for incentive-based models of intrinsic motivation in artificial systems.

Authors:  Kathryn E Merrick; Kamran Shafi
Journal:  Front Psychol       Date:  2013-10-30

8.  Cooperating with machines.

Authors:  Jacob W Crandall; Mayada Oudah; Fatimah Ishowo-Oloko; Sherief Abdallah; Jean-François Bonnefon; Manuel Cebrian; Azim Shariff; Michael A Goodrich; Iyad Rahwan
Journal:  Nat Commun       Date:  2018-01-16       Impact factor: 14.919

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.