Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Optimal Policy of Multiplayer Poker via Actor-Critic Reinforcement Learning.

Literature DB >> 35741495

Optimal Policy of Multiplayer Poker via Actor-Critic Reinforcement Learning.

Daming Shi¹, Xudong Guo¹, Yi Liu¹, Wenhui Fan¹.

Abstract

Poker has been considered a challenging problem in both artificial intelligence and game theory because poker is characterized by imperfect information and uncertainty, which are similar to many realistic problems like auctioning, pricing, cyber security, and operations. However, it is not clear that playing an equilibrium policy in multi-player games would be wise so far, and it is infeasible to theoretically validate whether a policy is optimal. Therefore, designing an effective optimal policy learning method has more realistic significance. This paper proposes an optimal policy learning method for multi-player poker games based on Actor-Critic reinforcement learning. Firstly, this paper builds the Actor network to make decisions with imperfect information and the Critic network to evaluate policies with perfect information. Secondly, this paper proposes a novel multi-player poker policy update method: asynchronous policy update algorithm (APU) and dual-network asynchronous policy update algorithm (Dual-APU) for multi-player multi-policy scenarios and multi-player sharing-policy scenarios, respectively. Finally, this paper takes the most popular six-player Texas hold 'em poker to validate the performance of the proposed optimal policy learning method. The experiments demonstrate the policies learned by the proposed methods perform well and gain steadily compared with the existing approaches. In sum, the policy learning methods of imperfect information games based on Actor-Critic reinforcement learning perform well on poker and can be transferred to other imperfect information games. Such training with perfect information and testing with imperfect information models show an effective and explainable approach to learning an approximately optimal policy.

Entities: Chemical

Keywords: Actor-Critic; multi-agent; multi-player; optimal policy; poker; reinforcement learning

Year: 2022 PMID： 35741495 PMCID： PMC9222241 DOI： 10.3390/e24060774

Source DB: PubMed Journal: Entropy (Basel) ISSN： 1099-4300 Impact factor: 2.738

Keyword Cloud
References

7 in total

1. Mastering the game of Go with deep neural networks and tree search.

Authors: David Silver; Aja Huang; Chris J Maddison; Arthur Guez; Laurent Sifre; George van den Driessche; Julian Schrittwieser; Ioannis Antonoglou; Veda Panneershelvam; Marc Lanctot; Sander Dieleman; Dominik Grewe; John Nham; Nal Kalchbrenner; Ilya Sutskever; Timothy Lillicrap; Madeleine Leach; Koray Kavukcuoglu; Thore Graepel; Demis Hassabis
Journal: Nature Date: 2016-01-28 Impact factor: 49.962

2. Computer science. Heads-up limit hold'em poker is solved.

Authors: Michael Bowling; Neil Burch; Michael Johanson; Oskari Tammelin
Journal: Science Date: 2015-01-09 Impact factor: 47.728

3. DeepStack: Expert-level artificial intelligence in heads-up no-limit poker.

Authors: Matej Moravčík; Martin Schmid; Neil Burch; Viliam Lisý; Dustin Morrill; Nolan Bard; Trevor Davis; Kevin Waugh; Michael Johanson; Michael Bowling
Journal: Science Date: 2017-03-02 Impact factor: 47.728

4. Superhuman AI for multiplayer poker.

Authors: Noam Brown; Tuomas Sandholm
Journal: Science Date: 2019-07-11 Impact factor: 47.728

5. Mastering the game of Go without human knowledge.

Authors: David Silver; Julian Schrittwieser; Karen Simonyan; Ioannis Antonoglou; Aja Huang; Arthur Guez; Thomas Hubert; Lucas Baker; Matthew Lai; Adrian Bolton; Yutian Chen; Timothy Lillicrap; Fan Hui; Laurent Sifre; George van den Driessche; Thore Graepel; Demis Hassabis
Journal: Nature Date: 2017-10-18 Impact factor: 49.962

6. Superhuman AI for heads-up no-limit poker: Libratus beats top professionals.

Authors: Noam Brown; Tuomas Sandholm
Journal: Science Date: 2017-12-17 Impact factor: 47.728

7. Human-level control through deep reinforcement learning.

Authors: Volodymyr Mnih; Koray Kavukcuoglu; David Silver; Andrei A Rusu; Joel Veness; Marc G Bellemare; Alex Graves; Martin Riedmiller; Andreas K Fidjeland; Georg Ostrovski; Stig Petersen; Charles Beattie; Amir Sadik; Ioannis Antonoglou; Helen King; Dharshan Kumaran; Daan Wierstra; Shane Legg; Demis Hassabis
Journal: Nature Date: 2015-02-26 Impact factor: 49.962

7 in total