| Literature DB >> 36156954 |
Abstract
The combination of deep neural networks and reinforcement learning had received more and more attention in recent years, and the attention of reinforcement learning of single agent was slowly getting transferred to multiagent. Regret minimization was a new concept in the theory of gaming. In some game issues that Nash equilibrium was not the optimal solution, the regret minimization had better performance. Herein, we introduce the regret minimization into multiagent reinforcement learning and propose a multiagent regret minimum algorithm. This chapter first introduces the Nash Q-learning algorithm and uses the overall framework of Nash Q-learning to minimize regrets into the multiagent reinforcement learning and then verify the effectiveness of the algorithm in the experiment.Entities:
Mesh:
Year: 2022 PMID: 36156954 PMCID: PMC9507689 DOI: 10.1155/2022/8683616
Source DB: PubMed Journal: Comput Intell Neurosci
Definition of Q value under different algorithms.
| Single | Nash | Regret | |
|---|---|---|---|
|
|
|
|
|
| Updated | Largest value under the next state | Product of agent's united Nash strategy and |
|
Figure 1Experimental results of the traveler game.
Figure 2Flow diagram of the centipede game.
Regret matrix of the first round.
| Player 2 | ||||
|---|---|---|---|---|
| Player 1 | 2 (2) | 4 (1) | 6 (1) | |
| 1 (4) | 0,0 | 2,0 | 4,0 | |
| 3 (2) | 1,0 | 0,1 | 2,1 | |
| 5 (1) | 1,2 | 1,0 | 0,1 | |
Regret matrix of the second round.
| Player 2 | |||
|---|---|---|---|
| Player 1 | 4 (0) | 6 (1) | |
| 5 (0) | 0,0 | 0,1 | |
Figure 3Experimental results of the centipede game.
Figure 4Game rules of grid-world.
Figure 5Experimental results of Grid-World.