| Literature DB >> 33286724 |
Xiaoling Mo1, Daoyun Xu1, Zufeng Fu1,2.
Abstract
In a general Markov decision progress system, only one agent's learning evolution is considered. However, considering the learning evolution of a single agent in many problems has some limitations, more and more applications involve multi-agent. There are two types of cooperation, game environment among multi-agent. Therefore, this paper introduces a Cooperation Markov Decision Process (CMDP) system with two agents, which is suitable for the learning evolution of cooperative decision between two agents. It is further found that the value function in the CMDP system also converges in the end, and the convergence value is independent of the choice of the value of the initial value function. This paper presents an algorithm for finding the optimal strategy pair (πk0,πk1) in the CMDP system, whose fundamental task is to find an optimal strategy pair and form an evolutionary system CMDP(πk0,πk1). Finally, an example is given to support the theoretical results.Entities:
Keywords: cooperation markov decision process; multi-agent; optimal pair of strategies; reinforcement learning
Year: 2020 PMID: 33286724 PMCID: PMC7597243 DOI: 10.3390/e22090955
Source DB: PubMed Journal: Entropy (Basel) ISSN: 1099-4300 Impact factor: 2.524
Figure 1V-function convergence diagram.
Figure 2Optimal strategy solution algorithm framework.
Figure 3Convergence diagram of the social value function of the Cooperation Markov Decision Process () system.
Figure 4Optimal Strategy Solution Algorithm Framework of System.
Figure 5The environment of two agents in this example.