Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 A statistical property of multiagent learning based on Markov decision process.

Literature DB >> 16856649

A statistical property of multiagent learning based on Markov decision process.

Kazunori Iwata¹, Kazushi Ikeda, Hideaki Sakai.

Abstract

We exhibit an important property called the asymptotic equipartition property (AEP) on empirical sequences in an ergodic multiagent Markov decision process (MDP). Using the AEP which facilitates the analysis of multiagent learning, we give a statistical property of multiagent learning, such as reinforcement learning (RL), near the end of the learning process. We examine the effect of the conditions among the agents on the achievement of a cooperative policy in three different cases: blind, visible, and communicable. Also, we derive a bound on the speed with which the empirical sequence converges to the best sequence in probability, so that the multiagent learning yields the best cooperative result.

Mesh：

Year: 2006 PMID： 16856649 DOI： 10.1109/TNN.2006.875990

Source DB: PubMed Journal: IEEE Trans Neural Netw ISSN： 1045-9227

Keyword Cloud
Cited

1 in total

1. The Convergence of a Cooperation Markov Decision Process System.

Authors: Xiaoling Mo; Daoyun Xu; Zufeng Fu
Journal: Entropy (Basel) Date: 2020-08-30 Impact factor: 2.524

1 in total