Literature DB >> 27187974

Extending the Peak Bandwidth of Parameters for Softmax Selection in Reinforcement Learning.

Kazunori Iwata.   

Abstract

Softmax selection is one of the most popular methods for action selection in reinforcement learning. Although various recently proposed methods may be more effective with full parameter tuning, implementing a complicated method that requires the tuning of many parameters can be difficult. Thus, softmax selection is still worth revisiting, considering the cost savings of its implementation and tuning. In fact, this method works adequately in practice with only one parameter appropriately set for the environment. The aim of this paper is to improve the variable setting of this method to extend the bandwidth of good parameters, thereby reducing the cost of implementation and parameter tuning. To achieve this, we take advantage of the asymptotic equipartition property in a Markov decision process to extend the peak bandwidth of softmax selection. Using a variety of episodic tasks, we show that our setting is effective in extending the bandwidth and that it yields a better policy in terms of stability. The bandwidth is quantitatively assessed in a series of statistical tests.

Entities:  

Year:  2016        PMID: 27187974     DOI: 10.1109/TNNLS.2016.2558295

Source DB:  PubMed          Journal:  IEEE Trans Neural Netw Learn Syst        ISSN: 2162-237X            Impact factor:   10.451


  1 in total

1.  Model Learning and Knowledge Sharing for Cooperative Multiagent Systems in Stochastic Environment.

Authors:  Wei-Cheng Jiang; Vignesh Narayanan; Jr-Shin Li
Journal:  IEEE Trans Cybern       Date:  2021-12-22       Impact factor: 11.448

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.