Literature DB >> 27639720

From free energy to expected energy: Improving energy-based value function approximation in reinforcement learning.

Stefan Elfwing1, Eiji Uchibe2, Kenji Doya3.   

Abstract

Free-energy based reinforcement learning (FERL) was proposed for learning in high-dimensional state and action spaces. However, the FERL method does only really work well with binary, or close to binary, state input, where the number of active states is fewer than the number of non-active states. In the FERL method, the value function is approximated by the negative free energy of a restricted Boltzmann machine (RBM). In our earlier study, we demonstrated that the performance and the robustness of the FERL method can be improved by scaling the free energy by a constant that is related to the size of network. In this study, we propose that RBM function approximation can be further improved by approximating the value function by the negative expected energy (EERL), instead of the negative free energy, as well as being able to handle continuous state input. We validate our proposed method by demonstrating that EERL: (1) outperforms FERL, as well as standard neural network and linear function approximation, for three versions of a gridworld task with high-dimensional image state input; (2) achieves new state-of-the-art results in stochastic SZ-Tetris in both model-free and model-based learning settings; and (3) significantly outperforms FERL and standard neural network function approximation for a robot navigation task with raw and noisy RGB images as state input and a large number of actions.
Copyright © 2016 The Author(s). Published by Elsevier Ltd.. All rights reserved.

Entities:  

Keywords:  Expected energy; Function approximation; Reinforcement learning; Restricted Boltzmann machine; SZ-Tetris

Mesh:

Year:  2016        PMID: 27639720     DOI: 10.1016/j.neunet.2016.07.013

Source DB:  PubMed          Journal:  Neural Netw        ISSN: 0893-6080


  4 in total

1.  Bayesian mechanics of perceptual inference and motor control in the brain.

Authors:  Chang Sub Kim
Journal:  Biol Cybern       Date:  2021-01-20       Impact factor: 2.086

2.  Dark control: The default mode network as a reinforcement learning agent.

Authors:  Elvis Dohmatob; Guillaume Dumas; Danilo Bzdok
Journal:  Hum Brain Mapp       Date:  2020-06-05       Impact factor: 5.038

Review 3.  Variational ecology and the physics of sentient systems.

Authors:  Maxwell J D Ramstead; Axel Constant; Paul B Badcock; Karl J Friston
Journal:  Phys Life Rev       Date:  2019-01-07       Impact factor: 11.025

4.  Constrained Deep Q-Learning Gradually Approaching Ordinary Q-Learning.

Authors:  Shota Ohnishi; Eiji Uchibe; Yotaro Yamaguchi; Kosuke Nakanishi; Yuji Yasui; Shin Ishii
Journal:  Front Neurorobot       Date:  2019-12-10       Impact factor: 2.650

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.