Literature DB >> 22791268

An information-theoretic approach to curiosity-driven reinforcement learning.

Susanne Still1, Doina Precup.   

Abstract

We provide a fresh look at the problem of exploration in reinforcement learning, drawing on ideas from information theory. First, we show that Boltzmann-style exploration, one of the main exploration methods used in reinforcement learning, is optimal from an information-theoretic point of view, in that it optimally trades expected return for the coding cost of the policy. Second, we address the problem of curiosity-driven learning. We propose that, in addition to maximizing the expected return, a learner should choose a policy that also maximizes the learner's predictive power. This makes the world both interesting and exploitable. Optimal policies then have the form of Boltzmann-style exploration with a bonus, containing a novel exploration-exploitation trade-off which emerges naturally from the proposed optimization principle. Importantly, this exploration-exploitation trade-off persists in the optimal deterministic policy, i.e., when there is no exploration due to randomness. As a result, exploration is understood as an emerging behavior that optimizes information gain, rather than being modeled as pure randomization of action choices.

Entities:  

Mesh:

Year:  2012        PMID: 22791268     DOI: 10.1007/s12064-011-0142-z

Source DB:  PubMed          Journal:  Theory Biosci        ISSN: 1431-7613            Impact factor:   1.919


  6 in total

1.  Predictability, complexity, and learning.

Authors:  W Bialek; I Nemenman; N Tishby
Journal:  Neural Comput       Date:  2001-11       Impact factor: 2.026

2.  Regularities unseen, randomness observed: levels of entropy convergence.

Authors:  James P Crutchfield; David P Feldman
Journal:  Chaos       Date:  2003-03       Impact factor: 3.642

3.  Statistical mechanics and phase transitions in clustering.

Authors: 
Journal:  Phys Rev Lett       Date:  1990-08-20       Impact factor: 9.161

4.  How many clusters? An information-theoretic perspective.

Authors:  Susanne Still; William Bialek
Journal:  Neural Comput       Date:  2004-12       Impact factor: 2.026

5.  Efficient computation of optimal actions.

Authors:  Emanuel Todorov
Journal:  Proc Natl Acad Sci U S A       Date:  2009-07-02       Impact factor: 11.205

6.  Reinforcement learning of motor skills with policy gradients.

Authors:  Jan Peters; Stefan Schaal
Journal:  Neural Netw       Date:  2008-04-26
  6 in total
  12 in total

1.  Guided self-organization: perception-action loops of embodied systems.

Authors:  Nihat Ay; Ralf Der; Mikhail Prokopenko
Journal:  Theory Biosci       Date:  2012-09       Impact factor: 1.919

2.  Computational mechanisms of curiosity and goal-directed exploration.

Authors:  Philipp Schwartenbeck; Johannes Passecker; Tobias U Hauser; Thomas Hb FitzGerald; Martin Kronbichler; Karl J Friston
Journal:  Elife       Date:  2019-05-10       Impact factor: 8.140

Review 3.  Extrinsic rewards, intrinsic rewards, and non-optimal behavior.

Authors:  Mousa Karayanni; Israel Nelken
Journal:  J Comput Neurosci       Date:  2022-02-05       Impact factor: 1.621

Review 4.  Deep temporal models and active inference.

Authors:  Karl J Friston; Richard Rosch; Thomas Parr; Cathy Price; Howard Bowman
Journal:  Neurosci Biobehav Rev       Date:  2017-04-14       Impact factor: 8.989

5.  Sensory substitution reveals a manipulation bias.

Authors:  Anja T Zai; Sophie Cavé-Lopez; Manon Rolland; Nicolas Giret; Richard H R Hahnloser
Journal:  Nat Commun       Date:  2020-11-23       Impact factor: 14.919

6.  Entropic Regularization of Markov Decision Processes.

Authors:  Boris Belousov; Jan Peters
Journal:  Entropy (Basel)       Date:  2019-07-10       Impact factor: 2.524

7.  Scene Construction, Visual Foraging, and Active Inference.

Authors:  M Berk Mirza; Rick A Adams; Christoph D Mathys; Karl J Friston
Journal:  Front Comput Neurosci       Date:  2016-06-14       Impact factor: 2.380

Review 8.  Active inference and learning.

Authors:  Karl Friston; Thomas FitzGerald; Francesco Rigoli; Philipp Schwartenbeck; John O Doherty; Giovanni Pezzulo
Journal:  Neurosci Biobehav Rev       Date:  2016-06-29       Impact factor: 8.989

9.  Deep temporal models and active inference.

Authors:  Karl J Friston; Richard Rosch; Thomas Parr; Cathy Price; Howard Bowman
Journal:  Neurosci Biobehav Rev       Date:  2018-05-08       Impact factor: 8.989

10.  Active Inference and Cognitive Consistency.

Authors:  Karl J Friston
Journal:  Psychol Inq       Date:  2018-10-10
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.