Literature DB >> 35602587

Lipschitzness is all you need to tame off-policy generative adversarial imitation learning.

Lionel Blondé1, Pablo Strasser1, Alexandros Kalousis1.   

Abstract

Despite the recent success of reinforcement learning in various domains, these approaches remain, for the most part, deterringly sensitive to hyper-parameters and are often riddled with essential engineering feats allowing their success. We consider the case of off-policy generative adversarial imitation learning, and perform an in-depth review, qualitative and quantitative, of the method. We show that forcing the learned reward function to be local Lipschitz-continuous is a sine qua non condition for the method to perform well. We then study the effects of this necessary condition and provide several theoretical results involving the local Lipschitzness of the state-value function. We complement these guarantees with empirical evidence attesting to the strong positive effect that the consistent satisfaction of the Lipschitzness constraint on the reward has on imitation performance. Finally, we tackle a generic pessimistic reward preconditioning add-on spawning a large class of reward shaping methods, which makes the base method it is plugged into provably more robust, as shown in several additional theoretical guarantees. We then discuss these through a fine-grained lens and share our insights. Crucially, the guarantees derived and reported in this work are valid for any reward satisfying the Lipschitzness condition, nothing is specific to imitation. As such, these may be of independent interest.
© The Author(s) 2022.

Entities:  

Keywords:  Deep learning; Generative adversarial networks; Imitation learning; Lipschitz-continuity; Reinforcement learning

Year:  2022        PMID: 35602587      PMCID: PMC9114147          DOI: 10.1007/s10994-022-06144-5

Source DB:  PubMed          Journal:  Mach Learn        ISSN: 0885-6125            Impact factor:   5.414


  6 in total

1.  Mastering the game of Go with deep neural networks and tree search.

Authors:  David Silver; Aja Huang; Chris J Maddison; Arthur Guez; Laurent Sifre; George van den Driessche; Julian Schrittwieser; Ioannis Antonoglou; Veda Panneershelvam; Marc Lanctot; Sander Dieleman; Dominik Grewe; John Nham; Nal Kalchbrenner; Ilya Sutskever; Timothy Lillicrap; Madeleine Leach; Koray Kavukcuoglu; Thore Graepel; Demis Hassabis
Journal:  Nature       Date:  2016-01-28       Impact factor: 49.962

2.  Functional approximation by feed-forward networks: a least-squares approach to generalization.

Authors:  A R Webb
Journal:  IEEE Trans Neural Netw       Date:  1994

3.  Flat minima.

Authors:  S Hochreiter; J Schmidhuber
Journal:  Neural Comput       Date:  1997-01-01       Impact factor: 2.026

4.  Human-level control through deep reinforcement learning.

Authors:  Volodymyr Mnih; Koray Kavukcuoglu; David Silver; Andrei A Rusu; Joel Veness; Marc G Bellemare; Alex Graves; Martin Riedmiller; Andreas K Fidjeland; Georg Ostrovski; Stig Petersen; Charles Beattie; Amir Sadik; Ioannis Antonoglou; Helen King; Dharshan Kumaran; Daan Wierstra; Shane Legg; Demis Hassabis
Journal:  Nature       Date:  2015-02-26       Impact factor: 49.962

5.  Batch Mode Reinforcement Learning based on the Synthesis of Artificial Trajectories.

Authors:  Raphael Fonteneau; Susan A Murphy; Louis Wehenkel; Damien Ernst
Journal:  Ann Oper Res       Date:  2013-09-01       Impact factor: 4.854

6.  Grandmaster level in StarCraft II using multi-agent reinforcement learning.

Authors:  Oriol Vinyals; Igor Babuschkin; Wojciech M Czarnecki; Michaël Mathieu; Andrew Dudzik; Junyoung Chung; David H Choi; Richard Powell; Timo Ewalds; Petko Georgiev; Junhyuk Oh; Dan Horgan; Manuel Kroiss; Ivo Danihelka; Aja Huang; Laurent Sifre; Trevor Cai; John P Agapiou; Chris Apps; David Silver; Max Jaderberg; Alexander S Vezhnevets; Rémi Leblond; Tobias Pohlen; Valentin Dalibard; David Budden; Yury Sulsky; James Molloy; Tom L Paine; Caglar Gulcehre; Ziyu Wang; Tobias Pfaff; Yuhuai Wu; Roman Ring; Dani Yogatama; Dario Wünsch; Katrina McKinney; Oliver Smith; Tom Schaul; Timothy Lillicrap; Koray Kavukcuoglu; Demis Hassabis
Journal:  Nature       Date:  2019-10-30       Impact factor: 49.962

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.