| Literature DB >> 26111402 |
Wouter Caarls, Erik Schuitema.
Abstract
Temporal difference (TD) learning, a key concept in reinforcement learning, is a popular method for solving simulated control problems. However, in real systems, this method is often avoided in favor of policy search methods because of its long learning time. But policy search suffers from its own drawbacks, such as the necessity of informed policy parameterization and initialization. In this paper, we show that TD learning can work effectively in real robotic systems as well, using parallel model learning and planning. Using locally weighted linear regression and trajectory sampled planning with 14 concurrent threads, we can achieve a speedup of almost two orders of magnitude over regular TD control on simulated control benchmarks. For a real-world pendulum swing-up task and a two-link manipulator movement task, we report a speedup of 20× to 60× , with a real-time learning speed of less than half a minute. The results are competitive with state-of-the-art policy search.Year: 2015 PMID: 26111402 DOI: 10.1109/TNNLS.2015.2442233
Source DB: PubMed Journal: IEEE Trans Neural Netw Learn Syst ISSN: 2162-237X Impact factor: 10.451