Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Examining the Use of Temporal-Difference Incremental Delta-Bar-Delta for Real-World Predictive Knowledge Architectures.

Literature DB >> 33501202

Examining the Use of Temporal-Difference Incremental Delta-Bar-Delta for Real-World Predictive Knowledge Architectures.

Johannes Günther^1,2, Nadia M Ady¹, Alex Kearney¹, Michael R Dawson^2,3, Patrick M Pilarski^1,2,3.

Abstract

Predictions and predictive knowledge have seen recent success in improving not only robot control but also other applications ranging from industrial process control to rehabilitation. A property that makes these predictive approaches well-suited for robotics is that they can be learned online and incrementally through interaction with the environment. However, a remaining challenge for many prediction-learning approaches is an appropriate choice of prediction-learning parameters, especially parameters that control the magnitude of a learning machine's updates to its predictions (the learning rates or step sizes). Typically, these parameters are chosen based on an extensive parameter search-an approach that neither scales well nor is well-suited for tasks that require changing step sizes due to non-stationarity. To begin to address this challenge, we examine the use of online step-size adaptation using the Modular Prosthetic Limb: a sensor-rich robotic arm intended for use by persons with amputations. Our method of choice, Temporal-Difference Incremental Delta-Bar-Delta (TIDBD), learns and adapts step sizes on a feature level; importantly, TIDBD allows step-size tuning and representation learning to occur at the same time. As a first contribution, we show that TIDBD is a practical alternative for classic Temporal-Difference (TD) learning via an extensive parameter search. Both approaches perform comparably in terms of predicting future aspects of a robotic data stream, but TD only achieves comparable performance with a carefully hand-tuned learning rate, while TIDBD uses a robust meta-parameter and tunes its own learning rates. Secondly, our results show that for this particular application TIDBD allows the system to automatically detect patterns characteristic of sensor failures common to a number of robotic applications. As a third contribution, we investigate the sensitivity of classic TD and TIDBD with respect to the initial step-size values on our robotic data set, reaffirming the robustness of TIDBD as shown in previous papers. Together, these results promise to improve the ability of robotic devices to learn from interactions with their environments in a robust way, providing key capabilities for autonomous agents and robots.

Entities: Chemical Disease Gene Species

Keywords: continual learning; long-term autonomy; prediction; reinforcement learning; robot learning

Year: 2020 PMID： 33501202 PMCID： PMC7805647 DOI： 10.3389/frobt.2020.00034

Source DB: PubMed Journal: Front Robot AI ISSN： 2296-9144

5 in total

Review 1. Neuronal coding of prediction errors.

Authors: W Schultz; A Dickinson
Journal: Annu Rev Neurosci Date: 2000 Impact factor: 12.449

2. Application of real-time machine learning to myoelectric prosthesis control: A case series in adaptive switching.

Authors: Ann L Edwards; Michael R Dawson; Jacqueline S Hebert; Craig Sherstan; Richard S Sutton; K Ming Chan; Patrick M Pilarski
Journal: Prosthet Orthot Int Date: 2015-09-30 Impact factor: 1.895

3. Representing high-dimensional data to intelligent prostheses and other wearable assistive robots: A first comparison of tile coding and selective Kanerva coding.

Authors: Jaden B Travnik; Patrick M Pilarski
Journal: IEEE Int Conf Rehabil Robot Date: 2017-07

4. Pavlovian control of intraspinal microstimulation to produce over-ground walking.

Authors: Ashley N Dalrymple; David A Roszko; Richard S Sutton; Vivian K Mushahwar
Journal: J Neural Eng Date: 2020-06-02 Impact factor: 5.379

5. Surprise and destabilize: prediction error influences episodic memory reconsolidation.

Authors: Alyssa H Sinclair; Morgan D Barense
Journal: Learn Mem Date: 2018-07-16 Impact factor: 2.460

5 in total

1 in total

1. Prediction, Knowledge, and Explainability: Examining the Use of General Value Functions in Machine Knowledge.

Authors: Alex Kearney; Johannes Günther; Patrick M Pilarski
Journal: Front Artif Intell Date: 2022-03-31

1 in total