| Literature DB >> 33324190 |
Yonatan Hutabarat1, Kittipong Ekkachai2, Mitsuhiro Hayashibe1,3, Waree Kongprawechnon4.
Abstract
In this study, we investigated a control algorithm for a semi-active prosthetic knee based on reinforcement learning (RL). Model-free reinforcement Q-learning control with a reward shaping function was proposed as the voltage controller of a magnetorheological damper based on the prosthetic knee. The reward function was designed as a function of the performance index that accounts for the trajectory of the subject-specific knee angle. We compared our proposed reward function to a conventional single reward function under the same random initialization of a Q-matrix. We trained this control algorithm to adapt to several walking speed datasets under one control policy and subsequently compared its performance with that of other control algorithms. The results showed that our proposed reward function performed better than the conventional single reward function in terms of the normalized root mean squared error and also showed a faster convergence trend. Furthermore, our control strategy converged within our desired performance index and could adapt to several walking speeds. Our proposed control structure has also an overall better performance compared to user-adaptive control, while some of its walking speeds performed better than the neural network predictive control from existing studies.Entities:
Keywords: Q-learning; magnetorhelogical damper; reinforcement learning; reward shaping; semi-active prosthetic knee
Year: 2020 PMID: 33324190 PMCID: PMC7726251 DOI: 10.3389/fnbot.2020.565702
Source DB: PubMed Journal: Front Neurorobot ISSN: 1662-5218 Impact factor: 2.650
Figure 1(A) Control structure of magnetorheological (MR) damper (Ekkachai et al., 2013). (B) Double pendulum model to simulate swing phase with MR damper attachment with distance d from the knee joint.
Figure 2(A) Average knee angle data used in this study. (B) β as an exponential function with n = 4. (C) Proposed reward shaping function as a function of E.
Figure 3Block diagram of the proposed Q-learning control in a simulated environment with external experiment data.
Figure 4Summary of simulation results over a constrained iteration of 3000. (A) Comparison of single reward mechanism and our proposed reward shaping function. (B) Effect of various learning rates to the overall performance (normalized root mean squared error, NRMSE). (C) Comparison of cumulative reward over iteration by each of the simulated learning rates.
Figure 5Overall training process of multispeed of walking under one control policy simulation.
Figure 6Comparison between user-adaptive control (green dashed line), neural network predictive control (NNPC) (red line), and Q-learning control (black line) for different walking speeds: (A) 2.4 km/h, (B) 3.6 km/h, and (C) 5.4 km/h.
Comparison between user adaptive, neural network predictive control (NNPC), and Q-learning control.
| 2.4 | 2.70 | 0.81 | 0.78 |
| 3.6 | 3.65 | 0.61 | 0.88 |
| 5.4 | 3.46 | 2.42 | 0.52 |
| Average | 3.27 | 1.28 | 0.73 |
Best performance.