| Literature DB >> 36090185 |
Benton Girdler1, William Caldbeck1, Jihye Bae1.
Abstract
Creating flexible and robust brain machine interfaces (BMIs) is currently a popular topic of research that has been explored for decades in medicine, engineering, commercial, and machine-learning communities. In particular, the use of techniques using reinforcement learning (RL) has demonstrated impressive results but is under-represented in the BMI community. To shine more light on this promising relationship, this article aims to provide an exhaustive review of RL's applications to BMIs. Our primary focus in this review is to provide a technical summary of various algorithms used in RL-based BMIs to decode neural intention, without emphasizing preprocessing techniques on the neural signals and reward modeling for RL. We first organize the literature based on the type of RL methods used for neural decoding, and then each algorithm's learning strategy is explained along with its application in BMIs. A comparative analysis highlighting the similarities and uniqueness among neural decoders is provided. Finally, we end this review with a discussion about the current stage of RLBMIs including their limitations and promising directions for future research.Entities:
Keywords: brain machine interface (BMI); neural decoder; neural interface; policy optimization; reinforcement learning (RL); value function approximation
Year: 2022 PMID: 36090185 PMCID: PMC9459159 DOI: 10.3389/fnsys.2022.836778
Source DB: PubMed Journal: Front Syst Neurosci ISSN: 1662-5137
FIGURE 1A review flow chart, by following the Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) guidelines (Page et al., 2021).
A summary of reviewed neural decoders in RLBMIs.
| Author (year) | Neural decoder | Neural signal type | Subject | Task | External device | Closed or open loop | Key reported performance | ||||
| RL base model | Function approximator | Learning algorithm | Type | No. | Best reported success rates | Data amount for evaluation | |||||
|
| Action-value function, Q | Linear function | P300 Linear Upper Confidence Bound (PLUCB) | EEG | Healthy human | 20 | Symbol selections in a standard 6 × 6 matrix of symbols | 2D screen | Open | Overall symbol accuracy: 80.4 12.8% | Two sessions (14 runs/session, 18 symbol selection/run, 6 sequence/symbol, and 12 flashes/sequence) |
| Transferred P300 Linear Upper Confidence Bound (TPLUCB) | Overall symbol accuracy: 79.6 14% | ||||||||||
|
| Q-learning | Kernel expansion | Kernel Temporal Difference (KTD) (λ) | Intracortical | Female Bonnet Macaque | 1 | 2-target center-out reaching task | 2D Screen | Open | Around 100% | 43 trials/epoch |
|
| Q-learning | Kernel expansion | Kernel Temporal Difference (KTD) (λ) | Intracortical | Female Bonnet Macaque | 1 | 2-target 1-step center-out reaching task | 2D Screen | Open | 2-target: 99% after 3 epochs | Average over 50 Monte Carlo runs |
| 4-target 1-step center-out reaching task | 2D Screen | Open | 4-target: 99% after 5 epochs | Average over 50 Monte Carlo runs | |||||||
| 8-target 1-step center-out reaching task | 2D Screen | Open | 8-target: 98% after 6 epochs | Average over 50 Monte Carlo runs | |||||||
| 3 target 4-step center-out reaching task | 2D Screen | Open | Above 60% after 1 epoch | ||||||||
| 4 target 2-step center-out reaching task | 2D Screen | Open | Above 60% after 1 epoch | ||||||||
| Intracortical | Marmoset Monkey | 1 | 2-target reaching task | Robotic Arm | Closed | 90% for Day 1 | 20 trials (10 trials each per target) | ||||
|
| Q-learning | Kernel expansion | Correntropy Kernel Temporal Differences (CKTD) | Intracortical | Female Bonnet Macaque | 1 | 4-target center-out reaching task | 2D Screen | Open | 100% after 5 epochs | Average over 50 Monte Carlo runs |
|
| Q-learning | Convolutional neural networks (CNNs) | Dueling Deep Q Networks | EEG | Healthy Human | 7 | 6 imagery action classification | N/A | Open | Average classification accuracy of 93.63% | 34,560 samples per subject |
|
| Watkin’s Q(λ) | Feedforward neural network | Recursive Least Square | Intracortical | Rat | 1 | Go no-go task | Robotic Arm | Closed | 93.7% | One session: 16 trials |
|
| Watkin’s Q(λ) | Feedforward neural network | Back-propagation | Intracortical | Rat | 1 | 2-Target reaching task | Robotic Arm | Open | max observed 81.3% | 10 sessions (16 trials/session) |
|
| Watkin’s Q(λ) | Feedforward neural network | Back-propagation | Intracortical | Male Sprague-Dawley Rat | 3 | 2-target reaching task | Robotic Arm | Closed | Avg. performance: rat01: 68%, rat02: 74%, and rat93: 73% | Avg. 2.1 1.2 session (1 session/day) |
|
| Watkin’s Q(λ) | Feedforward neural network | Back-propagation | Intracortical | Female Bonnet Macaque | 1 | 8-target center-out reaching task | 2D Screen | Open | Reached 100% after 18 epochs | 43 trials/epoch |
|
| Watkin’s Q(λ) | Feedforward neural network | Back-propagation | Simulated neurons | N/A | N/A | 8-target center-out reaching task | 2D Screen | Open | Over 95% with optimal Izhikevich-tuning depth | 80 neurons |
|
| Attention-Gated Reinforcement Learning (AGREL) | Feedforward neural network | Attention-Gated Reinforcement Learning (AGREL) | Intracortical | Male Rhesus Macaque | 1 | 4-target center-out reaching task | 2D Screen | Open | Average target acquisition rate reached to 90.16% | Day 1, 2, 3, and 6 (40 min data/day) |
|
| Attention-Gated Reinforcement Learning (AGREL) | Feedforward neural network | Attention-Gated Reinforcement Learning (AGREL) | Intracortical | Male Sprague Dawley | 6 | One level press task | Lever | Open | Average success rate of 87.5% | For six subjects over 300 training epochs |
|
| Attention-Gated Reinforcement Learning (AGREL) | Feedforward neural network | Transfer Learning and Mini-batch based Attention-Gated Reinforcement Learning (TMAGREL) | Intracortical | Male Rhesus Macaque | 2 | 3-target reaching and grasping task | N/A | Open | Approximately 90% for both monkeys | Monkey01: 600 trails |
|
| Attention-Gated Reinforcement Learning (AGREL) | Feedforward neural network | Maximum Correntropy based attention-gated reinforcement learning | Intracortical | Rhesus Macaque | 1 | 4-target obstacle avoidance task | 2D Screen | Open | Average success rate 68.79% | Total 552 trials for 30 Monte Carlo runs |
|
| Attention-Gated Reinforcement Learning (AGREL) | Kernel expansion | Quantized Attention-Gated Kernel Reinforcement Learning (QAGKRL) | Intracortical | Male Rhesus Macaque | 1 | 4-target obstacle avoidance task | 2D Screen | Open | Average success rate of 80.83 10.3% | On one type of learning scenario |
|
| Attention-Gated Reinforcement Learning (AGREL) | Kernel expansion | Clustering based Kernel reinforcement learning | Four simulated neurons | N/A | N/A | 4-target reaching task | 2D Screen | Open | 99.8 6.6% | 20 Monte Carlo runs for 600 epochs |
|
| Attention-Gated Reinforcement Learning (AGREL) | Kernel expansion | Clustering based Kernel reinforcement learning | Intracortical | Male Macaque | 1 | 4-target reaching task | Robotic Arm | Open | 94.3 0.9% | 20 Monte Carlo runs |
|
| Attention-Gated Reinforcement Learning (AGREL) | Kernel expansion | Clustering based Kernel reinforcement learning with a weight transfer | Three simulated neurons | N/A | N/A | Two level discriminative task | Lever | Open | Avg. approximately 95% | 20 Monte Carlo runs |
|
| Actor-Critic | Feedforward neural network | Back-propagation | Simulated Neurons | N/A | N/A | 4-target reaching task (2D workspace) | Robotic Arm | Closed | Reached 98% after less than 200 trials per target | One session |
| Intracortical | Male Sprague-Dawley Rat | 1 | 2-target reaching task | Robotic Arm | Closed | reached 100% after 16 trials | One session: 40 trials | ||||
|
| Actor-Critic | Feedforward neural network | Hebbian reinforcement learning | Intracortical | Marmoset Monkey ( | 1 | 2-target reaching task | Robotic Arm | Closed | Avg. 90% for the first 50 trials | Eight sessions (50∼60 trials/session) |
|
| Actor-Critic | Feedforward neural network | Hebbian reinforcement learning | Simulated Neurons | N/A | N/A | 2-target center-out reaching task | 2D Screen | Closed | 100% after 2 trials for 2 target tasks | One session |
| 4-target center-out reaching task | 2D Screen | Closed | 100% less than 50 additional trials for 4 target tasks | One session | |||||||
| Intracortical | Marmoset Monkey (Callithrix Jacchus) | 2 | Go no-go task | Robotic Arm | Open | Over 95% after 20 trials for both monkeys | Three sessions (1 session/day) | ||||
|
| Actor-Critic | Feedforward neural network | Hebbian reinforcement learning | Intracortical | Marmoset Monkey (Callithrix Jacchus) | 2 | Go no-go task | Robotic Arm | Open | Avg. 94%: monkey01 | 1000 sessions: monkey01 |
| Closed | Avg. 93%: monkey01 | Four sessions (1 session/day) | |||||||||
|
| Actor-Critic | Feedforward neural network | Hebbian reinforcement learning | Intracortical | Marmoset Monkey (Callithrix Jacchus) | 1 | Go no-go task | Robotic Arm | Open | From 77 to 83% when Critic accuracy is 90% | 100 trials/session |
|
| Actor-Critic | Feedforward neural network | Hebbian reinforcement learning | EEG | Subject with Chronic Spinal Cord Injury | 1 | Hand grasp or open task | Functional Electrical Stimulation Device | Closed | Avg. around 65% | Four closed-loop session (1st session: 300 trials, 2nd and 3rd sessions: 450 trials, and 4th session: 300 trials) |
EEG, electroencephalogram; M1, primary motor cortex; mPFC, medial prefrontal cortex; NAcc, nucleus accumbens; PMd, primate dorsal premotor cortex; PPC, posterior parietal cortex; S1, somatosensory cortex. Note that in Neural Signal Type, when the input state includes both single and multi-unit activities, a term “signal” was used, as the authors used this term in their studies.
FIGURE 2RLBMI architecture with labeled RL components. This figure is modified based on Figure 1 in DiGiovanna et al. (2009).
FIGURE 3The decoding structure of RLBMI using a Q-learning via Kernel Temporal Difference(λ). This figure is modified based on Figure 1 in Bae et al. (2015).
FIGURE 4The decoding structure of RLBMI using the Actor-Critic. This figure is modified based on Figure 6.15 in Sutton and Barto (1998).