| Literature DB >> 35814961 |
Aline Xavier Fidêncio1,2,3, Christian Klaes2, Ioannis Iossifidis1.
Abstract
The human brain has been an object of extensive investigation in different fields. While several studies have focused on understanding the neural correlates of error processing, advances in brain-machine interface systems using non-invasive techniques further enabled the use of the measured signals in different applications. The possibility of detecting these error-related potentials (ErrPs) under different experimental setups on a single-trial basis has further increased interest in their integration in closed-loop settings to improve system performance, for example, by performing error correction. Fewer works have, however, aimed at reducing future mistakes or learning. We present a review focused on the current literature using non-invasive systems that have combined the ErrPs information specifically in a reinforcement learning framework to go beyond error correction and have used these signals for learning.Entities:
Keywords: EEG; brain-machine (computer) interface; electroencephalography; error-related potentials; reinforcement learning; self-organization
Year: 2022 PMID: 35814961 PMCID: PMC9263570 DOI: 10.3389/fnhum.2022.806517
Source DB: PubMed Journal: Front Hum Neurosci ISSN: 1662-5161 Impact factor: 3.473
Figure 1The upper left plot shows the grand average ERPs for each trial class (error/correct) at electrode Cz (n = 12). The experimental setup considered a human-robot interaction scenario in which the subject had to respond to a stimulus with a keypress (left, right, or up) indicating the target position, and a real robot turned its head either in the given direction or not, eliciting interaction ErrPs. On the right side, the different ERP displays the N2-P3 complex measured. The topographical distribution of the difference ERP shows that the N2 component is fronto-centrally located, and the characteristic P3 is also centrally located at around 300 ms. Results were reproduced from Ehrlich and Cheng (2016) using the publicly available datasets Ehrlich and Cheng (2019).
Figure 2R2-values showing the difference in power for the different frequency bands between correct and error trials show activity mainly in the alpha range for the measured error.
Summary of the error-related potentials defined in Section 2.
|
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|---|
| Response | Choice reaction | Visual | 91 healthy | 5−25 | Fz, FCz, Cz, Pz | Neg. 80 ms (Ne), pos. [200-500] ms (Pe) | Response onset | Falkenstein et al., |
| Modified Eriksen flanker | Visual | 6 healthy | n.a. | Cz | Neg. 100 ms (ERN) | Gehring et al., | ||
| 18 healthy | 7.9 | Neg. 80 ms (ERN) | van Schie et al., | |||||
| Feedback | Time estimation | Visual | 18 healthy | 50 | FPz, Fz, Cz, Pz | Neg. [230–330] ms (FRN) | Feedback onset | Miltner et al., |
| Visual | 5 healthy | 50 | FCz | Neg. 320 | Lopez-Larraz et al., | |||
| Target | Manual aiming | Visual | 15 healthy | n.a. | Pz | Pos. 341 ms (P300) | Target jump onset | Krigolson et al., |
| Interaction | Cursor control | Visual | 3 healthy | 20 | Cz | Neg. 270 ms (Ne), pos. [350-450] ms (Pe), Neg. 550 ms | Feedback onset | Ferrez and Millan, |
| 5 healthy | 20, 50 | FCz | Pos. 200 ms, Neg. 250 ms, pos. 320 ms, Neg. 450 ms | Ferrez and Millan, | ||||
| 2 healthy | 20 | Pos. 200 ms, Neg. 270 ms, pos. 300 ms, Neg. 430 ms | Ferrez and Millan, | |||||
| 6 healthy | Pos. 230 ms, Neg. 290 ms, pos. 350 ms, Neg. 470 ms | Ferrez and Millan, | ||||||
| Robot control | 13 healthy | 35 | Cz | Neg. 200 ms (N2), pos. 300 ms (P3) | Ehrlich and Cheng, | |||
| BCI speller | 16 healthy | 38 | Cz | Neg. 350 ms, pos 480 ms | Margaux et al., | |||
| Visual, none | 10 healthy | 20 | CPz | Neg. 250 ms, pos. 500 ms | Bevilacqua et al., | |||
| Simulated car | Visual | 7 healthy | 30 | FCz, Cz | Neg. 250 ms, pos.400 ms | Zhang et al., | ||
| Execution | Cursor control | Visual | 10 healthy | 55 | FCz | Pos. 229 ms, Neg. 287 ms (FRN), pos. 367 ms (Pe), Neg. 461 ms (N400) | Error onset | Spüler and Niethammer, |
| 15 healthy | 30 | Neg. 246 ms, pos. 354 ms, Neg. 568 ms | Lopes-Dias et al., | |||||
| Visual normal | 15 healthy | 30 | Neg. 196 ms, pos. 404 ms, Neg. 616 ms | Lopes Dias et al., | ||||
| Robot control | Visual | 5 healthy | n.a. | Fz | Neg.150 ms, pos.[300-500] ms | Rakshit et al., | ||
| Outcome | Cursor control | Visual | 15 healthy | n.a. | Cz | Neg. 268 ms (ERN) | Movement end | Krigolson et al., |
| 10 healthy | 15 | FCz | Neg. 2 ms (ERN), pos. 268 ms (Pe), Neg. 486 ms (N400), pos. 742 ms | Spüler and Niethammer, | ||||
| Car game | Visual | 10 healthy | 22.7 | Fz, Cz, Pz | Pos. 200 ms, Neg. [400-500 | Collision onset | Kreilinger et al., | |
| Observation | Modified Eriksen flanker | Visual | 18 healthy | 9.1 | Cz | Neg. 252 ms (ERN) | Response onset | van Schie et al., |
| Cursor control | 3 healthy | 20 | FCz | Pos. 200 ms, Neg. 250 ms, pos. 350 ms, Neg. 500 ms | Feedback onset | Chavarriaga et al., | ||
| 6 healthy | 20, 40 | Pos. 200 ms, Neg. 260 ms, pos. 330 ms | Chavarriaga and Millan, | |||||
| Robot control | 2 healthy | 80 | Pos. 300 ms, Neg. 400 ms | Iturrate et al., | ||||
| 4 healthy | Iturrate et al., |
Please note that this table does not include an exhaustive list of available works. The ErrP components are presented in terms of the reported peak/latencies and the component name given by the authors, when applicable. Values marked gray were inferred from available information, and n.a. indicates not available.
Figure 3The agent-environment interaction process in an RL setup (Sutton and Barto, 2018).
Figure 4Error-based learning framework concept: upon EEG data classification, the error information can be used to update the likelihood of, for example, performing each action, decreasing it upon error detected or increasing, otherwise. This approach has been used by Chavarriaga et al. (2007, 2010). Iturrate et al. (2015b), on the other hand, have used the error information to update the likelihood of each possible position being the desired goal.
Overview of brain-machine interface systems that use error-related potential-based learning frameworks.
|
|
|
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|---|---|---|
| Cursor control | 1D reaching task | Chavarriaga et al., | 3 healthy | Observe cursor | Learn user's optimal policy to reach target (offline) | Observation ErrP | - | 1 target position | 2 actions [left, right] | Likelihood |
| 1D reaching task | Chavarriaga and Millan, | 6 healthy | Observe cursor | Learn user's optimal policy to reach target (offline) | Observation ErrP | - | 1 target position | 2 actions [left, right] | Likelihood | |
| 2D reaching task | Iturrate et al., | 8 healthy | Observe cursor | Learn user's chosen target and optimal policy to reach it and simultaneously train ErrP classifier (online) | Observation ErrP |
| 25 grid positions | 5 actions [left, right, up, down, reach] | Likelihood | |
| Robot control | 2D reaching task | Iturrate et al., | 1 healthy | Observe mobile robot | Learn user's chosen target and intended strategy to reach it (online) | Observation ErrP | - | 3 values [position, orientation] | 2 actions [rotation, linear movement] | Likelihood |
Figure 5Error-based reinforcement learning framework concept: the ErrP is used as a reward to guide the RL agent as it learns the optimal policy to achieve the desired task. The error-based reward functions penalize wrong actions and reinforce those evaluated as the correct expected behavior by the subject.
Overview of brain-machine interface systems that use error-related potential-based reinforcement learning frameworks.
|
|
|
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|---|---|---|
| Cursor control | 2D reaching task | Iturrate et al., | 4 healthy | Observe cursor | Learn optimal policy to reach target (offline) and learn user's chosen target and reach it (online) | Observation ErrP | Not mentioned | 25 grid positions | 5 actions [left, right, up, down, reach] | Q-learning |
| 1D reaching task | Iturrate et al., | 12 healthy | Observe cursor | Learn user's chosen target and optimal policy to reach it (online) | Observation ErrP |
| 9 target positions | 2 actions [left, right] | Q-learning | |
| Robot control | 1D reaching task | Iturrate et al., | 2 healthy | Observe virtual robot | Learn user's chosen target and reach it (online) | Observation ErrP |
| 5 target positions | 5 actions [one for each target] | Q-learning |
| 2D reaching task | Iturrate et al., | 12 healthy | Observe virtual robot | Learn user's chosen target and optimal policy to reach it (online) | Observation ErrP |
| 13 grid positions | 4 actions [left, right, up, down] | Q-learning | |
| 2D reaching task | Kim et al., | 7 healthy | Make gesture and | Learn to recognize human gestures and map them into robot commands (online) | Observation ErrP |
| palm normal vector and grip strength | 3 actions [left, right, forward] | LinUCB | |
| Guessing game | Ehrlich and Cheng, | 16 healthy | Observe robot | Adapt gazing policy to subject's guessing policy (online) | Feedback ErrP |
| 4 gazing options [3 objects, human] | 4 actions [gazing options] | Policy gradient | |
| 2D reaching task | Schiatti et al., | 8 healthy | Observe robot | Learn user's chosen target and intended strategy to reach it (online) | Observation ErrP |
| 25 grid positions | 4 actions [left, right, up, down] | Modified Q-learning | |
| 2D reaching task | Kim et al., | 8 healthy | Make gesture and | Learn to recognize human gestures and map them into robot commands (online) | Observation ErrP |
| palm normal vector and grip strength | 4 actions [left, right, forward, upward] | LinUCB | |
| 2D reaching task | Akinola et al., | 7 healthy | Observe robot | Learn to navigate in environment with obstacles to reach a target position (online) | Observation ErrP |
| 13 values [laser range data, target displacement and robot yaw] | 3 actions [forward, turn left, turn right] | PPO | |
| Binary selection | Luo et al., | 12 healthy | Observe robot | Learn to solve RL problem (four different problems) | Observation ErrP | Not specified | Not specified | Not specified | Not specified | |
| Rehabilitation device | Open/close hand | Roset et al., | 1 healthy | Perform hand | Learn to classify motor potentials into device commands (online) | Interaction ErrP |
| 12 normalized PSD z-scores (ErrPs) | 2 actions [open, close] | Actor-critic |