| Literature DB >> 35874164 |
Frida Heskebeck1, Carolina Bergeling2, Bo Bernhardsson1.
Abstract
The multi-armed bandit (MAB) problem models a decision-maker that optimizes its actions based on current and acquired new knowledge to maximize its reward. This type of online decision is prominent in many procedures of Brain-Computer Interfaces (BCIs) and MAB has previously been used to investigate, e.g., what mental commands to use to optimize BCI performance. However, MAB optimization in the context of BCI is still relatively unexplored, even though it has the potential to improve BCI performance during both calibration and real-time implementation. Therefore, this review aims to further describe the fruitful area of MABs to the BCI community. The review includes a background on MAB problems and standard solution methods, and interpretations related to BCI systems. Moreover, it includes state-of-the-art concepts of MAB in BCI and suggestions for future research.Entities:
Keywords: Brain-Computer Interface (BCI); calibration; multi-armed bandit (MAB); real-time optimization; reinforcement learning
Year: 2022 PMID: 35874164 PMCID: PMC9298543 DOI: 10.3389/fnhum.2022.931085
Source DB: PubMed Journal: Front Hum Neurosci ISSN: 1662-5161 Impact factor: 3.473
Overview of MAB variants and their characteristics compared to the original MAB problem.
|
|
|
|---|---|
| Original MAB problem | Static reward and fixed set of actions |
| Restless and switching bandits | Non-static reward |
| Mortal and sleeping bandits | Set of available actions changes |
| Contextual bandits | Rewards change based on state of surrounding environment |
| Dueling bandits | Agent chooses two actions at each time step |