| Literature DB >> 34141726 |
Anis Najar1, Mohamed Chetouani2.
Abstract
In this paper, we provide an overview of the existing methods for integrating human advice into a reinforcement learning process. We first propose a taxonomy of the different forms of advice that can be provided to a learning agent. We then describe the methods that can be used for interpreting advice when its meaning is not determined beforehand. Finally, we review different approaches for integrating advice into the learning process.Entities:
Keywords: advice-taking systems; human-robot interaction; interactive machine learning; reinforcement learning; unlabeled teaching signals
Year: 2021 PMID: 34141726 PMCID: PMC8205518 DOI: 10.3389/frobt.2021.584075
Source DB: PubMed Journal: Front Robot AI ISSN: 2296-9144
Figure 1Taxonomy of advice.
Types of advice.
| General constraints | Hayes-Roth et al., |
| General instructions | Maclin and Shavlik, |
| Guidance | Thomaz, |
| Contextual instructions | Utgoff and Clouse, |
| Corrective feedback | Nicolescu and Mataric, |
| Evaluative feedback | Dorigo and Colombetti, |
| Griffith et al., |
Interpreting advice.
| Kate and Mooney, | GI | Text | SVM | Demonstration |
| Kim and Scassellati, | EFB | Speech | kNN | Binary EFB classes |
| Chen and Mooney, | GLI | Text | SVM | Demonstration |
| Tellex et al., | GHI | Text | Graphical model | Demonstration |
| Artzi and Zettlemoyer, | GHI | Text | Perceptron | Rewards or demonstration + language model |
| Duvallet et al., | GLI | Text | MCC | Demonstration + language model |
| Tellex et al., | GHI | Text | Gradient descent | Demonstration |
| Pradyot et al., | CLI | Gestures | MLN | Demonstration |
| Lopes et al., | EFB and CFB | Simulation | IRL | EFB and CFB |
| Grizou et al., | EFB or CLI | Speech | EM | Task models |
| Grizou et al., | EFB | EEG | EM | Task models |
| MacGlashan et al., | GHI | Text | EM | Task and language models |
| MacGlashan et al., | GHI | Text | EM | EFB + language model |
| Loftin et al., | EFB | Buttons | EM | Task models |
| Branavan et al., | GLI | Text | PGRL | Rewards |
| Branavan et al., | GHI | Text | MB-PGRL | Rewards |
| Vogel and Jurafsky, | GLI | Text | SARSA | Demonstration |
| Najar et al., | CLI | Simulation | XCS | Rewards |
| Najar et al., | CLI | Gestures | XCS | EFB |
| Najar et al., | CLI | Gestures | Q-learning | EFB |
| Mathewson and Pilarski, | CLI | EMG | ACRL | Rewards and/or EFB |
| Najar et al., | CLI | Gestures | ACRL | Rewards and/or EFB |
GI, General instruction; GLI, general low-level instruction; GHI, general high-level instruction; CLI, contextual low-level instruction; EFB, evaluative feedback; CFB, corrective feedback; SVM, Support Vector Machines; kNN, k-nearest neighbors; MCC, multi-class classification; MLN, Markov Logic Networks; IRL, Inverse Reinforcement Learning; PGRL, policy-gradient RL; MB-PGRL, model-based policy-gradient RL; ACRL, Actor-Critic RL.
The term demonstration here is taken in the general sense as a trajectory, not necessarily the optimal one.
Shaping methods.
| Reward shaping | Model-free | Contextual instructions | Clouse and Utgoff, |
| Evaluative feedback | Isbell et al., | ||
| Model-based | Contextual instructions | Najar et al., | |
| Evaluative feedback | Knox and Stone, | ||
| Value shaping | Model-free | General instructions | Utgoff and Clouse, |
| Evaluative feedback | Dorigo and Colombetti, | ||
| Model-based | Contextual instructions | Najar et al., | |
| Evaluative feedback | Knox and Stone, | ||
| Policy shaping | Model-free | Contextual instructions | Rosenstein et al., |
| Evaluative feedback | Ho et al., | ||
| Model-based | Contextual instructions | Pradyot et al., | |
| Evaluative feedback | Knox and Stone, | ||
| Corrective feedback | Lopes et al., | ||
| Decision biasing | Guidance | Thomaz and Breazeal, | |
| Contextual instructions | Nicolescu and Mataric, |
Figure 2Shaping with evaluative feedback. 1: model-free reward shaping. 2: model-based reward shaping. 3: model-free value shaping. 4: model-based value shaping. 5: model-free policy shaping. 6: model-based policy shaping.
Figure 3Shaping with contextual instructions. 1: model-free reward shaping. 2: model-based reward shaping. 3: model-free value shaping. 4: model-based value shaping. 5: model-free policy shaping. 6: model-based policy shaping. 7: decision biasing.
Figure 4Exploration-control spectrum. As we move to the right, teaching signals inform more directly about the optimal policy and provide more control to the human over the learning process.
Figure 5Shaping with advice, a unified view. When advice is provided to the learning agent, it has first to be encoded into an appropriate representation. If the mapping between teaching signals and their corresponding internal representation is not predetermined, then advice has to be interpreted by the agent. Then advice can be integrated into the learning process (shaping), either in a model-free or a model-based fashion. Optional steps, interpretation and modeling, are sketched in light gray.