| Literature DB >> 34295926 |
Emma M van Zoelen1,2, Karel van den Bosch2, Mark Neerincx1,2.
Abstract
Becoming a well-functioning team requires continuous collaborative learning by all team members. This is called co-learning, conceptualized in this paper as comprising two alternating iterative stages: partners adapting their behavior to the task and to each other (co-adaptation), and partners sustaining successful behavior through communication. This paper focuses on the first stage in human-robot teams, aiming at a method for the identification of recurring behaviors that indicate co-learning. Studying this requires a task context that allows for behavioral adaptation to emerge from the interactions between human and robot. We address the requirements for conducting research into co-adaptation by a human-robot team, and designed a simplified computer simulation of an urban search and rescue task accordingly. A human participant and a virtual robot were instructed to discover how to collaboratively free victims from the rubbles of an earthquake. The virtual robot was designed to be able to real-time learn which actions best contributed to good team performance. The interactions between human participants and robots were recorded. The observations revealed patterns of interaction used by human and robot in order to adapt their behavior to the task and to one another. Results therefore show that our task environment enables us to study co-learning, and suggest that more participant adaptation improved robot learning and thus team level learning. The identified interaction patterns can emerge in similar task contexts, forming a first description and analysis method for co-learning. Moreover, the identification of interaction patterns support awareness among team members, providing the foundation for human-robot communication about the co-adaptation (i.e., the second stage of co-learning). Future research will focus on these human-robot communication processes for co-learning.Entities:
Keywords: co-adaptation; co-learning; emergent interactions; human-robot collaboration; human-robot team; interaction patterns
Year: 2021 PMID: 34295926 PMCID: PMC8290358 DOI: 10.3389/frobt.2021.692811
Source DB: PubMed Journal: Front Robot AI ISSN: 2296-9144
The concepts co-adaptation, co-learning and co-evolution defined in terms of timespan in which they occur, persistence and intention.
| Co-adaptation | Co-learning | Co-evolution | |
|---|---|---|---|
| Timespan | Short (seconds | Medium (hours | Long (weeks |
| Persistence | Developed behavior/mental state does not necessarily persist over time, and probably not at all across contexts | Developed behavior/mental state persists over time and possibly across contexts | Developed behavior/mental state might persist for a while but possibly continues to evolve, similar to the development of habituation |
| Intention | Changes and developments happen as a consequence of interactions and an implicit or explicit drive to improve performance or experience | Explicitly goal-driven: Attempts to improve performance or experience; learning is an explicit goal | Changes and developments happen as a consequence of interactions and possibly an implicit drive to improve performance or experience |
FIGURE 1A storyboard describing how a human-robot team might free an earthquake victim from underneath a pile of rocks. In this storyboard, we can see how the robot picks up a large rock, unaware that this will cause another rock to fall on the head of the victim (panel C). The human notices the issue, and steps in to prevent the rock from falling (panel D). This event can help the robot learn about the task, and that it apparently made a mistake. The human can learn about the capabilities of the robot, namely that it didn’t understand how the rocks would fall and that it would cause harm.
FIGURE 2The USAR task environment programmed in MATRX. It shows a victim underneath a pile of rocks, and a human and a robot representing the team members. The dashed red square (above the human’s head) represents the hand of the human that can be moved to pick up rocks. The dashed blue square represents the hand of the robot. Scene (A) was used as level 1 in the experiment, while scene (B) was used as level 2.
The task conditions specified for each Phase Variable used in the state space of the Reinforcement Learning algorithm.
| Phase | Description |
|---|---|
| Phase 1 | The starting phase: Describes the state of the task environment when no rocks have been moved |
| Phase 2 | The heights of all piles of rocks added up is now at least 10 rocks lower than in phase 1 |
| Phase 3 | Phase 2 has been reached, and the heights of all piles of rocks added up is now at least 20 rocks lower than in phase 1 |
| Phase 4 | Phase 2 and 3 have been reached, and either there are no more rocks directly on top of the victim, OR one of the sides of the task field is cleared from rocks, meaning there is an access route to the victim from either the left or right side |
| Goal phase | Phase 2, 3 and 4 have been reached, and there are no more rocks directly on top of the victim, AND one of the sides of the task field is cleared from rocks, meaning there is a free route from either the left or right side to the victim. The task terminates when this phase is reached |
FIGURE 3A flowchart showing the rule-based decision making the agent would go through when using Macro-Action 1.
FIGURE 4A flowchart showing the rule-based decision making the agent would go through when using Macro-Action 2.
FIGURE 5A flowchart showing the rule-based decision making the agent would go through when using Macro-Action 3.
FIGURE 6An overview of the representation of the learning problem embedded in the experiment. It shows the different runs that a participant went through (5 runs for level 1, 3 runs for level 2), as well as how the runs were separated into 4 phases defined by the Phase Variables. The colors show how in R1.1, R1.2 and R1.3, the robot usually used O1—picking up all, O2—passive large rocks and O3—breaking respectively in each phase. From R1.4 onwards, the robot would choose a Macro-action based on the learned Q-values. The Future Run portrays the behavior that the robot would engage in if there were another run, based on the Q-values after R2.3.
FIGURE 7The Collaboration Fluency scores per run in the experiment for all participants.
The interaction patterns identified from the behavioral data, including a description of what they entail.
| Category | Concept | Description |
|---|---|---|
| Stable situation | Actively synchronizing actions with a team member | Human understands the capabilities of another team member and actively uses their own actions to make optimal use of the combined capabilities |
| Alternating actively working on the task and waiting for a team member | Human switches between performing their own task for a while, then waiting for a team member to perform their task, and so on | |
| Being generally passive and letting a team member do most of the work | Human is overall passive and lets the other team member do the work | |
| Damage control: Prevent damage caused by a team member | Human performs actions that prevent their team member from causing intentional or unintentional harm or damage | |
| Focusing on own task | Human performs their own task without paying much attention to their team member | |
| Sudden adaptation | Avoiding communication with a team member | One of the team member actively avoids the other team member to avoid unwanted communication interpretations |
| Being confused by non-human-like behavior | A human team member is confused by non-human-like behavior performed by a team member | |
| Being confused by unexpected behavior (negative) | One of the team members is confused or frustrated by behavior performed by their team member that they did not expect | |
| Being happy that a team member does as expected | One of the team members is happy that their team member performs the kind of behavior that they expect and hoped for | |
| Being surprised by unexpected behavior (positive) | One of the team members is positively surprised by behavior performed by their team member that they did not expect | |
| Coming into action when a team member comes into action | A team member starts to actively perform their task after a period of inaction, when their team member also starts to actively perform their task after a period of inaction | |
| Doing useless or harmful actions because there is nothing else to do | A team member is unable to perform useful actions, therefore starts performing useless or harmful actions | |
| Feeling alone, as if team member does not help | A human team member feels left alone | |
| Following a team member’s action | A team member follows or copies the action performed by another team member | |
| Learning about behavioral cues | A team member gains insight into specific behavior performed by another team member | |
| Learning about own capabilities | A team member gains insight into their own capabilities | |
| Learning about team member’s capabilities or strategy | A team member gains insight into the capabilities or strategy of another team member | |
| Moving around different task components | A team member moves around different task components without actually performing any task | |
| Team member changes strategy, which is visible by a behavioral cue | A team member observes that another team member changes strategy by a behavioral cue | |
| Team member performs an action that makes no sense | A team member performs a useless action | |
| Trying to communicate by interacting with a team partner | A team member attempts to communicate with another team member by directly interacting with them, for example by coming close to them | |
| Trying to communicate by signaling task actions | A team member attempts to communicate with another team member by trying out different actions that they want their team member to perform | |
| Waiting for a team member to start acting | A team member waits for another team member to start performing their task |
FIGURE 8An overview of how often certain Macro-actions were chosen by the robot across all participants per phase (A) and per run (B).
FIGURE 9An overview of how many participants used specific behavioral strategies per run.
The clusters resulting from manually clustering participants based on whether they adapted to the robot across the whole experiment.
| Cluster | Participants |
|---|---|
| Does not adapt | 2, 6, 9, 15, 21, 22, 23, 24 (n = 8) |
| Adapts by balancing passively waiting and acting | 12, 13, 14, 16, 27, 28 (n = 6) |
| Adapts by actively using O2 or O3 | 3, 8, 10, 17, 19, 20, 26 (n = 7) |
FIGURE 10An overview of how often certain Macro-actions were chosen by the robot across all participants per run, split up by the level adaptation the participant showed: (A) shows participants who adapted by actively using O2—passive large rocks and/or O3—breaking, (B) shows participants who adapted by balancing waiting and acting, and (C) shows participants who did not adapt.
The claims as presented in Claims: Expected Observations that need to be justified in a co-adaptation experiment, including whether they were validated and an explanation of that conclusion.
| Claim | Justified | Explanation |
|---|---|---|
| Different participants develop different ways of performing the task | Yes | When looking at the different interaction patterns that people engage in, and categorizations of their adaptive behavior, we can see that different people indeed performed the task in a variety of ways |
| The agent learns different sequences of strategy options for different participants | Partly | The results showed that not all agents learned the same model on an individual level. However, the models had much in common, suggesting that all agents learned similar behavior. When splitting this up in groups based on human adaptive behavior, there seems to be a difference in learned agent behavior between the different groups. Currently, however, we did not do any statistical analysis to test whether this is a significant result |
| Different teams converge to different ways of performing the task | Partly | When looking at the different interaction patterns that participants engaged in with their robot team partner, different teams solved the task in a variety of ways (see H1). However, it is unclear to what extent the robot contributed to this. Moreover, while participants generally gained more confidence in their strategy and expressed to experience a greater subjective collaboration fluency toward the end of the experiment, it is unclear to what extent the strategy of the team really converged to a stable one |
| The agent converges to a specific sequence of strategy options for most participants | No | While we did observe a logical development of the Q-values on a population level, this does not count for all of the individual agents. Moreover, it is not clear to what extent the agents really converged to a stable set of actions |
| The human converges to a specific strategy within the experiment | Partly | The categorizations of participant behavior show that participants settle on a stable strategy more and more over the course of the experiment. This is also shown by the development of the confidence scores and subjective collaboration fluency. True convergence to a stable strategy, however, is not clearly visible within the 8 runs of the experiment |