| Literature DB >> 35178430 |
Hian Lee Kwa1,2, Jabez Leong Kit1, Roland Bouffanais3.
Abstract
Multi-agent systems and multi-robot systems have been recognized as unique solutions to complex dynamic tasks distributed in space. Their effectiveness in accomplishing these tasks rests upon the design of cooperative control strategies, which is acknowledged to be challenging and nontrivial. In particular, the effectiveness of these strategies has been shown to be related to the so-called exploration-exploitation dilemma: i.e., the existence of a distinct balance between exploitative actions and exploratory ones while the system is operating. Recent results point to the need for a dynamic exploration-exploitation balance to unlock high levels of flexibility, adaptivity, and swarm intelligence. This important point is especially apparent when dealing with fast-changing environments. Problems involving dynamic environments have been dealt with by different scientific communities using theory, simulations, as well as large-scale experiments. Such results spread across a range of disciplines can hinder one's ability to understand and manage the intricacies of the exploration-exploitation challenge. In this review, we summarize and categorize the methods used to control the level of exploration and exploitation carried out by an multi-agent systems. Lastly, we discuss the critical need for suitable metrics and benchmark problems to quantitatively assess and compare the levels of exploration and exploitation, as well as the overall performance of a system with a given cooperative control algorithm.Entities:
Keywords: dynamic environment; exploitation; exploration; multi-agent systems; multi-robot systems; swarm intelligence; swarm robotics
Year: 2022 PMID: 35178430 PMCID: PMC8844516 DOI: 10.3389/frobt.2021.771520
Source DB: PubMed Journal: Front Robot AI ISSN: 2296-9144
FIGURE 1Classification of MAS exploration and exploitation control methods.
List of works using spatial distribution metrics to quantify MAS exploration and exploitation.
| References | Metric | Task |
|---|---|---|
|
| Area observed by at least two agents | Area Coverage |
|
| Euclidean distance between solutions | Optimization |
|
| Euclidean distance between solution and mean system solution | Optimization |
|
| Euclidean distance between solution and median system solution | Optimization |
|
| Standard deviation of all candidate solutions | Optimization |
|
| Fitness value between sub-swarm leaders and mean fitness value | Optimization |
|
| Local agent density | Optimal Area Aggregation |
|
| Agent sensor overlap region | Target Search |
|
| Euclidean distance between agents | Target Search |
|
| Number of agents within visual range | Target Search |
|
|
| Target Tracking |
|
| Distance between agents, distance between targets and agents & distance moved by agents in one time-step | Target Tracking |
List of works using probability based metrics to quantify MAS exploration exploitation dynamics.
| References | Metric | Task |
|---|---|---|
|
| Lévy parameter | Area Mapping |
|
| Opinion switching probability | Best-of- |
|
| Lévy parameter | Resource Foraging |
|
| Lévy parameter and memory utilization probability | Resource Foraging |
|
| Task switching probability threshold | Task Allocation |
|
| Lévy parameter | Target Search |
List of works using their own developed methods or metrics to quantify the level exploration and exploitation carried out by an MAS.
| References | Metric | Task |
|---|---|---|
|
| Number of stubborn agents | Best-of- |
|
| Number of unique messages broadcast & proportion of agents broadcasting unique messages | Collective Decision-Making |
|
| Correlation between an agent’s direction of travel and directional bearing to target | Target Tracking |
|
| Tracking fairness | Target Tracking |
|
| Engagement ratio | Target Tracking |
|
| Tracking Fairness and time-based occupancy grid map | Target Tracking |
FIGURE 2Flowchart detailing how an agent deterministically decides on its response when provoked by environmental stimuli.
List of works using self-determined state change strategies to influence a system’s exploration and exploitation dynamics.
| References | Strategy | Task |
|---|---|---|
|
| Agent behaviors selected using AIS | Dynamic Shepherding |
|
| Random walk search phase & agent assignment exploitation phase | Resource Foraging |
|
| Lévy Walk exploration & memory driven exploitation | Resource Foraging |
|
| Dedicated exploration & exploitation Lévy parameters | Resource Foraging |
|
| Random search & PSO tracking strategy | Target Capture |
|
| Random search & target following strategy | Target Tracking (CMOMMT) |
|
| Pattern search & target following strategy | Target Tracking (CMOMMT) |
|
| Repulsion based exploration & PSO tracking strategy | Target Tracking |
|
| Repulsion based exploration, PSO tracking strategy & adjustable memory length | Target Tracking |
|
| Agent behaviors selected using AIS | Target Tracking |
|
| Dedicated exploration & localization Lévy parameters | Target Suppression |
|
| Agent behaviors selected using AIS | Task Assignment |
List of works using assigned agent responses to influence a system’s exploration and exploitation dynamics.
| References | Strategy | Task |
|---|---|---|
|
| Dynamic sub-swarm membership | Area Exploration |
|
| Stubborn agents that do not change opinion of site | Best-of- |
|
| Dedicated environment exploration and long term monitoring (exploitation) agents | Environment Monitoring |
|
| Forced re-initialization of a subset of candidate solutions | Optimization |
|
| Dedicated exploratory and exploitative particles | Optimization |
|
| Ant Colony System paired with agents with preset pheromone sensitivity | Optimization (Traveling Salesman Problem) |
|
| Dynamic sub-swarm membership | Target Search |
|
| Search strategies assigned based on agent’s distance to target and target-to-searcher ratio | Target Search |
|
| Forced re-initialization of robots | Target Search |
|
| Predetermined static agents for target search and mobile agents for target tracking | Target Tracking (CMOMMT) |
|
| Specified number of agents closest to target chosen to utilize a target tracking strategy | Target Tracking |
List of works using small response changes to influence the level of exploration and exploitation of an MAS.
| References | Strategy | Task |
|---|---|---|
|
| Exponential inter-agent repulsion strength | Area Coverage |
|
| Exponential inter-agent repulsion & attraction gradient | Area Coverage |
|
| Exponential inter-agent repulsion with “selfishness” term to drive exploration | Area Coverage |
|
| Varying agent environment sampling time | Area Characterization |
|
| Varying time interval at which robots regroup to trade map information | Area Exploration |
|
| Adaptive step size | Optimization |
|
| Exponential inter-agent repulsion strength | Optimization |
|
| Varying agent wait time based on local fitness value | Optimal Area Aggregation |
|
| Variable Lévy parameter | Resource Foraging |
|
| Variable Lévy parameter & interruption of Lévy walks | Resource Foraging |
|
| Random walk that gradually becomes more directed | Resource Foraging |
|
| Exponential inter-agent repulsion strength | Target Search |
|
| Using Lévy walks and Firefly optimization algorithm to generate points of attraction for each agent with different attraction weights | Target Search |
|
| SRLA and SRLAMR agent interaction schemes | Target Search |
|
| Exponential inter-agent repulsion and attraction strength | Target Tracking |
|
| Linear inter-agent repulsion strength & variable target attraction strength | Target Tracking (CMOMMT) |
FIGURE 3Inter-agent interaction forces generated by various repulsion and attraction schemes. Positive values indicate an attractive force and negative values indicate a repulsive force.
List of works using agent area and task assignments to influence the level of exploration and exploitation of an MAS.
| References | Strategy | Task |
|---|---|---|
|
| Area partitioning | Patrolling |
|
| Agent deployment probability threshold | Resource Foraging |
|
| Agent utility distribution to drive robot movement | Task Allocation |
|
| Belief maps & task transition probability | Task Allocation |
List of works using different information dissemination strategies to influence the level of exploration and exploitation of an MAS or MRS.
| References | Strategy | Task |
|---|---|---|
|
| Adjusting communications range | Collective Decision Making |
|
| Changing number of communication neighbors | Distributed Consensus |
|
| Changing attention limit of agents | Distributed Consensus |
|
| Changing interaction radius and field of vision angle | Distributed Consensus |
|
| Changing interaction radius | Distributed Consensus |
|
| Forced switching network model | Distributed Consensus |
|
| Changing number of communication neighbors | Optimization |
|
| Dynamic sub-swarm membership | Optimization |
|
| Adjusting communications range | Optimal Size Aggregation |
|
| Adjusting communications range | Resource Foraging |
|
| Adjusting communications range based on internal workload state | Task Allocation |
|
| Changing an agent’s number of neighbors in network based on fitness value | Target Search |
|
| Changing number of communication neighbors | Target Tracking |
|
| Varying communication link strength and number of communication neighbors | Target Tracking (CMOMMT) |
FIGURE 4(Left) A standard communication model with an agent in communication with its 10 nearest neighbors. (Right) A forced switching communication model where k = 5 of the nearest neighbors are substituted with a more distant set of neighbors within the black circle with a probability of p. (Kuan, 2018).
List of works using stigmergy to influence the level of exploration and exploitation carried out by an MAS.
| References | Strategy | Task |
|---|---|---|
|
| Repulsive pheromones | Area Exploration |
|
| Ant Colony Optimization with pheromone evaporation | Optimization |
|
| Ant Colony Optimization with pheromone diffusion | Optimization |
|
| Ant Colony System with agents of different pheromone sensitivities | Optimization (Traveling Salesman Problem) |
|
| ACO with solution re-initialization | Path Finding |
|
| Gradient decent with physical data carriers | Target Search |
|
| Gradient decent with vectorial pheromones and physical data carriers | Target Search |