| Literature DB >> 25825974 |
Meng-Li Cao1, Qing-Hao Meng2, Jia-Ying Wang3, Bing Luo4, Ya-Qi Jing5, Shu-Gen Ma6,7.
Abstract
Maintaining contact between the robot and plume is significant in chemical plume tracing (CPT). In the time immediately following the loss of chemical detection during the process of CPT, Track-Out activities bias the robot heading relative to the upwind direction, expecting to rapidly re-contact the plume. To determine the bias angle used in the Track-Out activity, we propose an online instance-based reinforcement learning method, namely virtual trail following (VTF). In VTF, action-value is generalized from recently stored instances of successful Track-Out activities. We also propose a collaborative VTF (cVTF) method, in which multiple robots store their own instances, and learn from the stored instances, in the same database. The proposed VTF and cVTF methods are compared with biased upwind surge (BUS) method, in which all Track-Out activities utilize an offline optimized universal bias angle, in an indoor environment with three different airflow fields. With respect to our experimental conditions, VTF and cVTF show stronger adaptability to different airflow environments than BUS, and furthermore, cVTF yields higher success rates and time-efficiencies than VTF.Entities:
Mesh:
Year: 2015 PMID: 25825974 PMCID: PMC4431220 DOI: 10.3390/s150407512
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1Circulation process following the first chemical detection event.
Figure 6Q-value approximation based on nearby VTs. The solid circle encloses the neighboring area of the starting state in the following Track-Out activity, i.e., .
Figure 2Robot trajectories obtained using BUS in the Track-Out activity. Due to the variation of wind direction, the chemical patches in the plume were carried from their past positions (i.e., dotted ellipses) to current positions (i.e., grey oblong plates).
Figure 3Discretizing the continuous action space to a set of eight actions.
Figure 4Flow chart of the VTF method. s and s denote the nearby VTs of and the VTs associated with action , respectively.
Figure 5Pseudo-code of the VTF method. denotes the VT associated with the action .
Figure 7Mobile olfactory robots used in the experiments. (a) One of the MrCollie robots; (b) a scene of controlling the robots in the experiments.
Figure 8Plan sketch of the laboratory. The valid search region is represented as a rectangular area with dotted edges.
Figure 9Comparison of the robot trajectories obtained by setting different values of . (a) Robot trajectories obtained using BUS; (b) Robot trajectories obtained using cVTF.
Figure 10Comparison of the robot trajectories obtained by setting different values of (a) Robot trajectories obtained using BUS; (b) Robot trajectories obtained using cVTF.
Figure 11Success rates of the Track-Out activities in ten CPT runs obtained using different values of , , and . The tested values are displayed on top of the bars.
Figure 12Numerical results. (a) Success rates in the three groups of experiments. The numerator and denominator of the fraction on each bar are the corresponding number of successful Track-Out activities and total number of Track-Out activities, respectively; (b) Box plots with the whisker lengths specified as 1.0 times the interquartile range for each method in the three groups.
Figure 13Typical robot trajectories obtained in M group.
Figure 14Typical robot trajectories obtained in S group.
Figure 15Typical robot trajectories obtained in N group.
Averaged re-contact distance overheads of successful Track-Out activities in the qualitatively analyzed typical experiments.
| BUS | rBUS | VTF | cVTF | |
|---|---|---|---|---|
| M group | 1.0162 | 1.0178 | 1.2474 | 1.2647 |
| S group | 1.0116 | 1.0184 | 1.4782 | 1.5280 |
| N group | 1.0201 | 1.0193 | 1.6045 | 1.6893 |
Figure 16(a) Typical wind magnitudes measured in the three groups of experiments; (b) The robot’s actual velocities recorded in 30 tests.
The notation used in this paper.
|
| Number of cycles from the last chemical detection event till the current time. |
|
| Cycle limit for the Track-Out activity. |
|
| Cycle limit for the plume re-acquiring behavior. |
|
| Robot heading at the |
|
| Bias angle at the |
|
| Angle of wind direction measured at the |
|
| Position of the robot at the |
|
| State, action, and reward at the |
|
| Action value when action |
|
| Learning rate used in VTF/cVTF. |
|
| Discount rate used in VTF/cVTF. |
|
| Probability of selecting random action in the |
|
| Number of actions. |
|
| Start state, end state, and Q-value of the new VT |
|
| Nearby VT of |
|
| The VTs associated with action |
|
| Database for storing VTs. |
|
| Size limit of .Size limit of
|
|
| The weight for the |
|
| Transient concentration measurement at the |
|
| Adaptive concentration threshold at the |
|
| Constant parameter for calculating |
|
| Goal position of the robot. |
|
| Distance threshold for determining whether to generate repulsive force or not. |
|
| A distance that is big enough for APF to generate sufficient attractive force. |
|
| Number of times that the robot has arrived at |
|
| Scanning span added to the scanning width in “casting” behavior. |
|
| The maximal velocity of the robot. |