| Literature DB >> 35711281 |
Xinxing Chen1,2, Yuquan Leng1,2, Chenglong Fu1,2.
Abstract
Fuzzy inference systems have been widely applied in robotic control. Previous studies proposed various methods to tune the fuzzy rules and the parameters of the membership functions (MFs). Training the systems with only supervised learning requires a large amount of input-output data, and the performance of the trained system is confined by that of the target system. Training the systems with only reinforcement learning (RL) does not require prior knowledge but is time-consuming, and the initialization of the system remains a problem. In this paper, a supervised-reinforced successive training framework is proposed for a multi-continuous-output fuzzy inference system (MCOFIS). The parameters of the fuzzy inference system are first tuned by a limited number of input-output data from an existing controller with supervised training and then are utilized to initialize the system in the reinforcement training stage. The proposed framework is applied in a robotic odor source searching task and the evaluation results demonstrate that the performance of the fuzzy inference system trained by the successive framework is superior to the systems trained by only supervised learning or RL. The system trained by the proposed framework can achieve around a 10% higher success rate compared to the systems trained by only supervised learning or RL.Entities:
Keywords: Monte Carlo test; fuzzy inference system; reinforcement learning; robotic odor source searching; supervised learner
Year: 2022 PMID: 35711281 PMCID: PMC9194852 DOI: 10.3389/fnbot.2022.914706
Source DB: PubMed Journal: Front Neurorobot ISSN: 1662-5218 Impact factor: 3.493
Figure 1Illustration of the supervised-reinforced successive training framework for the multi-continuous-output TSK fuzzy inference system (MCOTSK).
Figure 2Illustration of a snapshot of the searching area in the odor source searching problem. The odor source is represented by the star. The yellow patches represent the area within 2 m from the odor source. The wind field is illustrated by the black arrows. The red puffs are the simulated odor plumes, which resemble the real-world plumes well as shown in Figure 1.
Figure 3The mean squared errors on the evaluation set during the supervised training stage.
Figure 4The average reward in each episode in the reinforced training stage.
Figure 5Performance evaluation results of the three controllers in the Monte Carlo tests. (A) Success rate, (B) is the number of searching steps, and (C) is the distance overhead. Model 1: the Fuzzy Lévy Taxis algorithm used in the supervised training; Models 2–20: the trained MCOTSK model after every 20 reinforced training episodes (from 0 episodes to 360 episodes); Model 21: the MCOTSK model trained with RL only.
Figure 6A typical odor source searching trajectory with the MCOTSK model trained with the proposed successive training framework. The blue curves represent the trajectories.