| Literature DB >> 32316134 |
Xiao Wang1, Peng Shi1, Yushan Zhao1, Yue Sun2.
Abstract
In order to help the pursuer find its advantaged control policy in a one-to-one game in space, this paper proposes an innovative pre-trained fuzzy reinforcement learning algorithm, which is conducted in the x, y, and z channels separately. Compared with the previous algorithms applied in ground games, this is the first time reinforcement learning has been introduced to help the pursuer in space optimize its control policy. The known part of the environment is utilized to help the pursuer pre-train its consequent set before learning. An actor-critic framework is built in each moving channel of the pursuer. The consequent set of the pursuer is updated through the gradient descent method in fuzzy inference systems. The numerical experimental results validate the effectiveness of the proposed algorithm in improving the game ability of the pursuer.Entities:
Keywords: actor-critic; differential game; fuzzy system; reinforcement learning
Year: 2020 PMID: 32316134 PMCID: PMC7218890 DOI: 10.3390/s20082253
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1The location of the pursuer and the evader.
Figure 2The membership functions for one input.
Figure 3The diagram of the learning logic.
Figure 4The diagram of the pre-training process.
Initial states of the pursuer and the evader.
| State | Value |
|---|---|
|
|
|
|
|
|
Figure 5Trajectories of the pursuer and the evader in the X–Y plane.
Figure 6Trajectories of the pursuer and the evader in the Y–Z plane.
Figure 7Variations of tracking errors in the x, y, and z channels.
Figure 8Comparisons of the max tracking errors in different periods.
Figure 9Comparisons of total rewards in different periods.