| Literature DB >> 29849543 |
Abstract
Hierarchical reinforcement learning works on temporally extended actions or skills to facilitate learning. How to automatically form such abstraction is challenging, and many efforts tackle this issue in the options framework. While various approaches exist to construct options from different perspectives, few of them concentrate on options' adaptability during learning. This paper presents an algorithm to create options and enhance their quality online. Both aspects operate on detected communities of the learning environment's state transition graph. We first construct options from initial samples as the basis of online learning. Then a rule-based community revision algorithm is proposed to update graph partitions, based on which existing options can be continuously tuned. Experimental results in two problems indicate that options from initial samples may perform poorly in more complex environments, and our presented strategy can effectively improve options and get better results compared with flat reinforcement learning.Entities:
Mesh:
Year: 2018 PMID: 29849543 PMCID: PMC5937602 DOI: 10.1155/2018/2085721
Source DB: PubMed Journal: Comput Intell Neurosci
Algorithm 1Incremental processing algorithm for transition graph changes.
Algorithm 2Option learning with evolving communities.
Figure 1Four-room grid world.
Figure 2The four-room problem results of comparison with different learning algorithms.
Figure 3A snap shot of the evaluated small-scale Pac-Man problem. Only one dot and one ghost are set.
Figure 4Small-scale Pac-Man problem results. Comparison of the proposed incremental-option learning method with Q-learning with primitive actions and Louvain-option learning. Shadow areas denote standard deviation value.
Sampled state transition graph changes during learning.
| Scenario | Pac-Directional | Pac-Random |
|---|---|---|
| Nodes | ||
| Initial amount | 561 | 967 |
| Final amount | 839.5 (25.4) | 1105.4 (11.1) |
| Increased amount | 278.5 (25.4) | 138.4 (11.1) |
|
| ||
| Edges | ||
| Initial amount | 1504 | 3406 |
| Final amount | 2368.1 (60.8) | 4331.6 (16.5) |
| Increased amount | 864.1 (60.8) | 925.6 (16.5) |
Figure 5Modularity changes of incremental-option learning in small-scale Pac-Man problem. Comparison of the presented rule-based revision method with iteratively calling Louvain algorithm. Shadow areas denote standard deviation value.