| Literature DB >> 34068422 |
Abstract
This study proposes a novel hybrid imitation learning (HIL) framework in which behavior cloning (BC) and state cloning (SC) methods are combined in a mutually complementary manner to enhance the efficiency of robotic manipulation task learning. The proposed HIL framework efficiently combines BC and SC losses using an adaptive loss mixing method. It uses pretrained dynamics networks to enhance SC efficiency and performs stochastic state recovery to ensure stable learning of policy networks by transforming the learner's task state into a demo state on the demo task trajectory during SC. The training efficiency and policy flexibility of the proposed HIL framework are demonstrated in a series of experiments conducted to perform major robotic manipulation tasks (pick-up, pick-and-place, and stack tasks). In the experiments, the HIL framework showed about a 2.6 times higher performance improvement than the pure BC and about a four times faster training time than the pure SC imitation learning method. In addition, the HIL framework also showed about a 1.6 times higher performance improvement and about a 2.2 times faster training time than the other hybrid learning method combining BC and reinforcement learning (BC + RL) in the experiments.Entities:
Keywords: behavior cloning; dynamics modeling; hybrid imitation learning; robotic object manipulation task; trajectory cloning
Year: 2021 PMID: 34068422 PMCID: PMC8153608 DOI: 10.3390/s21103409
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1Behavior cloning (BC) vs. state cloning (SC).
Figure 2An example of robotic manipulation task.
Variable descriptions.
| Variable | Description |
|---|---|
|
| Demo state |
|
| Demo action |
|
| Task state |
|
| Task action |
|
| State space |
| A | Action space |
|
| Stochastic state transition probability |
|
| Assigned robotic task |
|
| Initial state condition |
|
| Goal state condition |
|
| Stochastic policy network |
|
| Demo task trajectory |
|
| Demo dataset |
|
| Length of each demo task trajectory |
|
| Behavior cloning (BC) loss |
|
| State cloning (SC) loss |
|
| Mixed loss |
|
| Degree of policy convergence |
|
| Dynamics network |
|
| Loss mixing weight |
|
| State recovery probability |
Figure 3Hybrid imitation learning (HIL) framework.
Figure 4State cloning with dynamics network.
Figure 5Three robotic manipulation tasks.
Performance comparison among the methods using different loss mixing weights.
| Task | Pick-Up | Pick-and-Place | Stack | ||||
|---|---|---|---|---|---|---|---|
| Loss Mixing | Success Rate | Time (min) | Success Rate | Time (min) | Success Rate | Time (min) | |
|
| 0.58 | 112 | 0.48 | 128 | 0.16 | 135 | |
|
| 0.75 | 293 | 0.52 | 422 | 0.46 | 414 | |
|
| 0.72 | 393 | 0.78 | 456 | 0.72 | 545 | |
|
| 0.64 | 460 | 0.76 | 625 | 0.66 | 724 | |
|
| 0.75 | 119 | 0.84 | 156 | 0.78 | 181 | |
Figure 6Changes in for learning manipulation tasks.
Performance comparison depending on the state recovery probability .
| Task | Pick-Up | Pick-and-Place | Stack | |
|---|---|---|---|---|
| Recovery | ||||
| 0.64 | 0.46 | 0.52 | ||
| 0.75 | 0.84 | 0.78 | ||
| 0.816 | 0.75 | 0.76 | ||
|
|
|
| ||
Figure 7Comparison with conventional learning methods.
Figure 8Qualitative comparison among the learned policies.