| Literature DB >> 35548780 |
Chunmiao Yu1, Peng Wang1,2,3,4.
Abstract
With the increasing demand for the dexterity of robotic operation, dexterous manipulation of multi-fingered robotic hands with reinforcement learning is an interesting subject in the field of robotics research. Our purpose is to present a comprehensive review of the techniques for dexterous manipulation with multi-fingered robotic hands, such as the model-based approach without learning in early years, and the latest research and methodologies focused on the method based on reinforcement learning and its variations. This work attempts to summarize the evolution and the state of the art in this field and provide a summary of the current challenges and future directions in a way that allows future researchers to understand this field.Entities:
Keywords: dexterous manipulation; learn from demonstration; multi-fingered robotic hand; reinforcement learning; sim2real
Year: 2022 PMID: 35548780 PMCID: PMC9083362 DOI: 10.3389/fnbot.2022.861825
Source DB: PubMed Journal: Front Neurorobot ISSN: 1662-5218 Impact factor: 3.493
Figure 1Typical tasks of dexterous manipulation with a multi-fingered hand. (A) Relocation, (B) Reorientation & Relocation, (C) Tool use, (D) Door opening, (E) Valve turning, (F) In-hand manipulation, (G) Screwing, (H) Dexterous manipulation, and (I) Pouring.
Typical dexterous hands.
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|
| Picture |
|
|
|
|
|
|
| Fingers | 3 | 3 | 4 | 4 | 5 | 5 |
| DoF | 9 | 9 | 16 | 16 | 24 | 15 |
Figure 2Overall presentation of this work.
Figure 3Method based on an accurate model of multi-fingered hand and object.
Figure 4Dexterous manipulation with a multi-fingered hand through reinforcement learning (part of this picture comes from [89]).
Classification and corresponding advantages and disadvantages of the RL methods.
|
|
|
|
|---|---|---|
| Value-based RL | 1. Easy to implement | 1. Poor performance in tasks of discontinuous and large state space |
| Policy-based RL | 1. Easier to converge | 1. Easy to converge to local optimum |
| Model-based RL | 1. More data efficient | 1. Model accuracy has a big impact on learning tasks |
| Model-free RL | 1. Easier to implement | 1. Demanding much data |
Comparison between typical value-based algorithms and policy-based algorithms.
|
|
|
|
|
|---|---|---|---|
| DQN (Mnih et al., | Approximating the optimal Q-value function with a deep convolutional neural network. Target-network and Experience replay | Value-based | Only capable of handling discrete and low-dimensional action spaces |
| Double DQN (van Hasselt et al., | Two networks are used for dealing with the overestimation problem of DQN | Value-based | |
| DQN with prioritized experience replay (Schaul et al., | Experience replay with priority is used to increase the learning utilization rate of samples and increase exploration | Value-based | |
| Dueling DQN (Wang et al., | V(s)+A(s, a) is used to replace Q(s, a) to alleviate the overestimation problem of DQN | Value-based | |
| REINFORCE (Williams, | The starting point of policy gradient algorithms | Policy-based | Low efficiency and high variance |
| TRPO (Schulman et al., | Finding the right step size to stably improve the policy | Policy-based | |
| PPO (Schulman et al., | An advanced version of TRPO which is easier to implement | Policy-based |
Summary of typical algorithms under the actor-critic framework.
|
|
|
|
|---|---|---|
| A3C (Mnih et al., | Adopting asynchronous training framework | On-policy |
| DDPG (Lillicrap et al., | Able to deal with continuous space of action issues | Off-policy |
| TD3 (Fujimoto et al., | An advanced version of DDPG solving the problem of overestimation in actor-critic and addressing variance | Off-policy |
| SAC (Haarnoja et al., | Adopting Maximum Entropy Model to improve the robustness of the algorithm and speed up training | Off-policy |
Figure 5Basic process of learning dexterous manipulation by RL from scratch (part of this picture comes from Open et al., 2019).
Overview of the dexterous manipulation solved by RL from scratch.
|
|
|
|
|
|
|
|---|---|---|---|---|---|
| Popov et al. ( | Improved DDPG | Jaco arm | - | Simulation only | Lego assembly |
| Fakoor et al. ( | DDPG++ | ADROIT hand | - | Simulation only | Door opening |
| He and Ciocarlie ( | DisoSyn (based on PPO) | Shadow hand | - | Simulation only | Multi-tasks |
| Huang et al. ( | DDPG+HER+Multi-task learning | Shadow hand | - | Simulation only | In-hand rotation |
| Katyal et al. ( | DQN | Modular Prosthetic Limb (MPL) | - | Simulation only | In-hand manipulation |
| Li S. et al. ( | DDPG+HER | Shadow hand | - | Simulation only | Solving a 2*2*2 Rubik's Cube |
| Omer et al. ( | MPC-SAC | Dclaw and Shadow hand | - | Simulation only | Valve-turning and manipulating a cube |
| He et al., | Soft HER | Shadow hand | - | Simulation only | Hand manipulate block and others |
| Xu et al. ( | SAC | Allegro hand | tactile sensors | Simulation only | Playing piano |
| Kumar et al. ( | RL with linear-Gaussian controllers (model-based RL) | Adroit platform | pressure sensors and piston length sensors | Simulation and real robot | Hand positioning and object manipulation |
| van Hoof et al. ( | NPREPS (van Hoof et al., | An under-actuated compliant robot hand | Tactile sensor | Real world | Rolling an object between fingertips |
| Nagabandi et al. ( | PDDM (model-based RL) | Shadow hand | Camera tracker | Real world | Baoding balls |
| Haarnoja et al. ( | SAC | Dclaw | Visual sensor | Real world | Valve rotation |
| Zhu H. et al. ( | TNPG | Dclaw and Allegro Hand | - | Real world | Valve Rotation and Door opening |
| Gupta et al. ( | MTRF | D'Hand | - | Real world | Pipe insertion and In-hand manipulation |
Figure 6Two types of combination of RL and demonstration.
Overview of the dexterous manipulation solved by RL with a demonstration.
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|
| Qin et al. ( | DexMV | Adroit Hand | - | Simulation only | Raw video | Relocating, pouring and placing inside |
| Zhu H. et al., | DAPG | Dclaw and Allegro Hand | - | Real robot | kinesthetic teaching | Valve Rotation, Valve Rotation and Door opening |
| Orbik et al. ( | IRL | Adroit Hand | - | Simulation only | CyberGlove | Object relocation, tool use, in-hand manipulation and door opening |
| Rajeswaran et al. ( | DAPG | Adroit hand | - | Simulation only | CyberGlove | Object relocation, tool use, in-hand manipulation and door opening |
| Gupta et al. ( | Learning from demonstrations algorithm based on the GPS | RBO Hand 2 | Phase space Impulse system | Real robot | LED marker tracking the motion of the object demonstrated by human | Turning a valve, pushing beads on an abacus, and grasping a bottle from a table |
| Jeong et al. ( | REQfSE | Bimanual Shadow Hand | - | Simulation only | Waypoint controllers | LEGO stacking |
| Alakuijala et al. ( | RRLfD | Adroit Hand | - | Simulation only | Script or a previously trained RL agent | Object relocation, tool use, in-hand manipulation and door opening |
| Radosavovic et al. ( | SOIL | Adroit Hand | - | Simulation only | virtual reality headset and a motion capture glove | Object relocation, tool use, in-hand manipulation and door opening |
Figure 7Category of approaches for sim-to-real in this domain.
Overview of the dexterous manipulation from simulation to reality.
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|
| Andrychowicz et al. ( | PPO | Domain randomization | Shadow hand | MuJoCo physics engine (Todorov et al., | 3D tracking system and RGB cameras | Manipulating a block |
| Open et al. ( | PPO | Automatic domain randomization | Shadow hand | MuJoCo physics engine (Todorov et al., | 3D tracking system and RGB cameras | Solving a Rubik's cube |
| Zhu Y. et al. ( | A method based on GAIL(IL)+PPO(RL) | Domain randomization | Jaco arm | MuJoCo physics engine (Todorov et al., | RGB cameras | Block lifting and stacking |
| Kumar et al. ( | Contextual RL (PPO) | Domain randomization | Allegro hand | - | RGB cameras and Tactile sensor | Grasping |
| Allshire et al. ( | PPO | Domain randomization | TriFinger | IsaacGym | RGB cameras | In-hand manipulation |
| Rusu et al. ( | A3C | Progressive net | Jaco arm | MuJoCo physics engine (Todorov et al., | RGB cameras | Reaching to a visual target |
| Fernandes Veiga et al. ( | Hierarchical control (RL+ tactile feedback control) | Hierarchical RL | Allegro hands | PyBullet Coumans and Bai ( | Tactile sensor | In-hand manipulation |