| Literature DB >> 35408128 |
Minjeong Yoo1, Yuseung Na1, Hamin Song1, Gamin Kim1, Junseong Yun1, Sangho Kim1, Changjoo Moon1, Kichun Jo1.
Abstract
As an alternative to traditional remote controller, research on vision-based hand gesture recognition is being actively conducted in the field of interaction between human and unmanned aerial vehicle (UAV). However, vision-based gesture system has a challenging problem in recognizing the motion of dynamic gesture because it is difficult to estimate the pose of multi-dimensional hand gestures in 2D images. This leads to complex algorithms, including tracking in addition to detection, to recognize dynamic gestures, but they are not suitable for human-UAV interaction (HUI) systems that require safe design with high real-time performance. Therefore, in this paper, we propose a hybrid hand gesture system that combines an inertial measurement unit (IMU)-based motion capture system and a vision-based gesture system to increase real-time performance. First, IMU-based commands and vision-based commands are divided according to whether drone operation commands are continuously input. Second, IMU-based control commands are intuitively mapped to allow the UAV to move in the same direction by utilizing estimated orientation sensed by a thumb-mounted micro-IMU, and vision-based control commands are mapped with hand's appearance through real-time object detection. The proposed system is verified in a simulation environment through efficiency evaluation with dynamic gestures of the existing vision-based system in addition to usability comparison with traditional joystick controller conducted for applicants with no experience in manipulation. As a result, it proves that it is a safer and more intuitive HUI design with a 0.089 ms processing speed and average lap time that takes about 19 s less than the joystick controller. In other words, it shows that it is viable as an alternative to existing HUI.Entities:
Keywords: IMU-based motion capture system; deep learning; hand-gesture-based recognition; human–UAV interaction; hybrid-based hand gesture recognition
Mesh:
Year: 2022 PMID: 35408128 PMCID: PMC9002368 DOI: 10.3390/s22072513
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Advantages and disadvantages of four types of categories in the HUI.
| HUI | Advantages | Disadvantages |
|---|---|---|
| Wearable Sensors |
Intuitive, Natural Low computation compared to other interfaces Suitable performance for human motion capture |
Expensive equipment required Lack of ability to distinguish between unconscious and predefined similar behaviors |
| More User-Friendly Remote Controller |
Less training time compared to traditional remote controller Additional features, such as path planning using touch-screen devices |
Less intuitive than other interfaces Distance limitation by WiFi signal transmission characteristics |
| Speech |
Intuitive, Natural No additional devices required |
Decreased performance due to influences from surrounding environment, such as noise Effects of language and intonation differences |
| Gesture |
Intuitive, Natural No additional devices required No need for many commands |
Lower control performance than other interfaces due to limited discriminant ability High computation compared to other interfaces |
Figure 1The proposed architecture on hand gesture recognition for controlling UAV.
Figure 2Multi-copter reference coordinate system and coordinate system reference direction.
Figure 3How to control a multi-copter using a twin stick controller.
Command list considering joystick-based multi-copter direct operation instructions and proposed system.
| No. | Command | No. | Command |
|---|---|---|---|
| 1 | Move forward (Pitch down) | 8 | Descend (Throttle down) |
| 2 | Move backward (Pitch up) | 9 | Arming |
| 3 | Move left (Roll left) | 10 | Disarming |
| 4 | Move right (Roll right) | 11 | Take off |
| 5 | Turn left (Yaw left) | 12 | Land |
| 6 | Turn right (Yaw right) | 13 | Back home |
| 7 | Ascend (Throttle up) | 14 | Stop |
Figure 4(a) Dynamic gestures for IMU-based gesture recognition.; (b) Static gestures for vision-based gesture recognition.
Figure 5Configuration of the orientation range of the thumb for recognizing each dynamic gesture.
Figure 6Example of the dataset regarding a gesture definition for each labeled class for vision-based gesture recognition.
Figure 7The class distribution of dataset. The X-axis represents class, and the Y-axis represents the number of images for each class.
Figure 8Graph of the relationship between the static gesture commands.
Figure 9Wearable system for gesture recognition.
Figure 10The simulation setup of proposed system.
Test scenario procedure.
| No. | Segment | Operation |
|---|---|---|
| 1 | IMU alignment | Proceed with IMU alignment in neural mode [N] |
| 2 | Arming | Switch to camera control mode [C] |
| Perform | ||
| 3 | Take off | Perform |
| 4 | Straight and level flight | Switch to IMU control mode [I] |
| Mode home position—C point (5 s waiting) with IMU-based gesture recognition | ||
| 5 | Backward and level flight | Move C point—home position with IMU-based gesture recognition |
| 6 | Rhombus flight | Move home position—B point—C point—D point—home sequentially through IMU-based gesture recognition |
| 7 | Target approach | Move home position—building structure through IMU-based gesture recognition |
| 8 | Back home | Switch to camera control mode [C] |
| Perform | ||
| 9 | Stop | Perform |
| 10 | Land | Perform |
| 11 | Disarming | Perform |
Figure 11Implementation of simulation environment using the Gazebo.
Figure 12Confusion matrix for trained YOLOv4 model for test set. (a) Confusion matrix without normalization.; (b) Normalized confusion matrix.
Result of evaluating the utility of each function for each dynamic gesture.
| Function | Accuracy | |
|---|---|---|
| IMU-based gesture command | Move forward | 97.78% |
| Move backward | 97.78% | |
| Move left | 98.89% | |
| Move right | 100% | |
| Turn right | 91.11% | |
| Turn left | 92.22% | |
| Ascend | 96.67% | |
| Descend | 100% |
Result of comparison between the proposed system and the vision-based system for dynamic gesture recognition.
| Authors | Interacted System | Deep-Learning Algorithm | Number of Dynamic Gestures | Processing Speed of Dynamic Gesture Recognition (ms) |
|---|---|---|---|---|
| Chen, B. [ | UAV | Yes | 6 | 45 |
| Kasab, Mohamed A. [ | UAV | Yes | 10 | 42.7786 |
| Liu, C. [ | UAV | Yes | 2 | 20 |
|
|
|
|
|
Lap time comparison of joystick-based systems and proposed systems of 10 applicants according to a given scenario.
| Joystick-Based Control | Proposed Method | |
|---|---|---|
| Participant 1 | 02:34 | 02:29 |
| Participant 2 | 03:01 | 02:34 |
| Participant 3 | 02:11 | 01:59 |
| Participant 4 | 03:04 | 02:13 |
| Participant 5 | 02:14 | 01:31 |
| Participant 6 | 02:07 | 02:19 |
| Participant 7 | 02:46 | 02:38 |
| Participant 8 | 02:54 | 02:38 |
| Participant 9 | 02:25 | 02:02 |
| Participant 10 | 02:34 | 02:16 |
| Average | 02:35 | 02:16 |