| Literature DB >> 34198830 |
Seunghyun Oh1, Chanhee Bae1, Jaechan Cho2, Seongjoo Lee3, Yunho Jung1,2.
Abstract
Recently, as technology has advanced, the use of in-vehicle infotainment systems has increased, providing many functions. However, if the driver's attention is diverted to control these systems, it can cause a fatal accident, and thus human-vehicle interaction is becoming more important. Therefore, in this paper, we propose a human-vehicle interaction system to reduce driver distraction during driving. We used voice and continuous-wave radar sensors that require low complexity for application to vehicle environments as resource-constrained platforms. The proposed system applies sensor fusion techniques to improve the limit of single-sensor monitoring. In addition, we used a binarized convolutional neural network algorithm, which significantly reduces the computational workload of the convolutional neural network in command classification. As a result of performance evaluation in noisy and cluttered environments, the proposed system showed a recognition accuracy of 96.4%, an improvement of 7.6% compared to a single voice sensor-based system, and 9.0% compared to a single radar sensor-based system.Entities:
Keywords: binarized convolutional neural network; gesture recognition; human vehicle interaction; sensor fusion; voice recognition
Mesh:
Year: 2021 PMID: 34198830 PMCID: PMC8201086 DOI: 10.3390/s21113906
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1Overview of the proposed system.
Voice sensor parameters.
| Parameter | Value |
|---|---|
| Frequency response | 45 Hz–20 kHz |
| Polar pattern | Omnidirectional |
| Signal-to-noise ratio | 65 dB |
| Maximum sound pressure level | 124 dB |
Radar sensor parameters.
| Parameter | Value |
|---|---|
| Center frequency | 24 GHz |
| Output power | 6 dBm |
| Antenna gain | 10 dBi |
| Maximum distance | 15 m |
| Horizontal field of view | 29 |
| Vertical field of view | 80 |
Figure 2Voice signal processing flow.
Figure 3Voice spectrogram: (a) right; (b) left; (c) yes; (d) no; (e) stop; (f) pull; (g) once; (h) twice; (i) unknown.
Figure 4Hand gesture examples: (a) Hand swipe from right side to left side (symbol for right); (b) Hand swipe from right side to left side (symbol for left); (c) Hand swipe shape O (symbol for yes); (d) Hand swipe shape X (symbol for no); (e) Hand push from body to radar (symbol for stop); (f) Hand pull from radar to body (symbol for pull); (g) Hand push and pull performed once (symbol for once); (h) Hand push and pull performed twice (symbol for twice).
Figure 5Gesture spectrogram: (a) right; (b) left; (c) yes; (d) no; (e) stop; (f) pull; (g) once; (h) twice; (i) unknown.
Figure 6Experiment setup for voice and hand gesture recognition in vehicle.
Accuracy according to network architecture.
| Convolution Layer | Fully Connected Layer | |||
|---|---|---|---|---|
| 1 | 2 | 3 | 4 | |
|
| 84.5 ± 4.5% | 89.5 ± 4.5% | 91 ± 3% | 90.5 ± 3.5% |
|
| 91 ± 3% | 91 ± 3% | 92.5 ± 2.5% | 95 ± 1% |
|
| 92 ± 2% | 92.5 ± 2.5% | 95 ± 1.5% | 94 ± 1% |
|
| 91 ± 3% | 92.5 ± 2.5% | 93 ± 3% | 94 ± 1% |
Number of parameters and inference computation times.
| 2CLs + 4FCLs | 3CLs + 3FCLs | |
|---|---|---|
| Number of parameters | 140,256 | 56,320 |
| Computation time | 0.581 ms | 0.622 ms |
Figure 7Architecture of binarized convolutional neural network.
Figure 8Learning curve of the 2-channel binarized convolutional neural network.
Figure 9Confusion matrix of HVI system: (a) voice only; (b) radar only; (c) voice and radar fusion.
Classification performance of each fused command.
| Right | Left | Yes | No | Stop | Pull | Once | Twice | Unknown | |
|---|---|---|---|---|---|---|---|---|---|
| Precision | 0.96 | 0.94 | 0.99 | 0.98 | 0.98 | 0.96 | 0.99 | 0.96 | 0.92 |
| Recall | 0.94 | 0.95 | 0.99 | 0.95 | 0.95 | 0.97 | 0.94 | 0.99 | 0.98 |
| F1 score | 0.95 | 0.94 | 0.99 | 0.96 | 0.97 | 0.96 | 0.97 | 0.98 | 0.95 |
Figure 10Accuracy of the HVI systems for each scenario.
Accuracies of HVI systems for each fold.
| Validation Sets | Voice Only | Gesture Only | Fusion |
|---|---|---|---|
| Driver 1 | 87.6% | 84.4% | 95.2% |
| Driver 2 | 88.3% | 89.7% | 97.2% |
| Driver 3 | 85.2% | 83.5% | 94.0% |
| Driver 4 | 86.8% | 78.8% | 90.9% |
| Driver 5 | 84.9% | 88.6% | 96.8% |