| Literature DB >> 30845772 |
DuYeong Heo1, Jae Yeal Nam2, Byoung Chul Ko3.
Abstract
Semi-supervised learning is known to achieve better generalisation than a model learned solely from labelled data. Therefore, we propose a new method for estimating a pedestrian pose orientation using a soft-target method, which is a type of semi-supervised learning method. Because a convolutional neural network (CNN) based pose orientation estimation requires large numbers of parameters and operations, we apply the teacher⁻student algorithm to generate a compressed student model with high accuracy and compactness resembling that of the teacher model by combining a deep network with a random forest. After the teacher model is generated using hard target data, the softened outputs (soft-target data) of the teacher model are used for training the student model. Moreover, the orientation of the pedestrian has specific shape patterns, and a wavelet transform is applied to the input image as a pre-processing step owing to its good spatial frequency localisation property and the ability to preserve both the spatial information and gradient information of an image. For a benchmark dataset considering real driving situations based on a single camera, we used the TUD and KITTI datasets. We applied the proposed algorithm to various driving images in the datasets, and the results indicate that its classification performance with regard to the pose orientation is better than that of other state-of-the-art methods based on a CNN. In addition, the computational speed of the proposed student model is faster than that of other deep CNNs owing to the shorter model structure with a smaller number of parameters.Entities:
Keywords: model compression; pedestrian orientation; soft-target training; teacher–student algorithm; wavelet transform
Year: 2019 PMID: 30845772 PMCID: PMC6427411 DOI: 10.3390/s19051147
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1Teacher–student learning framework using hard- and soft-target data (in order of learning process): (a) dataset A labelled with a hard target input into the (b) teacher deep network and used for training teacher deep network, (c) teacher random forest (RF) is trained using feature maps of teacher deep network, (d) after finishing training of two-teacher model, unlabelled training dataset B input into the trained two-teacher model, (e) the soft output of the two teachers combined into one soft target vector and (f) a soft target dataset of the teacher network are used for training the target student network, (g) training of the student model composed of student network and student RF, and (h) the final class probability is generated by combining output probability of two-student model.
Figure 2Eight orientations recognized by the proposed system and corresponding examples of pedestrian poses.
Figure 3Confusion matrix based on accuracy (ACC) of proposed method (%).
Performance comparison results on eight methods (We referred to the results of the first three methods from the experimental evaluations in [19]).
| Methods | AP (%) | AR (%) | AFPR (%) |
|---|---|---|---|
| MoAWG [ | 67.4 | 65.4 | 4.9 |
| PLS-RF [ | 66.3 | 62.3 | 5.0 |
| MACF [ | 41.1 | 40.0 | 8.5 |
| VGG-16 [ | 67.6 | 68.6 | 4.3 |
| ResNet101 [ | 73.5 | 74.6 | 3.9 |
| Proposed T-Model without handcraft filters | 75.4 | 76.8 | 3.1 |
| Proposed T-Model | 85.6 | 84.6 | 2.0 |
| Proposed S-Model |
|
|
|
Figure 4Five possible pairs of experiment results for determining the number of trees for the student RF.
Comparison between the numbers of parameters and operations for the proposed method and two state-of-the-art compression models using the TUD dataset.
| Methods | Accuracy (%) | No. of Parameters (M) | No. of Operations (M) |
|---|---|---|---|
| MobileNet V2 [ | 60.73 | 2.2 | 430 |
| SqueezeNet V1.1 [ | 53.06 | 0.72 | 283 |
| Teacher model | 85.08 | 47.2 | 7564 |
| Proposed student model |
|
|
|
Comparison of the average orientation similarity (AOS) for the proposed method and a deep neural network-based approach using the KITTI dataset. (We referred to the results of four comparison methods from the experimental evaluations in [44]).
| Methods | Average Orientation Similarity (AOS, %) | ||
|---|---|---|---|
|
|
|
| |
| DPM-VOC+VP [ | 53.66 | 39.83 | 35.73 |
| Mono3D [ | 68.58 | 58.12 | 54.94 |
| SubCNN [ | 78.33 | 66.28 | 61.37 |
| FRCNN [ | 66.84 | 52.62 | 48.72 |
| Proposed |
|
|
|
Figure 5Sample orientation classification results for TUD and KITTI datasets using the proposed method: (a) estimation of pedestrian’s body orientation using TUD dataset, (b) results of pose orientation estimation (POE) using KITTI.