| Literature DB >> 36118568 |
Haoyi Zhao1,2, Bo Tao1,3, Licheng Huang4,5, Baojia Chen6.
Abstract
We propose a deep learning-based vehicle pose estimation method based on a monocular camera called FPN PoseEstimateNet. The FPN PoseEstimateNet consists of a feature extractor and a pose calculate network. The feature extractor is based on Siamese network and a feature pyramid network (FPN) is adopted to deal with feature scales. Through the feature extractor, a correlation matrix between the input images is obtained for feature matching. With the time interval as the label, the feature extractor can be trained independently of the pose calculate network. On the basis of the correlation matrix and the standard matrix, the vehicle pose changes can be predicted by the pose calculate network. Results show that the network runs at a speed of 6 FPS, and the parameter size is 101.6 M. In different sequences, the angle error is within 8.26° and the maximum translation error is within 31.55 m.Entities:
Keywords: contrast learning; correlation matrix; feature pyramid network; pose estimation; siamese network
Year: 2022 PMID: 36118568 PMCID: PMC9478513 DOI: 10.3389/fbioe.2022.948726
Source DB: PubMed Journal: Front Bioeng Biotechnol ISSN: 2296-4185
FIGURE 1Overview of FPN PoseEstimateNet.
FIGURE 2Overview of feature extraction.
Architecture of feature extractor of the FPN PoseEstimateNet.
| Feature extraction |
|---|
| Input [128,380,3] |
| Conv [7,7,64,] ReLU stride 2 BN |
| Conv [55,128] ReLU stride 1 BN |
| Conv [55,256] ReLU stride 2 BN |
| Conv [33,512] ReLU stride 2 BN |
| Conv [33,512] ReLU stride 1 BN |
| Conv [33,512] ReLU stride 2 BN |
| Conv [33,512] ReLU stride 1 BN |
| ZerosPadding [2,1] |
| Conv [3,31024] ReLU stride 2 BN |
| ZerosPadding [1,1] |
| Conv [33,512] ReLU stride 1 BN |
| ZerosPadding [1,1] |
| Conv [33,512] ReLU stride 2 BN |
| ZerosPadding [1,1] |
| Conv [33,512] ReLU stride 1 BN |
| ZerosPadding |
| Conv [11,256] ReLU stride 1 |
| Concatenate |
| MaxPool [2,2] |
FIGURE 3Feature pyramid network for multi scale features fusion.
FIGURE 4Correlation matrix φ.
FIGURE 5Standard matrix ζ.
FIGURE 6Contrast loss function.
FIGURE 7Pose calculate network. Conv for convolution module.
Pose calculate network of FPN PoseEstimateNet.
| Pose calculate network |
|---|
| Input [32,32,1] |
| Conv [3,3,512] ReLU stride 2 |
| Conv [3,3,256] ReLU stride 2 |
| ChannelAttention1 [1,1,32] ReLU |
| ChannelAttention2 [1,1,256] Sigmoid |
| SpatialAttention1 [33 128] ReLU stride2 |
| SpatialAttention2 [3,3,1] Sigmoid |
| Conv [33,128] ReLU |
| Conv [1,1,3] |
FIGURE 8Channel attention module.
FIGURE 9Spatial attention.
FIGURE 10Spatial attention in translation and rotation movement.
FIGURE 11Correlation matrix φ in translation (left) and rotation (right).
FIGURE 12Trend of distance over time interval.
FIGURE 13FPN PoseEstimateNet train and valid curves.
Prediction accuracy of different series.
| Serials | FPN PoseEstimateNet | FlowNet | ||
|---|---|---|---|---|
| ATE (m) | ARE (°) | ATE(m) | ARE (°) | |
| 00 | 9.41 | 4.87 | 5.43 | 4.62 |
| 03 | 12.07 | 1.38 | 18.05 | 8.82 |
| 05 | 8.96 | 4.26 | 7.92 | 3.54 |
| 07 | 21.55 | 5.92 | 23.61 | 4.11 |
Inference speed and parameters for different networks.
| FPN PoseEstimateNet | FlowNet | |
|---|---|---|
| Inference speed (FPS) | 6 | 2 |
| Parameter size (M) | 101.6 | 581 |
FIGURE 14Results of estimated pose. (A) Estimated poses of 00 sequences. (B) Estimated poses of 03 sequences. (C) Estimated poses of 05 sequences. (D) Estimated poses of 07 sequences.