| Literature DB >> 35455106 |
Chuandong Wang1, Chi Zhang1, Yujian Feng1, Yimu Ji1, Jianyu Ding1.
Abstract
Visible thermal person re-identification (VT Re-ID) is the task of matching pedestrian images collected by thermal and visible light cameras. The two main challenges presented by VT Re-ID are the intra-class variation between pedestrian images and the cross-modality difference between visible and thermal images. Existing works have principally focused on local representation through cross-modality feature distribution, but ignore the internal connection of the local features of pedestrian body parts. Therefore, this paper proposes a dual-path attention network model to establish the spatial dependency relationship between the local features of the pedestrian feature map and to effectively enhance the feature extraction. Meanwhile, we propose cross-modality dual-constraint loss, which adds the center and boundary constraints for each class distribution in the embedding space to promote compactness within the class and enhance the separability between classes. Our experimental results show that our proposed approach has advantages over the state-of-the-art methods on the two public datasets SYSU-MM01 and RegDB. The result for the SYSU-MM01 is Rank-1/mAP 57.74%/54.35%, and the result for the RegDB is Rank-1/mAP 76.07%/69.43%.Entities:
Keywords: VT Re-ID; cross-modality dual-constraint loss; dual-path attention; local feature
Year: 2022 PMID: 35455106 PMCID: PMC9030151 DOI: 10.3390/e24040443
Source DB: PubMed Journal: Entropy (Basel) ISSN: 1099-4300 Impact factor: 2.738
Figure 1The pipeline of our proposed method for cross-modality person Re-ID contains two components: a dual-path attention network and cross-modality dual-constraint loss. The network consists of two stages. In the first stage, the modality-specific feature extraction is used to learn specific features of the visible and thermal modalities. In the second stage, the modality-shared feature extraction is used to learn the common features between the two modalities. The experiment has two constraints: (1) cross-modality dual-constraint (CMDC) loss; (2) identity (ID) loss.
Figure 2The details of the cross-modality attention module.
Figure 3Example pictures from SYSU-MM01 and RegDB. The pictures in the first component are from the SYSU-MM01, in which the first row is the pictures captured by a visible camera and the second row is the images captured by a thermal camera. The same is true for RegDB. Each column includes pictures of the same person.
Comparison with the state-of-the-art method on the SYSU-MM01 dataset. Re-identification rates at Rank-r and mAP.
| Method | Source | All Research | Indoor Research | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Rank-1 | Rank-10 | Rank-20 | mAP | Rank-1 | Rank-10 | Rank-20 | mAP | ||
| HCML | AAAI18 | 14.32 | 53.16 | 69.17 | 16.16 | 24.52 | 73.25 | 86.73 | 30.08 |
| BDTR | IJCAI18 | 17.00 | 55.40 | 69.20 | 16.20 | - | - | - | - |
| DRL | CVPR19 | 28.90 | 70.60 | 82.40 | 29.20 | 31.60 | 77.20 | 89.20 | 44.20 |
| MAC | MM19 | 33.26 | 79.04 | 90.09 | 36.22 | 36.43 | 63.36 | 71.63 | 37.03 |
| AlignGAN | ICCV19 | 42.40 | 85.00 | 93.70 | 40.70 | 45.90 | 87.60 | 94.40 | 54.30 |
| CMSP | IJCV20 | 43.56 | 86.25 | - | 44.98 | 48.62 | 89.50 | - | 57.50 |
| AGW | Arxiv20 | 47.50 | - | - | 47.65 | 54.17 | - | - | 62.97 |
| DFE | MM19 | 48.71 | 88.86 | 95.27 | 48.59 | 52.25 | 89.86 | 95.85 | 59.68 |
| XIV | AAAI20 | 49.92 | 89.79 | 95.96 | 50.73 | - | - | - | - |
| DDAG | ECCV20 | 54.75 | 90.39 | 95.81 | 53.02 | 61.02 | 94.06 | 98.41 | 67.98 |
| HAT | TIFS20 | 55.29 |
|
| 53.89 | 62.10 | 95.75 | 99.20 | 69.37 |
| NFS | CVPR21 | 56.91 | 91.34 | 96.52 |
|
|
|
|
|
| Our |
| 90.53 | 96.27 | 54.35 | 61.56 | 94.86 | 98.34 | 68.13 | |
Comparison with the state-of-the-art methods on RegDB dataset in visible → thermal.
| Method | Source | Rank-1 | Rank-10 | Rank-20 | mAP |
|---|---|---|---|---|---|
| HCML | AAAI18 | 24.44 | 47.53 | 56.78 | 20.80 |
| BDTR | IJCAI18 | 33.56 | 58.61 | 67.43 | 32.76 |
| DRL | CVPR19 | 43.40 | 66.10 | 76.30 | 44.10 |
| MAC | MM19 | 36.43 | 62.36 | 71.63 | 44.10 |
| AlignGAN | ICCV19 | 57.90 | - | - | 53.60 |
| XIV | AAAI20 | 62.21 | 83.13 | 91.72 | 60.18 |
| CMSP | IJCV20 | 65.07 | 83.71 | - | 64.50 |
| DDAG | ECCV20 | 69.34 | 86.19 | 91.49 | 63.46 |
| AGW | Arxiv20 | 70.05 | - | - | 66.37 |
| DFE | MM19 | 70.13 | 86.32 | 91.96 | 67.56 |
| HAT | TIFS20 | 71.83 | 87.16 | 92.16 | 67.56 |
| Our |
|
|
|
|
Evaluation of each component on SYSU-MM01.
| Setting | Rank-1 | Rank-10 | mAP |
|---|---|---|---|
| Baseline | 51.35 | 85.80 | 49.48 |
| Baseline + DPAN | 55.10 | 87.89 | 51.73 |
| Baseline + CMDC | 54.69 | 87.19 | 51.12 |
| Baseline + DCAN + CMDC |
|
|
|
Figure 4Influence of on SYSU-MM01 dataset. is the balance weight of .
Figure 5Convergence curve of CMDC on SYSU-MM01 and the RegDB dataset. (a) Training convergence curve of CMDC. (b) Validation convergence curve of CMDC.