| Literature DB >> 35490175 |
Wenjing Yang1, Liejun Wang2, Shuli Cheng3.
Abstract
Deep hashing method is widely applied in the field of image retrieval because of its advantages of low storage consumption and fast retrieval speed. There is a defect of insufficiency feature extraction when existing deep hashing method uses the convolutional neural network (CNN) to extract images semantic features. Some studies propose to add channel-based or spatial-based attention modules. However, embedding these modules into the network can increase the complexity of model and lead to over fitting in the training process. In this study, a novel deep parameter-free attention hashing (DPFAH) is proposed to solve these problems, that designs a parameter-free attention (PFA) module in ResNet18 network. PFA is a lightweight module that defines an energy function to measure the importance of each neuron and infers 3-D attention weights for feature map in a layer. A fast closed-form solution for this energy function proves that the PFA module does not add any parameters to the network. Otherwise, this paper designs a novel hashing framework that includes the hash codes learning branch and the classification branch to explore more label information. The like-binary codes are constrained by a regulation term to reduce the quantization error in the continuous relaxation. Experiments on CIFAR-10, NUS-WIDE and Imagenet-100 show that DPFAH method achieves better performance.Entities:
Mesh:
Year: 2022 PMID: 35490175 PMCID: PMC9056524 DOI: 10.1038/s41598-022-11217-5
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Figure 1The overall framework of deep parameter-free attention hashing module, which is composed of three parts: (1) pairs of images are fed into Convolution and Maxpool layer to obtain feature map; (2) features map is fed into the PFA module, and the result obtained perform the element-wise sum operation with feature map; (3) a hashing layer is designed to generate hash codes, and three loss functions are used to optimize the network. (Created by ‘Microsoft Office Visio 2013’ url: https://www.microsoft.com/zh-cn/microsoft-365/previous-versions/microsoft-visio-2013).
Figure 2The detail of PFA module. The mean value of input feature maps and the channel variance are computed to judge the importance of each channel and spatial, so as to generate 3-D weights. Then 3-D weights are processed by the sigmoid activation function and multiplied by to obtain the output feature maps . (Created by ‘Microsoft Office Visio 2013’ url: https://www.microsoft.com/zh-cn/microsoft-365/previous-versions/microsoft-visio-2013).
Figure 3Visualization of feature activations. (Created by ‘Microsoft Office Visio 2013’ url: https://www.microsoft.com/zh-cn/microsoft-365/previous-versions/microsoft-visio-2013).
Configuration of ResNet18 network.
| Layer | Configuretion |
|---|---|
| Conv1 | |
| Maxpool | |
| Layer1 | |
| Layer2 | |
| Layer3 | |
| Layer4 | |
| Avgpool | |
| Hashing Layer |
Environment configuration.
| Item | Configuration |
|---|---|
| OS | Ubuntu 16.04( |
| GPU | Tesla V100 |
mAP for of different .
| CIFAR-10 (mAP@ALL) | NUS-WIDE (mAP@5000) | |||||||
|---|---|---|---|---|---|---|---|---|
| 16 bit | 32 bit | 48 bit | 64 bit | 16 bit | 32 bit | 48 bit | 64 bit | |
| 0.05 | 0.7929 | 0.8161 | 0.8445 | 0.8264 | 0.8246 | 0.8442 | 0.8506 | 0.8538 |
| 0.1 | 0.8382 | 0.8445 | 0.8522 | 0.8549 | 0.8288 | 0.8490 | 0.8541 | 0.8580 |
| 0.5 | 0.8128 | 0.8123 | 0.8293 | 0.8444 | 0.8104 | 0.8407 | 0.8516 | 0.8534 |
| 1.0 | 0.8077 | 0.8285 | 0.8208 | 0.8363 | 0.8013 | 0.8342 | 0.8406 | 0.8436 |
mAP for of different .
| CIFAR-10 (mAP@ALL) | NUS-WIDE (mAP@5000) | |||||||
|---|---|---|---|---|---|---|---|---|
| 16 bit | 32 bit | 48 bit | 64 bit | 16 bit | 32 bit | 48 bit | 64 bit | |
| 1 | 0.8134 | 0.8331 | 0.8345 | 0.8338 | 0.8238 | 0.8456 | 0.8517 | 0.8550 |
| 5 | 0.8237 | 0.8168 | 0.8239 | 0.8395 | 0.8232 | 0.8475 | 0.8547 | 0.8593 |
| 10 | 0.8382 | 0.8445 | 0.8522 | 0.8549 | 0.8288 | 0.8490 | 0.8541 | 0.8580 |
| 15 | 0.8153 | 0.8173 | 0.8377 | 0.8391 | 0.8160 | 0.8430 | 0.8512 | 0.8541 |
Figure 4(a,b) Represent mAP on different . (Created by “matlab R2019a” url: https://ww2.mathworks.cn/products/matlab.html).
Figure 5(a,b) Represent mAP on different . (Created by “matlab R2019a” url: https://ww2.mathworks.cn/products/matlab.html).
Ablation experiments.
| Modules | DBDH | DFPAH-1 | DFPAH-2 | DFPAH-3 |
|---|---|---|---|---|
| Alexnet | √ | |||
| ResNet18 | √ | √ | √ | |
| PFA Module | √ | √ | ||
| √ | ||||
| mAP(48bit) | 0.7839 | 0.8129 | 0.8424 | 0.8522 |
Figure 6(a,b) Present the PR curves and P@N curves, respectively. (Created by “matlab R2019a” url: https://ww2.mathworks.cn/products/matlab.html).
Figure 7(a,b) Present the t-SNE visualization of hash codes on CIFAR-10. (Created by “python3.6” https://www.python.org/downloads/release/python-3614).
mAP for different bit on three datasets.
| Method | CIFAR-10 (mAP@ALL) | NUS-WIDE (mAP@5000) | Imagenet-100 (mAP@1000) | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 16bit | 32bit | 48bit | 64bit | 16bit | 32bit | 48bit | 64bit | 16bit | 32bit | 48bit | 64bit | |
| DPFAH | 0.8382 | 0.8445 | 0.8522 | 0.8549 | 0.8298 | 0.8490 | 0.8541 | 0.8580 | 0.6420 | 0.7009 | 0.7212 | 0.7795 |
| DBDH | 0.8021 | 0.8113 | 0.8129 | 0.8209 | 0.8084 | 0.8345 | 0.8393 | 0.8492 | 0.3358 | 0.3215 | 0.5626 | 0.6321 |
| DSDH | 0.7761 | 0.7881 | 0.8086 | 0.8183 | 0.8085 | 0.8373 | 0.8265 | 0.8441 | 0.1612 | 0.3011 | 0.3638 | 0.4268 |
| DHN | 0.7695 | 0.7871 | 0.7869 | 0.7966 | 0.8108 | 0.8069 | 0.7854 | 0.7910 | 0.4900 | 0.4808 | 0.4747 | 0.5664 |
| LCDSH | 0.7383 | 0.7661 | 0.8083 | 0.8202 | 0.8071 | 0.8304 | 0.8425 | 0.8436 | 0.2269 | 0.3177 | 0.4517 | 0.4671 |
| Hashnet | 0.6975 | 0.7892 | 0.7878 | 0.7949 | 0.7453 | 0.8004 | 0.8268 | 0.8297 | 0.3017 | 0.4690 | 0.5400 | 0.5719 |
| IDHN | 0.6641 | 0.7296 | 0.7762 | 0.7681 | 0.7820 | 0.7795 | 0.7601 | 0.7366 | 0.2721 | 0.3255 | 0.4477 | 0.5539 |
| DFH | 0.5947 | 0.6347 | 0.7298 | 0.7662 | 0.7893 | 0.8185 | 0.8350 | 0.8372 | 0.1727 | 0.3435 | 0.3445 | 0.3430 |
| DSH | 0.5095 | 0.4663 | 0.4702 | 0.4714 | 0.6680 | 0.7383 | 0.7563 | 0.7940 | 0.3109 | 0.3848 | 0.4294 | 0.4403 |
Figure 8(a–d) The PR curves on CIFAR-10 of all methods with different bits. (Created by “matlab R2019a” url: https://ww2.mathworks.cn/products/matlab.html).
Figure 9(a–d) Represent the PR curves on NUS-WIDE of all methods with different bits. (Created by “matlab R2019a” url: https://ww2.mathworks.cn/products/matlab.html).
Figure 10(a–d) Represent the PR curves on Imagenet-100 of all methods with different bits. (Created by “matlab R2019a” url: https://ww2.mathworks.cn/products/matlab.html).
Figure 11(a–c) Present P@H = 2 on three datasets. (Created by “matlab R2019a” url: https://ww2.mathworks.cn/products/matlab.html).
Figure 12(a–d) Represent the P@N curves on CIFAR-10 of all methods with different bit. (Created by “matlab R2019a” url: https://ww2.mathworks.cn/products/matlab.html).
Figure 13(a–d) Represent the P@N curves on NUS-WIDE of all methods with different bits. (Created by “matlab R2019a” url: https://ww2.mathworks.cn/products/matlab.html).
Figure 14(a–d) Represent the P@N curves on Imagenet-100 of all methods with different bits. (Created by “matlab R2019a” url: https://ww2.mathworks.cn/products/matlab.html).
Figure 15Top 10 retrieved results from Imagenet-100 dataset by DPFAH with 64bit hash codes. (Created by ‘Microsoft Office Visio 2013’ url: https://www.microsoft.com/zh-cn/microsoft-365/previous-versions/microsoft-visio-2013).