| Literature DB >> 34141884 |
Shubai Chen1, Song Wu1, Li Wang2.
Abstract
Due to the high efficiency of hashing technology and the high abstraction of deep networks, deep hashing has achieved appealing effectiveness and efficiency for large-scale cross-modal retrieval. However, how to efficiently measure the similarity of fine-grained multi-labels for multi-modal data and thoroughly explore the intermediate layers specific information of networks are still two challenges for high-performance cross-modal hashing retrieval. Thus, in this paper, we propose a novel Hierarchical Semantic Interaction-based Deep Hashing Network (HSIDHN) for large-scale cross-modal retrieval. In the proposed HSIDHN, the multi-scale and fusion operations are first applied to each layer of the network. A Bidirectional Bi-linear Interaction (BBI) policy is then designed to achieve the hierarchical semantic interaction among different layers, such that the capability of hash representations can be enhanced. Moreover, a dual-similarity measurement ("hard" similarity and "soft" similarity) is designed to calculate the semantic similarity of different modality data, aiming to better preserve the semantic correlation of multi-labels. Extensive experiment results on two large-scale public datasets have shown that the performance of our HSIDHN is competitive to state-of-the-art deep cross-modal hashing methods.Entities:
Keywords: Bidirectional Bi-linear Interaction; Cross-Modal Hashing; Deep Neural Network; Dual-Similarity Measurement
Year: 2021 PMID: 34141884 PMCID: PMC8176532 DOI: 10.7717/peerj-cs.552
Source DB: PubMed Journal: PeerJ Comput Sci ISSN: 2376-5992
Figure 1The architecture of our proposed HSIDHN which consists of two parts.
One component is the backbone network used to extract hash representations. The other one is the Bidirectional Bi-linear Interaction (BBI) module used to capture the hierarchical semantic correlation of each modality data from different levels.
Figure 2The generation of multi-scale and multi-level hash representations.
Details of datasets division.
| Dataset name | Total number | Training set/test set |
|---|---|---|
| MIRFLlickr-25K | 20,015 | 10,000/2,000 |
| NUS-WIDE | 190,421 | 10,500/2,100 |
Mean Average Percision (MAP) comparison results for MIRFlickr-25K.
| MIRFLICKR-25K | ||||||
|---|---|---|---|---|---|---|
| Method | Image-query-text | Text-query-image | ||||
| 16 bits | 32 bits | 64 bits | 16 bits | 32 bits | 64 bits | |
| SCM | 0.6354 | 0.5618 | 0.5634 | 0.6340 | 0.6458 | 0.6541 |
| SePH | 0.6740 | 0.6813 | 0.6830 | 0.7139 | 0.7258 | 0.7294 |
| DCMH | 0.7316 | 0.7343 | 0.7446 | 0.7607 | 0.7737 | 0.7805 |
| CHN | 0.7504 | 0.7495 | 0.7461 | 0.7776 | 0.7775 | 0.7798 |
| PRDH | 0.6952 | 0.7072 | 0.7108 | 0.7626 | 0.7718 | 0.7755 |
| SSAH | 0.7745 | 0.7882 | 0.7990 | 0.7860 | 0.7974 | 0.7910 |
| CMHH | 0.7334 | 0.7281 | 0.7444 | 0.7320 | 0.7183 | 0.7279 |
| HSIDHN | 0.7978 | 0.8097 | 0.8179 | 0.7802 | 0.7946 | 0.8115 |
Mean Average Percision (MAP) comparison results for NUS-WIDE.
| NUS-WIDE | ||||||
|---|---|---|---|---|---|---|
| Method | Image-query-text | Text-query-image | ||||
| 16 bits | 32 bits | 64 bits | 16 bits | 32 bits | 64 bits | |
| SCM | 0.3121 | 0.3111 | 0.3121 | 0.4261 | 0.4372 | 0.4478 |
| SePH | 0.4797 | 0.4859 | 0.4906 | 0.6072 | 0.6280 | 0.6291 |
| DCMH | 0.5445 | 0.5597 | 0.5803 | 0.5793 | 0.5922 | 0.6014 |
| CHN | 0.5754 | 0.5966 | 0.6015 | 0.5816 | 0.5967 | 0.5992 |
| PRDH | 0.5919 | 0.6059 | 0.6116 | 0.6155 | 0.6286 | 0.6349 |
| SSAH | 0.6163 | 0.6278 | 0.6140 | 0.6204 | 0.6251 | 0.6215 |
| CMHH | 0.5530 | 0.5698 | 0.5924 | 0.5739 | 0.5786 | 0.5889 |
| HSIDHN | 0.6498 | 0.6787 | 0.6834 | 0.6396 | 0.6529 | 0.6792 |
Figure 3Performance on MIRFlickr-25K evaluated by PR curves.
Figure 4Performance on NUS-WIDE evaluated by PR curves.
Ablation study results.
| Method | MIRFLICKR-25K | NUS-WIDE | ||
|---|---|---|---|---|
| Image-query-text | Text-query-image | Image-query-text | Text-query-image | |
| HSIDHN-SIM | 0.8140 | 0.8097 | 0.6432 | 0.6401 |
| HSIDHN-BBI | 0.8034 | 0.8004 | 0.6316 | 0.6275 |
| HSIDHN | 0.8179 | 0.8115 | 0.6834 | 0.6792 |