| Literature DB >> 34306059 |
Huan Liu1, Jiang Xiong1, Nian Zhang2, Fuming Liu1, Xitao Zou1.
Abstract
Recently, benefitting from the storage and retrieval efficiency of hashing and the powerful discriminative feature extraction capability of deep neural networks, deep cross-modal hashing retrieval has drawn more and more attention. To preserve the semantic similarities of cross-modal instances during the hash mapping procedure, most existing deep cross-modal hashing methods usually learn deep hashing networks with a pairwise loss or a triplet loss. However, these methods may not fully explore the similarity relation across modalities. To solve this problem, in this paper, we introduce a quadruplet loss into deep cross-modal hashing and propose a quadruplet-based deep cross-modal hashing (termed QDCMH) method. Extensive experiments on two benchmark cross-modal retrieval datasets show that our proposed method achieves state-of-the-art performance and demonstrate the efficiency of the quadruplet loss in cross-modal hashing.Entities:
Mesh:
Year: 2021 PMID: 34306059 PMCID: PMC8270718 DOI: 10.1155/2021/9968716
Source DB: PubMed Journal: Comput Intell Neurosci
Figure 1(a) Triplet loss-based cross-modal hashing methods suffer from a weak generalization capacity from the training set to the testing set because the test instances belong to the category and cannot be mapped into compact binary codes (see the lower-right corner). (b) Triplet loss-based cross-modal hashing methods can project the test instances, which belong to the category , into compact binary space (see the lower right corner).
Figure 2Flowchart of the proposed quadruplet-based deep cross-modal hashing (QDCMH) method. QDCMH encompasses three steps: (1) a quadruplet-based cross-modal semantic preserving module, (2) a classical convolutional neural network is used to learn image-modality features and the TxtNet in SSAH [15] is adopted to learn the text-modality features, and (3) an intermodal quadruplet loss is utilized to efficiently capture the relevant semantic information during the feature learning process and a quantization loss is used to decrease information loss during the hash codes generation procedure. (a) Quadruplet (V, T, T, T), which utilizes an image instance V to retrieve three text instances: T, T, and T. V and T have at least one common labels, while V and T, V and T, and T and T are three pairwise instances and the two instances in each pairwise have no common label. (b) Quadruplet (V, T, T, T), which utilizes a text instance T to retrieve three image instances: V, V, and V. T and V have at least one common labels, while T and V, T and V, and V and V are three pairwise instances and the two instances in each pairwise have no common label.
Algorithm 1QDCMH: quadruplet-based deep cross-modal hashing.
Brief description of the experimental datasets.
| Dataset | Used | Train | Query | Retrieve | Tag dimension | Labels |
|---|---|---|---|---|---|---|
| MIRFLICKR-25K | 20,015 | 10,000 | 2,000 | 18,015 | 1,386 | 24 |
| MS-COCO2014 | 122,218 | 10,000 | 5,000 | 117,218 | 2,026 | 80 |
Figure 3A sensitivity analysis of the hyperparameters. (a) Hyperparameter β on MIRFLICKR-25K dataset. (b) Hyperparameter γ on MIRFLICKR-25K dataset.
Comparison to baselines in terms of MAP on two datasets: MIRFLICKR-25K, and Microsoft COCO2014, respectively. The best accuracy is shown in boldface.
| Task | Methods | MIRFlickr-25K | MS-COCO | |||||
|---|---|---|---|---|---|---|---|---|
| 16bits | 32bits | 64bits | 16bits | 32bits | 64bits | |||
| I⟶T | Handcrafted methods | CMSSH [ | 0.5600 | 0.5709 | 0.5836 | 0.5439 | 0.5450 | 0.5410 |
| SePH [ | 0.6740 | 0.6813 | 0.6803 | 0.4295 | 0.4353 | 0.4726 | ||
| SCM [ | 0.6354 | 0.6407 | 0.6556 | 0.4252 | 0.4344 | 0.4574 | ||
| GSPH [ | 0.6068 | 0.6191 | 0.6230 | 0.4427 | 0.4733 | 0.4840 | ||
| Deep methods | DCMH [ | 0.7316 | 0.7343 | 0.7446 | 0.5228 | 0.5438 | 0.5419 | |
| PRDH [ | 0.6952 | 0.7072 | 0.7108 | 0.5238 |
|
| ||
| SSAH [ |
|
|
| 0.5127 | 0.5256 | 0.5067 | ||
| TDH [ | 0.7423 | 0.7478 | 0.7512 | 0.5164 | 0.5222 | 0.5276 | ||
| DSePH [ | 0.7128 | 0.7285 | 0.7422 | 0.4621 | 0.4958 | 0.5112 | ||
| QDCMH | 0.7635 | 0.7688 | 0.7713 |
| 0.5313 | 0.5371 | ||
|
| ||||||||
| T⟶I | Handcrafted methods | CMSSH [ | 0.5726 | 0.5776 | 0.5753 | 0.3793 | 0.3876 | 0.3899 |
| SePH [ | 0.7139 | 0.7258 | 0.7294 | 0.4348 | 0.4606 | 0.5195 | ||
| SCM [ | 0.6340 | 0.6458 | 0.6541 | 0.4118 | 0.4183 | 0.4345 | ||
| GSPH [ | 0.6282 | 0.6458 | 0.6503 | 0.5435 | 0.6039 | 0.6461 | ||
| Deep methods | DCMH [ | 0.7607 | 0.7737 | 0.7805 | 0.4883 | 0.4942 | 0.5145 | |
| PRDH [ | 0.7626 | 0.7718 | 0.7755 | 0.5122 | 0.5190 | 0.5404 | ||
| SSAH [ |
|
|
| 0.4832 | 0.4831 | 0.4922 | ||
| TDH [ | 0.7516 | 0.7577 | 0.7634 | 0.5198 | 0.5332 | 0.5399 | ||
| DSePH [ | 0.7422 | 0.7578 | 0.7760 | 0.4616 | 0.4882 | 0.5305 | ||
| QDCMH | 0.7762 | 0.7725 | 0.7859 |
|
|
| ||
Figure 4Precision-recall curves on datasets MIRFLICKR-25K and Microsoft COCO2014.
Figure 5Top N-precision curves on datasets MIRFLICKR-25K and Microsoft COCO2014.