| Literature DB >> 33287034 |
Shuli Cheng1, Liejun Wang1,2, Anyu Du1.
Abstract
Deep hashing is the mainstream algorithm for large-scale cross-modal retrieval due to its high retrieval speed and low storage capacity, but the problem of reconstruction of modal semantic information is still very challenging. In order to further solve the problem of unsupervised cross-modal retrieval semantic reconstruction, we propose a novel deep semantic-preserving reconstruction hashing (DSPRH). The algorithm combines spatial and channel semantic information, and mines modal semantic information based on adaptive self-encoding and joint semantic reconstruction loss. The main contributions are as follows: (1) We introduce a new spatial pooling network module based on tensor regular-polymorphic decomposition theory to generate rank-1 tensor to capture high-order context semantics, which can assist the backbone network to capture important contextual modal semantic information. (2) Based on optimization perspective, we use global covariance pooling to capture channel semantic information and accelerate network convergence. In feature reconstruction layer, we use two bottlenecks auto-encoding to achieve visual-text modal interaction. (3) In metric learning, we design a new loss function to optimize model parameters, which can preserve the correlation between image modalities and text modalities. The DSPRH algorithm is tested on MIRFlickr-25K and NUS-WIDE. The experimental results show that DSPRH has achieved better performance on retrieval tasks.Entities:
Keywords: cross-modal retrieval; global covariance pooling; semantic-preserving; spatial pooling
Year: 2020 PMID: 33287034 DOI: 10.3390/e22111266
Source DB: PubMed Journal: Entropy (Basel) ISSN: 1099-4300 Impact factor: 2.524