| Literature DB >> 26259218 |
Ran He, Man Zhang, Liang Wang, Ye Ji, Qiyue Yin.
Abstract
In multimedia applications, the text and image components in a web document form a pairwise constraint that potentially indicates the same semantic concept. This paper studies cross-modal learning via the pairwise constraint and aims to find the common structure hidden in different modalities. We first propose a compound regularization framework to address the pairwise constraint, which can be used as a general platform for developing cross-modal algorithms. For unsupervised learning, we propose a multi-modal subspace clustering method to learn a common structure for different modalities. For supervised learning, to reduce the semantic gap and the outliers in pairwise constraints, we propose a cross-modal matching method based on compound ℓ21 regularization. Extensive experiments demonstrate the benefits of joint text and image modeling with semantically induced pairwise constraints, and they show that the proposed cross-modal methods can further reduce the semantic gap between different modalities and improve the clustering/matching accuracy.Year: 2015 PMID: 26259218 DOI: 10.1109/TIP.2015.2466106
Source DB: PubMed Journal: IEEE Trans Image Process ISSN: 1057-7149 Impact factor: 10.856