| Literature DB >> 25119982 |
Yan Song1, Ian Vince McLoughlin1, Li-Rong Dai1.
Abstract
This paper mainly focuses on how to effectively and efficiently measure visual similarity for local feature based representation. Among existing methods, metrics based on Bag of Visual Word (BoV) techniques are efficient and conceptually simple, at the expense of effectiveness. By contrast, kernel based metrics are more effective, but at the cost of greater computational complexity and increased storage requirements. We show that a unified visual matching framework can be developed to encompass both BoV and kernel based metrics, in which local kernel plays an important role between feature pairs or between features and their reconstruction. Generally, local kernels are defined using Euclidean distance or its derivatives, based either explicitly or implicitly on an assumption of Gaussian noise. However, local features such as SIFT and HoG often follow a heavy-tailed distribution which tends to undermine the motivation behind Euclidean metrics. Motivated by recent advances in feature coding techniques, a novel efficient local coding based matching kernel (LCMK) method is proposed. This exploits the manifold structures in Hilbert space derived from local kernels. The proposed method combines advantages of both BoV and kernel based metrics, and achieves a linear computational complexity. This enables efficient and scalable visual matching to be performed on large scale image sets. To evaluate the effectiveness of the proposed LCMK method, we conduct extensive experiments with widely used benchmark datasets, including 15-Scenes, Caltech101/256, PASCAL VOC 2007 and 2011 datasets. Experimental results confirm the effectiveness of the relatively efficient LCMK method.Entities:
Mesh:
Year: 2014 PMID: 25119982 PMCID: PMC4132086 DOI: 10.1371/journal.pone.0103575
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Some examples from Caltech101/256 and PASCAL VOC 2007/2011 datasets.
a) Caltech101. b) Caltech-256. c) PASCAL VOC 2007/2011.
Figure 2Mean accuracy of each category on 15-Scenes dataset.
Figure 3Image representation using local coding based spectral embedding.
Image classification results using 15-Scenes dataset in terms of mAP and stdv(%).
| Method | Result |
| BoV | 81.40(0.39) |
| Sparse Coding | 80.3(0.5) |
| Macrofeature | 85.6(0.2) |
| LLC | 81.6(-) |
| LLC+ | 84.21 |
| LCMK |
|
Image classification results using Caltech101 dataset in terms of mAP and stdv(%).
| Method | Result |
| BoV | 76.95(0.39) |
| NBNN | 73.0(-) |
| Sparse Coding | 73.2(0.5) |
| LLC | 73.4(-) |
| O2P | 79.3(0.5) |
| Fisher Vector | 77.8(0.6) |
| HMP | 76.8(-) |
| LCMK |
|
Image classification results using Caltech256 dataset in terms of mAP and stdv(%).
| method/Training Images | 15 | 30 | 45 | 60 |
| BoV | 28.30 | 34.10 | - | - |
| NBNN | - | 42.7(-) | - | - |
| Sparse Coding | 27.73 | 34.02 | 37.46 | 40.14 |
| LScSPM | 30.0 | 35.74 | 38.54 | 40.43 |
| Super Vector | 36.72 | 43.77 | 47.24 | 50.98 |
| LLC | 34.36 | 41.19 | 45.31 | 47.68 |
| LLC+ | 35.2(-) | 42.8(-) | 47.5(-) | 51.2(-) |
| O2P | - | 42.6(0.4) | - | - |
| Fisher Vector | 38.5(0.2) | 47.4(0.1) | 52.1(0.4) | 54.8(0.4) |
| HMP | 40.5(0.4) | 48.0(0.2) | 51.9(0.2) | 55.2(0.3) |
| LCMK |
|
|
|
|
Image classification results using PASCAL-VOC07 dataset in terms of mAP(%).
| method/object class | aero | bicyc | bird | boat | bott | bus | car | cat | chair | cow |
| Winner of VOC07 | 77.5 | 63.6 | 56.1 | 71.9 | 33.1 | 60.6 | 78.0 | 58.8 | 53.5 | 42.6 |
| LLC+ | 78.0 | 70.9 | 55.3 | 72.1 | 31.5 | 69.2 | 80.9 | 62.8 | 55.3 | 51.4 |
| LLC | 74.1 | 64.9 | 51.5 | 68.3 | 27.2 | 62.9 | 78.4 | 61.4 | 54.4 | 47.2 |
| Fisher Vector | 79.0 | 67.4 | 51.9 | 70.9 | 30.8 | 72.2 | 79.9 | 61.4 | 56.0 | 49.6 |
| Super Vector | 79.4 | 72.5 | 55.6 | 73.8 | 34.0 | 72.4 | 83.4 | 63.6 | 56.6 | 52.8 |
| LCMK | 76.9 | 66.1 | 55.6 | 70.2 | 40.6 | 66.3 | 77.2 | 62.2 | 77.2 | 48.7 |
Image classification results using PASCAL-VOC2011 dataset in terms of mAP(%).
| codebook size/object class | aero | bicyc | bird | boat | bott | bus | car | cat | chair | cow |
| 4096 | 75.9 | 46.4 | 40.6 | 50.0 | 18.7 | 74.1 | 52.2 | 55.1 | 45.8 | 21.9 |
| 8192 | 77.8 | 49.6 | 44.8 | 53.8 | 19.5 | 76.4 | 55.0 | 58.1 | 48.6 | 26.6 |
| 16384 | 78.6 | 51.3 | 48.1 | 55.8 | 22.6 | 77.2 | 56.6 | 61.4 | 51.8 | 26.7 |
| 24576 | 78.3 | 51.3 | 50.4 | 58.0 | 24.7 | 77.9 | 58.2 | 61.3 | 52.1 | 26.5 |
Table 1. Algorithm 1: Learning embedding from local coding based matching kernel:LCMK.
| Input: |
| Output: Embedding matrix |
| 1. Generate |
| 2. Obtain the adjacency matrix using local coding technique |
| 3. Calculate |
| 4. Calculate the embedding matrix |